Re: [VOTE] Standardize Spark Exception Messages SPIP

2020-11-08 Thread Takeshi Yamamuro
+1

On Thu, Nov 5, 2020 at 3:41 AM Xinyi Yu  wrote:

> Hi all,
>
> We had the discussion of SPIP: Standardize Spark Exception Messages at
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html
> <
> http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-SPIP-Standardize-Spark-Exception-Messages-td30341.html>
>
> . The SPIP document link is at
>
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing
> <
> https://docs.google.com/document/d/1XGj1o3xAFh8BA7RCn3DtwIPC6--hIFOaNUNSlpaOIZs/edit?usp=sharing>
>
> . We want to have the vote on this, for 72 hours.
>
> Please vote before November 7th at noon:
>
> [ ] +1: Accept this SPIP proposal
> [ ] -1: Do not agree to standardize Spark exception messages, because ...
>
>
> Thanks for your time and feedback!
>
> --
> Xinyi
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
---
Takeshi Yamamuro


Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-08 Thread Jungtaek Lim
After the check logic was introduced in Spark 3.0, there's another related
issue I addressed in Spark 3.1, SPARK-24634 [1].

Before SPARK-24634, there's no way to know how many rows are discarded due
to being late, even whether there's any late row or not. That said, the
issue has been the correctness issue "silently" impacting the
result. SPARK-24634 will provide the overall number of late rows in the
streaming listener, as well as the number of late rows "per operator" in
the SQL UI graph. So end users are no longer "blindly" impacted.

Even though, I'd agree that it's pretty hard to construct the query
which avoids correctness issues and still does chained stateful operations.
I see two separate JIRA issues on reporting the same correctness behavior,
meaning this is already impacting the end users' queries. (More number of
end users may not even notice the impact, as SPARK-24634 isn't released
yet.)

So overall I'm +1 to prevent the query in prior. This change would possibly
break some of user queries, but I'd suspect they might suffer from
correctness and they even didn't notice that.

For sure, a better approach would be dropping global watermark and
implementing operator-wise watermark properly. This is just a workaround,
but fixing watermark would require major effort.

Thanks,
Jungtaek Lim (HeartSaVioR)

1. https://issues.apache.org/jira/browse/SPARK-24634


On Sat, Nov 7, 2020 at 3:59 PM Liang-Chi Hsieh  wrote:

> Hi devs,
>
> In Spark structured streaming, chained stateful operators possibly produces
> incorrect results under the global watermark. SPARK-33259
> (https://issues.apache.org/jira/browse/SPARK-33259) has an example
> demostrating what the correctness issue could be.
>
> Currently we don't prevent users running such queries. Because the possible
> correctness in chained stateful operators in streaming query is not
> straightforward for users. From users perspective, it will possibly be
> considered as a Spark bug like SPARK-33259. It is also possible the worse
> case, users are not aware of the correctness issue and use wrong results.
>
> IMO, it is better to disable such queries and let users choose to run the
> query if they understand there is such risk, instead of implicitly running
> the query and let users to find out correctness issue by themselves.
>
> I would like to propose to disable the streaming query with possible
> correctness issue in chained stateful operators. The behavior can be
> controlled by a SQL config, so if users understand the risk and still want
> to run the query, they can disable the check.
>
> In the PR (https://github.com/apache/spark/pull/30210), the concern I got
> for now is, this changes current behavior and by default it will break some
> existing streaming queries. But I think it is pretty easy to disable the
> check with the new config. In the PR currently there is no objection but
> suggestion to hear more voices. Please let me know if you have some
> thoughts.
>
> Thanks.
> Liang-Chi Hsieh
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>