Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-11 Thread Liang-Chi Hsieh
Thanks all for the responses! Based on these responses, I think we can go forward with the PR. I will put the new config in the migration guide. Please help review the PR if you have more comments. Thank you! Yuanjian Li wrote > Already +1 in the PR. It would be great to mention the new

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-11 Thread Yuanjian Li
Already +1 in the PR. It would be great to mention the new config in the SS migration guide. Ryan Blue 于2020年11月11日周三 上午7:48写道: > +1, I agree with Tom. > > On Tue, Nov 10, 2020 at 3:00 PM Dongjoon Hyun > wrote: > >> +1 for Apache Spark 3.1.0. >> >> Bests, >> Dongjoon. >> >> On Tue, Nov 10,

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-10 Thread Ryan Blue
+1, I agree with Tom. On Tue, Nov 10, 2020 at 3:00 PM Dongjoon Hyun wrote: > +1 for Apache Spark 3.1.0. > > Bests, > Dongjoon. > > On Tue, Nov 10, 2020 at 6:17 AM Tom Graves > wrote: > >> +1 since its a correctness issue, I think its ok to change the behavior >> to make sure the user is aware

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-10 Thread Dongjoon Hyun
+1 for Apache Spark 3.1.0. Bests, Dongjoon. On Tue, Nov 10, 2020 at 6:17 AM Tom Graves wrote: > +1 since its a correctness issue, I think its ok to change the behavior to > make sure the user is aware of it and let them decide. > > Tom > > On Saturday, November 7, 2020, 01:00:11 AM CST,

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-10 Thread Tom Graves
+1 since its a correctness issue, I think its ok to change the behavior to make sure the user is aware of it and let them decide. Tom On Saturday, November 7, 2020, 01:00:11 AM CST, Liang-Chi Hsieh wrote: Hi devs, In Spark structured streaming, chained stateful operators possibly

Re: [DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-08 Thread Jungtaek Lim
After the check logic was introduced in Spark 3.0, there's another related issue I addressed in Spark 3.1, SPARK-24634 [1]. Before SPARK-24634, there's no way to know how many rows are discarded due to being late, even whether there's any late row or not. That said, the issue has been the

[DISCUSS] Disable streaming query with possible correctness issue by default

2020-11-06 Thread Liang-Chi Hsieh
Hi devs, In Spark structured streaming, chained stateful operators possibly produces incorrect results under the global watermark. SPARK-33259 (https://issues.apache.org/jira/browse/SPARK-33259) has an example demostrating what the correctness issue could be. Currently we don't prevent users