Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Yang Jie
hmm... I guess this is meant to cc @Bingkun Pan ? On 2024/03/05 02:16:12 Hyukjin Kwon wrote: > Is this related to https://github.com/apache/spark/pull/42428? > > cc @Yang,Jie(INF) > > On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim > wrote: > > > Shall we revisit this functionality? The API doc

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread yangjie01
That sounds like a great suggestion. 发件人: Jungtaek Lim 日期: 2024年3月5日 星期二 10:46 收件人: Hyukjin Kwon 抄送: yangjie01 , Dongjoon Hyun , dev , user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Jungtaek Lim
Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in versionless doc (spark-website) rather than the doc being pinned to a specific version. On Tue, Mar 5, 2024 at 11:18 AM Hyukjin Kwon wrote: > Is this related to

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Hyukjin Kwon
Is this related to https://github.com/apache/spark/pull/42428? cc @Yang,Jie(INF) On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim wrote: > Shall we revisit this functionality? The API doc is built with individual > versions, and for each individual version we depend on other released > versions.

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
Thanks Jason for detailed information and big associated with it. Hopefully someone provided more information about this pressing issue. On Mon, Mar 4, 2024 at 1:26 PM Jason Xu wrote: > Hi Prem, > > From the symptom of shuffle fetch failure and few duplicate data and few > missing data, I think

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Jason Xu
Hi Prem, >From the symptom of shuffle fetch failure and few duplicate data and few missing data, I think you might run into this correctness bug: https://issues.apache.org/jira/browse/SPARK-38388. Node/shuffle failure is hard to avoid, I wonder if you have non-deterministic logic and calling

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
super :( On Mon, Mar 4, 2024 at 6:19 AM Mich Talebzadeh wrote: > "... in a nutshell if fetchFailedException occurs due to data node reboot > then it can create duplicate / missing data . so this is more of > hardware(env issue ) rather than spark issue ." > > As an overall conclusion your

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Mich Talebzadeh
"... in a nutshell if fetchFailedException occurs due to data node reboot then it can create duplicate / missing data . so this is more of hardware(env issue ) rather than spark issue ." As an overall conclusion your point is correct but again the answer is not binary. Spark core relies on