Re: spark driver with OOM due to org.apache.spark.status.ElementTrackingStore

2022-08-04 Thread Qian SUN
I have no other advice for this. Does the situation improve after parameter configuration? Jason Jun 于2022年8月5日周五 06:55写道: > Hi Qian, > > Thanks for your feedback. We're using spark ver 3.1.2, these are set : > > spark.ui.retainedJobs 10 > spark.ui.retainedStages 10 > spark.ui.retainedTasks 100

Re: structured streaming join of streaming dataframe with static dataframe performance

2022-08-04 Thread Koert Kuipers
thats good point about skewness and potential join optimizations. i will try turning off all skew optimizations, and force a sort-merge-join, and see if it then re-uses shuffle files on the static side. unfortunately my static side is too large to broadcast. the streaming side can be broadcasted

Re: structured streaming join of streaming dataframe with static dataframe performance

2022-08-04 Thread kant kodali
I suspect it is probably because the incoming rows when I joined with static frame can lead to variable degree of skewness over time and if so it is probably better to employ different join strategies at run time. But if you know your Dataset I believe you can just do broadcast join for your