not work for me. that is, rdd.coalesce(200, forceShuffle) . Does
anyone have ideas on how to distribute your data evenly and co-locate
partitions of interest?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Any-ideas-why-a-few-tasks-would-stall
I ran into something similar before. 19/20 partitions would complete very
quickly, and 1 would take the bulk of time and shuffle reads writes. This was
because the majority of partitions were empty, and 1 had all the data. Perhaps
something similar is going on here - I would suggest taking a
Good point, Ankit.
Steve - You can click on the link for '27' in the first column to get a
break down of how much data is in each of those 116 cached partitions. But
really, you want to also understand how much data is in the 4 non-cached
partitions, as they may be huge. One thing you can try
Thanks - I found the same thing -
calling
boolean forceShuffle = true;
myRDD = myRDD.coalesce(120,forceShuffle );
worked - there were 120 partitions but forcing a shuffle distributes the
work
I believe there is a bug in my code causing memory to accumulate as
partitions grow in
This did not work for me. that is, rdd.coalesce(200, forceShuffle) . Does
anyone have ideas on how to distribute your data evenly and co-locate
partitions of interest?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Any-ideas-why-a-few-tasks-would-stall
Have you tried taking thread dumps via the UI? There is a link to do so on
the Executors' page (typically under http://driver IP:4040/exectuors.
By visualizing the thread call stack of the executors with slow running
tasks, you can see exactly what code is executing at an instant in time. If
you
1) I can go there but none of the links are clickable
2) when I see something like 116/120 partitions succeeded in the stages ui
in the storage ui I see
NOTE RDD 27 has 116 partitions cached - 4 not and those are exactly the
number of machines which will not complete
Also RDD 27 does not show up