Re: Any ideas why a few tasks would stall

2014-12-05 Thread Andrew Or
not work for me. that is, rdd.coalesce(200, forceShuffle) . Does anyone have ideas on how to distribute your data evenly and co-locate partitions of interest? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Any-ideas-why-a-few-tasks-would-stall

Re: Any ideas why a few tasks would stall

2014-12-04 Thread Ankit Soni
I ran into something similar before. 19/20 partitions would complete very quickly, and 1 would take the bulk of time and shuffle reads writes. This was because the majority of partitions were empty, and 1 had all the data. Perhaps something similar is going on here - I would suggest taking a

Re: Any ideas why a few tasks would stall

2014-12-04 Thread Sameer Farooqui
Good point, Ankit. Steve - You can click on the link for '27' in the first column to get a break down of how much data is in each of those 116 cached partitions. But really, you want to also understand how much data is in the 4 non-cached partitions, as they may be huge. One thing you can try

Re: Any ideas why a few tasks would stall

2014-12-04 Thread Steve Lewis
Thanks - I found the same thing - calling boolean forceShuffle = true; myRDD = myRDD.coalesce(120,forceShuffle ); worked - there were 120 partitions but forcing a shuffle distributes the work I believe there is a bug in my code causing memory to accumulate as partitions grow in

Re: Any ideas why a few tasks would stall

2014-12-04 Thread akhandeshi
This did not work for me. that is, rdd.coalesce(200, forceShuffle) . Does anyone have ideas on how to distribute your data evenly and co-locate partitions of interest? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Any-ideas-why-a-few-tasks-would-stall

Re: Any ideas why a few tasks would stall

2014-12-02 Thread Sameer Farooqui
Have you tried taking thread dumps via the UI? There is a link to do so on the Executors' page (typically under http://driver IP:4040/exectuors. By visualizing the thread call stack of the executors with slow running tasks, you can see exactly what code is executing at an instant in time. If you

Re: Any ideas why a few tasks would stall

2014-12-02 Thread Steve Lewis
1) I can go there but none of the links are clickable 2) when I see something like 116/120 partitions succeeded in the stages ui in the storage ui I see NOTE RDD 27 has 116 partitions cached - 4 not and those are exactly the number of machines which will not complete Also RDD 27 does not show up