Re: Heavy Stage Concentration - Ends With Failure

2016-07-19 Thread Andrew Ehrlich
Yea this is a good suggestion; also check 25th percentile, median, and 75th percentile to see how skewed the input data is. If you find that the RDD’s partitions are skewed you can solve it either by changing the partitioner when you read the files like already suggested, or call repartition()

Re: Heavy Stage Concentration - Ends With Failure

2016-07-19 Thread Kuchekar
Hi, Can you check if the RDD is partitioned correctly with correct partition number (if you are manually setting the partition value.) . Try using Hash partitioner while reading the files. One way you can debug is by checking the number of records that executor has compared to others in the

Heavy Stage Concentration - Ends With Failure

2016-07-19 Thread Aaron Jackson
Hi, I have a cluster with 15 nodes of which 5 are HDFS nodes. I kick off a job that creates some 120 stages. Eventually, the active and pending stages reduce down to a small bottleneck and it never fails... the tasks associated with the 10 (or so) running tasks are always allocated to the same