Spark 1.3.0 on YARN (Amazon EMR), cluster of 10 m3.2xlarge (8cpu, 30GB), executor memory 20GB, driver memory 10GB
I'm using Spark SQL, mainly via spark-shell, to query 15GB of data spread out over roughly 2,000 Parquet files and my queries frequently hang. Simple queries like "select count(*) from ..." on the entire data set work ok. Slightly more demanding ones with group by's and some aggregate functions (percentile_approx, avg, etc.) work ok as well, as long as I have some criteria in my where clause to keep the number of rows down. Once I hit some limit on query complexity and rows processed, my queries start to hang. I've left them for up to an hour without seeing any progress. No OOM's either - the job is just stuck. I've tried setting spark.sql.shuffle.partitions to 400 and even 800, but with the same results: usually near the end of the tasks (like 780 of 800 complete), progress just stops: 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 788.0 in stage 1.0 (TID 1618) in 800 ms on ip-10-209-22-211.eu-west-1.compute.internal (748/800) 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 793.0 in stage 1.0 (TID 1623) in 622 ms on ip-10-105-12-41.eu-west-1.compute.internal (749/800) 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 797.0 in stage 1.0 (TID 1627) in 616 ms on ip-10-90-2-201.eu-west-1.compute.internal (750/800) 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 799.0 in stage 1.0 (TID 1629) in 611 ms on ip-10-90-2-201.eu-west-1.compute.internal (751/800) 15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 795.0 in stage 1.0 (TID 1625) in 669 ms on ip-10-105-12-41.eu-west-1.compute.internal (752/800) ^^^^^^^ this is where it stays forever Looking at the Spark UI, several of the executors still list active tasks. I do see that the Shuffle Read for executors that don't have any tasks remaining is around 100MB, whereas it's more like 10MB for the executors that still have tasks. The first stage, mapPartitions, always completes fine. It's the second stage (takeOrdered), that hangs. I've had this issue in 1.2.0 and 1.2.1 as well as 1.3.0. I've also encountered it when using JSON files (instead of Parquet). Thoughts? I'm blocked on using Spark SQL b/c most of the queries I do are having this issue.