Is it possible to jstack the executors and see where they are hanging?
On Thu, Mar 26, 2015 at 2:02 PM, Jon Chase jon.ch...@gmail.com wrote:
Spark 1.3.0 on YARN (Amazon EMR), cluster of 10 m3.2xlarge (8cpu, 30GB),
executor memory 20GB, driver memory 10GB
I'm using Spark SQL, mainly via spark-shell, to query 15GB of data spread
out over roughly 2,000 Parquet files and my queries frequently hang. Simple
queries like select count(*) from ... on the entire data set work ok.
Slightly more demanding ones with group by's and some aggregate functions
(percentile_approx, avg, etc.) work ok as well, as long as I have some
criteria in my where clause to keep the number of rows down.
Once I hit some limit on query complexity and rows processed, my queries
start to hang. I've left them for up to an hour without seeing any
progress. No OOM's either - the job is just stuck.
I've tried setting spark.sql.shuffle.partitions to 400 and even 800, but
with the same results: usually near the end of the tasks (like 780 of 800
complete), progress just stops:
15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 788.0 in
stage 1.0 (TID 1618) in 800 ms on
ip-10-209-22-211.eu-west-1.compute.internal (748/800)
15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 793.0 in
stage 1.0 (TID 1623) in 622 ms on
ip-10-105-12-41.eu-west-1.compute.internal (749/800)
15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 797.0 in
stage 1.0 (TID 1627) in 616 ms on ip-10-90-2-201.eu-west-1.compute.internal
(750/800)
15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 799.0 in
stage 1.0 (TID 1629) in 611 ms on ip-10-90-2-201.eu-west-1.compute.internal
(751/800)
15/03/26 20:53:29 INFO scheduler.TaskSetManager: Finished task 795.0 in
stage 1.0 (TID 1625) in 669 ms on
ip-10-105-12-41.eu-west-1.compute.internal (752/800)
^^^ this is where it stays forever
Looking at the Spark UI, several of the executors still list active
tasks. I do see that the Shuffle Read for executors that don't have any
tasks remaining is around 100MB, whereas it's more like 10MB for the
executors that still have tasks.
The first stage, mapPartitions, always completes fine. It's the second
stage (takeOrdered), that hangs.
I've had this issue in 1.2.0 and 1.2.1 as well as 1.3.0. I've also
encountered it when using JSON files (instead of Parquet).
Thoughts? I'm blocked on using Spark SQL b/c most of the queries I do are
having this issue.