From: Aaron Jackson [mailto:ajack...@pobox.com] 
Sent: Tuesday, July 19, 2016 7:17 PM
To: user <user@spark.apache.org>
Subject: Heavy Stage Concentration - Ends With Failure

 

Hi,

 

I have a cluster with 15 nodes of which 5 are HDFS nodes.  I kick off a job 
that creates some 120 stages.  Eventually, the active and pending stages reduce 
down to a small bottleneck and it never fails... the tasks associated with the 
10 (or so) running tasks are always allocated to the same executor on the same 
host.

 

Sooner or later, it runs out of memory ... or some other resource.  It falls 
over and then they tasks are reallocated to another executor.

 

Why do we see such heavy concentration of tasks onto a single executor when 
other executors are free?  Were the tasks assigned to an executor when the job 
was decomposed into stages?

Reply via email to