Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Imran Rashid
Hi Michael,

judging from the logs, it seems that those tasks are just working a really
long time.  If you have long running tasks, then you wouldn't expect the
driver to output anything while those tasks are working.

What is unusual is that there is no activity during all that time the tasks
are executing.  Are you sure you are looking at the activity of the
executors (the nodes that are actually running the tasks), and not the
activity of the driver node (the node where your main program lives, but
that doesn't do any of the distributed computation)?  It would be perfectly
normal for the driver node to be idle while all the executors were busy
with long running tasks.

I would look at:
(a) the cpu usage etc. of the executor nodes during those long running tasks
(b) the thread dumps of the executors during those long running tasks
(available via the UI under the Executors tab, or just log into the boxes
and run jstack).  Ideally this will point out a hotspot in your code that
is making these tasks take so long.  (Or perhaps it'll point out what is
going on in spark internals that is so slow)
(c) the summary metrics for the long running stage, when it finally
finishes (also available in the UI, under the Stages tab).  You will get
a breakdown of how much time is spent in various phases of the tasks, how
much data is read, etc., which can help you figure out why tasks are slow


Hopefully this will help you find out what is taking so long.  If you find
out the executors really arent' doing anything during these really long
tasks, it would be great to find that out, and maybe get some more info for
a bug report.

Imran


On Tue, Feb 3, 2015 at 6:18 PM, Michael Albert 
m_albert...@yahoo.com.invalid wrote:

 Greetings!

 First, my sincere thanks to all who have given me advice.
 Following previous discussion, I've rearranged my code to try to keep the
 partitions to more manageable sizes.
 Thanks to all who commented.

 At the moment, the input set I'm trying to work with is about 90GB (avro
 parquet format).

 When I run on a reasonable chunk of the data (say half) things work
 reasonably.

 On the full data, the spark process stalls.
 That is, for about 1.5 hours out of a 3.5 hour run, I see no activity.
 No cpu usage, no error message, no network activity.
 It just seems to sits there.
 The messages bracketing the stall are shown below.

 Any advice on how to diagnose this?
 I don't get any error messages.
 The spark UI says that it is running a stage, but it makes no discernible
 progress.
 Ganglia shows no CPU usage or network activity.
 When I shell into the worker nodes there are no filled disks or other
 obvious problems.

 How can I discern what Spark is waiting for?

 The only weird thing seen, other than the stall, is that the yarn logs on
 the workers have lines with messages like this:
 2015-02-03 22:59:58,890 INFO
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 13158 for container-id
 container_1422834185427_0083_01_21: 7.1 GB of 8.5 GB physical memory
 used; 7.6 GB of 42.5 GB virtual memory used

 It's rather strange that it mentions 42.5 GB of virtual memory.  The
 machines are EMR machines with 32 GB of physical memory and, as far as I
 can determine, no swap space.

 The messages bracketing the stall are shown below.


 Any advice is welcome.

 Thanks!

 Sincerely,
  Mike Albert

 Before the stall.
 15/02/03 21:45:28 INFO cluster.YarnClientClusterScheduler: Removed TaskSet
 5.0, whose tasks have all completed, from pool
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Stage 5
 (mapPartitionsWithIndex at Transposer.scala:147) finished in 4880.317 s
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: looking for newly runnable
 stages
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: running: Set(Stage 3)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: waiting: Set(Stage 6, Stage
 7, Stage 8)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: failed: Set()
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 6: List(Stage 3)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 7: List(Stage 6)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 8: List(Stage 7)
 At this point, I see no activity for 1.5 hours except for this (XXX for
 I.P. address)
 15/02/03 22:13:24 INFO util.AkkaUtils: Connecting to ExecutorActor:
 akka.tcp://sparkExecutor@ip-XXX.ec2.internal:36301/user/ExecutorActor

 Then finally it started again:
 15/02/03 23:31:34 INFO scheduler.TaskSetManager: Finished task 1.0 in
 stage 3.0 (TID 7301) in 7208259 ms on ip-10-171-0-124.ec2.internal (3/4)
 15/02/03 23:31:34 INFO scheduler.TaskSetManager: Finished task 0.0 in
 stage 3.0 (TID 7300) in 7208503 ms on ip-10-171-0-128.ec2.internal (4/4)
 15/02/03 23:31:34 INFO scheduler.DAGScheduler: Stage 3 (mapPartitions at
 Transposer.scala:211) finished in 7209.534 s






Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Sandy Ryza
Also, do you see any lines in the YARN NodeManager logs where it says that
it's killing a container?

-Sandy

On Wed, Feb 4, 2015 at 8:56 AM, Imran Rashid iras...@cloudera.com wrote:

 Hi Michael,

 judging from the logs, it seems that those tasks are just working a really
 long time.  If you have long running tasks, then you wouldn't expect the
 driver to output anything while those tasks are working.

 What is unusual is that there is no activity during all that time the
 tasks are executing.  Are you sure you are looking at the activity of the
 executors (the nodes that are actually running the tasks), and not the
 activity of the driver node (the node where your main program lives, but
 that doesn't do any of the distributed computation)?  It would be perfectly
 normal for the driver node to be idle while all the executors were busy
 with long running tasks.

 I would look at:
 (a) the cpu usage etc. of the executor nodes during those long running
 tasks
 (b) the thread dumps of the executors during those long running tasks
 (available via the UI under the Executors tab, or just log into the boxes
 and run jstack).  Ideally this will point out a hotspot in your code that
 is making these tasks take so long.  (Or perhaps it'll point out what is
 going on in spark internals that is so slow)
 (c) the summary metrics for the long running stage, when it finally
 finishes (also available in the UI, under the Stages tab).  You will get
 a breakdown of how much time is spent in various phases of the tasks, how
 much data is read, etc., which can help you figure out why tasks are slow


 Hopefully this will help you find out what is taking so long.  If you find
 out the executors really arent' doing anything during these really long
 tasks, it would be great to find that out, and maybe get some more info for
 a bug report.

 Imran


 On Tue, Feb 3, 2015 at 6:18 PM, Michael Albert 
 m_albert...@yahoo.com.invalid wrote:

 Greetings!

 First, my sincere thanks to all who have given me advice.
 Following previous discussion, I've rearranged my code to try to keep the
 partitions to more manageable sizes.
 Thanks to all who commented.

 At the moment, the input set I'm trying to work with is about 90GB (avro
 parquet format).

 When I run on a reasonable chunk of the data (say half) things work
 reasonably.

 On the full data, the spark process stalls.
 That is, for about 1.5 hours out of a 3.5 hour run, I see no activity.
 No cpu usage, no error message, no network activity.
 It just seems to sits there.
 The messages bracketing the stall are shown below.

 Any advice on how to diagnose this?
 I don't get any error messages.
 The spark UI says that it is running a stage, but it makes no discernible
 progress.
 Ganglia shows no CPU usage or network activity.
 When I shell into the worker nodes there are no filled disks or other
 obvious problems.

 How can I discern what Spark is waiting for?

 The only weird thing seen, other than the stall, is that the yarn logs on
 the workers have lines with messages like this:
 2015-02-03 22:59:58,890 INFO
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 13158 for container-id
 container_1422834185427_0083_01_21: 7.1 GB of 8.5 GB physical memory
 used; 7.6 GB of 42.5 GB virtual memory used

 It's rather strange that it mentions 42.5 GB of virtual memory.  The
 machines are EMR machines with 32 GB of physical memory and, as far as I
 can determine, no swap space.

 The messages bracketing the stall are shown below.


 Any advice is welcome.

 Thanks!

 Sincerely,
  Mike Albert

 Before the stall.
 15/02/03 21:45:28 INFO cluster.YarnClientClusterScheduler: Removed
 TaskSet 5.0, whose tasks have all completed, from pool
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Stage 5
 (mapPartitionsWithIndex at Transposer.scala:147) finished in 4880.317 s
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: looking for newly runnable
 stages
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: running: Set(Stage 3)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: waiting: Set(Stage 6,
 Stage 7, Stage 8)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: failed: Set()
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 6: List(Stage 3)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 7: List(Stage 6)
 15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage
 8: List(Stage 7)
 At this point, I see no activity for 1.5 hours except for this (XXX for
 I.P. address)
 15/02/03 22:13:24 INFO util.AkkaUtils: Connecting to ExecutorActor:
 akka.tcp://sparkExecutor@ip-XXX.ec2.internal:36301/user/ExecutorActor

 Then finally it started again:
 15/02/03 23:31:34 INFO scheduler.TaskSetManager: Finished task 1.0 in
 stage 3.0 (TID 7301) in 7208259 ms on ip-10-171-0-124.ec2.internal (3/4)
 15/02/03 23:31:34 INFO scheduler.TaskSetManager: Finished task 0.0 in
 

Re: advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-04 Thread Michael Albert
Greetings!
Thanks to all who have taken the time to look at this.
While the process is stalled, I see, in the yarn log on the head node, 
repeating messages of the form Trying to fulfill reservation for application 
XXX on node YYY, but that node is is reserved by XXX_01.  Below is a chunk 
of the log.
Any suggestions as to what I can investigate?
Thanks!-Mike
2015-02-04 18:18:28,617 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1422834185427_0088 on node: 
ip-10-171-0-124.ec2.internal:91032015-02-04 18:18:28,617 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): assignContainers: 
node=ip-10-171-0-124.ec2.internal application=88 priority=1 request={Priority: 
1, Capability: memory:8704, vCores:1, # Containers: 17, Labels: , Location: 
*, Relax Locality: true} type=OFF_SWITCH2015-02-04 18:18:28,617 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): Application application_1422834185427_0088 
reserved container container_1422834185427_0088_01_25 on node host: 
ip-10-171-0-124.ec2.internal:9103 #containers=2 available=5632 used=17408, 
currently has 6 at priority 1; currentReservation 522242015-02-04 18:18:28,618 
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): Reserved container  
application=application_1422834185427_0088 resource=memory:8704, vCores:1 
queue=default: capacity=1.0, absoluteCapacity=1.0, 
usedResources=memory:226304, vCores:26usedCapacity=0.982, 
absoluteUsedCapacity=0.982, numApps=1, numContainers=26 
usedCapacity=0.982 absoluteUsedCapacity=0.982 used=memory:226304, 
vCores:26 cluster=memory:230400, vCores:1602015-02-04 18:18:28,618 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Skipping scheduling since node 
ip-10-171-0-124.ec2.internal:9103 is reserved by application 
appattempt_1422834185427_0088_012015-02-04 18:18:28,623 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): default usedResources: memory:226304, 
vCores:26 clusterResources: memory:230400, vCores:160 currentCapacity 
0.982 required memory:8704, vCores:1 potentialNewCapacity: 1.02 (  
max-capacity: 1.0)2015-02-04 18:18:28,625 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1422834185427_0088 on node: 
ip-10-171-0-119.ec2.internal:91032015-02-04 18:18:28,625 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): assignContainers: 
node=ip-10-171-0-119.ec2.internal application=88 priority=1 request={Priority: 
1, Capability: memory:8704, vCores:1, # Containers: 17, Labels: , Location: 
*, Relax Locality: true} type=OFF_SWITCH2015-02-04 18:18:28,626 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt
 (ResourceManager Event Processor): Application application_1422834185427_0088 
reserved container container_1422834185427_0088_01_26 on node host: 
ip-10-171-0-119.ec2.internal:9103 #containers=2 available=5632 used=17408, 
currently has 6 at priority 1; currentReservation 522242015-02-04 18:18:28,626 
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): Reserved container  
application=application_1422834185427_0088 resource=memory:8704, vCores:1 
queue=default: capacity=1.0, absoluteCapacity=1.0, 
usedResources=memory:226304, vCores:26usedCapacity=0.982, 
absoluteUsedCapacity=0.982, numApps=1, numContainers=26 
usedCapacity=0.982 absoluteUsedCapacity=0.982 used=memory:226304, 
vCores:26 cluster=memory:230400, vCores:1602015-02-04 18:18:28,626 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Skipping scheduling since node 
ip-10-171-0-119.ec2.internal:9103 is reserved by application 
appattempt_1422834185427_0088_012015-02-04 18:18:28,636 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
 (ResourceManager Event Processor): Trying to fulfill reservation for 
application application_1422834185427_0088 on node: 
ip-10-171-0-129.ec2.internal:91032015-02-04 18:18:28,636 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue 
(ResourceManager Event Processor): assignContainers: 
node=ip-10-171-0-129.ec2.internal application=88 priority=1 request={Priority: 
1, Capability: memory:8704, vCores:1, # Containers: 17, Labels: , Location: 
*, Relax Locality: true} 

advice on diagnosing Spark stall for 1.5hr out of 3.5hr job?

2015-02-03 Thread Michael Albert
Greetings!
First, my sincere thanks to all who have given me advice.Following previous 
discussion, I've rearranged my code to try to keep the partitions to more 
manageable sizes.Thanks to all who commented.
At the moment, the input set I'm trying to work with is about 90GB (avro 
parquet format).
When I run on a reasonable chunk of the data (say half) things work reasonably.
On the full data, the spark process stalls.That is, for about 1.5 hours out of 
a 3.5 hour run, I see no activity.No cpu usage, no error message, no network 
activity.It just seems to sits there.The messages bracketing the stall are 
shown below.
Any advice on how to diagnose this? I don't get any error messages.  The spark 
UI says that it is running a stage, but it makes no discernible 
progress.Ganglia shows no CPU usage or network activity.When I shell into the 
worker nodes there are no filled disks or other obvious problems.
How can I discern what Spark is waiting for?
The only weird thing seen, other than the stall, is that the yarn logs on the 
workers have lines with messages like this:2015-02-03 22:59:58,890 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 13158 for container-id 
container_1422834185427_0083_01_21: 7.1 GB of 8.5 GB physical memory used; 
7.6 GB of 42.5 GB virtual memory used
It's rather strange that it mentions 42.5 GB of virtual memory.  The machines 
are EMR machines with 32 GB of physical memory and, as far as I can determine, 
no swap space.
The messages bracketing the stall are shown below.

Any advice is welcome.
Thanks!
Sincerely, Mike Albert
Before the stall.15/02/03 21:45:28 INFO cluster.YarnClientClusterScheduler: 
Removed TaskSet 5.0, whose tasks have all completed, from pool 15/02/03 
21:45:28 INFO scheduler.DAGScheduler: Stage 5 (mapPartitionsWithIndex at 
Transposer.scala:147) finished in 4880.317 s15/02/03 21:45:28 INFO 
scheduler.DAGScheduler: looking for newly runnable stages15/02/03 21:45:28 INFO 
scheduler.DAGScheduler: running: Set(Stage 3)15/02/03 21:45:28 INFO 
scheduler.DAGScheduler: waiting: Set(Stage 6, Stage 7, Stage 8)15/02/03 
21:45:28 INFO scheduler.DAGScheduler: failed: Set()15/02/03 21:45:28 INFO 
scheduler.DAGScheduler: Missing parents for Stage 6: List(Stage 3)15/02/03 
21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage 7: List(Stage 
6)15/02/03 21:45:28 INFO scheduler.DAGScheduler: Missing parents for Stage 8: 
List(Stage 7)At this point, I see no activity for 1.5 hours except for this 
(XXX for I.P. address)15/02/03 22:13:24 INFO util.AkkaUtils: Connecting to 
ExecutorActor: 
akka.tcp://sparkExecutor@ip-XXX.ec2.internal:36301/user/ExecutorActor
Then finally it started again:15/02/03 23:31:34 INFO scheduler.TaskSetManager: 
Finished task 1.0 in stage 3.0 (TID 7301) in 7208259 ms on 
ip-10-171-0-124.ec2.internal (3/4)15/02/03 23:31:34 INFO 
scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 7300) in 7208503 
ms on ip-10-171-0-128.ec2.internal (4/4)15/02/03 23:31:34 INFO 
scheduler.DAGScheduler: Stage 3 (mapPartitions at Transposer.scala:211) 
finished in 7209.534 s