Thanks Akhil and Sean.

All three workers are doing the work and tasks stall simultaneously on all 
three. I think Sean hit on my issue. I've been under the impression that each 
application has one executor process per worker machine (not per core per 
machine). Is that incorrect? If an executor is running on each core that would 
totally make sense why things are stalling.

Akhil, I'm running 8/cores per machine, and tasks are stalling on all three 
machines simultaneously. Also, no other Spark contexts are running, so I didn't 
think this was an issue of spark.executor.memory vs SPARK_WORKER_MEMORY (which 
is default currently).

App UI
ID      Name    Cores   Memory per Node Submitted Time  User    State   Duration
app-20140819101355-0001<http://tc1-master:8080/app?appId=app-20140819101355-0001>
       Spark shell<http://tc1-master:4040/>    24      2.0 GB
Worker UI
ExecutorID      Cores   State   Memory  Job Details     Logs
2       8       RUNNING 2.0 GB
Tasks when it stalls:
129     129     SUCCESS NODE_LOCAL      worker01        8/19/14 10:16   0.1 s   
        1 ms
130     130     RUNNING NODE_LOCAL      worker03        8/19/14 10:16   5 s
131     131     RUNNING NODE_LOCAL      worker02        8/19/14 10:16   5 s
132     132     SUCCESS NODE_LOCAL      worker02        8/19/14 10:16   0.1 s   
        1 ms
133     133     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
134     134     RUNNING NODE_LOCAL      worker02        8/19/14 10:16   5 s
135     135     RUNNING NODE_LOCAL      worker03        8/19/14 10:16   5 s
136     136     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
137     137     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
138     138     RUNNING NODE_LOCAL      worker03        8/19/14 10:16   5 s
139     139     RUNNING NODE_LOCAL      worker02        8/19/14 10:16   5 s
140     140     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
141     141     RUNNING NODE_LOCAL      worker02        8/19/14 10:16   5 s
142     142     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
143     143     RUNNING NODE_LOCAL      worker01        8/19/14 10:16   5 s
144     144     RUNNING NODE_LOCAL      worker03        8/19/14 10:16   5 s
145     145     RUNNING NODE_LOCAL      worker02        8/19/14 10:16   5 s


From: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>>
Date: Tuesday, August 19, 2014 at 9:23 AM
To: Capital One 
<benjamin.la...@capitalone.com<mailto:benjamin.la...@capitalone.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Executor Memory, Task hangs


Given a fixed amount of memory allocated to your workers, more memory per 
executor means fewer executors can execute in parallel. This means it takes 
longer to finish all of the tasks. Set high enough, and your executors can find 
no worker with enough memory and so they all are stuck waiting for resources. 
The reason the tasks seem to take longer is really that they spend time waiting 
for an executor rather than spend more time running.  That's my first guess.

If you want Spark to use more memory on your machines, give workers more 
memory. It sounds like there is no value in increasing executor memory as it 
only means you are underutilizing the CPU of your cluster by not running as 
many tasks in parallel as would be optimal.

Hi all,

I'm doing some testing on a small dataset (HadoopRDD, 2GB, ~10M records), with 
a cluster of 3 nodes

Simple calculations like count take approximately 5s when using the default 
value of executor.memory (512MB). When I scale this up to 2GB, several Tasks 
take 1m or more (while most still are <1s), and tasks hang indefinitely if I 
set it to 4GB or higher.

While these worker nodes aren't very powerful, they seem to have enough RAM to 
handle this:

Running 'free –m' shows I have >7GB free on each worker.

Any tips on why these jobs would hang when given more available RAM?

Thanks
Ben

________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Reply via email to