Re: Hung spark executors don't count toward worker memory limit

2015-09-01 Thread hai
Hi Keith, we are running into the same issue here with Spark standalone
1.2.1. I was wondering if you have found a solution or workaround.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hung-spark-executors-don-t-count-toward-worker-memory-limit-tp16083p24548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Hung spark executors don't count toward worker memory limit

2014-10-13 Thread Keith Simmons
Maybe I should put this another way.  If spark has two jobs, A and B, both
of which consume the entire allocated memory pool, is it expected that
spark can launch B before the executor processes tied to A are completely
terminated?

On Thu, Oct 9, 2014 at 6:57 PM, Keith Simmons ke...@pulse.io wrote:

 Actually, it looks like even when the job shuts down cleanly, there can be
 a few minutes of overlap between the time the next job launches and the
 first job actually terminates it's process.  Here's some relevant lines
 from my log:

 14/10/09 20:49:20 INFO Worker: Asked to kill executor
 app-20141009204127-0029/1
 14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor
 app-20141009204127-0029/1 interrupted
 14/10/09 20:49:20 INFO ExecutorRunner: Killing process!
 14/10/09 20:49:20 INFO Worker: Asked to launch executor
 app-20141009204508-0030/1 for Job
 ... More lines about launching new job...
 14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished
 with state KILLED

 As you can see, the first app didn't actually shutdown until two minutes
 after the new job launched.  During that time, I was at double the worker
 memory limit.

 Keith


 On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons ke...@pulse.io wrote:

 Hi Folks,

 We have a spark job that is occasionally running out of memory and
 hanging (I believe in GC).  This is it's own issue we're debugging, but in
 the meantime, there's another unfortunate side effect.  When the job is
 killed (most often because of GC errors), each worker attempts to kill its
 respective executor.  However, it appears that several of the executors
 fail to shut themselves down (I actually have to kill -9 them).  However,
 even though the worker fails to successfully cleanup the executor, it
 starts the next job as though all the resources have been freed up.  This
 is causing the spark worker to exceed it's configured memory limit, which
 is in turn running our boxes out of memory.  Is there a setting I can
 configure to prevent this issue?  Perhaps by having the worker force kill
 the executor or not start the next job until it's confirmed the executor
 has exited?  Let me know if there's any additional information I can
 provide.

 Keith

 P.S. We're running spark 1.0.2





Hung spark executors don't count toward worker memory limit

2014-10-09 Thread Keith Simmons
Hi Folks,

We have a spark job that is occasionally running out of memory and hanging
(I believe in GC).  This is it's own issue we're debugging, but in the
meantime, there's another unfortunate side effect.  When the job is killed
(most often because of GC errors), each worker attempts to kill its
respective executor.  However, it appears that several of the executors
fail to shut themselves down (I actually have to kill -9 them).  However,
even though the worker fails to successfully cleanup the executor, it
starts the next job as though all the resources have been freed up.  This
is causing the spark worker to exceed it's configured memory limit, which
is in turn running our boxes out of memory.  Is there a setting I can
configure to prevent this issue?  Perhaps by having the worker force kill
the executor or not start the next job until it's confirmed the executor
has exited?  Let me know if there's any additional information I can
provide.

Keith

P.S. We're running spark 1.0.2


Re: Hung spark executors don't count toward worker memory limit

2014-10-09 Thread Keith Simmons
Actually, it looks like even when the job shuts down cleanly, there can be
a few minutes of overlap between the time the next job launches and the
first job actually terminates it's process.  Here's some relevant lines
from my log:

14/10/09 20:49:20 INFO Worker: Asked to kill executor
app-20141009204127-0029/1
14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor
app-20141009204127-0029/1 interrupted
14/10/09 20:49:20 INFO ExecutorRunner: Killing process!
14/10/09 20:49:20 INFO Worker: Asked to launch executor
app-20141009204508-0030/1 for Job
... More lines about launching new job...
14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished
with state KILLED

As you can see, the first app didn't actually shutdown until two minutes
after the new job launched.  During that time, I was at double the worker
memory limit.

Keith


On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons ke...@pulse.io wrote:

 Hi Folks,

 We have a spark job that is occasionally running out of memory and hanging
 (I believe in GC).  This is it's own issue we're debugging, but in the
 meantime, there's another unfortunate side effect.  When the job is killed
 (most often because of GC errors), each worker attempts to kill its
 respective executor.  However, it appears that several of the executors
 fail to shut themselves down (I actually have to kill -9 them).  However,
 even though the worker fails to successfully cleanup the executor, it
 starts the next job as though all the resources have been freed up.  This
 is causing the spark worker to exceed it's configured memory limit, which
 is in turn running our boxes out of memory.  Is there a setting I can
 configure to prevent this issue?  Perhaps by having the worker force kill
 the executor or not start the next job until it's confirmed the executor
 has exited?  Let me know if there's any additional information I can
 provide.

 Keith

 P.S. We're running spark 1.0.2