Re: Hung spark executors don't count toward worker memory limit
Hi Keith, we are running into the same issue here with Spark standalone 1.2.1. I was wondering if you have found a solution or workaround. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hung-spark-executors-don-t-count-toward-worker-memory-limit-tp16083p24548.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Hung spark executors don't count toward worker memory limit
Maybe I should put this another way. If spark has two jobs, A and B, both of which consume the entire allocated memory pool, is it expected that spark can launch B before the executor processes tied to A are completely terminated? On Thu, Oct 9, 2014 at 6:57 PM, Keith Simmons ke...@pulse.io wrote: Actually, it looks like even when the job shuts down cleanly, there can be a few minutes of overlap between the time the next job launches and the first job actually terminates it's process. Here's some relevant lines from my log: 14/10/09 20:49:20 INFO Worker: Asked to kill executor app-20141009204127-0029/1 14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor app-20141009204127-0029/1 interrupted 14/10/09 20:49:20 INFO ExecutorRunner: Killing process! 14/10/09 20:49:20 INFO Worker: Asked to launch executor app-20141009204508-0030/1 for Job ... More lines about launching new job... 14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished with state KILLED As you can see, the first app didn't actually shutdown until two minutes after the new job launched. During that time, I was at double the worker memory limit. Keith On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons ke...@pulse.io wrote: Hi Folks, We have a spark job that is occasionally running out of memory and hanging (I believe in GC). This is it's own issue we're debugging, but in the meantime, there's another unfortunate side effect. When the job is killed (most often because of GC errors), each worker attempts to kill its respective executor. However, it appears that several of the executors fail to shut themselves down (I actually have to kill -9 them). However, even though the worker fails to successfully cleanup the executor, it starts the next job as though all the resources have been freed up. This is causing the spark worker to exceed it's configured memory limit, which is in turn running our boxes out of memory. Is there a setting I can configure to prevent this issue? Perhaps by having the worker force kill the executor or not start the next job until it's confirmed the executor has exited? Let me know if there's any additional information I can provide. Keith P.S. We're running spark 1.0.2
Hung spark executors don't count toward worker memory limit
Hi Folks, We have a spark job that is occasionally running out of memory and hanging (I believe in GC). This is it's own issue we're debugging, but in the meantime, there's another unfortunate side effect. When the job is killed (most often because of GC errors), each worker attempts to kill its respective executor. However, it appears that several of the executors fail to shut themselves down (I actually have to kill -9 them). However, even though the worker fails to successfully cleanup the executor, it starts the next job as though all the resources have been freed up. This is causing the spark worker to exceed it's configured memory limit, which is in turn running our boxes out of memory. Is there a setting I can configure to prevent this issue? Perhaps by having the worker force kill the executor or not start the next job until it's confirmed the executor has exited? Let me know if there's any additional information I can provide. Keith P.S. We're running spark 1.0.2
Re: Hung spark executors don't count toward worker memory limit
Actually, it looks like even when the job shuts down cleanly, there can be a few minutes of overlap between the time the next job launches and the first job actually terminates it's process. Here's some relevant lines from my log: 14/10/09 20:49:20 INFO Worker: Asked to kill executor app-20141009204127-0029/1 14/10/09 20:49:20 INFO ExecutorRunner: Runner thread for executor app-20141009204127-0029/1 interrupted 14/10/09 20:49:20 INFO ExecutorRunner: Killing process! 14/10/09 20:49:20 INFO Worker: Asked to launch executor app-20141009204508-0030/1 for Job ... More lines about launching new job... 14/10/09 20:51:17 INFO Worker: Executor app-20141009204127-0029/1 finished with state KILLED As you can see, the first app didn't actually shutdown until two minutes after the new job launched. During that time, I was at double the worker memory limit. Keith On Thu, Oct 9, 2014 at 5:06 PM, Keith Simmons ke...@pulse.io wrote: Hi Folks, We have a spark job that is occasionally running out of memory and hanging (I believe in GC). This is it's own issue we're debugging, but in the meantime, there's another unfortunate side effect. When the job is killed (most often because of GC errors), each worker attempts to kill its respective executor. However, it appears that several of the executors fail to shut themselves down (I actually have to kill -9 them). However, even though the worker fails to successfully cleanup the executor, it starts the next job as though all the resources have been freed up. This is causing the spark worker to exceed it's configured memory limit, which is in turn running our boxes out of memory. Is there a setting I can configure to prevent this issue? Perhaps by having the worker force kill the executor or not start the next job until it's confirmed the executor has exited? Let me know if there's any additional information I can provide. Keith P.S. We're running spark 1.0.2