Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis.  They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query.  This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case.  From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 



Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
BTW it seems to me that even without that patch, you should be getting tasks 
launched as long as you leave at least 32 MB of memory free on each machine 
(that is, the sum of the executor memory sizes is not exactly the same as the 
total size of the machine). Then Mesos will be able to re-offer that machine 
whenever CPUs free up.

Matei

On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

We have not tried the work-around because there are other bugs in there 
that affected our set-up, though it seems it would help. 


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote: 

 +1 to have the work around in. 
 
 I'll be investigating from the Mesos side too. 
 
 Tim 
 
 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote: 
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too 
 bad that this happens in fine-grained mode -- would be really good to fix. 
 I'll see if we can get the workaround in 
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally 
 have you tried that? 
  
  Matei 
  
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) 
 wrote: 
  
  Hi Matei, 
  
  We have an analytics team that uses the cluster on a daily basis. They 
 use two types of 'run modes': 
  
  1) For running actual queries, they set the spark.executor.memory to 
 something between 4 and 8GB of RAM/worker. 
  
  2) A shell that takes a minimal amount of memory on workers (128MB) for 
 prototyping out a larger query. This allows them to not take up RAM on the 
 cluster when they do not really need it. 
  
  We see the deadlocks when there are a few shells in either case. From 
 the usage patterns we have, coarse-grained mode would be a challenge as we 
 have to constantly remind people to kill their shells as soon as their 
 queries finish. 
  
  Am I correct in viewing Mesos in coarse-grained mode as being similar to 
 Spark Standalone's cpu allocation behavior? 
  
  
  
  
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote: 
  Hey Gary, just as a workaround, note that you can use Mesos in 
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold 
 onto CPUs for the duration of the job. 
  
  Matei 
  
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) 
 wrote: 
  
  I just wanted to bring up a significant Mesos/Spark issue that makes the 
  combo difficult to use for teams larger than 4-5 people. It's covered in 
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is 
 that 
  Spark's use of executors in fine-grained mode is a very different 
 behavior 
  than many of the other common frameworks for Mesos. 
  
 


Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
This is kind of weird then, seems perhaps unrelated to this issue (or at least 
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being 
freed and didn't re-offer the machine *even though there was more than 32 MB 
free overall*?

Matei

On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) wrote:

I definitely saw a case where

a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell

My job immediately stopped making progress.  Once user a killed his shells, it 
started again.

This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.


On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks 
launched as long as you leave at least 32 MB of memory free on each machine 
(that is, the sum of the executor memory sizes is not exactly the same as the 
total size of the machine). Then Mesos will be able to re-offer that machine 
whenever CPUs free up.

Matei

On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis. They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query. This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case. From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 




Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
Anyway it would be good if someone from the Mesos side investigates this and 
proposes a solution. The 32 MB per task hack isn't completely foolproof either 
(e.g. people might allocate all the RAM to their executor and thus stop being 
able to launch tasks), so maybe we wait on a Mesos fix for this one.

Matei

On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) wrote:

This is kind of weird then, seems perhaps unrelated to this issue (or at least 
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being 
freed and didn't re-offer the machine *even though there was more than 32 MB 
free overall*?

Matei

On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) wrote:

I definitely saw a case where

a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell

My job immediately stopped making progress.  Once user a killed his shells, it 
started again.

This is on nodes with ~15G of memory, on which we have successfully run 8G jobs.


On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
BTW it seems to me that even without that patch, you should be getting tasks 
launched as long as you leave at least 32 MB of memory free on each machine 
(that is, the sum of the executor memory sizes is not exactly the same as the 
total size of the machine). Then Mesos will be able to re-offer that machine 
whenever CPUs free up.

Matei

On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.


On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

 +1 to have the work around in.

 I'll be investigating from the Mesos side too.

 Tim

 On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too
 bad that this happens in fine-grained mode -- would be really good to fix.
 I'll see if we can get the workaround in
 https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
 have you tried that?
 
  Matei
 
  On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  Hi Matei,
 
  We have an analytics team that uses the cluster on a daily basis. They
 use two types of 'run modes':
 
  1) For running actual queries, they set the spark.executor.memory to
 something between 4 and 8GB of RAM/worker.
 
  2) A shell that takes a minimal amount of memory on workers (128MB) for
 prototyping out a larger query. This allows them to not take up RAM on the
 cluster when they do not really need it.
 
  We see the deadlocks when there are a few shells in either case. From
 the usage patterns we have, coarse-grained mode would be a challenge as we
 have to constantly remind people to kill their shells as soon as their
 queries finish.
 
  Am I correct in viewing Mesos in coarse-grained mode as being similar to
 Spark Standalone's cpu allocation behavior?
 
 
 
 
  On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.
 
  Matei
 
  On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:
 
  I just wanted to bring up a significant Mesos/Spark issue that makes the
  combo difficult to use for teams larger than 4-5 people. It's covered in
  https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
  Spark's use of executors in fine-grained mode is a very different
 behavior
  than many of the other common frameworks for Mesos.
 




Re: Mesos/Spark Deadlock

2014-08-25 Thread Cody Koeninger
I definitely saw a case where

a. the only job running was a 256m shell
b. I started a 2g job
c. a little while later the same user as in a started another 256m shell

My job immediately stopped making progress.  Once user a killed his shells,
it started again.

This is on nodes with ~15G of memory, on which we have successfully run 8G
jobs.


On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 BTW it seems to me that even without that patch, you should be getting
 tasks launched as long as you leave at least 32 MB of memory free on each
 machine (that is, the sum of the executor memory sizes is not exactly the
 same as the total size of the machine). Then Mesos will be able to re-offer
 that machine whenever CPUs free up.

 Matei

 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 We have not tried the work-around because there are other bugs in there
 that affected our set-up, though it seems it would help.


 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

  +1 to have the work around in.
 
  I'll be investigating from the Mesos side too.
 
  Tim
 
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's
 too
  bad that this happens in fine-grained mode -- would be really good to
 fix.
  I'll see if we can get the workaround in
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
  have you tried that?
  
   Matei
  
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   Hi Matei,
  
   We have an analytics team that uses the cluster on a daily basis. They
  use two types of 'run modes':
  
   1) For running actual queries, they set the spark.executor.memory to
  something between 4 and 8GB of RAM/worker.
  
   2) A shell that takes a minimal amount of memory on workers (128MB) for
  prototyping out a larger query. This allows them to not take up RAM on
 the
  cluster when they do not really need it.
  
   We see the deadlocks when there are a few shells in either case. From
  the usage patterns we have, coarse-grained mode would be a challenge as
 we
  have to constantly remind people to kill their shells as soon as their
  queries finish.
  
   Am I correct in viewing Mesos in coarse-grained mode as being similar
 to
  Spark Standalone's cpu allocation behavior?
  
  
  
  
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia 
 matei.zaha...@gmail.com
  wrote:
   Hey Gary, just as a workaround, note that you can use Mesos in
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
  onto CPUs for the duration of the job.
  
   Matei
  
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   I just wanted to bring up a significant Mesos/Spark issue that makes
 the
   combo difficult to use for teams larger than 4-5 people. It's covered
 in
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
  that
   Spark's use of executors in fine-grained mode is a very different
  behavior
   than many of the other common frameworks for Mesos.
  
 



Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
Hi Matei,

I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.

Tim

On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Anyway it would be good if someone from the Mesos side investigates this and
 proposes a solution. The 32 MB per task hack isn't completely foolproof
 either (e.g. people might allocate all the RAM to their executor and thus
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this
 one.

 Matei

 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com)
 wrote:

 This is kind of weird then, seems perhaps unrelated to this issue (or at
 least to the way I understood it). Is the problem maybe that Mesos saw 0 MB
 being freed and didn't re-offer the machine *even though there was more than
 32 MB free overall*?

 Matei

 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org)
 wrote:

 I definitely saw a case where

 a. the only job running was a 256m shell
 b. I started a 2g job
 c. a little while later the same user as in a started another 256m shell

 My job immediately stopped making progress.  Once user a killed his shells,
 it started again.

 This is on nodes with ~15G of memory, on which we have successfully run 8G
 jobs.


 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 BTW it seems to me that even without that patch, you should be getting
 tasks launched as long as you leave at least 32 MB of memory free on each
 machine (that is, the sum of the executor memory sizes is not exactly the
 same as the total size of the machine). Then Mesos will be able to re-offer
 that machine whenever CPUs free up.

 Matei

 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 We have not tried the work-around because there are other bugs in there
 that affected our set-up, though it seems it would help.


 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

  +1 to have the work around in.
 
  I'll be investigating from the Mesos side too.
 
  Tim
 
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's
   too
  bad that this happens in fine-grained mode -- would be really good to
  fix.
  I'll see if we can get the workaround in
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
  have you tried that?
  
   Matei
  
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   Hi Matei,
  
   We have an analytics team that uses the cluster on a daily basis. They
  use two types of 'run modes':
  
   1) For running actual queries, they set the spark.executor.memory to
  something between 4 and 8GB of RAM/worker.
  
   2) A shell that takes a minimal amount of memory on workers (128MB)
   for
  prototyping out a larger query. This allows them to not take up RAM on
  the
  cluster when they do not really need it.
  
   We see the deadlocks when there are a few shells in either case. From
  the usage patterns we have, coarse-grained mode would be a challenge as
  we
  have to constantly remind people to kill their shells as soon as their
  queries finish.
  
   Am I correct in viewing Mesos in coarse-grained mode as being similar
   to
  Spark Standalone's cpu allocation behavior?
  
  
  
  
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia
   matei.zaha...@gmail.com
  wrote:
   Hey Gary, just as a workaround, note that you can use Mesos in
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will
  hold
  onto CPUs for the duration of the job.
  
   Matei
  
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   I just wanted to bring up a significant Mesos/Spark issue that makes
   the
   combo difficult to use for teams larger than 4-5 people. It's covered
   in
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
  that
   Spark's use of executors in fine-grained mode is a very different
  behavior
   than many of the other common frameworks for Mesos.
  
 




Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
My problem is that I'm not sure this workaround would solve things, given the 
issue described here (where there was a lot of memory free but it didn't get 
re-offered). If you think it does, it would be good to explain why it behaves 
like that.

Matei

On August 25, 2014 at 2:28:18 PM, Timothy Chen (tnac...@gmail.com) wrote:

Hi Matei, 

I'm going to investigate from both Mesos and Spark side will hopefully 
have a good long term solution. In the mean time having a work around 
to start with is going to unblock folks. 

Tim 

On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: 
 Anyway it would be good if someone from the Mesos side investigates this and 
 proposes a solution. The 32 MB per task hack isn't completely foolproof 
 either (e.g. people might allocate all the RAM to their executor and thus 
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this 
 one. 
 
 Matei 
 
 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com) 
 wrote: 
 
 This is kind of weird then, seems perhaps unrelated to this issue (or at 
 least to the way I understood it). Is the problem maybe that Mesos saw 0 MB 
 being freed and didn't re-offer the machine *even though there was more than 
 32 MB free overall*? 
 
 Matei 
 
 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org) 
 wrote: 
 
 I definitely saw a case where 
 
 a. the only job running was a 256m shell 
 b. I started a 2g job 
 c. a little while later the same user as in a started another 256m shell 
 
 My job immediately stopped making progress. Once user a killed his shells, 
 it started again. 
 
 This is on nodes with ~15G of memory, on which we have successfully run 8G 
 jobs. 
 
 
 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote: 
 
 BTW it seems to me that even without that patch, you should be getting 
 tasks launched as long as you leave at least 32 MB of memory free on each 
 machine (that is, the sum of the executor memory sizes is not exactly the 
 same as the total size of the machine). Then Mesos will be able to re-offer 
 that machine whenever CPUs free up. 
 
 Matei 
 
 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com) 
 wrote: 
 
 We have not tried the work-around because there are other bugs in there 
 that affected our set-up, though it seems it would help. 
 
 
 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote: 
 
  +1 to have the work around in. 
  
  I'll be investigating from the Mesos side too. 
  
  Tim 
  
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com 
  wrote: 
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's 
   too 
  bad that this happens in fine-grained mode -- would be really good to 
  fix. 
  I'll see if we can get the workaround in 
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally 
  have you tried that? 
   
   Matei 
   
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) 
  wrote: 
   
   Hi Matei, 
   
   We have an analytics team that uses the cluster on a daily basis. They 
  use two types of 'run modes': 
   
   1) For running actual queries, they set the spark.executor.memory to 
  something between 4 and 8GB of RAM/worker. 
   
   2) A shell that takes a minimal amount of memory on workers (128MB) 
   for 
  prototyping out a larger query. This allows them to not take up RAM on 
  the 
  cluster when they do not really need it. 
   
   We see the deadlocks when there are a few shells in either case. From 
  the usage patterns we have, coarse-grained mode would be a challenge as 
  we 
  have to constantly remind people to kill their shells as soon as their 
  queries finish. 
   
   Am I correct in viewing Mesos in coarse-grained mode as being similar 
   to 
  Spark Standalone's cpu allocation behavior? 
   
   
   
   
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia 
   matei.zaha...@gmail.com 
  wrote: 
   Hey Gary, just as a workaround, note that you can use Mesos in 
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will 
  hold 
  onto CPUs for the duration of the job. 
   
   Matei 
   
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) 
  wrote: 
   
   I just wanted to bring up a significant Mesos/Spark issue that makes 
   the 
   combo difficult to use for teams larger than 4-5 people. It's covered 
   in 
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is 
  that 
   Spark's use of executors in fine-grained mode is a very different 
  behavior 
   than many of the other common frameworks for Mesos. 
   
  
 
 


Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
I don't think it solves Cody's problem which still need more
investigating, but I believe it does solve the problem you described
earlier.

I just confirmed with Mesos folks that we no longer need the minimum
memory requirement so we'll be dropping that soon and the workaround
might not be needed for the next mesos release.

Tim

On Mon, Aug 25, 2014 at 3:06 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 My problem is that I'm not sure this workaround would solve things, given
 the issue described here (where there was a lot of memory free but it didn't
 get re-offered). If you think it does, it would be good to explain why it
 behaves like that.

 Matei

 On August 25, 2014 at 2:28:18 PM, Timothy Chen (tnac...@gmail.com) wrote:

 Hi Matei,

 I'm going to investigate from both Mesos and Spark side will hopefully
 have a good long term solution. In the mean time having a work around
 to start with is going to unblock folks.

 Tim

 On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
 Anyway it would be good if someone from the Mesos side investigates this
 and
 proposes a solution. The 32 MB per task hack isn't completely foolproof
 either (e.g. people might allocate all the RAM to their executor and thus
 stop being able to launch tasks), so maybe we wait on a Mesos fix for this
 one.

 Matei

 On August 25, 2014 at 1:07:15 PM, Matei Zaharia (matei.zaha...@gmail.com)
 wrote:

 This is kind of weird then, seems perhaps unrelated to this issue (or at
 least to the way I understood it). Is the problem maybe that Mesos saw 0
 MB
 being freed and didn't re-offer the machine *even though there was more
 than
 32 MB free overall*?

 Matei

 On August 25, 2014 at 12:59:59 PM, Cody Koeninger (c...@koeninger.org)
 wrote:

 I definitely saw a case where

 a. the only job running was a 256m shell
 b. I started a 2g job
 c. a little while later the same user as in a started another 256m shell

 My job immediately stopped making progress. Once user a killed his shells,
 it started again.

 This is on nodes with ~15G of memory, on which we have successfully run 8G
 jobs.


 On Mon, Aug 25, 2014 at 2:02 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 BTW it seems to me that even without that patch, you should be getting
 tasks launched as long as you leave at least 32 MB of memory free on each
 machine (that is, the sum of the executor memory sizes is not exactly the
 same as the total size of the machine). Then Mesos will be able to
 re-offer
 that machine whenever CPUs free up.

 Matei

 On August 25, 2014 at 5:05:56 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 We have not tried the work-around because there are other bugs in there
 that affected our set-up, though it seems it would help.


 On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:

  +1 to have the work around in.
 
  I'll be investigating from the Mesos side too.
 
  Tim
 
  On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia
  matei.zaha...@gmail.com
  wrote:
   Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's
   too
  bad that this happens in fine-grained mode -- would be really good to
  fix.
  I'll see if we can get the workaround in
  https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally
  have you tried that?
  
   Matei
  
   On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   Hi Matei,
  
   We have an analytics team that uses the cluster on a daily basis.
   They
  use two types of 'run modes':
  
   1) For running actual queries, they set the spark.executor.memory to
  something between 4 and 8GB of RAM/worker.
  
   2) A shell that takes a minimal amount of memory on workers (128MB)
   for
  prototyping out a larger query. This allows them to not take up RAM on
  the
  cluster when they do not really need it.
  
   We see the deadlocks when there are a few shells in either case. From
  the usage patterns we have, coarse-grained mode would be a challenge as
  we
  have to constantly remind people to kill their shells as soon as their
  queries finish.
  
   Am I correct in viewing Mesos in coarse-grained mode as being similar
   to
  Spark Standalone's cpu allocation behavior?
  
  
  
  
   On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia
   matei.zaha...@gmail.com
  wrote:
   Hey Gary, just as a workaround, note that you can use Mesos in
  coarse-grained mode by setting spark.mesos.coarse=true. Then it will
  hold
  onto CPUs for the duration of the job.
  
   Matei
  
   On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
  wrote:
  
   I just wanted to bring up a significant Mesos/Spark issue that makes
   the
   combo difficult to use for teams larger than 4-5 people. It's covered
   in
   https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
  that
   Spark's use of executors in fine-grained mode is a very different
  behavior
   than many of the other common frameworks for Mesos.
  
 



Re: Mesos/Spark Deadlock

2014-08-24 Thread Matei Zaharia
Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad 
that this happens in fine-grained mode -- would be really good to fix. I'll see 
if we can get the workaround in https://github.com/apache/spark/pull/1860 into 
Spark 1.1. Incidentally have you tried that?

Matei

On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) wrote:

Hi Matei,

We have an analytics team that uses the cluster on a daily basis.  They use two 
types of 'run modes':

1) For running actual queries, they set the spark.executor.memory to something 
between 4 and 8GB of RAM/worker.  

2) A shell that takes a minimal amount of memory on workers (128MB) for 
prototyping out a larger query.  This allows them to not take up RAM on the 
cluster when they do not really need it.

We see the deadlocks when there are a few shells in either case.  From the 
usage patterns we have, coarse-grained mode would be a challenge as we have to 
constantly remind people to kill their shells as soon as their queries finish.  

Am I correct in viewing Mesos in coarse-grained mode as being similar to Spark 
Standalone's cpu allocation behavior?




On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained 
mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the 
duration of the job.

Matei

On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

I just wanted to bring up a significant Mesos/Spark issue that makes the
combo difficult to use for teams larger than 4-5 people. It's covered in
https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that
Spark's use of executors in fine-grained mode is a very different behavior
than many of the other common frameworks for Mesos.



Re: Mesos/Spark Deadlock

2014-08-24 Thread Timothy Chen
+1 to have the work around in.

I'll be investigating from the Mesos side too.

Tim

On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
 Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too bad 
 that this happens in fine-grained mode -- would be really good to fix. I'll 
 see if we can get the workaround in https://github.com/apache/spark/pull/1860 
 into Spark 1.1. Incidentally have you tried that?

 Matei

 On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) wrote:

 Hi Matei,

 We have an analytics team that uses the cluster on a daily basis.  They use 
 two types of 'run modes':

 1) For running actual queries, they set the spark.executor.memory to 
 something between 4 and 8GB of RAM/worker.

 2) A shell that takes a minimal amount of memory on workers (128MB) for 
 prototyping out a larger query.  This allows them to not take up RAM on the 
 cluster when they do not really need it.

 We see the deadlocks when there are a few shells in either case.  From the 
 usage patterns we have, coarse-grained mode would be a challenge as we have 
 to constantly remind people to kill their shells as soon as their queries 
 finish.

 Am I correct in viewing Mesos in coarse-grained mode as being similar to 
 Spark Standalone's cpu allocation behavior?




 On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com 
 wrote:
 Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained 
 mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the 
 duration of the job.

 Matei

 On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

 I just wanted to bring up a significant Mesos/Spark issue that makes the
 combo difficult to use for teams larger than 4-5 people. It's covered in
 https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that
 Spark's use of executors in fine-grained mode is a very different behavior
 than many of the other common frameworks for Mesos.



Re: Mesos/Spark Deadlock

2014-08-23 Thread Matei Zaharia
Hey Gary, just as a workaround, note that you can use Mesos in coarse-grained 
mode by setting spark.mesos.coarse=true. Then it will hold onto CPUs for the 
duration of the job.

Matei

On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) wrote:

I just wanted to bring up a significant Mesos/Spark issue that makes the 
combo difficult to use for teams larger than 4-5 people. It's covered in 
https://issues.apache.org/jira/browse/MESOS-1688. My understanding is that 
Spark's use of executors in fine-grained mode is a very different behavior 
than many of the other common frameworks for Mesos. 


Re: Mesos/Spark Deadlock

2014-08-23 Thread Gary Malouf
Hi Matei,

We have an analytics team that uses the cluster on a daily basis.  They use
two types of 'run modes':

1) For running actual queries, they set the spark.executor.memory to
something between 4 and 8GB of RAM/worker.

2) A shell that takes a minimal amount of memory on workers (128MB) for
prototyping out a larger query.  This allows them to not take up RAM on the
cluster when they do not really need it.

We see the deadlocks when there are a few shells in either case.  From the
usage patterns we have, coarse-grained mode would be a challenge as we have
to constantly remind people to kill their shells as soon as their queries
finish.

Am I correct in viewing Mesos in coarse-grained mode as being similar to
Spark Standalone's cpu allocation behavior?




On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Hey Gary, just as a workaround, note that you can use Mesos in
 coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold
 onto CPUs for the duration of the job.

 Matei

 On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com)
 wrote:

 I just wanted to bring up a significant Mesos/Spark issue that makes the
 combo difficult to use for teams larger than 4-5 people. It's covered in
 https://issues.apache.org/jira/browse/MESOS-1688. My understanding is
 that
 Spark's use of executors in fine-grained mode is a very different behavior
 than many of the other common frameworks for Mesos.