We have not tried the work-around because there are other bugs in there that affected our set-up, though it seems it would help.
On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen <tnac...@gmail.com> wrote: > +1 to have the work around in. > > I'll be investigating from the Mesos side too. > > Tim > > On Sun, Aug 24, 2014 at 9:52 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > > Yeah, Mesos in coarse-grained mode probably wouldn't work here. It's too > bad that this happens in fine-grained mode -- would be really good to fix. > I'll see if we can get the workaround in > https://github.com/apache/spark/pull/1860 into Spark 1.1. Incidentally > have you tried that? > > > > Matei > > > > On August 23, 2014 at 4:30:27 PM, Gary Malouf (malouf.g...@gmail.com) > wrote: > > > > Hi Matei, > > > > We have an analytics team that uses the cluster on a daily basis. They > use two types of 'run modes': > > > > 1) For running actual queries, they set the spark.executor.memory to > something between 4 and 8GB of RAM/worker. > > > > 2) A shell that takes a minimal amount of memory on workers (128MB) for > prototyping out a larger query. This allows them to not take up RAM on the > cluster when they do not really need it. > > > > We see the deadlocks when there are a few shells in either case. From > the usage patterns we have, coarse-grained mode would be a challenge as we > have to constantly remind people to kill their shells as soon as their > queries finish. > > > > Am I correct in viewing Mesos in coarse-grained mode as being similar to > Spark Standalone's cpu allocation behavior? > > > > > > > > > > On Sat, Aug 23, 2014 at 7:16 PM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > > Hey Gary, just as a workaround, note that you can use Mesos in > coarse-grained mode by setting spark.mesos.coarse=true. Then it will hold > onto CPUs for the duration of the job. > > > > Matei > > > > On August 23, 2014 at 7:57:30 AM, Gary Malouf (malouf.g...@gmail.com) > wrote: > > > > I just wanted to bring up a significant Mesos/Spark issue that makes the > > combo difficult to use for teams larger than 4-5 people. It's covered in > > https://issues.apache.org/jira/browse/MESOS-1688. My understanding is > that > > Spark's use of executors in fine-grained mode is a very different > behavior > > than many of the other common frameworks for Mesos. > > >