> On Sept. 15, 2014, 9:02 nachm., Vinod Kone wrote: > > CHANGELOG, lines 1-9 > > <https://reviews.apache.org/r/25035/diff/7/?file=688718#file688718line1> > > > > Thinking a bit more about this and talking to others. Adding > > deprecations in a bug fix release is bit weird. > > > > 2 options. > > > > 1) We can land this feature in 0.21.0 and not 0.20.1. That way we will > > do deprecation warning in 0.21.0 and disallow cpu/mem only executors in > > 0.22.0. This is the most straightforward. > > > > 2) Land this in 0.20.1, but the deprecation warning, in changelog (and > > ResourceUsageChecker?), happens in 0.21.0. The disallowing hapens in > > 0.22.0. This is bit weird but not too bad if you absolutely need this in > > 0.20.1. > > > > Considering 0.21.0 would happen in a month or so, I prefer #1. Does > > that work for you?
For me it only matters to fix the problem in the near future. So I adjusted the patch for integration with 0.21.0. - Martin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/25035/#review53362 ----------------------------------------------------------- On Sept. 16, 2014, 9:05 nachm., Martin Weindel wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25035/ > ----------------------------------------------------------- > > (Updated Sept. 16, 2014, 9:05 nachm.) > > > Review request for mesos and Vinod Kone. > > > Bugs: MESOS-1688 > https://issues.apache.org/jira/browse/MESOS-1688 > > > Repository: mesos-git > > > Description > ------- > > As already explained in JIRA MESOS-1688, there are schedulers allocating > memory only for the executor and not for tasks. For tasks only CPU resources > are allocated in this case. > Such a scheduler does not get offered any idle CPUs if the slave has nearly > used up all memory. > This can easily lead to a dead lock (in the application, not in Mesos). > > Simple example: > 1. Scheduler allocates all memory of a slave for an executor > 2. Scheduler launches a task for this executor (allocating 1 CPU) > 3. Task finishes: 1 CPU , 0 MB memory allocatable. > 4. No offers are made, as no memory is left. Scheduler will wait for offers > forever. Dead lock in the application. > > To fix this problem, offers must be made if CPU resources are allocatable > without considering allocatable memory > > > Diffs > ----- > > CHANGELOG a822cc4 > src/common/resources.cpp edf36b1 > src/master/constants.cpp faa1503 > src/master/hierarchical_allocator_process.hpp 34f8cd6 > src/master/master.cpp 18464ba > src/tests/allocator_tests.cpp 774528a > > Diff: https://reviews.apache.org/r/25035/diff/ > > > Testing > ------- > > Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested > running multiple parallel Spark jobs in "fine-grained" mode to saturate > allocatable memory. The jobs run fine now. This load always caused a dead > lock in all Spark jobs within one minute with the unpatched Mesos. > > > Thanks, > > Martin Weindel > >