Re: Review Request 25035: Fix for MESOS-1688

Vinod Kone Wed, 03 Sep 2014 10:40:24 -0700


> On Sept. 2, 2014, 5:53 p.m., Vinod Kone wrote:
> > src/master/hierarchical_allocator_process.hpp, lines 825-840
> > <https://reviews.apache.org/r/25035/diff/2/?file=672690#file672690line825>
> >
> >     I suggest to delete this comment altogether because frameworks can 
> > utilize offers with either no memory or no cpus based on how they allocate 
> > resources between executors and tasks. Also, change the code to 
> >     
> >     ```
> >     return (cpus.isSome() && cpus.get() >= MIN_CPUS) || 
> >            (mem.isSome() && mem.get() >= MIN_MEM);
> >     ```
> >     
> >     The important thing to note here is that executors should be launched 
> > with both cpus *and* memory. Mind adding a TODO in ResourceUsageChecker in 
> > master.cpp to that effect and log a warning? The reason we are doing a TODO 
> > and warning instead of fixing ResourceUsageChecker is to give frameworks 
> > (e.g., Spark) time to update their code to adhere to these new semantics. 
> > We will enforce this in the next release. Sounds good?
> 
> Martin Weindel wrote:
>     Ok, I will take a look in allocator_tests and see how extend it.
>     
>     Your suggested code change was actually my first try. But there were test 
> cases in allocator_tests which failed with this code.
>     I have not the time to investigate the allocation algorithm and its 
> constraints to really understand the cause.
>     So either somebody with better understanding for the allocation algorithm 
> takes a closer look at this or we keep my suggested variant.
>     It would be good if we agree on this, before I write the test.
>     
>     BTW, can you explain the background of the importance that "executors 
> should be launched with both cpus and memory"?
>     What's the difference between these two allocations?
>     a) executor: 0 cpu, its 4 parallel tasks: each 1 cpu
>     b) executor: 0.1 cpu, its 4 parallel tasks: each 1 cpu
>     
>     Is it correct that case b) the framework can only run 3 parallel tasks if 
> there are 4 cpu resources allocatable?
>     That seems to be a waste of resources only to make some conservative 
> estimation for the cpu resources really consumed by the executor itself.
>     Why is it so important to reserve cpu resources for the little overhead 
> the executor may cause by calculating the next tasks and communicating with 
> Mesos and its tasks?


```
Your suggested code change was actually my first try. But there were test cases 
in allocator_tests which failed with this code.
```

I see. If you can paste the logs of the tests that fail I'll be happy to help 
diagnose/fix. Alternatively, add a note on why you are only doing this for cpu 
and not for memory.


```
executors should be launched with both cpus and memory
``` 

This is because executor is an actual unix process that is launched by the 
slave. If an executor doesn't specify cpus, what should do the cpu limits be 
for that executor *when there are no tasks running* on it? If no cpu limits are 
set then it might starve other executors/tasks on the slave violating isolation 
guarantees. Same goes with memory. Moreover, the current containerizer/isolator 
code will throw failures when using such an executor, e.g., when the last task 
on the executor finishes and Containerizer::update() is called with 0 cpus or 0 
mem.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review52048
-----------------------------------------------------------


On Sept. 2, 2014, 5:52 p.m., Martin Weindel wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25035/
> -----------------------------------------------------------
> 
> (Updated Sept. 2, 2014, 5:52 p.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Bugs: MESOS-1688
>     https://issues.apache.org/jira/browse/MESOS-1688
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> As already explained in JIRA MESOS-1688, there are schedulers allocating 
> memory only for the executor and not for tasks. For tasks only CPU resources 
> are allocated in this case.
> Such a scheduler does not get offered any idle CPUs if the slave has nearly 
> used up all memory.
> This can easily lead to a dead lock (in the application, not in Mesos).
> 
> Simple example:
> 1. Scheduler allocates all memory of a slave for an executor
> 2. Scheduler launches a task for this executor (allocating 1 CPU)
> 3. Task finishes: 1 CPU , 0 MB memory allocatable.
> 4. No offers are made, as no memory is left. Scheduler will wait for offers 
> forever. Dead lock in the application.
> 
> To fix this problem, offers must be made if CPU resources are allocatable 
> without considering allocatable memory
> 
> 
> Diffs
> -----
> 
>   src/master/hierarchical_allocator_process.hpp 
> 34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 
> 
> Diff: https://reviews.apache.org/r/25035/diff/
> 
> 
> Testing
> -------
> 
> Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
> running multiple parallel Spark jobs in "fine-grained" mode to saturate 
> allocatable memory. The jobs run fine now. This load always caused a dead 
> lock in all Spark jobs within one minute with the unpatched Mesos.
> 
> 
> Thanks,
> 
> Martin Weindel
> 
>

Re: Review Request 25035: Fix for MESOS-1688

Reply via email to