Re: Review Request 25035: Updated allocator to offer cpu only or memory only resources.

2014-09-17 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 17, 2014, 6:36 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

Updated the summary.

Also edited the CHANGELOG to point to a new ticket regarding deprecation.

I'll commit this now.


Summary (updated)
-

Updated allocator to offer cpu only or memory only resources.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-16 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 16, 2014, 9:05 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

Adjusted CHANGELOG and comments for integration with 0.21.0 instead 0.20.1.
Improved warning log.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-16 Thread Martin Weindel


 On Sept. 15, 2014, 9:02 nachm., Vinod Kone wrote:
  CHANGELOG, lines 1-9
  https://reviews.apache.org/r/25035/diff/7/?file=688718#file688718line1
 
  Thinking a bit more about this and talking to others. Adding 
  deprecations in a bug fix release is bit weird.
  
  2 options. 
  
  1) We can land this feature in 0.21.0 and not 0.20.1. That way we will 
  do deprecation warning in 0.21.0 and disallow cpu/mem only executors in 
  0.22.0. This is the most straightforward.
  
  2) Land this in 0.20.1, but the deprecation warning, in changelog (and 
  ResourceUsageChecker?), happens in 0.21.0. The disallowing hapens in 
  0.22.0. This is bit weird but not too bad if you absolutely need this in 
  0.20.1. 
  
  Considering 0.21.0 would happen in a month or so, I prefer #1. Does 
  that work for you?

For me it only matters to fix the problem in the near future.
So I adjusted the patch for integration with 0.21.0.


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review53362
---


On Sept. 16, 2014, 9:05 nachm., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 16, 2014, 9:05 nachm.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   CHANGELOG a822cc4 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel
 




Re: Review Request 25035: Fix for MESOS-1688

2014-09-16 Thread Martin Weindel


 On Sept. 15, 2014, 3:23 nachm., Timothy St. Clair wrote:
  src/master/hierarchical_allocator_process.hpp, line 837
  https://reviews.apache.org/r/25035/diff/7/?file=688721#file688721line837
 
  What happens in the case where all CPUs are taken but memory is 
  available?  It looks like it will return (true), but this should not be 
  possible. 
  
  I think you want to give an offer in the case where there are CPU 
  resources available, but memory is consumed by the executor.
 
 Vinod Kone wrote:
 Giving memory only resources is ok as long as it is used for a task and 
 not an executor. See my comments above.
 
 Timothy St. Clair wrote:
 Could you please add a detailed comment in the code above the mod, as on 
 1st inspection it leaves me still feeling unsettled.

I agree with Vinod. An executor may make use of additional offered memory, e.g 
for expanding a cache.
In this scenario, the already allocated CPU resources are sufficient.


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review53343
---


On Sept. 16, 2014, 9:05 nachm., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 16, 2014, 9:05 nachm.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   CHANGELOG a822cc4 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel
 




Re: Review Request 25035: Fix for MESOS-1688

2014-09-13 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 13, 2014, 6:56 nachm.)


Review request for mesos and Vinod Kone.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-13 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 13, 2014, 7:10 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

improved understandability of patch in Resources::find()


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  CHANGELOG a822cc4 
  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 10, 2014, 10 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

fixed review issues


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  src/common/resources.cpp edf36b1 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-10 Thread Martin Weindel


 On Sept. 9, 2014, 7:10 nachm., Vinod Kone wrote:
  src/master/master.cpp, line 1901
  https://reviews.apache.org/r/25035/diff/4/?file=682182#file682182line1901
 
  I like these warnings.
  
  Are you planning to get this in to 0.20.1 or 0.21.0 ? If the former, 
  can you add this to the list of deprecations in CHANGELOG.

Would be nice to see this in 0.20.1.
But it is not clear to me, how to update the CHANGELOG. There is no section for 
upcoming releases.


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review52763
---


On Sept. 10, 2014, 10 nachm., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 10, 2014, 10 nachm.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/common/resources.cpp edf36b1 
   src/master/constants.cpp faa1503 
   src/master/hierarchical_allocator_process.hpp 34f8cd6 
   src/master/master.cpp 18464ba 
   src/tests/allocator_tests.cpp 774528a 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel
 




Re: Review Request 25035: Fix for MESOS-1688

2014-09-06 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 6, 2014, 6:37 nachm.)


Review request for mesos and Vinod Kone.


Changes
---

- allow pure cpus or mem offers
- added tests in allocate_tests
- added log warning in ResourceUsageChecker
- fixed Resources::operator = and ::find to deal correctly with zero resources
- added constant MIN_CPUS_EXECUTOR


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  src/common/resources.cpp edf36b1 
  src/master/constants.hpp ce7995b 
  src/master/constants.cpp faa1503 
  src/master/hierarchical_allocator_process.hpp 34f8cd6 
  src/master/master.cpp 18464ba 
  src/tests/allocator_tests.cpp 774528a 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-02 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Sept. 2, 2014, 5:52 p.m.)


Review request for mesos and Vinod Kone.


Changes
---

I'll shepeherd this -- @vinodkone.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs
-

  src/master/hierarchical_allocator_process.hpp 
34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-09-02 Thread Martin Weindel


 On Sept. 2, 2014, 5:53 nachm., Vinod Kone wrote:
  src/master/hierarchical_allocator_process.hpp, lines 825-840
  https://reviews.apache.org/r/25035/diff/2/?file=672690#file672690line825
 
  I suggest to delete this comment altogether because frameworks can 
  utilize offers with either no memory or no cpus based on how they allocate 
  resources between executors and tasks. Also, change the code to 
  
  ```
  return (cpus.isSome()  cpus.get() = MIN_CPUS) || 
 (mem.isSome()  mem.get() = MIN_MEM);
  ```
  
  The important thing to note here is that executors should be launched 
  with both cpus *and* memory. Mind adding a TODO in ResourceUsageChecker in 
  master.cpp to that effect and log a warning? The reason we are doing a TODO 
  and warning instead of fixing ResourceUsageChecker is to give frameworks 
  (e.g., Spark) time to update their code to adhere to these new semantics. 
  We will enforce this in the next release. Sounds good?

Ok, I will take a look in allocator_tests and see how extend it.

Your suggested code change was actually my first try. But there were test cases 
in allocator_tests which failed with this code.
I have not the time to investigate the allocation algorithm and its constraints 
to really understand the cause.
So either somebody with better understanding for the allocation algorithm takes 
a closer look at this or we keep my suggested variant.
It would be good if we agree on this, before I write the test.

BTW, can you explain the background of the importance that executors should be 
launched with both cpus and memory?
What's the difference between these two allocations?
a) executor: 0 cpu, its 4 parallel tasks: each 1 cpu
b) executor: 0.1 cpu, its 4 parallel tasks: each 1 cpu

Is it correct that case b) the framework can only run 3 parallel tasks if there 
are 4 cpu resources allocatable?
That seems to be a waste of resources only to make some conservative estimation 
for the cpu resources really consumed by the executor itself.
Why is it so important to reserve cpu resources for the little overhead the 
executor may cause by calculating the next tasks and communicating with Mesos 
and its tasks?


- Martin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/#review52048
---


On Sept. 2, 2014, 5:52 nachm., Martin Weindel wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/25035/
 ---
 
 (Updated Sept. 2, 2014, 5:52 nachm.)
 
 
 Review request for mesos and Vinod Kone.
 
 
 Bugs: MESOS-1688
 https://issues.apache.org/jira/browse/MESOS-1688
 
 
 Repository: mesos-git
 
 
 Description
 ---
 
 As already explained in JIRA MESOS-1688, there are schedulers allocating 
 memory only for the executor and not for tasks. For tasks only CPU resources 
 are allocated in this case.
 Such a scheduler does not get offered any idle CPUs if the slave has nearly 
 used up all memory.
 This can easily lead to a dead lock (in the application, not in Mesos).
 
 Simple example:
 1. Scheduler allocates all memory of a slave for an executor
 2. Scheduler launches a task for this executor (allocating 1 CPU)
 3. Task finishes: 1 CPU , 0 MB memory allocatable.
 4. No offers are made, as no memory is left. Scheduler will wait for offers 
 forever. Dead lock in the application.
 
 To fix this problem, offers must be made if CPU resources are allocatable 
 without considering allocatable memory
 
 
 Diffs
 -
 
   src/master/hierarchical_allocator_process.hpp 
 34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 
 
 Diff: https://reviews.apache.org/r/25035/diff/
 
 
 Testing
 ---
 
 Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
 running multiple parallel Spark jobs in fine-grained mode to saturate 
 allocatable memory. The jobs run fine now. This load always caused a dead 
 lock in all Spark jobs within one minute with the unpatched Mesos.
 
 
 Thanks,
 
 Martin Weindel
 




Re: Review Request 25035: Fix for MESOS-1688

2014-08-30 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Aug. 30, 2014, 6:34 nachm.)


Review request for mesos.


Changes
---

uploaded same diff once again


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs (updated)
-

  src/master/hierarchical_allocator_process.hpp 
34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Re: Review Request 25035: Fix for MESOS-1688

2014-08-26 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

(Updated Aug. 26, 2014, 7:53 vorm.)


Review request for mesos.


Changes
---

added manual testing


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs
-

  src/master/hierarchical_allocator_process.hpp 
34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 

Diff: https://reviews.apache.org/r/25035/diff/


Testing (updated)
---

Deployed patched Mesos 0.19.1 on a small cluster with 3 slaves and tested 
running multiple parallel Spark jobs in fine-grained mode to saturate 
allocatable memory. The jobs run fine now. This load always caused a dead lock 
in all Spark jobs within one minute with the unpatched Mesos.


Thanks,

Martin Weindel



Review Request 25035: Fix for MESOS-1688

2014-08-25 Thread Martin Weindel

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25035/
---

Review request for mesos.


Bugs: MESOS-1688
https://issues.apache.org/jira/browse/MESOS-1688


Repository: mesos-git


Description
---

As already explained in JIRA MESOS-1688, there are schedulers allocating memory 
only for the executor and not for tasks. For tasks only CPU resources are 
allocated in this case.
Such a scheduler does not get offered any idle CPUs if the slave has nearly 
used up all memory.
This can easily lead to a dead lock (in the application, not in Mesos).

Simple example:
1. Scheduler allocates all memory of a slave for an executor
2. Scheduler launches a task for this executor (allocating 1 CPU)
3. Task finishes: 1 CPU , 0 MB memory allocatable.
4. No offers are made, as no memory is left. Scheduler will wait for offers 
forever. Dead lock in the application.

To fix this problem, offers must be made if CPU resources are allocatable 
without considering allocatable memory


Diffs
-

  src/master/hierarchical_allocator_process.hpp 
34f8cd658920b36b1062bd3b7f6bfbd1bcb6bb52 

Diff: https://reviews.apache.org/r/25035/diff/


Testing
---

`make` and `make check` executed.
Allocation tests succeeded.
On my machine the test `MasterTest.MetricsInStatsEndpoint` failed both with and 
without the patch. So I'm not sure if all tests were executed.


Thanks,

Martin Weindel