Re: Review Request 50480: Multiple executor support in Scheduler

2016-07-26 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50480/#review143664
---




CHANGELOG (lines 1 - 4)


This file is updated automatically by our release script. You can revert 
these changes.



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
 (line 165)


Kill this line?



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
 (lines 33 - 34)


Fix indentation. Style for continuation indents should be:

public ExecutorSettings(
ImmutableMap<...> config,
boolean populateDiscoveryInfo) {

  // code continues here after blank line above



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
 (lines 40 - 45)


Instead of creating separate methods for get config and checking a config's 
existence, how about changing `getExecutorConfig` to return 
`Optional` and using `Optional#isPresent` as the indicator of 
whether config exists for that name?



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
 (line 61)


s/An map/A map



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
 (line 82)


House style is: `Map foo = Maps.newHashMap()`. It's a legacy of the 
project's pre-Java 7 origins. In theory at some point we could just move to 
Java 7 diamond syntax (`... = new HashMap<>()`), but until that time, best to 
remain consistent.



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
 (lines 97 - 98)


multiExecutors.put(
e.getName(),
new ExecutorConfig(...))



src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
 (line 100)


We should probably return an `ImmutableMap` here.



src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java (line 230)


Instead of passing in the task, just to use it to get the executor name, 
why not just pass in the executor name?



src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java (line 271)


Same here.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (lines 
141 - 143)


Should fit on one line?



src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (line 
158)


Are we guaranteed that the executor config exists here? The previous branch 
is checking if executorConfig is set and that the config exists, but won't we 
fall into this branch if executorConfig is not set?


- Joshua Cohen


On July 27, 2016, 2:04 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50480/
> ---
> 
> (Updated July 27, 2016, 2:04 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adds support for using multiple executors in a single scheduler.
> 
> Added warnings for pre-emption performance degradation when increasing 
> executor overhead.
> 
> Made executor config file require a JSON array.
> 
> Modified the way in which Thermos and any custom executors are loaded at 
> startup.
> 
> Thermos now always exists regardless of any other executors being used.
> 
> Jobs sent where no executor configuration is found for them are rejected.
> 
> 
> Diffs
> -
> 
>   CHANGELOG fc6a46d77ebf889c8f60402c95e8a7472980ba1c 
>   RELEASE-NOTES.md d46420ef876b88d9ab68c4816f1c2d6780aba702 
>   docs/operations/configuration.md 0615c545f560597683f727c2425de4cc4e82c364 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> c43e04aee0df8988ed3af9d463dd6747d6631e3b 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  93ed3600268f231b0e53ceb6b3674ff742d94407 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  8b7f8dcf4d8e0372208e636b3f57fc159272ecc7 
>   
> 

Re: Review Request 50478: Improve `executorLost` error message.

2016-07-26 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50478/#review143663
---


Ship it!




Ship It!

- Joshua Cohen


On July 27, 2016, 1:25 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50478/
> ---
> 
> (Updated July 27, 2016, 1:25 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Dmitriy Shirchenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve `executorLost` error message by including the slave id.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImpl.java 
> 433fc904e419b533f879ad34284ce043118f33d4 
> 
> Diff: https://reviews.apache.org/r/50478/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 50480: Multiple executor support in Scheduler

2016-07-26 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50480/#review143653
---


Ship it!




Master (08792d4) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 27, 2016, 2:04 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50480/
> ---
> 
> (Updated July 27, 2016, 2:04 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adds support for using multiple executors in a single scheduler.
> 
> Added warnings for pre-emption performance degradation when increasing 
> executor overhead.
> 
> Made executor config file require a JSON array.
> 
> Modified the way in which Thermos and any custom executors are loaded at 
> startup.
> 
> Thermos now always exists regardless of any other executors being used.
> 
> Jobs sent where no executor configuration is found for them are rejected.
> 
> 
> Diffs
> -
> 
>   CHANGELOG fc6a46d77ebf889c8f60402c95e8a7472980ba1c 
>   RELEASE-NOTES.md d46420ef876b88d9ab68c4816f1c2d6780aba702 
>   docs/operations/configuration.md 0615c545f560597683f727c2425de4cc4e82c364 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> c43e04aee0df8988ed3af9d463dd6747d6631e3b 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  93ed3600268f231b0e53ceb6b3674ff742d94407 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorModule.java
>  8b7f8dcf4d8e0372208e636b3f57fc159272ecc7 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettings.java
>  e919d3f2e2f86c26c0448029b06277a3a41a6941 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader.java
>  b74edf46d35ac99164ec1bcf33edf36556baf9ed 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> cbbd6be94aa857b02cd7b45bfb2f0216d9a1cec3 
>   src/main/java/org/apache/aurora/scheduler/mesos/TestExecutorSettings.java 
> 1dfa97e659c2fca8308cb592f37d14328e4b42bc 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java
>  ee6fe95eaa41c7952a974ebea069b3de6ed82994 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 3b84dbcfde9ae686110409173d742b3fa86ac764 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  147327036a872c9ccd03e17daaaf8cb1df489843 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoaderTest.java
>  7de7c46e6e1f478e25f7362d1d060b7f765dcb36 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  d34e12f6e0837fa43ea9b4e2a17db579e0ee10a2 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 500fd435b4c72b25abd8df7eea6b3850edc96e99 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  7eb1714d14581a6ab25e85d36a1f3e973380c536 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fba427bd327e7f63b640c8b8753bfdeec3ee31e7 
>   src/test/java/org/apache/aurora/scheduler/thrift/Fixtures.java 
> a79b0f413e603374347f8ce5765fb6e71bc9a5b5 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  c0fe84371ff2f9d47721126a9c356180f7c166de 
>   
> src/test/resources/org/apache/aurora/scheduler/configuration/executor/test-missing-field.json
>  464617028b1e765a563a3e11f70d66089ede6866 
>   
> src/test/resources/org/apache/aurora/scheduler/configuration/executor/test-multiple-executor.json
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/configuration/executor/test-single-executor.json
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/configuration/executor/test-thermos-executor.json
>  114eb4f6c3eaeca3ad976e4907a3ec32c2339a09 
> 
> Diff: https://reviews.apache.org/r/50480/diff/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 50478: Improve `executorLost` error message.

2016-07-26 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50478/#review143651
---



Master (08792d4) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On July 27, 2016, 1:25 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50478/
> ---
> 
> (Updated July 27, 2016, 1:25 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Dmitriy Shirchenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve `executorLost` error message by including the slave id.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosSchedulerImpl.java 
> 433fc904e419b533f879ad34284ce043118f33d4 
> 
> Diff: https://reviews.apache.org/r/50478/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 47853: Isolate the executor's filesystem from the task's.

2016-07-26 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47853/
---

(Updated July 26, 2016, 4:57 p.m.)


Review request for Aurora, Jie Yu, Maxim Khutornenko, and Stephan Erb.


Changes
---

Update to use new mesos-containerizer launch command to isolate the task's 
filesystem rather than replicating all of that logic from Mesos into Thermos.

A couple of notes:

1) this depends on Mesos 1.0.0, so I won't ship this until that has shipped and 
we've upgraded Aurora to depend on it.
2) End to end tests are failing for the GPU job which I'm assuming is related 
to the fact that I've upgraded my vagrant image to Mesos 1.0.0 but have not 
upgraded Aurora.


Repository: aurora


Description
---

This changes the approach to launching tasks with filesystem images in the 
unified containerizer. Instead of adding an `Image` to the `MesosContainer`, we 
instead add the task filesystem as a `Volume` with an associated image. This 
image is mounted in the mesos directory under the `taskfs` path. The executor, 
on start up does the following:

1. Creates user/group under the taskfs root.
2. `pivot_root`s into the taskfs, while bind mounting the sandbox under that 
root as well as mounting procfs.
3. From there, task execution is essentially unchanged minus some slight 
changes to the environment depending on whether we're running in a pivoted root.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d66208490aff6ea8af4c737845fa2cf13617529 
  examples/vagrant/upstart/aurora-scheduler.conf 
954ddb48e923e0a2a29c415975c4f69afcad37b5 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
cbbd6be94aa857b02cd7b45bfb2f0216d9a1cec3 
  src/main/python/apache/aurora/executor/bin/thermos_executor_main.py 
0ef3856abc0df5403e3443ac35ba8d6940de8938 
  src/main/python/apache/aurora/executor/common/sandbox.py 
6d8b7f58b639a60cc5a0c0c9ef98dfaaa8a64486 
  src/main/python/apache/aurora/executor/thermos_task_runner.py 
3896e3841562600379705dbf78a6f62728246348 
  src/main/python/apache/thermos/core/BUILD 
1094664e112cc71af37835f32037e9eb6d047202 
  src/main/python/apache/thermos/core/process.py 
1791b5ff9a36eef7470bef9a6ebbafaf0ab05ca3 
  src/main/python/apache/thermos/core/runner.py 
fe971edaa2448afaf0fc342e11bc370de96ef5e4 
  src/main/python/apache/thermos/runner/thermos_runner.py 
0d06e8e2ac78d26ba8f63744853eb5ce3f6aced6 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
500fd435b4c72b25abd8df7eea6b3850edc96e99 
  src/test/python/apache/aurora/executor/common/test_sandbox.py 
63f46e25bdd6fa387dd64975d7b95ee2659f5874 
  src/test/python/apache/thermos/core/test_process.py 
77f644c09116266ce02479b9a80403aa68767bd6 
  src/test/sh/org/apache/aurora/e2e/Dockerfile 
6fdea3d28760f59235c51c5b6913d2ee0172ef1a 
  src/test/sh/org/apache/aurora/e2e/Dockerfile.netcat PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
bf6ef69401782d6c65a2d72e9270e52d1ad9fb9f 
  src/test/sh/org/apache/aurora/e2e/http/http_example_bad_healthcheck.aurora 
edeafbea288c95c19c82aede09717840b569528d 
  src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora 
9569eec2a32e0ea5212517c0082fc906036d1e57 
  src/test/sh/org/apache/aurora/e2e/run-server.sh PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
47bd94d1ce2aeffaefb67ac8325fc2f1d21c934c 

Diff: https://reviews.apache.org/r/47853/diff/


Testing
---

Lots of manual testing, e2e tests, etc.

I didn't add much coverage on the thermos side of things because it seemed like 
this was better served by the e2e tests than by doing a bunch of 
subprocess.check_call mocking. On the e2e front I created a new Dockerfile that 
sets up a much slimmer filesystem image that explicitly does not include python 
to ensure that the executor's filesystem is truly isolated.


Thanks,

Joshua Cohen



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-26 Thread Mehrdad Nurolahzade


> On July 26, 2016, 7:36 a.m., Joshua Cohen wrote:
> > I know this is already landed, but it would be nice to follow this up with 
> > a test case if possible?
> 
> Mehrdad Nurolahzade wrote:
> The logic has two test cases: 
> (https://github.com/apache/aurora/blob/380307ac1878cb71bc86d2ccd7887192161254cf/src/test/python/apache/aurora/client/test_config.py#L228-L243).
> 
> Joshua Cohen wrote:
> Yes, but they weren't failing though? It seems they were not sufficient 
> to catch this problem.

Alright, I'll add a test case with moustache variables (to prove that it works 
now with bindings).


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review143525
---


On July 25, 2016, 7:36 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 25, 2016, 7:36 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/config.py 
> 96cd9ddf77cac449d68537cd652bca03ef87d75d 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-26 Thread Joshua Cohen


> On July 26, 2016, 2:36 p.m., Joshua Cohen wrote:
> > I know this is already landed, but it would be nice to follow this up with 
> > a test case if possible?
> 
> Mehrdad Nurolahzade wrote:
> The logic has two test cases: 
> (https://github.com/apache/aurora/blob/380307ac1878cb71bc86d2ccd7887192161254cf/src/test/python/apache/aurora/client/test_config.py#L228-L243).

Yes, but they weren't failing though? It seems they were not sufficient to 
catch this problem.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review143525
---


On July 26, 2016, 2:36 a.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 26, 2016, 2:36 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/config.py 
> 96cd9ddf77cac449d68537cd652bca03ef87d75d 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 50432: AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710

2016-07-26 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50432/#review143525
---



I know this is already landed, but it would be nice to follow this up with a 
test case if possible?

- Joshua Cohen


On July 26, 2016, 2:36 a.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50432/
> ---
> 
> (Updated July 26, 2016, 2:36 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1741
> https://issues.apache.org/jira/browse/AURORA-1741
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1741 Fix pystachio binding bug introduced by AURORA-1710
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/config.py 
> 96cd9ddf77cac449d68537cd652bca03ef87d75d 
> 
> Diff: https://reviews.apache.org/r/50432/diff/
> 
> 
> Testing
> ---
> 
> ```
> ./build-support/jenkins/build.sh
> 
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>