Re: Review Request 57487: Implementation of Dynamic Reservations Proposal

2017-04-17 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57487/#review172163
---



Master (b847db8) is red with this patch.
  ./build-support/jenkins/build.sh

Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:commons:generateThriftResources
:commons:processResources
:commons:classes
:commons:jar
:compileJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java:250:
 warning: [deprecation] hasRole() in FrameworkInfo has been deprecated
  && (!frameworkInfoFactory.getFrameworkInfo().hasRole()
  ^
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java:251:
 warning: [deprecation] getRole() in FrameworkInfo has been deprecated
  || frameworkInfoFactory.getFrameworkInfo().getRole().isEmpty())) {
^
error: warnings found and -Werror specified
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java:400:
 warning: [deprecation] getRole() in FrameworkInfo has been deprecated
.setRole(frameworkInfoFactory.getFrameworkInfo().getRole())
^
1 error
3 warnings
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 2.599 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On April 18, 2017, 1:42 a.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57487/
> ---
> 
> (Updated April 18, 2017, 1:42 a.m.)
> 
> 
> Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Esteemed reviewers, here is the latest iteration on the implementation of 
> dynamic reservations. Some changes are merging of the patches into a single 
> one, updated design document with a more high level overview and user stories 
> told from an operator’s point of view. Unit TESTS are going to be done as 
> soon as we agree on the approach, as I have tested this patch on local 
> vagrant and a multi-node dev cluster. Jenkins build is expected to fail as 
> tested are incomplete.
> 
> For reference, here are previous two patches which feedback I addressed in 
> this new single patch. 
> Previous 2 patches:
> https://reviews.apache.org/r/56690/
> https://reviews.apache.org/r/56691/
> 
> RFC document: 
> https://docs.google.com/document/d/15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A
> Design Doc [UPDATED]: 
> https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE
> 
> 
> Diffs
> -
> 
>   examples/vagrant/mesos_config/etc_mesos-slave/resources 
> aa0e97e1c4a6c1a76cc712549159db9336d051eb 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 63fcc87be653835cb3c3f25dae4164f1d7c8d4da 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> f2296a9d7a88be7e43124370edecfe64415df00f 
>   src/jmh/java/org/apache/aurora/benchmark/fakes/FakeOfferManager.java 
> 6f2ca35c5d83dde29c24865b4826d4932e96da80 
>   src/main/java/org/apache/aurora/scheduler/HostOffer.java 
> bc40d0798f40003cab5bf6efe607217e4d5de9f1 
>   src/main/java/org/apache/aurora/scheduler/TaskVars.java 
> 676dfd9f9d7ee0633c05424f788fd0ab116976bb 
>   src/main/java/org/apache/aurora/scheduler/TierInfo.java 
> c45b949ae7946fc92d7e62f94696ddc4f0790cfa 
>   src/main/java/org/apache/aurora/scheduler/TierManager.java 
> c6ad2b1c48673ca2c14ddd308684d81ce536beca 
>   src/main/java/org/apache/aurora/scheduler/base/I

Re: Review Request 57487: Implementation of Dynamic Reservations Proposal

2017-04-17 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57487/
---

(Updated April 18, 2017, 1:42 a.m.)


Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji.


Changes
---

Adding intergation tests for tasks with reserved tier. Adding additional test 
to make sure reservations are done correctly.
Keeping reservations for ports stable resulting in same ports getting assigned 
to tasks between upgrades (though port mappings are not stable: this can be 
future work unless someone objects).


Repository: aurora


Description
---

Esteemed reviewers, here is the latest iteration on the implementation of 
dynamic reservations. Some changes are merging of the patches into a single 
one, updated design document with a more high level overview and user stories 
told from an operator’s point of view. Unit TESTS are going to be done as soon 
as we agree on the approach, as I have tested this patch on local vagrant and a 
multi-node dev cluster. Jenkins build is expected to fail as tested are 
incomplete.

For reference, here are previous two patches which feedback I addressed in this 
new single patch. 
Previous 2 patches:
https://reviews.apache.org/r/56690/
https://reviews.apache.org/r/56691/

RFC document: 
https://docs.google.com/document/d/15n29HSQPXuFrnxZAgfVINTRP1Iv47_jfcstJNuMwr5A
Design Doc [UPDATED]: 
https://docs.google.com/document/d/1L2EKEcKKBPmuxRviSUebyuqiNwaO-2hsITBjt3SgWvE


Diffs (updated)
-

  examples/vagrant/mesos_config/etc_mesos-slave/resources 
aa0e97e1c4a6c1a76cc712549159db9336d051eb 
  examples/vagrant/upstart/aurora-scheduler.conf 
63fcc87be653835cb3c3f25dae4164f1d7c8d4da 
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
f2296a9d7a88be7e43124370edecfe64415df00f 
  src/jmh/java/org/apache/aurora/benchmark/fakes/FakeOfferManager.java 
6f2ca35c5d83dde29c24865b4826d4932e96da80 
  src/main/java/org/apache/aurora/scheduler/HostOffer.java 
bc40d0798f40003cab5bf6efe607217e4d5de9f1 
  src/main/java/org/apache/aurora/scheduler/TaskVars.java 
676dfd9f9d7ee0633c05424f788fd0ab116976bb 
  src/main/java/org/apache/aurora/scheduler/TierInfo.java 
c45b949ae7946fc92d7e62f94696ddc4f0790cfa 
  src/main/java/org/apache/aurora/scheduler/TierManager.java 
c6ad2b1c48673ca2c14ddd308684d81ce536beca 
  src/main/java/org/apache/aurora/scheduler/base/InstanceKeys.java 
b12ac83168401c15fb1d30179ea8e4816f09cd3d 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 754fde0fdc976b673d78ae15d8ccd8c85b792373 
  
src/main/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilter.java 
f6c759f03c4152ae93317692fc9db202fe251122 
  src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java 
36608a9f027c95723c31f9915852112beb367223 
  src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java 
df51d4cf4893899613683603ab4aa9aefa88faa6 
  src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
0d639f66db456858278b0485c91c40975c3b45ac 
  src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
78255e6dfa31c4920afc0221ee60ec4f8c2a12c4 
  src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
adf7f33e4a72d87c3624f84dfe4998e20dc75fdc 
  src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
317a2d26d8bfa27988c60a7706b9fb3aa9b4e2a2 
  
src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java 
5ed578cc4c11b49f607db5f7e516d9e6022a926c 
  src/main/java/org/apache/aurora/scheduler/resources/AcceptedOffer.java 
291d5c95916915afc48a7143759e523fccd52feb 
  
src/main/java/org/apache/aurora/scheduler/resources/MesosResourceConverter.java 
7040004ae48d3a9d0985cb9b231f914ebf6ff5a4 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceManager.java 
9aa263a9cfae03a9a0c5bc7fe3a1405397d3009c 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceMapper.java 
375f93c5277a78666fc4823382c82ac4d179f39d 
  
src/main/java/org/apache/aurora/scheduler/scheduling/ReservationTimeoutCalculator.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
03a0e8485d1a392f107fda5b4af05b7f8f6067c6 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
203f62bacc47470545d095e4d25f7e0f25990ed9 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
a177b301203143539b052524d14043ec8a85a46d 
  src/main/java/org/apache/aurora/scheduler/stats/AsyncStatsModule.java 
40451e91aed45866c2030d901160cc4e084834df 
  src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
c129896d8cd54abd2634e2a339c27921042b0162 
  src/main/resources/org/apache/aurora/scheduler/tiers.json 
34ddb1dc769a73115c209c9b2ee158cd364392d8 
  src/tes

Re: Review Request 58259: Add update affinity to Scheduler

2017-04-17 Thread David McLaughlin


> On April 17, 2017, 8:27 p.m., Stephan Erb wrote:
> > The code change looks decent to me. 
> > 
> > However, I am unsure about two things:
> > 
> > * For us it is common to have jobs with #instance in the ballpark of 
> > #agents. The proposed code change could easily block a significant number 
> > of agents for scheduling, even if there would be enough capacity for other 
> > job instances. So while improving the MTTA for job udpates, this could 
> > easily lead to increased MTTA for regularly launched jobs (cron, adhoc, 
> > etc).
> > * The alternative dynamic reservation proposal has the advantage that it 
> > works when multiple frameworks are used. Would it be plausible to just 
> > reserve any used resources in a generic fashion, so that we ensure 
> > reservations always come back to Aurora and cannot be intercepted by 
> > another framework?
> > 
> > Please run `./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.*'` to help 
> > ensure the scheduling changes don't come with an unexpected performance 
> > regression.

I think for (1) you described a problem that wouldn't be an issue in clusters 
with decent amounts of capacity available. It's only really an issue in low 
capacity clusters. And this change is specifically targetting the use case you 
mentioned (big, hard-to-schedule task of an important production job being 
killed for an update and some low priority task like a cron taking its place 
and then the prod job not being able to be scheduled.. triggering preemption 
and churn across the cluster - rinse, repeat for thousands of instances of a 
task). 

We run Aurora as a single framework, so can't really speak to (2). I think 
though you'd just want Dynamic Reservations for this? Is that what you're 
suggesting? Now we're back to the other approach which also has a bunch of open 
questions.

To be clear - this approach has one major difference I care about: it does not 
expose this to users via a new tier. In practice it means we don't need to ask 
people to opt in to what is essentially caching, and we also don't need to 
expose the reserved tier for users (Twitter also has the use case where we want 
to expose user-managed dynamic reservations via some reserved tier).


> On April 17, 2017, 8:27 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java
> > Lines 52-55 (patched)
> > 
> >
> > I am trying to understand if this is a good default for this 
> > best-effort feature.
> > 
> > What is your cluster-wide MTTA? It should give us a decent hint for a 
> > suitable default.

Our MTTA can range from a couple milliseconds to several minutes. Depends how 
many tasks are pending and how full the cluster is.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172122
---


On April 12, 2017, 7:51 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> ---
> 
> (Updated April 12, 2017, 7:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e0302

Re: Review Request 58259: Add update affinity to Scheduler

2017-04-17 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172122
---



The code change looks decent to me. 

However, I am unsure about two things:

* For us it is common to have jobs with #instance in the ballpark of #agents. 
The proposed code change could easily block a significant number of agents for 
scheduling, even if there would be enough capacity for other job instances. So 
while improving the MTTA for job udpates, this could easily lead to increased 
MTTA for regularly launched jobs (cron, adhoc, etc).
* The alternative dynamic reservation proposal has the advantage that it works 
when multiple frameworks are used. Would it be plausible to just reserve any 
used resources in a generic fashion, so that we ensure reservations always come 
back to Aurora and cannot be intercepted by another framework?

Please run `./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.*'` to help ensure 
the scheduling changes don't come with an unexpected performance regression.


src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java
Lines 52-55 (patched)


I am trying to understand if this is a good default for this best-effort 
feature.

What is your cluster-wide MTTA? It should give us a decent hint for a 
suitable default.


- Stephan Erb


On April 12, 2017, 9:51 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> ---
> 
> (Updated April 12, 2017, 9:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.ja

Re: Review Request 58259: Add update affinity to Scheduler

2017-04-17 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172100
---


Ship it!




Master (cc2aa46) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On April 12, 2017, 7:51 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> ---
> 
> (Updated April 12, 2017, 7:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateAgentReserverImplTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58259/diff/2/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 58259: Add update affinity to Scheduler

2017-04-17 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172099
---



@ReviewBot retry

- David McLaughlin


On April 12, 2017, 7:51 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> ---
> 
> (Updated April 12, 2017, 7:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateAgentReserverImplTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58259/diff/2/
> 
> 
> Testing
> ---
> 
> ./gradlew build
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 52669: Move the H2 database off heap.

2017-04-17 Thread David McLaughlin


> On April 17, 2017, 11:09 a.m., Stephan Erb wrote:
> > David, you mentioned on the mailinglist that off-heap storage only offered 
> > marginal improvements. I suppse I can therefore discard this patch?

We'll give it one last shot in our scale test environment (it finally looks 
like we've been able to reproduce real GC issues) and then I'll confirm.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52669/#review172078
---


On Oct. 11, 2016, 6:17 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52669/
> ---
> 
> (Updated Oct. 11, 2016, 6:17 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, John Sirois, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This experiment is inspired by David's comment: "I don’t think the
> storage engine matters. We just need to be able to offload it from
> the Scheduler JVM. The problem with H2 isn’t SQL or anything else,
> it’s the GC pressure."
> 
> Basic idea is to switch to another storage backend: "nioMemFS stores
> data outside of the VM's heap - useful for large memory DBs without
> incurring GC costs" (http://www.h2database.com/html/advanced.html)
> 
> Our micro-benchmarks look promising
> 
> Current Master (on-heap db with latest versions):
> 
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  72851.249 ± 15794.210  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  31626.929 ± 17326.988  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  0.078 ± 0.013  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   1  thrpt5414.135 ±   315.838  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   5  thrpt5 68.643 ±24.303  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A  10  thrpt5 32.032 ±13.870  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 1000 N/A N/A  thrpt5143.981 ±78.985  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 5000 N/A N/A  thrpt5 35.224 ±25.593  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
>1 N/A N/A  thrpt5 18.869 ± 3.318  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run  1 
>  N/A N/A N/A  thrpt5 36.013 ±19.743  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 
>  N/A N/A N/A  thrpt5 33.813 ±11.216  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run100 
>  N/A N/A N/A  thrpt5 20.516 ±10.526  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run   1000 
>  N/A N/A N/A  thrpt5 16.564 ± 2.993  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A  10 N/A  thrpt5 32.399 ±21.310  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A 100 N/A  thrpt5 35.518 ± 7.468  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A1000 N/A  thrpt5 19.757 ±10.035  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A   1 N/A  thrpt5 10.849 ±10.660  ops/s
> 
> This patch (off-heap db):
> 
> Benchmark (instanceOverrides)  
> (instances)  (metadata)  (numTasks)   Mode  Cnt  Score   Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  77746.436 ± 47191.240  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  70099.087 ± 37223.642  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  30461.428 ± 22964.261  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark

Re: Review Request 58467: Update to Mesos 1.2.0

2017-04-17 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58467/#review172093
---


Ship it!




Ship It!

- Zameer Manji


On April 15, 2017, 3:44 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58467/
> ---
> 
> (Updated April 15, 2017, 3:44 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update to Mesos 1.2.0
> 
> Changelog: https://github.com/apache/mesos/blob/1.2.0/CHANGELOG
> 
> The items that stand to me out the most:
> 
> * Unified containerizer support for specifying Docker images by Image ID. 
> (MESOS-3505)
> * new posix/rlimits isolator (MESOS-6402)
> * new IPC namespace isolator (MESOS-6557)
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD 7648ac8ca81ef1bfef13d840334a03f4bb7b8198 
>   RELEASE-NOTES.md 5babea532760e908d80235e1d6b8a1548c57cce3 
>   Vagrantfile d1c536bc3868409e184c9c97845d65c5a3e1722c 
>   build-support/packer/build.sh 548cf37e097c6ed56fc6cc718a642b105afb9331 
>   build.gradle bca669881e95e1415f5848f298dc4bab4fb65ba0 
> 
> 
> Diff: https://reviews.apache.org/r/58467/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> ./gradlew -Pq build
> ./pants test.pytest src/{main,test}/python:: -- -v
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 58467: Update to Mesos 1.2.0

2017-04-17 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58467/#review172085
---


Ship it!




Ship It!

- Joshua Cohen


On April 15, 2017, 10:44 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58467/
> ---
> 
> (Updated April 15, 2017, 10:44 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update to Mesos 1.2.0
> 
> Changelog: https://github.com/apache/mesos/blob/1.2.0/CHANGELOG
> 
> The items that stand to me out the most:
> 
> * Unified containerizer support for specifying Docker images by Image ID. 
> (MESOS-3505)
> * new posix/rlimits isolator (MESOS-6402)
> * new IPC namespace isolator (MESOS-6557)
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD 7648ac8ca81ef1bfef13d840334a03f4bb7b8198 
>   RELEASE-NOTES.md 5babea532760e908d80235e1d6b8a1548c57cce3 
>   Vagrantfile d1c536bc3868409e184c9c97845d65c5a3e1722c 
>   build-support/packer/build.sh 548cf37e097c6ed56fc6cc718a642b105afb9331 
>   build.gradle bca669881e95e1415f5848f298dc4bab4fb65ba0 
> 
> 
> Diff: https://reviews.apache.org/r/58467/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> ./gradlew -Pq build
> ./pants test.pytest src/{main,test}/python:: -- -v
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 52669: Move the H2 database off heap.

2017-04-17 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52669/#review172078
---



David, you mentioned on the mailinglist that off-heap storage only offered 
marginal improvements. I suppse I can therefore discard this patch?

- Stephan Erb


On Oct. 11, 2016, 8:17 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52669/
> ---
> 
> (Updated Oct. 11, 2016, 8:17 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, John Sirois, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This experiment is inspired by David's comment: "I don’t think the
> storage engine matters. We just need to be able to offload it from
> the Scheduler JVM. The problem with H2 isn’t SQL or anything else,
> it’s the GC pressure."
> 
> Basic idea is to switch to another storage backend: "nioMemFS stores
> data outside of the VM's heap - useful for large memory DBs without
> incurring GC costs" (http://www.h2database.com/html/advanced.html)
> 
> Our micro-benchmarks look promising
> 
> Current Master (on-heap db with latest versions):
> 
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  72851.249 ± 15794.210  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  31626.929 ± 17326.988  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  0.078 ± 0.013  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   1  thrpt5414.135 ±   315.838  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   5  thrpt5 68.643 ±24.303  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A  10  thrpt5 32.032 ±13.870  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 1000 N/A N/A  thrpt5143.981 ±78.985  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
> 5000 N/A N/A  thrpt5 35.224 ±25.593  ops/s
> UpdateStoreBenchmarks.JobDetailsBenchmark.run N/A 
>1 N/A N/A  thrpt5 18.869 ± 3.318  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run  1 
>  N/A N/A N/A  thrpt5 36.013 ±19.743  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run 10 
>  N/A N/A N/A  thrpt5 33.813 ±11.216  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run100 
>  N/A N/A N/A  thrpt5 20.516 ±10.526  ops/s
> UpdateStoreBenchmarks.JobInstructionsBenchmark.run   1000 
>  N/A N/A N/A  thrpt5 16.564 ± 2.993  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A  10 N/A  thrpt5 32.399 ±21.310  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A 100 N/A  thrpt5 35.518 ± 7.468  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A1000 N/A  thrpt5 19.757 ±10.035  ops/s
> UpdateStoreBenchmarks.JobUpdateMetadataBenchmark.run  N/A 
>  N/A   1 N/A  thrpt5 10.849 ±10.660  ops/s
> 
> This patch (off-heap db):
> 
> Benchmark (instanceOverrides)  
> (instances)  (metadata)  (numTasks)   Mode  Cnt  Score   Error  Units
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   1  thrpt5  77746.436 ± 47191.240  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A   5  thrpt5  70099.087 ± 37223.642  ops/s
> TaskStoreBenchmarks.DBFetchTasksBenchmark.run N/A 
>  N/A N/A  10  thrpt5  30461.428 ± 22964.261  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A N/A   1  thrpt5335.302 ±   229.328  ops/s
> TaskStoreBenchmarks.MemFetchTasksBenchmark.runN/A 
>  N/A