Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148643
---




src/main/python/apache/aurora/client/api/__init__.py (line 266)


This should send both key and query for both v-1 and v schedulers to work 
correctly.



src/main/python/apache/aurora/client/cli/update.py (lines 529 - 533)


Are you sure the `detailsList` is going to be set when a response is coming 
from the v-1 scheduler? Even if that was the case, it would be completely 
empty. I'd expect something like this instead:
```
if detailsList is not None:
  # process detailsList
else
  # process details
```


- Maxim Khutornenko


On Sept. 12, 2016, 10:26 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 10:26 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/python/apache/aurora/client/cli/update.py 
> d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
>   src/test/python/apache/aurora/client/cli/test_supdate.py 
> 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Maxim Khutornenko


> On Sept. 13, 2016, 1:14 a.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 188
> > 
> >
> > Our current metrics use 'nanos' to indicate nanoseconds instead of 
> > 'ns'. Mind using that here for consistency?

We actually have both "nanos" and "ns". I agree that "nanos" looks more 
prevalent as it's what `TimedInterceptor` is using.


> On Sept. 13, 2016, 1:14 a.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 105
> > 
> >
> > I agree we cannot use `java.util.concurrent.FutureTask` here. However, 
> > this seems to be pretty close to 
> > `java.util.concurrent.CompletableFuture` (new in Java 8).
> > 
> > It's not clear to me here if you have investigated it or not, but if 
> > you have I would like a brief explanation as to why it is not suitable. If 
> > you haven't please take a look at it as it seems to provide all of the 
> > funtionality you need.
> > 
> > Further if it is possible to use it, then we do not need pubsub for 
> > notification for when work completes, if item is it's own future, we can 
> > use the `CompletionStage` functionality to execute code after a future has 
> > been fulfilled. 
> > 
> > It has a `get()` method that can replace `waitForResult()` and a 
> > `complete()` method that allows you to set the value.

I did evaluate the `CompletableFuture` for the initial version of the 
`BatchWorker` and did not find it up to the task. Looking at it again though, I 
don't see any reasons why it would not satisfy the current version! Thanks for 
the nudge, the diff is incoming.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review148633
---


On Sept. 12, 2016, 10:51 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 12, 2016, 10:51 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review148633
---



I noticed that `CompletableFuture` might be suitable here. I will hold off a 
deeper review until I'm convinced that it is not suitable. I feel if it were 
possible to adopt and it was adopted here it could simplify the design.

Overall, I like the approach of queueing up multiple writes together and then 
executing them together inside the same storage lock. I think the interface is 
suitable for this purpose.


src/main/java/org/apache/aurora/scheduler/BatchWorker.java (line 105)


I agree we cannot use `java.util.concurrent.FutureTask` here. However, this 
seems to be pretty close to `java.util.concurrent.CompletableFuture` (new in 
Java 8).

It's not clear to me here if you have investigated it or not, but if you 
have I would like a brief explanation as to why it is not suitable. If you 
haven't please take a look at it as it seems to provide all of the funtionality 
you need.

Further if it is possible to use it, then we do not need pubsub for 
notification for when work completes, if item is it's own future, we can use 
the `CompletionStage` functionality to execute code after a future has been 
fulfilled. 

It has a `get()` method that can replace `waitForResult()` and a 
`complete()` method that allows you to set the value.



src/main/java/org/apache/aurora/scheduler/BatchWorker.java (line 188)


Our current metrics use 'nanos' to indicate nanoseconds instead of 'ns'. 
Mind using that here for consistency?


- Zameer Manji


On Sept. 12, 2016, 3:51 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 12, 2016, 3:51 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max ba

Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148630
---


Ship it!




Master (f0ff1c4) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 10:26 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 10:26 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/python/apache/aurora/client/cli/update.py 
> d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
>   src/test/python/apache/aurora/client/cli/test_supdate.py 
> 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148628
---



@ReviewBot retry

- Zameer Manji


On Sept. 12, 2016, 3:26 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 3:26 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/python/apache/aurora/client/cli/update.py 
> d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
>   src/test/python/apache/aurora/client/cli/test_supdate.py 
> 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review148625
---


Ship it!




Master (c4903d8) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 10:51 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 12, 2016, 10:51 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if interested.
> 
> #Upcoming
> The introduction of the `BatchWorker` by itself was not enough to 
> substantially improve the MTTA. It, however, paves the way for the next phase 
> of scheduling perf improvement - taking more than 1 task from a given 
> `TaskGroup` in a single scheduling round (coming soon). That improvement 
> wouldn't deliver without decreasing the lock contention first. 
> 
> Note: it wasn't easy to have a clean diff split, so some functionality in 
> `BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current 
> patch but will become obvious in the part 2 (coming out shortly).  
> 
> [1] - 
> https://github.com/apache/aurora/blob/f6ac13b169aaad5aad73ef3cc72873781e30a705/src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java#L540-L555
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/BatchWorker.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/Sched

Re: Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/#review148624
---


Ship it!




Master (c4903d8) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 11:07 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51826/
> ---
> 
> (Updated Sept. 12, 2016, 11:07 p.m.)
> 
> 
> Review request for Aurora and Joshua Cohen.
> 
> 
> Bugs: AURORA-1770
> https://issues.apache.org/jira/browse/AURORA-1770
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This will change the help output from:
> `-shiro_realm_modules (default 
> [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
> to
> `-shiro_realm_modules (default [class 
> org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
> d5f96543d1068bf30b9d173a247c2806faf35578 
>   src/test/java/org/apache/aurora/scheduler/app/MoreModulesTest.java 
> b2fb3c9dcba64f69a05894f506ba43766f74ddaa 
> 
> Diff: https://reviews.apache.org/r/51826/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51765: Batching writes - Part 3 (of 3): Converting TaskScheduler to use BatchWorker.

2016-09-12 Thread Maxim Khutornenko


> On Sept. 11, 2016, 11:18 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 115-128
> > 
> >
> > The `scheduleTask` method is doing some staggering amount of work, such 
> > as iterating over offers and  matching constraints. Only a small part of 
> > this actually requires an open write transaction.
> > 
> > It seems to me as if moving part of this computation out of the write 
> > transaction could significantly improve write throughput. You have probably 
> > considered that, so what am I missing here? :-)

It's true that only a small part requires writing into the store. 
Unfortunately, we do use write transaction as a global system lock to freeze 
the state for the duration of the logical change. It's very hard to reason 
about that logic if things change underneath. The assumptions about being 
inside of a "locked" state begin with computing the `AttributeAggregate` in the 
`TaskScheduler` and follow along until we are ready to launch task or consider 
a preemption. There are MANY things that can go wrong if we move the 
transaction boundary. Starting from trivial nullref and concurrent collection 
modification exceptions to very hard to troubleshoot logical errors (offers 
reused, limit constraints are not enforced and etc.). While it's theoretically 
possible to implement an optimistic task/offer matching it would require 
substantial rewrite of what we currently have and is definitely outside the 
scope of this change.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51765/#review148434
---


On Sept. 9, 2016, 6:52 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51765/
> ---
> 
> (Updated Sept. 9, 2016, 6:52 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the final part of the `BatchWorker` conversion work that converts 
> `TaskScheduler`. See https://reviews.apache.org/r/51759 for more background 
> on the `BatchWorker`.
> 
> #Problem
> See https://reviews.apache.org/r/51759
> 
> #Remediation
> Task scheduling is one of the most dominant users of the write lock. It's 
> also one of the heaviest and the most latency-sensitive. As such, the default 
> max batch size is chosen conservatively low (3) and batch items are executed 
> in a blocking way. 
> 
> BTW, attempting to make task scheduling non-blocking resulted in a much worse 
> scheduling performance. The way our `DBTaskStore` is wired, all async 
> activities, including `EventBus` are bound to use a single async `Executor`, 
> which is currently limited at 8 threads [1]. Relying on the same `EventBus` 
> to deliver scheduling completion events resulted in slower scheduling perf as 
> those events were backed up behind all other activities, including tasks 
> status events, reconciliation and etc. Increasing the executor thread pool 
> size to a larger number on the other side, also increased the lock contention 
> defeating the whole purpose of this work.
> 
> #Results
> See https://reviews.apache.org/r/51759 for the lock contention results.
> 
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L51-L54
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 11e8033438ad0808e446e41bb26b3fa4c04136c7 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> c044ebe6f72183a67462bbd8e5be983eb592c3e9 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> d266f6a25ae2360db2977c43768a19b1f1efe8ff 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c2ceb4e7685a9301f8014a9183e02fbad65bca26 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  72562e6bd9a9860c834e6a9faa094c28600a8fed 
> 
> Diff: https://reviews.apache.org/r/51765/diff/
> 
> 
> Testing
> ---
> 
> All types of testing including deploying to test and production clusters.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>



Re: Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/
---

(Updated Sept. 12, 2016, 4:07 p.m.)


Review request for Aurora and Joshua Cohen.


Changes
---

Add tests.


Bugs: AURORA-1770
https://issues.apache.org/jira/browse/AURORA-1770


Repository: aurora


Description
---

This will change the help output from:
`-shiro_realm_modules (default 
[org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
to
`-shiro_realm_modules (default [class 
org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
d5f96543d1068bf30b9d173a247c2806faf35578 
  src/test/java/org/apache/aurora/scheduler/app/MoreModulesTest.java 
b2fb3c9dcba64f69a05894f506ba43766f74ddaa 

Diff: https://reviews.apache.org/r/51826/diff/


Testing
---


Thanks,

Zameer Manji



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/
---

(Updated Sept. 12, 2016, 10:51 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Joshua's comments.


Repository: aurora


Description
---

This is the first (out of 3) patches intending to reduce storage write lock 
contention and as such improve overall system write throughput. It introduces 
the `BatchWorker` and migrates the majority of storage writes due to task 
status change events to use `TaskEventBatchWorker`.

#Problem
Our current storage system writes effectively behave as `SERIALIZABLE` 
transaction isolation level in SQL terms. This means all writes require 
exclusive access to the storage and no two transactions can happen in parallel 
[1]. While it certainly simplifies our implementation, it creates a single 
hotspot where multiple threads are competing for the storage write access. This 
type of contention only worsens as the cluster size grows, more tasks are 
scheduled, more status updates are processed, more subscribers are listening to 
status updates and etc. Eventually, the scheduler throughput (and especially 
task scheduling) becomes degraded to the extent that certain operations wait 
much longer (4x and more) for the lock acquisition than it takes to process 
their payload when inside the transaction. Some ops (like event processing) are 
generally tolerant of these types of delays. Others - not as much. The task 
scheduling suffers the most as backing up the scheduling queue directly affects 
t
 he Median Time To Assigned (MTTA).

#Remediation
Given the above, it's natural to assume that reducing the number of write 
transactions should help reducing the lock contention. This patch introduces a 
generic `BatchWorker` service that delivers a "best effort" batching approach 
by redirecting multiple individual write requests into a single FIFO queue 
served non-stop by a single dedicated thread. Every batch shares a single write 
transaction thus reducing the number of potential write lock requests. To 
minimize wait-in-queue time, items are dispatched immediately and the max 
number of items is bounded. There are a few `BatchWorker` instances specialized 
on particular workload types: task even processing, cron scheduling and task 
scheduling. Every instance can be tuned independently (max batch size) and 
provides specialized metrics helping to monitor each workload type perf.

#Results
The proposed approach has been heavily tested in production and delivered the 
best results. The lock contention latencies got down between 2x and 5x 
depending on the cluster load. A number of other approaches tried but discarded 
as not performing well or even performing much worse than the current master:
- Clock-driven batch execution - every batch is dispatched on a time schedule
- Max batch with a deadline - a batch is dispatched when max size is reached OR 
a timeout expires
- Various combinations of the above - some `BatchWorkers` are using 
clock-driven execution while others are using max batch with a deadline
- Completely non-blocking (event-based) completion notification - all call 
sites are notified of item completion via a `BatchWorkCompleted` event

Happy to provide more details on the above if interested.

#Upcoming
The introduction of the `BatchWorker` by itself was not enough to substantially 
improve the MTTA. It, however, paves the way for the next phase of scheduling 
perf improvement - taking more than 1 task from a given `TaskGroup` in a single 
scheduling round (coming soon). That improvement wouldn't deliver without 
decreasing the lock contention first. 

Note: it wasn't easy to have a clean diff split, so some functionality in 
`BatchWorker` (e.g.: `executeWithReplay`) appears to be unused in the current 
patch but will become obvious in the part 2 (coming out shortly).  

[1] - 
https://github.com/apache/aurora/blob/f6ac13b169aaad5aad73ef3cc72873781e30a705/src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java#L540-L555


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/BatchWorker.java PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/SchedulerModule.java 
4a7ef0b1b90607f68d89fe8e207f42c42a8c56a0 
  src/main/java/org/apache/aurora/scheduler/events/PubsubEvent.java 
70b5470b9dad1af838b5222cae5ac86487e2f2e4 
  src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
f07746c2b990c1c2235e99f9e4775fc84f9c27b1 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskThrottler.java 
bbd971a2aa8a96cf79edd879ad60e1bebd933d79 
  src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
3c7cda09ab292d696070ca4d9dfedb1f6f71b0fe 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
594bb621929

Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Maxim Khutornenko


> On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote:
> > Along the lines of the question Stephan raised, what happens in the event 
> > of a failover mid-batch, especially w.r.t. repeatable work?

Same as today: the transaction either happens or does not. In case of a cron 
job (the only user of `RepeatableWork` at the moment), the cron job will not be 
triggered (again, same as today).


> On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 106-107
> > 
> >
> > I think this should be "does not initialize"?

Good catch. Fixed.


> On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 257-259
> > 
> >
> > Even though there's only a single callsite where we initialize a 
> > `WorkItem` in `FOLLOWUP` state, there's no guarantee that `laskBackoffMsec` 
> > will be present just because the state is `FOLLOWUP`.
> > 
> > We should add some state checks to the `WorkItem` constructor so that 
> > if state is `FOLLOWUP` we'll throw if `lastBackoffMsec` is not present to 
> > protect against future mistakes? (Or if that's a valid scenario we should 
> > perform an `isPresent` check or use `or` here...).

I assert both `backoffStrategy` and `lastBackoffMsec` inside the `backoffFor()` 
method on line 315. I think it should be sufficient given that `WorkItem` is a 
private class?


> On Sept. 12, 2016, 6:26 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 273-275
> > 
> >
> > Should we do this after the batch is processed rather than before?

It's largely irrelevant but I guess it's more logical to expect it at the end 
while going through the code. Moved.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review148501
---


On Sept. 9, 2016, 5:29 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 9, 2016, 5:29 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Resul

Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148616
---



Master (c4903d8) is red with this patch.
  ./build-support/jenkins/build.sh

 # Create file stdout for capturing output. We 
can't use StringIO mock
 # because TestProcess is running fork.
 with open(os.path.join(td, 'sys_stdout'), 
'w+') as stdout:
   with open(os.path.join(td, 'sys_stderr'), 
'w+') as stderr:
 with mutable_sys():
   sys.stdout, sys.stderr = stdout, 
stderr
 
   p = TestProcess('process', 'echo hello 
world; echo >&2 hello stderr', 0,
   taskpath, sandbox, 
logger_destination=LoggerDestination.BOTH)
   p.start()
   rc = 
wait_for_rc(taskpath.getpath('process_checkpoint'))
 
   assert rc == 0
   # Check log files were created in std 
path with correct content
 > assert_log_content(taskpath, 'stdout', 
'hello world\n')
 
 src/test/python/apache/thermos/core/test_process.py:487: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 taskpath = 
 log_name = 'stdout'
 expected_content = 'hello world\n'
 
 def assert_log_content(taskpath, log_name, 
expected_content):
   log = 
taskpath.with_filename(log_name).getpath('process_logdir')
   assert os.path.exists(log)
   with open(log, 'r') as fp:
 >   assert fp.read() == expected_content
 E   assert '' == 'hello world\n'
 E + hello world
 
 src/test/python/apache/thermos/core/test_process.py:313: 
AssertionError
  generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
 
  1 failed, 700 passed, 6 skipped, 1 warnings in 
367.18 seconds 
 
FAILURE


22:46:52 06:41   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 10:26 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 10:26 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/python/apache/aurora/client/cli/update.py 
> d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
>   src/test/python/apache/aurora/client/cli/test_supdate.py 
> 92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/
---

(Updated Sept. 12, 2016, 3:26 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Appease the robot.


Bugs: AURORA-1764
https://issues.apache.org/jira/browse/AURORA-1764


Repository: aurora


Description
---

This extends getJobUpdateDetails to return a list of details instead of being 
scoped to a single update.


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
4202f0eec59ef16668fca6b6ebb925335ad5dea1 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
929d021a336c6a3438613c9340c84a1096dc9069 
  src/main/python/apache/aurora/client/api/__init__.py 
9149c3018ae58d405f284fcbd4076d251ccc8192 
  src/main/python/apache/aurora/client/cli/update.py 
d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
  src/main/resources/scheduler/assets/js/services.js 
315fc17894c2261333715fbb9c45fc02505f2942 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
a7d1f74acdfe5f58eb822a4d826e920cad53dced 
  src/test/python/apache/aurora/client/api/test_api.py 
7a0797d5402e931006d4f215e2d9c0dbbd113257 
  src/test/python/apache/aurora/client/cli/test_supdate.py 
92fb039fb7d3e368d7c0dfa5ebdb465c7f112416 

Diff: https://reviews.apache.org/r/51712/diff/


Testing
---


Thanks,

Zameer Manji



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Maxim Khutornenko


> On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote:
> > Thanks for the excellent write up and the time you took to split the review 
> > into sizable chunks. A full review will take some time. I will start with a 
> > couple of highlevel questions.
> > 
> > a) Do you have some results of the jmh benachmarks that you might share?
> > 
> > b) If I understand this correctly, you are basically trading one lock with 
> > another: Instead of waiting on the global storage lock, threads do now have 
> > to wait on the lock of the `workQueue`. Obviously waiting on that lock is 
> > much faster as there is less work being done in the critical region 
> > protected by that lock. However, isn't that a strategy that we could try to 
> > use as well? For details, see my comment in Part 3.

> a) Do you have some results of the jmh benachmarks that you might share?

Good question. Creating a reliable benchmark to simulate a real world lock 
contention is next to impossible. JMH guidelines warn the accuracy of 
benchmarking degrades guickly with the number of threads involved. The best we 
could prove here is that the overhead introduced by the `BatchWorker` is not 
prohibitive:

Master:
```
Benchmark  
Mode  Cnt ScoreError  Units
SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark  
thrpt   10  2714.177 ± 84.900  ops/s
```

Part 3 (https://reviews.apache.org/r/51765/):
```
Benchmark  
Mode  Cnt ScoreError  Units
SchedulingBenchmarks.LimitConstraintMismatchSchedulingBenchmark.runBenchmark  
thrpt   10  2664.389 ± 65.013  ops/s
```

The results are well within the margin of error and both are a few orders of 
magnitude higher than the real scheduling perf possible/observed.

> b) If I understand this correctly, you are basically trading one lock with 
> another...

The major benefit here is improving the response time by reducing the hotspot 
contention. According to Little's Law [1]:
```
MeanResponseTime = MeanNumberInSystem / MeanThroughput
```

Given the fixed throughput (native log perf), the only way to reduce queue 
saturation (and as such improve response time) is to reduce the number of the 
outstanding requests. Batching multiple individual requests accomplishes 
exactly that goal. It's worth noting that the above formula is applicable when 
all requests take about the same time to process. That is *not* the case in our 
system. Some take longer (task scheduling), others are very low latency (single 
tasks status processing). Varying the batch size of different types of requests 
(a few scheduling requests vs. hundreds of task events) lets us level out the 
request shape (as much as it's physically possible) and as such improve overall 
perf.

As for reducing the critical section weight, I'll address it in the Part 3 
comment. In general though, reducing the latency helps to level out the request 
shape but is still not sufficient to reduce the hotspot (see above).

[1] - https://en.wikipedia.org/wiki/Little%27s_law


> On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 58
> > 
> >
> > I am wondering if this queue shouldn't have a fixed capacity, so that 
> > we gain some kind of back pressure throughout the system.

Not sure I follow. The default `LinkedBlockingQueue` constructor uses 
`Integer.MAX_VALUE` for the max number of items in the queue. That should be 
more than sufficient for our use case.


> On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, line 164
> > 
> >
> > `Work` is a subclass of `RepeatableWork` but meant to express something 
> > non-repeatable. This sounds like a violation of the [Liskov Substitution 
> > Principle]( https://en.wikipedia.org/wiki/Liskov_substitution_principle). 
> > 
> > In other words, I feel like `RepeatableWork` should extend `Work` and 
> > not the other way round.

LSP is a pricinple applicable to mutable types. These are interfaces. The 
`RepeatableWork` is a broader type that supports repetition. The `Work` type a 
convenience wrapper limiting to just one repetition. We use similar approach in 
`Storage`. See `StorageOperation` and the multitude of derived interfaces 
(`Quiet`, `NoResult` and etc.)


> On Sept. 11, 2016, 11:02 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/BatchWorker.java, lines 240-247
> > 
> >
> > A queue could hold vital information that we should not loose during 
> > scheduler failover (in particular, as we do this once per day).
> > 
> > Would it be possible to dra

Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/
---

(Updated Sept. 13, 2016, 12:08 a.m.)


Review request for Aurora and Maxim Khutornenko.


Changes
---

Add comment proposed in reviews.


Repository: aurora


Description
---

We plan to open source a very simple Mesos ResourceEstimator and QosController 
that supports RAM and CPU oversubscription (ETA ~2 weeks). We have been using 
it internally with a patched Aurora version where the hardcoded 
`isMesosRevocable` flag of RAM has been set to `true`. This patch makes this 
behaviour configurable.


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  docs/features/resource-isolation.md 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
  docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
  docs/reference/scheduler-configuration.md 
87d2cded0780ffa34884b52cc049c6a0ad808f0a 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
4c102a3d4c4bdd27a1d0536b689acd6257e77788 

Diff: https://reviews.apache.org/r/51807/diff/


Testing
---

./gradlew -Pq build


Thanks,

Stephan Erb



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb


> On Sept. 12, 2016, 9:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java, line 
> > 75
> > 
> >
> > Would be great to have a e2e test when this functionality is available 
> > in Mesos. Perhaps drop a TODO here?
> 
> Stephan Erb wrote:
> Acutally, it would work in Mesos out of the box even today. All we have 
> to do is adapt our Mesos config to also pass along memory resources 
> https://github.com/apache/aurora/blob/master/examples/vagrant/mesos_config/etc_mesos-slave/modules#L8.
> 
> I didn't do it because I felt like it wouldn't add any additional test 
> coverage, now that the entire resource handling within Aurora is generic.
> 
> Stephan Erb wrote:
> Additional info: We already have an e2e test using revocable CPU 
> resources.
> 
> Maxim Khutornenko wrote:
> It's true that things should now "just work". Reality proves that's not 
> always the case, so any test coverage helping to validate we can actually run 
> tasks against offers with more than one revocable resource would be welcome. 
> Does not have to be in this RB though.

Alright. Will put it on my list and do a follow up sometime soon.


> On Sept. 12, 2016, 9:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java, 
> > lines 22-33
> > 
> >
> > This is somewhat non-standard way to define cmd flags. We tend to 
> > declare them in modules and use 'setting' classes to act as pure wrappers. 
> > Perhaps it's time to introduce a `ResourceModule` class (if only to bind 
> > the `ResourceSettings` instance for now)?
> 
> Stephan Erb wrote:
> I believe there would be no way to inject the settings into the 
> `ResourceType` enum at compile time with guice. That's why I've adopted this 
> approach here.  Happy to give it a try if you have some pointers for me how 
> to make it work.
> 
> Maxim Khutornenko wrote:
> Oh, that's right. I forgot java enums are extremely picky when it comes 
> to field injections. Ignore this comment. Perhaps having an extra note in 
> this class javadoc will help avoiding this type of questions in the future.

Added the comment.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148563
---


On Sept. 12, 2016, 3:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 3:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/#review148608
---



Master (b429612) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 9:44 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51826/
> ---
> 
> (Updated Sept. 12, 2016, 9:44 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1770
> https://issues.apache.org/jira/browse/AURORA-1770
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This will change the help output from:
> `-shiro_realm_modules (default 
> [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
> to
> `-shiro_realm_modules (default [class 
> org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
> d5f96543d1068bf30b9d173a247c2806faf35578 
> 
> Diff: https://reviews.apache.org/r/51826/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148604
---



Master (b429612) is red with this patch.
  ./build-support/jenkins/build.sh

  Running setup.py bdist_wheel for twitter.common.lang: finished with status 
'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/b4/91/6b/e0a5ae9722528a67f600d436e0b97b30a040a65a5a35978272
  Running setup.py bdist_wheel for twitter.common.log: started
  Running setup.py bdist_wheel for twitter.common.log: finished with status 
'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/26/36/82/e6a6b7ca739262435cecee1cde7950bf60950f1aaa2ce21093
  Running setup.py bdist_wheel for cov-core: started
  Running setup.py bdist_wheel for cov-core: finished with status 'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/86/e1/c2/9ff8cfe9773ce07003f2c2be096e169af4614c2f634671d49b
  Running setup.py bdist_wheel for twitter.common.options: started
  Running setup.py bdist_wheel for twitter.common.options: finished with status 
'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/60/c2/40/54b323809df9598cc125f02527f93ff743cd9bd979f4a1737d
Successfully built pantsbuild.pants ansicolors setproctitle 
twitter.common.collections pathspec twitter.common.dirutil pystache scandir 
psutil pywatchman Markdown Pygments docutils twitter.common.confluence coverage 
pytest pytest-cov lmdb twitter.common.lang twitter.common.log cov-core 
twitter.common.options
Installing collected packages: ansicolors, setproctitle, twitter.common.lang, 
twitter.common.collections, six, pathspec, twitter.common.dirutil, requests, 
pystache, scandir, psutil, pywatchman, futures, setuptools, pex, Markdown, 
Pygments, docutils, twitter.common.options, twitter.common.log, 
twitter.common.confluence, monotonic, fasteners, coverage, py, pytest, 
cov-core, pytest-cov, lmdb, pantsbuild.pants
  Found existing installation: setuptools 21.2.1
Uninstalling setuptools-21.2.1:
  Successfully uninstalled setuptools-21.2.1
Successfully installed Markdown-2.1.1 Pygments-1.4 ansicolors-1.0.2 
cov-core-1.15.0 coverage-3.7.1 docutils-0.12 fasteners-0.14.1 futures-3.0.5 
lmdb-0.89 monotonic-1.2 pantsbuild.pants-1.1.0rc7 pathspec-0.3.4 pex-1.1.10 
psutil-4.3.0 py-1.4.31 pystache-0.5.3 pytest-2.6.4 pytest-cov-1.8.1 
pywatchman-1.3.0 requests-2.5.3 scandir-1.2 setproctitle-1.1.10 
setuptools-5.4.1 six-1.10.0 twitter.common.collections-0.3.7 
twitter.common.confluence-0.3.7 twitter.common.dirutil-0.3.7 
twitter.common.lang-0.3.7 twitter.common.log-0.3.7 twitter.common.options-0.3.7

21:47:26 00:00 [main]
   (To run a reporting server: ./pants server)
21:47:26 00:00   [setup]
21:47:26 00:00 [parse]
   Executing tasks in goals: compile
21:47:27 00:01   [compile]
21:47:27 00:01 [compile-prep-command]
21:47:27 00:01 [compile]
21:47:27 00:01 [python-eval]
21:47:27 00:01 [pythonstyle]
21:47:27 00:01   [cache]  
   No cached artifacts for 42 targets.
   Invalidated 42 targets.
T100:ERROR   src/main/python/apache/aurora/client/cli/update.py:531 Indentation 
of 4 instead of 2
 |context.print_err("There is no update for key: %s" % key)


FAILURE: 1 Python Style issues found


21:47:46 00:20   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 9:22 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 9:22 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/py

Re: Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/#review148605
---


Fix it, then Ship it!




:)


src/main/java/org/apache/aurora/scheduler/app/MoreModules.java (line 69)


Test plz~


- Joshua Cohen


On Sept. 12, 2016, 9:44 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51826/
> ---
> 
> (Updated Sept. 12, 2016, 9:44 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1770
> https://issues.apache.org/jira/browse/AURORA-1770
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This will change the help output from:
> `-shiro_realm_modules (default 
> [org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
> to
> `-shiro_realm_modules (default [class 
> org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
> d5f96543d1068bf30b9d173a247c2806faf35578 
> 
> Diff: https://reviews.apache.org/r/51826/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/
---

(Updated Sept. 12, 2016, 2:44 p.m.)


Review request for Aurora and Stephan Erb.


Bugs: AURORA-1770
https://issues.apache.org/jira/browse/AURORA-1770


Repository: aurora


Description
---

This will change the help output from:
`-shiro_realm_modules (default 
[org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
to
`-shiro_realm_modules (default [class 
org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`


Diffs
-

  src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
d5f96543d1068bf30b9d173a247c2806faf35578 

Diff: https://reviews.apache.org/r/51826/diff/


Testing
---


Thanks,

Zameer Manji



Review Request 51826: Implement `toString` on lazy modules.

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51826/
---

Review request for Aurora and Stephan Erb.


Repository: aurora


Description
---

This will change the help output from:
`-shiro_realm_modules (default 
[org.apache.aurora.scheduler.app.MoreModules$1@158a8276])`
to
`-shiro_realm_modules (default [class 
org.apache.aurora.scheduler.http.api.security.IniShiroRealmModule])`


Diffs
-

  src/main/java/org/apache/aurora/scheduler/app/MoreModules.java 
d5f96543d1068bf30b9d173a247c2806faf35578 

Diff: https://reviews.apache.org/r/51826/diff/


Testing
---


Thanks,

Zameer Manji



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/#review148602
---


Ship it!




Master (b429612) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 9:29 p.m., Karthik Anantha Padmanabhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51662/
> ---
> 
> (Updated Sept. 12, 2016, 9:29 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1602 Aurora admin commands for reconcilation
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
> 3275d72a0a74909e635bce615e2036ec72aa38ee 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/admin/admin.py 
> 76009b9c1c7a130c25abad544a176dc590dafb12 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
>  b9317dc20456f90057ec2bf4d10619a5ae986187 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  779dc302602ae8842084807ca868a491ea99b676 
>   src/test/python/apache/aurora/admin/test_admin.py 
> eb193c4a33f5275f05a995338446bec0e19bc1cf 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> afac2500551af2fce406bb906aa4e33f353e90a1 
> 
> Diff: https://reviews.apache.org/r/51662/diff/
> 
> 
> Testing
> ---
> 
> * Manually tested on my local vagrant installation
> * ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Karthik Anantha Padmanabhan
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Maxim Khutornenko


> On Sept. 12, 2016, 7:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java, 
> > lines 22-33
> > 
> >
> > This is somewhat non-standard way to define cmd flags. We tend to 
> > declare them in modules and use 'setting' classes to act as pure wrappers. 
> > Perhaps it's time to introduce a `ResourceModule` class (if only to bind 
> > the `ResourceSettings` instance for now)?
> 
> Stephan Erb wrote:
> I believe there would be no way to inject the settings into the 
> `ResourceType` enum at compile time with guice. That's why I've adopted this 
> approach here.  Happy to give it a try if you have some pointers for me how 
> to make it work.

Oh, that's right. I forgot java enums are extremely picky when it comes to 
field injections. Ignore this comment. Perhaps having an extra note in this 
class javadoc will help avoiding this type of questions in the future.


> On Sept. 12, 2016, 7:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java, line 
> > 75
> > 
> >
> > Would be great to have a e2e test when this functionality is available 
> > in Mesos. Perhaps drop a TODO here?
> 
> Stephan Erb wrote:
> Acutally, it would work in Mesos out of the box even today. All we have 
> to do is adapt our Mesos config to also pass along memory resources 
> https://github.com/apache/aurora/blob/master/examples/vagrant/mesos_config/etc_mesos-slave/modules#L8.
> 
> I didn't do it because I felt like it wouldn't add any additional test 
> coverage, now that the entire resource handling within Aurora is generic.
> 
> Stephan Erb wrote:
> Additional info: We already have an e2e test using revocable CPU 
> resources.

It's true that things should now "just work". Reality proves that's not always 
the case, so any test coverage helping to validate we can actually run tasks 
against offers with more than one revocable resource would be welcome. Does not 
have to be in this RB though.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148563
---


On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb


> On Sept. 12, 2016, 9:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java, line 
> > 75
> > 
> >
> > Would be great to have a e2e test when this functionality is available 
> > in Mesos. Perhaps drop a TODO here?
> 
> Stephan Erb wrote:
> Acutally, it would work in Mesos out of the box even today. All we have 
> to do is adapt our Mesos config to also pass along memory resources 
> https://github.com/apache/aurora/blob/master/examples/vagrant/mesos_config/etc_mesos-slave/modules#L8.
> 
> I didn't do it because I felt like it wouldn't add any additional test 
> coverage, now that the entire resource handling within Aurora is generic.

Additional info: We already have an e2e test using revocable CPU resources.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148563
---


On Sept. 12, 2016, 3:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 3:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148600
---


Ship it!




Ship It!

- Maxim Khutornenko


On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Karthik Anantha Padmanabhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/
---

(Updated Sept. 12, 2016, 9:29 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Changes
---

* Use Precondition to check validity of settings argument
* Make reconciliation type checking more explicit


Repository: aurora


Description
---

AURORA-1602 Aurora admin commands for reconcilation


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
3275d72a0a74909e635bce615e2036ec72aa38ee 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
929d021a336c6a3438613c9340c84a1096dc9069 
  src/main/python/apache/aurora/admin/admin.py 
76009b9c1c7a130c25abad544a176dc590dafb12 
  src/main/python/apache/aurora/client/api/__init__.py 
9149c3018ae58d405f284fcbd4076d251ccc8192 
  
src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
 b9317dc20456f90057ec2bf4d10619a5ae986187 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 779dc302602ae8842084807ca868a491ea99b676 
  src/test/python/apache/aurora/admin/test_admin.py 
eb193c4a33f5275f05a995338446bec0e19bc1cf 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
afac2500551af2fce406bb906aa4e33f353e90a1 

Diff: https://reviews.apache.org/r/51662/diff/


Testing
---

* Manually tested on my local vagrant installation
* ./build-support/jenkins/build.sh


Thanks,

Karthik Anantha Padmanabhan



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji


> On Sept. 12, 2016, 12:48 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/api/__init__.py, line 264
> > 
> >
> > Why not going all the way to the client command and initializing both 
> > values in this changelist? Otherwise, there is a risk of cutting a release 
> > in between and delaying removal by extra release (until both client and 
> > scheduler are dual writing/reading).
> 
> Zameer Manji wrote:
> We can't send both values (it doesn't make sense). I will upgrade the 
> client to send the query object instead of the key object. I was not aware 
> when we made an API change we would also change the clients in the same 
> commit.
> 
> Zameer Manji wrote:
> Actually never mind, if I make this change (updating the client) it will 
> force us to deploy the client and the scheduler together. I think we need to 
> keep it as is for now right?
> 
> Maxim Khutornenko wrote:
> > We can't send both values (it doesn't make sense).
> 
> This will work just fine. The newer scheduler will prefer the query (and 
> collection in the response), while the old one will continue using the key 
> completely unaware of the new value (thrift drops unknown fields). This way 
> no functional changes will be needed on the client side, just removal of the 
> deprecated fields.

Done.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148565
---


On Sept. 12, 2016, 2:22 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 2:22 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/python/apache/aurora/client/cli/update.py 
> d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/
---

(Updated Sept. 12, 2016, 2:22 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Update client and ui to use new argument.


Bugs: AURORA-1764
https://issues.apache.org/jira/browse/AURORA-1764


Repository: aurora


Description
---

This extends getJobUpdateDetails to return a list of details instead of being 
scoped to a single update.


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
4202f0eec59ef16668fca6b6ebb925335ad5dea1 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
929d021a336c6a3438613c9340c84a1096dc9069 
  src/main/python/apache/aurora/client/api/__init__.py 
9149c3018ae58d405f284fcbd4076d251ccc8192 
  src/main/python/apache/aurora/client/cli/update.py 
d23243dcf93dd82f66b4c8cc4342bd2c8d2adc79 
  src/main/resources/scheduler/assets/js/services.js 
315fc17894c2261333715fbb9c45fc02505f2942 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
a7d1f74acdfe5f58eb822a4d826e920cad53dced 
  src/test/python/apache/aurora/client/api/test_api.py 
7a0797d5402e931006d4f215e2d9c0dbbd113257 

Diff: https://reviews.apache.org/r/51712/diff/


Testing
---


Thanks,

Zameer Manji



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb


> On Sept. 12, 2016, 8:06 p.m., Zameer Manji wrote:
> > docs/reference/scheduler-configuration.md, line 205
> > 
> >
> > As an aside, could you file a ticket to clean up the default here? 
> > Seems useless ATM.
> 
> Joshua Cohen wrote:
> Do you mean the rendering of the default value, or the actual default 
> value for `shiro_realm_modules`? The actual default is not useless, and I 
> think including the default value is useful as well, but we could improve 
> `MoreModules#lazilyInstantiated` to override `toString` on the anonymous 
> `AbstractModule` instance so it will print out the actual module name?
> 
> Zameer Manji wrote:
> I mean the rendering of the default value. We need to implement 
> `toString` somewhere.

Issue filed https://issues.apache.org/jira/browse/AURORA-1770


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148541
---


On Sept. 12, 2016, 3:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 3:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb


> On Sept. 12, 2016, 9:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java, 
> > lines 22-33
> > 
> >
> > This is somewhat non-standard way to define cmd flags. We tend to 
> > declare them in modules and use 'setting' classes to act as pure wrappers. 
> > Perhaps it's time to introduce a `ResourceModule` class (if only to bind 
> > the `ResourceSettings` instance for now)?

I believe there would be no way to inject the settings into the `ResourceType` 
enum at compile time with guice. That's why I've adopted this approach here.  
Happy to give it a try if you have some pointers for me how to make it work.


> On Sept. 12, 2016, 9:31 p.m., Maxim Khutornenko wrote:
> > src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java, line 
> > 75
> > 
> >
> > Would be great to have a e2e test when this functionality is available 
> > in Mesos. Perhaps drop a TODO here?

Acutally, it would work in Mesos out of the box even today. All we have to do 
is adapt our Mesos config to also pass along memory resources 
https://github.com/apache/aurora/blob/master/examples/vagrant/mesos_config/etc_mesos-slave/modules#L8.

I didn't do it because I felt like it wouldn't add any additional test 
coverage, now that the entire resource handling within Aurora is generic.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148563
---


On Sept. 12, 2016, 3:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 3:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148576
---


Ship it!




Master (b429612) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 7:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 7:39 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Maxim Khutornenko


> On Sept. 12, 2016, 7:48 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/api/__init__.py, line 264
> > 
> >
> > Why not going all the way to the client command and initializing both 
> > values in this changelist? Otherwise, there is a risk of cutting a release 
> > in between and delaying removal by extra release (until both client and 
> > scheduler are dual writing/reading).
> 
> Zameer Manji wrote:
> We can't send both values (it doesn't make sense). I will upgrade the 
> client to send the query object instead of the key object. I was not aware 
> when we made an API change we would also change the clients in the same 
> commit.
> 
> Zameer Manji wrote:
> Actually never mind, if I make this change (updating the client) it will 
> force us to deploy the client and the scheduler together. I think we need to 
> keep it as is for now right?

> We can't send both values (it doesn't make sense).

This will work just fine. The newer scheduler will prefer the query (and 
collection in the response), while the old one will continue using the key 
completely unaware of the new value (thrift drops unknown fields). This way no 
functional changes will be needed on the client side, just removal of the 
deprecated fields.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148565
---


On Sept. 12, 2016, 7:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 7:39 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji


> On Sept. 12, 2016, 12:48 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/api/__init__.py, line 264
> > 
> >
> > Why not going all the way to the client command and initializing both 
> > values in this changelist? Otherwise, there is a risk of cutting a release 
> > in between and delaying removal by extra release (until both client and 
> > scheduler are dual writing/reading).
> 
> Zameer Manji wrote:
> We can't send both values (it doesn't make sense). I will upgrade the 
> client to send the query object instead of the key object. I was not aware 
> when we made an API change we would also change the clients in the same 
> commit.

Actually never mind, if I make this change (updating the client) it will force 
us to deploy the client and the scheduler together. I think we need to keep it 
as is for now right?


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148565
---


On Sept. 12, 2016, 12:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 12:39 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji


> On Sept. 12, 2016, 12:48 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/api/__init__.py, line 264
> > 
> >
> > Why not going all the way to the client command and initializing both 
> > values in this changelist? Otherwise, there is a risk of cutting a release 
> > in between and delaying removal by extra release (until both client and 
> > scheduler are dual writing/reading).

We can't send both values (it doesn't make sense). I will upgrade the client to 
send the query object instead of the key object. I was not aware when we made 
an API change we would also change the clients in the same commit.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148565
---


On Sept. 12, 2016, 12:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 12:39 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148565
---




src/main/python/apache/aurora/client/api/__init__.py (line 264)


Why not going all the way to the client command and initializing both 
values in this changelist? Otherwise, there is a risk of cutting a release in 
between and delaying removal by extra release (until both client and scheduler 
are dual writing/reading).


- Maxim Khutornenko


On Sept. 12, 2016, 7:39 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 12, 2016, 7:39 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/#review148566
---


Ship it!




LGTM.

I am not commiting this because there are outstanding feedback items.

Once those are complete, please ask a comitter to land this patch for you. 
Thanks for your contribution!

- Zameer Manji


On Sept. 12, 2016, 10:49 a.m., Karthik Anantha Padmanabhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51662/
> ---
> 
> (Updated Sept. 12, 2016, 10:49 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1602 Aurora admin commands for reconcilation
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
> 3275d72a0a74909e635bce615e2036ec72aa38ee 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/admin/admin.py 
> 76009b9c1c7a130c25abad544a176dc590dafb12 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
>  b9317dc20456f90057ec2bf4d10619a5ae986187 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  779dc302602ae8842084807ca868a491ea99b676 
>   src/test/python/apache/aurora/admin/test_admin.py 
> eb193c4a33f5275f05a995338446bec0e19bc1cf 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> afac2500551af2fce406bb906aa4e33f353e90a1 
> 
> Diff: https://reviews.apache.org/r/51662/diff/
> 
> 
> Testing
> ---
> 
> * Manually tested on my local vagrant installation
> * ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Karthik Anantha Padmanabhan
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/
---

(Updated Sept. 12, 2016, 12:39 p.m.)


Review request for Aurora, Joshua Cohen and Maxim Khutornenko.


Changes
---

Josh's feedback.


Bugs: AURORA-1764
https://issues.apache.org/jira/browse/AURORA-1764


Repository: aurora


Description
---

This extends getJobUpdateDetails to return a list of details instead of being 
scoped to a single update.


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
4202f0eec59ef16668fca6b6ebb925335ad5dea1 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
929d021a336c6a3438613c9340c84a1096dc9069 
  src/main/python/apache/aurora/client/api/__init__.py 
9149c3018ae58d405f284fcbd4076d251ccc8192 
  src/main/resources/scheduler/assets/js/services.js 
315fc17894c2261333715fbb9c45fc02505f2942 
  
src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java 
a7d1f74acdfe5f58eb822a4d826e920cad53dced 
  src/test/python/apache/aurora/client/api/test_api.py 
7a0797d5402e931006d4f215e2d9c0dbbd113257 

Diff: https://reviews.apache.org/r/51712/diff/


Testing
---


Thanks,

Zameer Manji



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/#review148564
---


Fix it, then Ship it!





RELEASE-NOTES.md (line 37)


s/reconcile/reconcile_tasks



src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
(lines 651 - 657)


No need to have an extra method here. Can be inlined.


- Maxim Khutornenko


On Sept. 12, 2016, 5:49 p.m., Karthik Anantha Padmanabhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51662/
> ---
> 
> (Updated Sept. 12, 2016, 5:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1602 Aurora admin commands for reconcilation
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
> 3275d72a0a74909e635bce615e2036ec72aa38ee 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/admin/admin.py 
> 76009b9c1c7a130c25abad544a176dc590dafb12 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
>  b9317dc20456f90057ec2bf4d10619a5ae986187 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  779dc302602ae8842084807ca868a491ea99b676 
>   src/test/python/apache/aurora/admin/test_admin.py 
> eb193c4a33f5275f05a995338446bec0e19bc1cf 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> afac2500551af2fce406bb906aa4e33f353e90a1 
> 
> Diff: https://reviews.apache.org/r/51662/diff/
> 
> 
> Testing
> ---
> 
> * Manually tested on my local vagrant installation
> * ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Karthik Anantha Padmanabhan
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148563
---




src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
(lines 22 - 33)


This is somewhat non-standard way to define cmd flags. We tend to declare 
them in modules and use 'setting' classes to act as pure wrappers. Perhaps it's 
time to introduce a `ResourceModule` class (if only to bind the 
`ResourceSettings` instance for now)?



src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java (line 75)


Would be great to have a e2e test when this functionality is available in 
Mesos. Perhaps drop a TODO here?


- Maxim Khutornenko


On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148557
---


Ship it!





src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
(line 321)


I think `Either key or query must be set` would be a better error message?


- Joshua Cohen


On Sept. 9, 2016, 12:30 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 9, 2016, 12:30 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/#review148549
---


Ship it!




lgtm modulo the below.

Thanks for iterating Karthik!


src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
(lines 654 - 656)


you can use `Preconditions#checkArgument` here.



src/main/python/apache/aurora/admin/admin.py (lines 356 - 359)


I still don't like defaulting to `explicit` if the value is not `implicit`. 
It could lead to unexpected behavior if the option is changed without updating 
the code here. Instead we should:

if optiona.type == 'implicit':
  ...
elif options.type == 'explicit':
  ...
else:
  die('Unexpected value for --type: %s' % options.type)


- Joshua Cohen


On Sept. 12, 2016, 5:49 p.m., Karthik Anantha Padmanabhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51662/
> ---
> 
> (Updated Sept. 12, 2016, 5:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1602 Aurora admin commands for reconcilation
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
> 3275d72a0a74909e635bce615e2036ec72aa38ee 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/admin/admin.py 
> 76009b9c1c7a130c25abad544a176dc590dafb12 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
>  b9317dc20456f90057ec2bf4d10619a5ae986187 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  779dc302602ae8842084807ca868a491ea99b676 
>   src/test/python/apache/aurora/admin/test_admin.py 
> eb193c4a33f5275f05a995338446bec0e19bc1cf 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> afac2500551af2fce406bb906aa4e33f353e90a1 
> 
> Diff: https://reviews.apache.org/r/51662/diff/
> 
> 
> Testing
> ---
> 
> * Manually tested on my local vagrant installation
> * ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Karthik Anantha Padmanabhan
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Zameer Manji


> On Sept. 12, 2016, 11:06 a.m., Zameer Manji wrote:
> > docs/reference/scheduler-configuration.md, line 205
> > 
> >
> > As an aside, could you file a ticket to clean up the default here? 
> > Seems useless ATM.
> 
> Joshua Cohen wrote:
> Do you mean the rendering of the default value, or the actual default 
> value for `shiro_realm_modules`? The actual default is not useless, and I 
> think including the default value is useful as well, but we could improve 
> `MoreModules#lazilyInstantiated` to override `toString` on the anonymous 
> `AbstractModule` instance so it will print out the actual module name?

I mean the rendering of the default value. We need to implement `toString` 
somewhere.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148541
---


On Sept. 12, 2016, 6:38 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 6:38 a.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Joshua Cohen


> On Sept. 12, 2016, 6:06 p.m., Zameer Manji wrote:
> > docs/reference/scheduler-configuration.md, line 205
> > 
> >
> > As an aside, could you file a ticket to clean up the default here? 
> > Seems useless ATM.

Do you mean the rendering of the default value, or the actual default value for 
`shiro_realm_modules`? The actual default is not useless, and I think including 
the default value is useful as well, but we could improve 
`MoreModules#lazilyInstantiated` to override `toString` on the anonymous 
`AbstractModule` instance so it will print out the actual module name?


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148541
---


On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51759: Batching writes - Part 1 (of 3): Introducing BatchWorker and task event batching.

2016-09-12 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51759/#review148501
---



Along the lines of the question Stephan raised, what happens in the event of a 
failover mid-batch, especially w.r.t. repeatable work?


src/main/java/org/apache/aurora/scheduler/BatchWorker.java (lines 106 - 107)


I think this should be "does not initialize"?



src/main/java/org/apache/aurora/scheduler/BatchWorker.java (lines 257 - 259)


Even though there's only a single callsite where we initialize a `WorkItem` 
in `FOLLOWUP` state, there's no guarantee that `laskBackoffMsec` will be 
present just because the state is `FOLLOWUP`.

We should add some state checks to the `WorkItem` constructor so that if 
state is `FOLLOWUP` we'll throw if `lastBackoffMsec` is not present to protect 
against future mistakes? (Or if that's a valid scenario we should perform an 
`isPresent` check or use `or` here...).



src/main/java/org/apache/aurora/scheduler/BatchWorker.java (lines 273 - 275)


Should we do this after the batch is processed rather than before?


- Joshua Cohen


On Sept. 9, 2016, 5:29 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51759/
> ---
> 
> (Updated Sept. 9, 2016, 5:29 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first (out of 3) patches intending to reduce storage write lock 
> contention and as such improve overall system write throughput. It introduces 
> the `BatchWorker` and migrates the majority of storage writes due to task 
> status change events to use `TaskEventBatchWorker`.
> 
> #Problem
> Our current storage system writes effectively behave as `SERIALIZABLE` 
> transaction isolation level in SQL terms. This means all writes require 
> exclusive access to the storage and no two transactions can happen in 
> parallel [1]. While it certainly simplifies our implementation, it creates a 
> single hotspot where multiple threads are competing for the storage write 
> access. This type of contention only worsens as the cluster size grows, more 
> tasks are scheduled, more status updates are processed, more subscribers are 
> listening to status updates and etc. Eventually, the scheduler throughput 
> (and especially task scheduling) becomes degraded to the extent that certain 
> operations wait much longer (4x and more) for the lock acquisition than it 
> takes to process their payload when inside the transaction. Some ops (like 
> event processing) are generally tolerant of these types of delays. Others - 
> not as much. The task scheduling suffers the most as backing up the 
> scheduling queue directly affects
  the Median Time To Assigned (MTTA).
> 
> #Remediation
> Given the above, it's natural to assume that reducing the number of write 
> transactions should help reducing the lock contention. This patch introduces 
> a generic `BatchWorker` service that delivers a "best effort" batching 
> approach by redirecting multiple individual write requests into a single FIFO 
> queue served non-stop by a single dedicated thread. Every batch shares a 
> single write transaction thus reducing the number of potential write lock 
> requests. To minimize wait-in-queue time, items are dispatched immediately 
> and the max number of items is bounded. There are a few `BatchWorker` 
> instances specialized on particular workload types: task even processing, 
> cron scheduling and task scheduling. Every instance can be tuned 
> independently (max batch size) and provides specialized metrics helping to 
> monitor each workload type perf.
> 
> #Results
> The proposed approach has been heavily tested in production and delivered the 
> best results. The lock contention latencies got down between 2x and 5x 
> depending on the cluster load. A number of other approaches tried but 
> discarded as not performing well or even performing much worse than the 
> current master:
> - Clock-driven batch execution - every batch is dispatched on a time schedule
> - Max batch with a deadline - a batch is dispatched when max size is reached 
> OR a timeout expires
> - Various combinations of the above - some `BatchWorkers` are using 
> clock-driven execution while others are using max batch with a deadline
> - Completely non-blocking (event-based) completion notification - all call 
> sites are notified of item completion via a `BatchWorkCompleted` event
> 
> Happy to provide more details on the above if 

Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148541
---




docs/reference/scheduler-configuration.md (line 205)


As an aside, could you file a ticket to clean up the default here? Seems 
useless ATM.


- Zameer Manji


On Sept. 12, 2016, 6:38 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 6:38 a.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/#review148542
---


Ship it!




Master (b429612) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 5:49 p.m., Karthik Anantha Padmanabhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51662/
> ---
> 
> (Updated Sept. 12, 2016, 5:49 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1602 Aurora admin commands for reconcilation
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   
> src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
> 3275d72a0a74909e635bce615e2036ec72aa38ee 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/admin/admin.py 
> 76009b9c1c7a130c25abad544a176dc590dafb12 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   
> src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
>  b9317dc20456f90057ec2bf4d10619a5ae986187 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  779dc302602ae8842084807ca868a491ea99b676 
>   src/test/python/apache/aurora/admin/test_admin.py 
> eb193c4a33f5275f05a995338446bec0e19bc1cf 
>   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
> afac2500551af2fce406bb906aa4e33f353e90a1 
> 
> Diff: https://reviews.apache.org/r/51662/diff/
> 
> 
> Testing
> ---
> 
> * Manually tested on my local vagrant installation
> * ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Karthik Anantha Padmanabhan
> 
>



Re: Review Request 51712: Extend getJobUpdateDetails to accept JobUpdateQuery

2016-09-12 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51712/#review148539
---



Bumping this for visability.

- Zameer Manji


On Sept. 8, 2016, 5:30 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51712/
> ---
> 
> (Updated Sept. 8, 2016, 5:30 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1764
> https://issues.apache.org/jira/browse/AURORA-1764
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This extends getJobUpdateDetails to return a list of details instead of being 
> scoped to a single update.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c5765b70501c101f0535b4eed94e9948c36808f9 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> 4202f0eec59ef16668fca6b6ebb925335ad5dea1 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  929d021a336c6a3438613c9340c84a1096dc9069 
>   src/main/python/apache/aurora/client/api/__init__.py 
> 9149c3018ae58d405f284fcbd4076d251ccc8192 
>   src/main/resources/scheduler/assets/js/services.js 
> 315fc17894c2261333715fbb9c45fc02505f2942 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImplTest.java
>  a7d1f74acdfe5f58eb822a4d826e920cad53dced 
>   src/test/python/apache/aurora/client/api/test_api.py 
> 7a0797d5402e931006d4f215e2d9c0dbbd113257 
> 
> Diff: https://reviews.apache.org/r/51712/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 51662: AURORA-1602 Aurora admin commands for reconcilation

2016-09-12 Thread Karthik Anantha Padmanabhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51662/
---

(Updated Sept. 12, 2016, 5:49 p.m.)


Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Zameer Manji.


Repository: aurora


Description
---

AURORA-1602 Aurora admin commands for reconcilation


Diffs (updated)
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c5765b70501c101f0535b4eed94e9948c36808f9 
  src/main/java/org/apache/aurora/scheduler/reconciliation/TaskReconciler.java 
3275d72a0a74909e635bce615e2036ec72aa38ee 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
929d021a336c6a3438613c9340c84a1096dc9069 
  src/main/python/apache/aurora/admin/admin.py 
76009b9c1c7a130c25abad544a176dc590dafb12 
  src/main/python/apache/aurora/client/api/__init__.py 
9149c3018ae58d405f284fcbd4076d251ccc8192 
  
src/test/java/org/apache/aurora/scheduler/reconciliation/TaskReconcilerTest.java
 b9317dc20456f90057ec2bf4d10619a5ae986187 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 779dc302602ae8842084807ca868a491ea99b676 
  src/test/python/apache/aurora/admin/test_admin.py 
eb193c4a33f5275f05a995338446bec0e19bc1cf 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
afac2500551af2fce406bb906aa4e33f353e90a1 

Diff: https://reviews.apache.org/r/51662/diff/


Testing
---

* Manually tested on my local vagrant installation
* ./build-support/jenkins/build.sh


Thanks,

Karthik Anantha Padmanabhan



Re: Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/#review148485
---



Master (b429612) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 12, 2016, 1:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51807/
> ---
> 
> (Updated Sept. 12, 2016, 1:38 p.m.)
> 
> 
> Review request for Aurora and Maxim Khutornenko.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We plan to open source a very simple Mesos ResourceEstimator and 
> QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We 
> have been using it internally with a patched Aurora version where the 
> hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch 
> makes this behaviour configurable.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
>   docs/features/resource-isolation.md 
> 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
>   docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
>   docs/reference/scheduler-configuration.md 
> 87d2cded0780ffa34884b52cc049c6a0ad808f0a 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
> 4c102a3d4c4bdd27a1d0536b689acd6257e77788 
> 
> Diff: https://reviews.apache.org/r/51807/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 51807: Introduce a flag to treat RAM as a revocable resources

2016-09-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51807/
---

Review request for Aurora and Maxim Khutornenko.


Repository: aurora


Description
---

We plan to open source a very simple Mesos ResourceEstimator and QosController 
that supports RAM and CPU oversubscription (ETA ~2 weeks). We have been using 
it internally with a patched Aurora version where the hardcoded 
`isMesosRevocable` flag of RAM has been set to `true`. This patch makes this 
behaviour configurable.


Diffs
-

  RELEASE-NOTES.md bbf7198dc7ce53bb02a1bc59f75a661e23472913 
  docs/features/resource-isolation.md 01c5b407a4964e89741d0f6c72d04ab5dc4c2f87 
  docs/operations/configuration.md 90dde574ce517355d2d6045a5ab036c1a3838882 
  docs/reference/scheduler-configuration.md 
87d2cded0780ffa34884b52cc049c6a0ad808f0a 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java 
4c102a3d4c4bdd27a1d0536b689acd6257e77788 

Diff: https://reviews.apache.org/r/51807/diff/


Testing
---

./gradlew -Pq build


Thanks,

Stephan Erb