Re: Review Request 55089: AURORA-1826 Expose Thrift server request workload stats

2017-01-24 Thread Mehrdad Nurolahzade


> On Jan. 24, 2017, 5:35 p.m., Reza Motamedi wrote:
> > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java,
> >  line 510
> > 
> >
> > Should this be something like an atomic long?

{{tasksKilled}} is a local variable in method scope (not a field bound to a 
singleton object on the heap); there would be no concurrent access to it. Read 
more here: 
http://stackoverflow.com/questions/12825847/why-are-local-variables-thread-safe-in-java


- Mehrdad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55089/#review162893
---


On Dec. 29, 2016, 9:58 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55089/
> ---
> 
> (Updated Dec. 29, 2016, 9:58 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Bugs: AURORA-1826
> https://issues.apache.org/jira/browse/AURORA-1826
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1826   Expose Thrift server request workload stats
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  16b1b52f8691d978a9ec1bf7aa0c9716b3484cf0 
>   src/main/java/org/apache/aurora/scheduler/thrift/aop/Measured.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor.java
>  d57f910d8f9bbe5c24aec960e88d03702bc353da 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  b28cd2489a52041a8e7e53f298fad8d8cd29406f 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptorTest.java
>  9c40ec51c28c8c57365dc21c3cd7391a3894784c 
> 
> Diff: https://reviews.apache.org/r/55089/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep thrift_workload
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 413340 413340 0  3695k  0 --:--:-- --:--:-- --:--:-- 4036k
> thrift_workload_addInstances 0
> thrift_workload_createJob 0
> thrift_workload_createOrUpdateCronTemplate 0
> thrift_workload_drainHosts 0
> thrift_workload_endMaintenance 0
> thrift_workload_getConfigSummary 0
> thrift_workload_getJobSummary 0
> thrift_workload_getJobUpdateDetails 0
> thrift_workload_getJobUpdateSummaries 0
> thrift_workload_getJobs 0
> thrift_workload_getPendingReason 0
> thrift_workload_getRoleSummary 0
> thrift_workload_getTaskStatus 0
> thrift_workload_getTasksWithoutConfigs 0
> thrift_workload_killTasks 0
> thrift_workload_maintenanceStatus 0
> thrift_workload_restartShards 0
> thrift_workload_rewriteConfigs 0
> thrift_workload_startJobUpdate 0
> thrift_workload_startMaintenance 0
> ```
> 
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 2359
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  28m58.389s
> user  0m1.508s
> sys   0m0.820s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55902: Capture health check output

2017-01-24 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/#review162896
---



Master (52ddce9) is red with this patch.
  ./build-support/jenkins/build.sh

virtualenv-15.0.2/virtualenv_support/wheel-0.29.0-py2.py3-none-any.whl
+ touch virtualenv-15.0.2/BOOTSTRAPPED
+ popd
/home/jenkins/jenkins-slave/workspace/AuroraBot
+ exec /usr/bin/python2.7 
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/virtualenv-15.0.2/virtualenv.py
 --no-download 
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv
New python executable in 
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/bin/python2.7
Also creating executable in 
/home/jenkins/jenkins-slave/workspace/AuroraBot/build-support/python/isort.venv/bin/python
Installing setuptools, pip, wheel...done.
Collecting isort==4.0.0
  Downloading isort-4.0.0-py2.py3-none-any.whl
Installing collected packages: isort
Successfully installed isort-4.0.0
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ERROR: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/aurora/common/health_check/shell.py
 Imports are incorrectly sorted.
--- 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/aurora/common/health_check/shell.py:before
   2017-01-25 01:41:27.669006
+++ 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/python/apache/aurora/common/health_check/shell.py:after
2017-01-25 01:45:51.877003
@@ -16,6 +16,7 @@
 import sys
 
 from subprocess32 import STDOUT
+
 # Recommended pattern for Python 2 and 3 support from 
https://github.com/google/python-subprocess32
 # Backport which adds bug fixes and timeout support for Python 2.7
 if os.name == 'posix' and sys.version_info[0] < 3:
ERROR: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/aurora/common/health_check/test_shell.py
 Imports are incorrectly sorted.
--- 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/aurora/common/health_check/test_shell.py:before
  2017-01-25 01:41:27.669006
+++ 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/python/apache/aurora/common/health_check/test_shell.py:after
   2017-01-25 01:45:52.368200
@@ -17,10 +17,10 @@
 import unittest
 
 import mock
+from subprocess32 import STDOUT
 
 from apache.aurora.common.health_check.shell import ShellHealthCheck
 
-from subprocess32 import STDOUT
 # Recommended pattern for Python 2 and 3 support from 
https://github.com/google/python-subprocess32
 # Backport which adds bug fixes and timeout support for Python 2.7
 if os.name == 'posix' and sys.version_info[0] < 3:


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 25, 2017, 1:34 a.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55902/
> ---
> 
> (Updated Jan. 25, 2017, 1:34 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1881
> https://issues.apache.org/jira/browse/AURORA-1881
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Users really could really benefit from seeing the output of the shell health 
> check failure, so plumbing through the output.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/common/health_check/shell.py 
> 58470a48a7a14092eeb8837aada9e358c6922c93 
>   src/test/python/apache/aurora/common/health_check/test_shell.py 
> 792ef40028cda112db5b93c4cc37e305937bc351 
> 
> Diff: https://reviews.apache.org/r/55902/diff/
> 
> 
> Testing
> ---
> 
> added unit tests
> e2e tests
> screenshot attached.
> 
> 
> File Attachments
> 
> 
> Updated screenshot
>   
> https://reviews.apache.org/media/uploaded/files/2017/01/25/90d6ff4f-84f9-4b4d-9b3d-56dadf7027ae__Screen_Shot_2017-01-24_at_5.32.08_PM.png
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>



Re: Review Request 55902: Capture health check output

2017-01-24 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/#review162894
---



@ReviewBot retry

- Dmitriy Shirchenko


On Jan. 25, 2017, 1:34 a.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55902/
> ---
> 
> (Updated Jan. 25, 2017, 1:34 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1881
> https://issues.apache.org/jira/browse/AURORA-1881
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Users really could really benefit from seeing the output of the shell health 
> check failure, so plumbing through the output.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/common/health_check/shell.py 
> 58470a48a7a14092eeb8837aada9e358c6922c93 
>   src/test/python/apache/aurora/common/health_check/test_shell.py 
> 792ef40028cda112db5b93c4cc37e305937bc351 
> 
> Diff: https://reviews.apache.org/r/55902/diff/
> 
> 
> Testing
> ---
> 
> added unit tests
> e2e tests
> screenshot attached.
> 
> 
> File Attachments
> 
> 
> Updated screenshot
>   
> https://reviews.apache.org/media/uploaded/files/2017/01/25/90d6ff4f-84f9-4b4d-9b3d-56dadf7027ae__Screen_Shot_2017-01-24_at_5.32.08_PM.png
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>



Re: Review Request 55902: Capture health check output

2017-01-24 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/
---

(Updated Jan. 25, 2017, 1:34 a.m.)


Review request for Aurora, Joshua Cohen and Zameer Manji.


Changes
---

Updating screenshot.


Bugs: AURORA-1881
https://issues.apache.org/jira/browse/AURORA-1881


Repository: aurora


Description
---

Users really could really benefit from seeing the output of the shell health 
check failure, so plumbing through the output.


Diffs
-

  src/main/python/apache/aurora/common/health_check/shell.py 
58470a48a7a14092eeb8837aada9e358c6922c93 
  src/test/python/apache/aurora/common/health_check/test_shell.py 
792ef40028cda112db5b93c4cc37e305937bc351 

Diff: https://reviews.apache.org/r/55902/diff/


Testing
---

added unit tests
e2e tests
screenshot attached.


File Attachments (updated)


Updated screenshot
  
https://reviews.apache.org/media/uploaded/files/2017/01/25/90d6ff4f-84f9-4b4d-9b3d-56dadf7027ae__Screen_Shot_2017-01-24_at_5.32.08_PM.png


Thanks,

Dmitriy Shirchenko



Re: Review Request 55089: AURORA-1826 Expose Thrift server request workload stats

2017-01-24 Thread Reza Motamedi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55089/#review162893
---




src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
(line 510)


Should this be something like an atomic long?


- Reza Motamedi


On Dec. 30, 2016, 5:58 a.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55089/
> ---
> 
> (Updated Dec. 30, 2016, 5:58 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Bugs: AURORA-1826
> https://issues.apache.org/jira/browse/AURORA-1826
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1826   Expose Thrift server request workload stats
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  16b1b52f8691d978a9ec1bf7aa0c9716b3484cf0 
>   src/main/java/org/apache/aurora/scheduler/thrift/aop/Measured.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor.java
>  d57f910d8f9bbe5c24aec960e88d03702bc353da 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  b28cd2489a52041a8e7e53f298fad8d8cd29406f 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptorTest.java
>  9c40ec51c28c8c57365dc21c3cd7391a3894784c 
> 
> Diff: https://reviews.apache.org/r/55089/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep thrift_workload
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 413340 413340 0  3695k  0 --:--:-- --:--:-- --:--:-- 4036k
> thrift_workload_addInstances 0
> thrift_workload_createJob 0
> thrift_workload_createOrUpdateCronTemplate 0
> thrift_workload_drainHosts 0
> thrift_workload_endMaintenance 0
> thrift_workload_getConfigSummary 0
> thrift_workload_getJobSummary 0
> thrift_workload_getJobUpdateDetails 0
> thrift_workload_getJobUpdateSummaries 0
> thrift_workload_getJobs 0
> thrift_workload_getPendingReason 0
> thrift_workload_getRoleSummary 0
> thrift_workload_getTaskStatus 0
> thrift_workload_getTasksWithoutConfigs 0
> thrift_workload_killTasks 0
> thrift_workload_maintenanceStatus 0
> thrift_workload_restartShards 0
> thrift_workload_rewriteConfigs 0
> thrift_workload_startJobUpdate 0
> thrift_workload_startMaintenance 0
> ```
> 
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 2359
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  28m58.389s
> user  0m1.508s
> sys   0m0.820s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55902: Capture health check output

2017-01-24 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/
---

(Updated Jan. 25, 2017, 1:34 a.m.)


Review request for Aurora, Joshua Cohen and Zameer Manji.


Bugs: AURORA-1881
https://issues.apache.org/jira/browse/AURORA-1881


Repository: aurora


Description
---

Users really could really benefit from seeing the output of the shell health 
check failure, so plumbing through the output.


Diffs
-

  src/main/python/apache/aurora/common/health_check/shell.py 
58470a48a7a14092eeb8837aada9e358c6922c93 
  src/test/python/apache/aurora/common/health_check/test_shell.py 
792ef40028cda112db5b93c4cc37e305937bc351 

Diff: https://reviews.apache.org/r/55902/diff/


Testing
---

added unit tests
e2e tests
screenshot attached.


File Attachments (updated)


Screenshot
  
https://reviews.apache.org/media/uploaded/files/2017/01/25/c4e69424-71ad-4d71-b1f4-895bc6a7821e__Screen_Shot_2017-01-24_at_5.03.06_PM.png
Updated screenshot
  
https://reviews.apache.org/media/uploaded/files/2017/01/25/90d6ff4f-84f9-4b4d-9b3d-56dadf7027ae__Screen_Shot_2017-01-24_at_5.32.08_PM.png


Thanks,

Dmitriy Shirchenko



Re: Review Request 55902: Capture health check output

2017-01-24 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/#review162892
---



Master (52ddce9) is red with this patch.
  ./build-support/jenkins/build.sh

---
* What went wrong:
Execution failed for task ':analyzeReport'.
> Test coverage missing for org/apache/aurora/scheduler/storage/db/views/DbImage
  Test coverage missing for org/apache/aurora/scheduler/http/Mname
  Test coverage missing for org/apache/aurora/scheduler/http/Services
  Test coverage missing for org/apache/aurora/scheduler/http/QuitCallback
  Test coverage missing for org/apache/aurora/scheduler/http/Cron
  Test coverage missing for org/apache/aurora/scheduler/app/VolumeParser
  Test coverage missing for 
org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader$Schema
  Test coverage missing for 
org/apache/aurora/scheduler/configuration/executor/ExecutorSettingsLoader
  Test coverage missing for 
org/apache/aurora/scheduler/pruning/TaskHistoryPruner$1
  Test coverage missing for 
org/apache/aurora/scheduler/stats/AsyncStatsModule$OfferAdapter
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/IniShiroRealmModule
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/ShiroUtils
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/HttpSecurityModule$3
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/HttpSecurityModule$2
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/KerberosPrincipalParser
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/Kerberos5ShiroRealmModule
  Test coverage missing for 
org/apache/aurora/scheduler/http/api/security/Kerberos5ShiroRealmModule$1
  Test coverage missing for 
org/apache/aurora/scheduler/log/mesos/MesosLog$LogStream
  Test coverage missing for org/apache/aurora/scheduler/log/mesos/MesosLog
  Test coverage missing for 
org/apache/aurora/scheduler/log/mesos/MesosLog$LogStream$OpStats
  Test coverage missing for 
org/apache/aurora/scheduler/log/mesos/MesosLog$LogStream$1
  Test coverage missing for 
org/apache/aurora/scheduler/log/mesos/MesosLog$LogStream$LogEntry
  Test coverage missing for 
org/apache/aurora/scheduler/log/mesos/MesosLog$LogStream$LogPosition
  Test coverage missing for 
org/apache/aurora/scheduler/discovery/CommonsServiceDiscoveryModule
  Test coverage missing for 
org/apache/aurora/scheduler/reconciliation/KillRetry$KillAttempt
  Test coverage missing for 
org/apache/aurora/scheduler/preemptor/Preemptor$PreemptorImpl
  Test coverage missing for 
org/apache/aurora/scheduler/events/PubsubEvent$DriverDisconnected
  Test coverage missing for 
org/apache/aurora/scheduler/events/PubsubEvent$DriverRegistered
  Test coverage missing for 
org/apache/aurora/scheduler/storage/db/typehandlers/VolumeModeTypeHandler

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
==

BUILD FAILED

Total time: 6 mins 8.393 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 25, 2017, 1:08 a.m., Dmitriy Shirchenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55902/
> ---
> 
> (Updated Jan. 25, 2017, 1:08 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Bugs: AURORA-1881
> https://issues.apache.org/jira/browse/AURORA-1881
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Users really could really benefit from seeing the output of the shell health 
> check failure, so plumbing through the output.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/common/health_check/shell.py 
> 58470a48a7a14092eeb8837aada9e358c6922c93 
>   src/test/python/apache/aurora/common/health_check/test_shell.py 
> 792ef40028cda112db5b93c4cc37e305937bc351 
> 
> Diff: https://reviews.apache.org/r/55902/diff/
> 
> 
> Testing
> ---
> 
> added unit tests
> e2e tests
> screenshot attached.
> 
> 
> File Attachments
> 
> 
> Screenshot
>   
> https://reviews.apache.org/media/uploaded/files/2017/01/25/c4e69424-71ad-4d71-b1f4-895bc6a7821e__Screen_Shot_2017-01-24_at_5.03.06_PM.png
> 
> 
> Thanks,
> 
> Dmitriy Shirchenko
> 
>



Review Request 55902: Capture health check output

2017-01-24 Thread Dmitriy Shirchenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55902/
---

Review request for Aurora, Joshua Cohen and Zameer Manji.


Bugs: AURORA-1881
https://issues.apache.org/jira/browse/AURORA-1881


Repository: aurora


Description
---

Users really could really benefit from seeing the output of the shell health 
check failure, so plumbing through the output.


Diffs
-

  src/main/python/apache/aurora/common/health_check/shell.py 
58470a48a7a14092eeb8837aada9e358c6922c93 
  src/test/python/apache/aurora/common/health_check/test_shell.py 
792ef40028cda112db5b93c4cc37e305937bc351 

Diff: https://reviews.apache.org/r/55902/diff/


Testing
---

added unit tests
e2e tests
screenshot attached.


File Attachments


Screenshot
  
https://reviews.apache.org/media/uploaded/files/2017/01/25/c4e69424-71ad-4d71-b1f4-895bc6a7821e__Screen_Shot_2017-01-24_at_5.03.06_PM.png


Thanks,

Dmitriy Shirchenko



Re: Review Request 55357: AURORA-1867 Consider reserving for multiple tasks per preemption round

2017-01-24 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55357/#review162834
---


Ship it!




Master (86d8f2f) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 24, 2017, 4:42 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55357/
> ---
> 
> (Updated Jan. 24, 2017, 4:42 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1867
> https://issues.apache.org/jira/browse/AURORA-1867
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> To be fair, PendingTaskProcessor interleaves tasks from different groups. 
> However, this fairness comes at the price of increasing reservation time. 
> Even if reservations are being made for the same task group, the processor 
> would still restart iterating through slaves for each task instance. This 
> results in reevaluating all slaves already rejected in a previous search 
> before it finds a new viable candidate.
> 
> This patch improves `PendingTaskProcessor` performance by reducing slave 
> search/evaluation time, at the cost of reduced fairness. 
> `PendingTaskProcessor` now does reservation for a configurable maximum of _N_ 
> candidates per task group in each iteration over the list of slaves.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> fa37236e68657b539b182519b9d46d96d5b0953a 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
> f59f3fd8959b1ba3726b55a2943fb9228a049ac5 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java 
> 67822cafbe89f4798b4ea6da3856663cc4872798 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 23d1c120657d5cb9d294a80c63e8a04512d361ca 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
>  d11ae5883f2a00dca4c4b36f0ab58ea95c7ecb2e 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 67b6d69e3ddd1028dfe9ff451b171cd888674920 
> 
> Diff: https://reviews.apache.org/r/55357/diff/
> 
> 
> Testing
> ---
> 
> As is, the cluster setup in our existing preemption benchmark does not 
> reflect the improvements resulting from this patch. Currently, all existing 
> victims can be preempted, therefore all `PendingTaskProcessor` has to is look 
> at the next slave.
> 
> ```
> BEFORE
> Benchmark   
> (numPendingTasks)   Mode  Cnt   Score   Error  Units
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>   1  thrpt   10  75.386 ± 2.984  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>  10  thrpt   10  74.584 ± 2.598  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
> 100  thrpt   10  79.731 ± 2.182  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
> 1000  thrpt   10  66.386 ± 1.833  ops/s
> 
> AFTER
> Benchmark   
> (numPendingTasks)   Mode  Cnt   Score   Error  Units
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>   1  thrpt   10  78.266 ± 3.290  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>  10  thrpt   10  76.743 ± 2.073  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
> 100  thrpt   10  75.343 ± 1.943  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
> 1000  thrpt   10  68.284 ± 2.413  ops/s
> ```
> 
> I need to further imprpve the cluster setup for this benchmark to reflect the 
> improvements in the patch. A more representative cluster setup would be one 
> in which only a subset of potential victims pass 
> `PreemptionVictimFilter.filterPreemptionVictims()` test.
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55357: AURORA-1867 Consider reserving for multiple tasks per preemption round

2017-01-24 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55357/
---

(Updated Jan. 24, 2017, 8:42 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and 
Zameer Manji.


Changes
---

- Corrected spelling mistake
- Fixed bug in unit test


Bugs: AURORA-1867
https://issues.apache.org/jira/browse/AURORA-1867


Repository: aurora


Description
---

To be fair, PendingTaskProcessor interleaves tasks from different groups. 
However, this fairness comes at the price of increasing reservation time. Even 
if reservations are being made for the same task group, the processor would 
still restart iterating through slaves for each task instance. This results in 
reevaluating all slaves already rejected in a previous search before it finds a 
new viable candidate.

This patch improves `PendingTaskProcessor` performance by reducing slave 
search/evaluation time, at the cost of reduced fairness. `PendingTaskProcessor` 
now does reservation for a configurable maximum of _N_ candidates per task 
group in each iteration over the list of slaves.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
fa37236e68657b539b182519b9d46d96d5b0953a 
  src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
f59f3fd8959b1ba3726b55a2943fb9228a049ac5 
  src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java 
67822cafbe89f4798b4ea6da3856663cc4872798 
  src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
23d1c120657d5cb9d294a80c63e8a04512d361ca 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
 d11ae5883f2a00dca4c4b36f0ab58ea95c7ecb2e 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
67b6d69e3ddd1028dfe9ff451b171cd888674920 

Diff: https://reviews.apache.org/r/55357/diff/


Testing
---

As is, the cluster setup in our existing preemption benchmark does not reflect 
the improvements resulting from this patch. Currently, all existing victims can 
be preempted, therefore all `PendingTaskProcessor` has to is look at the next 
slave.

```
BEFORE
Benchmark   
(numPendingTasks)   Mode  Cnt   Score   Error  Units
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark  
1  thrpt   10  75.386 ± 2.984  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark 
10  thrpt   10  74.584 ± 2.598  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
100  thrpt   10  79.731 ± 2.182  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
1000  thrpt   10  66.386 ± 1.833  ops/s

AFTER
Benchmark   
(numPendingTasks)   Mode  Cnt   Score   Error  Units
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark  
1  thrpt   10  78.266 ± 3.290  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark 
10  thrpt   10  76.743 ± 2.073  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
100  thrpt   10  75.343 ± 1.943  ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark   
1000  thrpt   10  68.284 ± 2.413  ops/s
```

I need to further imprpve the cluster setup for this benchmark to reflect the 
improvements in the patch. A more representative cluster setup would be one in 
which only a subset of potential victims pass 
`PreemptionVictimFilter.filterPreemptionVictims()` test.


Thanks,

Mehrdad Nurolahzade



Re: Review Request 55243: AURORA-1868 Evaluate multiple preemption proposals per round

2017-01-24 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55243/#review162825
---



Ping!

- Mehrdad Nurolahzade


On Jan. 5, 2017, 5:17 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55243/
> ---
> 
> (Updated Jan. 5, 2017, 5:17 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Zameer Manji.
> 
> 
> Bugs: AURORA-1868
> https://issues.apache.org/jira/browse/AURORA-1868
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `TaskScheduler` makes an attempt to preempt already identified candidates 
> through `Preemptor` when it fails to schedule one or more tasks. However, 
> `Preemptor` currently evaluates only one proposal per invocation. A proposal 
> may get vetoed at this point by scheduling filters. If a proposal fails 
> validation the task group might get penalized by `TaskGroups` to give 
> `PendingTaskProcessor` some time to find new preemption candidates; despite 
> the fact that another proposal may already exist in `slotCache`. This penalty 
> might result in expiration of existing proposals in `slotCache`, hence 
> slowing down the overall preemption process.
> 
> This patch modifies `Preemptor` so that it evaluates all existing preemption 
> proposals before giving up.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/preemptor/Preemptor.java 
> 7d2903a47dacfc35f9e547ccb6c5896efe3e013f 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 40c42b1b3aa63797da8dea61732b49155034fea2 
> 
> Diff: https://reviews.apache.org/r/55243/diff/
> 
> 
> Testing
> ---
> 
> ```
> $ ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 12228
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  26m20.055s
> user  0m1.412s
> sys   0m0.705s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55020: AURORA-1835 Expose finer grained offer veto stats

2017-01-24 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55020/#review162820
---



Ping!

- Mehrdad Nurolahzade


On Dec. 23, 2016, 2:46 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55020/
> ---
> 
> (Updated Dec. 23, 2016, 2:46 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Bugs: AURORA-1835
> https://issues.apache.org/jira/browse/AURORA-1835
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1835 Expose finer grained offer veto stats
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/TaskVars.java 
> 6351cc74c152d1f902078154ad14376c19c6ef1a 
>   src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 
> 05cd78f4c7c7d8dd6eeb6f2f9a3e8f7a167f274d 
> 
> Diff: https://reviews.apache.org/r/55020/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep scheduling_veto
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 502370 502370 0  8672k  0 --:--:-- --:--:-- --:--:-- 9811k
> scheduling_veto_insufficient_resources 5
> scheduling_veto_static 5
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>



Re: Review Request 55105: AURORA-1870 Add finer grained timings to the Snapshot process

2017-01-24 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55105/#review162822
---



Ping!

- Mehrdad Nurolahzade


On Jan. 9, 2017, 5:11 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55105/
> ---
> 
> (Updated Jan. 9, 2017, 5:11 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Bugs: AURORA-1870
> https://issues.apache.org/jira/browse/AURORA-1870
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1870 Add finer grained timings to the Snapshot process
> 
> I gave up on `@Timed` interceptor approach because major refactoring is 
> required in order to have snapshot fields instantiated by Guice through 
> `Provider` or `@Provides` interfaces. The abstract class approach is much 
> easier/cleaner.
> 
> 
> Diffs
> -
> 
>   commons/src/main/java/org/apache/aurora/common/stats/SlidingStats.java 
> f7a5ae41e307627fc55157758e9b7cdd861c3268 
>   commons/src/test/java/org/apache/aurora/common/stats/SlidingStatsTest.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 7aa111ec14696ae40f518c42f3c7f45d8ab0e94c 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  f56a1624c7188da175ad3e6de323c3442802f2ea 
> 
> Diff: https://reviews.apache.org/r/55105/diff/
> 
> 
> Testing
> ---
> 
> ```
> $ curl localhost:8081/vars | grep snapshot_restore
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
> snapshot_restore_crons_events 1
> snapshot_restore_crons_events_per_sec 0.0
> snapshot_restore_crons_nanos_per_event 0.0
> snapshot_restore_crons_nanos_total 73648
> snapshot_restore_crons_nanos_total_per_sec 0.0
> snapshot_restore_dbscript_events 1
> snapshot_restore_dbscript_events_per_sec 0.0
> snapshot_restore_dbscript_nanos_per_event 0.0
> snapshot_restore_dbscript_nanos_total 1148842021
> snapshot_restore_dbscript_nanos_total_per_sec 0.0
> snapshot_restore_hosts_events 1
> snapshot_restore_hosts_events_per_sec 0.0
> snapshot_restore_hosts_nanos_per_event 0.0
> snapshot_restore_hosts_nanos_total 76166
> snapshot_restore_hosts_nanos_total_per_sec 0.0
> snapshot_restore_job_updates_events 1
> snapshot_restore_job_updates_events_per_sec 0.0
> snapshot_restore_job_updates_nanos_per_event 0.0
> snapshot_restore_job_updates_nanos_total 49482
> snapshot_restore_job_updates_nanos_total_per_sec 0.0
> snapshot_restore_locks_events 1
> snapshot_restore_locks_events_per_sec 0.0
> snapshot_restore_locks_nanos_per_event 0.0
> snapshot_restore_locks_nanos_total 125084
> snapshot_restore_locks_nanos_total_per_sec 0.0
> snapshot_restore_quota_events 1
> snapshot_restore_quota_events_per_sec 0.0
> snapshot_restore_quota_nanos_per_event 0.0
> snapshot_restore_quota_nanos_total 52305
> snapshot_restore_quota_nanos_total_per_sec 0.0
> snapshot_restore_scheduler_metadata_events 1
> snapshot_restore_scheduler_metadata_events_per_sec 0.0
> snapshot_restore_scheduler_metadata_nanos_per_event 0.0
> snapshot_restore_scheduler_metadata_nanos_total 70816
> snapshot_restore_scheduler_metadata_nanos_total_per_sec 0.0
> snapshot_restore_tasks_events 1
> snapshot_restore_tasks_events_per_sec 0.0
> snapshot_restore_tasks_nanos_per_event 0.0
> snapshot_restore_tasks_nanos_total 91377
> snapshot_restore_tasks_nanos_total_per_sec 0.0
> ```
> 
> ```
> $ aurora_admin scheduler_snapshot devcluster
>  INFO] Response from scheduler: OK (message: )
>  
> $ curl localhost:8081/vars | grep snapshot_save
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 482260 482260 0  3266k  0 --:--:-- --:--:-- --:--:-- 3363k
> snapshot_save_crons_events 1
> snapshot_save_crons_events_per_sec 0.0
> snapshot_save_crons_nanos_per_event 0.0
> snapshot_save_crons_nanos_total 466181
> snapshot_save_crons_nanos_total_per_sec 0.0
> snapshot_save_dbscript_events 1
> snapshot_save_dbscript_events_per_sec 0.0
> snapshot_save_dbscript_nanos_per_event 0.0
> snapshot_save_dbscript_nanos_total 18201542
> snapshot_save_dbscript_nanos_total_per_sec 0.0
> snapshot_save_hosts_events 1
> snapshot_save_hosts_events_per_sec 0.0
> snapshot_save_hosts_nanos_per_event 0.0
> snapshot_save_hosts_nanos_total 1286180
> snapshot_save_hosts_nanos_total_per_sec 0.0
> snapshot_save_job_updates_events 1
> snapshot_save_job_updates_events_per_sec 0.0
> snapshot_save_job_updates_nanos_per_event 0.0
> 

Re: Review Request 55089: AURORA-1826 Expose Thrift server request workload stats

2017-01-24 Thread Mehrdad Nurolahzade

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55089/#review162821
---



Ping!

- Mehrdad Nurolahzade


On Dec. 29, 2016, 9:58 p.m., Mehrdad Nurolahzade wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55089/
> ---
> 
> (Updated Dec. 29, 2016, 9:58 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Stephan Erb.
> 
> 
> Bugs: AURORA-1826
> https://issues.apache.org/jira/browse/AURORA-1826
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> AURORA-1826   Expose Thrift server request workload stats
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  16b1b52f8691d978a9ec1bf7aa0c9716b3484cf0 
>   src/main/java/org/apache/aurora/scheduler/thrift/aop/Measured.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptor.java
>  d57f910d8f9bbe5c24aec960e88d03702bc353da 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  b28cd2489a52041a8e7e53f298fad8d8cd29406f 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/aop/ThriftStatsExporterInterceptorTest.java
>  9c40ec51c28c8c57365dc21c3cd7391a3894784c 
> 
> Diff: https://reviews.apache.org/r/55089/diff/
> 
> 
> Testing
> ---
> 
> ```
> curl 192.168.33.7:8081/vars | grep thrift_workload
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
> 100 413340 413340 0  3695k  0 --:--:-- --:--:-- --:--:-- 4036k
> thrift_workload_addInstances 0
> thrift_workload_createJob 0
> thrift_workload_createOrUpdateCronTemplate 0
> thrift_workload_drainHosts 0
> thrift_workload_endMaintenance 0
> thrift_workload_getConfigSummary 0
> thrift_workload_getJobSummary 0
> thrift_workload_getJobUpdateDetails 0
> thrift_workload_getJobUpdateSummaries 0
> thrift_workload_getJobs 0
> thrift_workload_getPendingReason 0
> thrift_workload_getRoleSummary 0
> thrift_workload_getTaskStatus 0
> thrift_workload_getTasksWithoutConfigs 0
> thrift_workload_killTasks 0
> thrift_workload_maintenanceStatus 0
> thrift_workload_restartShards 0
> thrift_workload_rewriteConfigs 0
> thrift_workload_startJobUpdate 0
> thrift_workload_startMaintenance 0
> ```
> 
> ```
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> ...
> 
> *** OK (All tests passed) ***
> 
> mesos-master start/running, process 2359
> + RETCODE=0
> + restore_netrc
> + mv /home/vagrant/.netrc.bak /home/vagrant/.netrc
> + true
> Connection to 127.0.0.1 closed.
> 
> real  28m58.389s
> user  0m1.508s
> sys   0m0.820s
> ```
> 
> 
> Thanks,
> 
> Mehrdad Nurolahzade
> 
>