-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51763/
-----------------------------------------------------------

(Updated Sept. 14, 2016, 11:12 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
-------

Rebasing and comments.


Repository: aurora


Description
-------

This is the second part of the `BatchWorker` conversion work that moves cron 
jobs to use non-blocking kill followups and reduces the number of trigger 
threads. See https://reviews.apache.org/r/51759 for more background on the 
`BatchWorker`.

#####Problem
The current implementation of the cron scheduling relies on a large number of 
threads (`cron_scheduler_num_threads`=100) to support cron triggering and 
killing existing tasks according to `KILL_EXISTING` collision policy. This 
creates large spikes of activities at synchronized intervals as users tend to 
schedule their cron runs around similar schedules. Moreover, the current 
implementation re-acquires write locks multiple times to deliver on 
`KILL_EXISTING` policy. 

#####Remediation
Trigger level batching is still done in a blocking way but multiple cron 
triggers may be bundled together to share the same write transaction. Any 
followups, however, are performed in a non-blocking way by relying on a 
`BatchWorker.executeWithReplay()` and the `BatchWorkCompleted` notification. In 
order to still ensure non-concurrent execution of a given job key trigger, a 
token (job key) is saved within the trigger itself. A concurrent trigger will 
bail if a kill followup is still in progress (token is set AND no entry in 
`killFollowups` set exists yet).

#####Results
The above approach allowed reducing the number of cron threads to 10 and likely 
can be reduced even further. See https://reviews.apache.org/r/51759 for the 
lock contention results.


Diffs (updated)
-----

  commons/src/main/java/org/apache/aurora/common/util/BackoffHelper.java 
8e73dd9ebc43e06f696bbdac4d658e4b225e7df7 
  commons/src/test/java/org/apache/aurora/common/util/BackoffHelperTest.java 
bc30990d57f444f7d64805ed85c363f1302736d0 
  config/findbugs/excludeFilter.xml fe3f4ca5db1484124af14421a3349950dfec8519 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJob.java 
c07551e94f9221b5b21c5dc9715e82caa290c2e8 
  src/main/java/org/apache/aurora/scheduler/cron/quartz/CronModule.java 
155d702d68367b247dd066f773c662407f0e3b5b 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/AuroraCronJobTest.java 
5c64ff2994e200b3453603ac5470e8e152cebc55 
  src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
1c0a3fa84874d7bc185b78f13d2664cb4d8dd72f 

Diff: https://reviews.apache.org/r/51763/diff/


Testing
-------

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko

Reply via email to