----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56723/ -----------------------------------------------------------
(Updated Feb. 15, 2017, 6:24 p.m.) Review request for Aurora, David McLaughlin and Santhosh Kumar Shanmugham. Bugs: AURORA-1890 https://issues.apache.org/jira/browse/AURORA-1890 Repository: aurora Description ------- Currently the scheduler causes all coordinated ("pulsed") updates into ROLL_FORWARD_AWAITING_PULSE, or ROLL_BACK_AWAITING_PULSE on scheduler startup/recovery. This is because the last pulse timestamp is not durably stored and the timestamp of the last pulse is set to 0L (aka no pulse yet). In cases where the pulse timeout is larger and the failover is fast or frequent, this casues many updates to unnecessarily transition into a pulse related state until the next pulse. It is posible to avoid these uncessary transitons by traversing the job update events and initializing the last pulse timestamp to the last event if the last event was not a pulse event. Diffs (updated) ----- api/src/main/thrift/org/apache/aurora/gen/api.thrift efd4e534c4ad90862d7a9fae437ed724da3a34dc src/main/java/org/apache/aurora/scheduler/base/Jobs.java 49e5b2cfc0b84bb0e0c95cca375cd0503f9dcdb5 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 729c1234a2e27f1e756ddfd6a4e5a04fa20bbd7f src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java ea0b89a232c2fc10f2183218b750bb0478d51a58 Diff: https://reviews.apache.org/r/56723/diff/ Testing ------- Thanks, Zameer Manji