Hi Folks, I'm currently working on a feature on aurora scheduler and executor. The implementation strategy became controversial on the review board, so I was wondering if I should broadcast it to more audience and initiate a discussion. Please feel free to let me know your thoughts, your help is greatly appreciated!
The high level goal of this feature is to improve reliability and performance of the Aurora scheduler job updater, by relying on health check status rather than watch_secs timeout when deciding an individual instance update state. Please see the original review request https://reviews.apache.org/r/51536/ aurora JIRA ticket https://issues.apache.org/jira/browse/AURORA-894 design doc https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# for more details and background. Note: The design doc becomes a little bit outdated on the "scheduler change summary" part (this is what the review request trying to address). As a result, I've left some comment to clarify the latest proposed implementation plan for scheduler change. There are two questions I'm trying to address here: 1. How does the scheduler infer the executor version and be backward compatible? 2. Where do we determine if health check is enabled? In short, there are 3 different solutions proposed on the review board. In the first two approaches, the scheduler will rely on a string to determine the executor version. We determine whether health check is enabled merely on executor side. There will be communication between the executor and the scheduler. Solution 1: vCurrent executor sends a message in its health check thread during RUNNING state transition, and the vCurrent updater will infer the executor version from the presence of this message, and skip the watch_secs if necessary. Solution 2: Instead of relying on the presence of an arbitrary string in the message, rely on the presence of a string like: "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just formalizing the mechanism and making it a bit more future proof. In the third solution, the scheduler infers the executor version from the JobUpdateSettings on scheduler side. Solution 3: Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if that is set, the scheduler assumes that the transition from STARTING -> RUNNING makes the executor healthy and concurrently, we release thermos and change HealthCheckConfig to say that it should only go to running after healthy. Pros and Cons: The main benefit of Solution 1 is: 1. By using the message in task status update, we don't have to make any schema change, which makes the design simple. 2. The feature is fully backward-compatible. When we roll out the vCurrent schedulers and executors, we do not have to instruct the users to provide additional field in the Job or Update configs, which could confuses customers when the vPrev and vCurrent executor coexist in the cluster. Concerns: Relying on the presence of a message makes things brittle. Also we do not want to expose this message to users. The benefit of Solution 2 is making the feature more future proof. However, if we do not envision a new executor feature in the short term, it's not too much different from Solution 1. The benefits of Solution 3 include: 1. We support more than just thermos now (and others rely on custom executors). 2. A lot of things in Aurora treat the executor as opaque. The status update message sent by executor should not be visible to users only if it's an error message. Concerns: 1. In addition to the ‘executorDrivenUpdates' bit that identifies the executor version, we still need to notify the scheduler if health check is enabled on vCurrent executor, if not, the scheduler must be able to fall back to use watch_secs. 2. The users have to provide an additional field in their .aurora config files. The feature wouldn't be available unless new clients are rolled out as well. Please let me know if I understand your suggestions correctly and hopefully everyone is on the same page! Thanks, Kai