Hello everyone, Sending a message to everyone and collecting feedback on the AIP-35 on adding signal-based scheduling. This was previously briefly mentioned in the discussion of development slack channel. The key motivation of this proposal is to support a mixture of batch and stream jobs in the same workflow.
In practice, there are many business logics that need collaboration between streaming and batch jobs. For example, in machine learning, an online learning job is a streaming job running forever. It emits a model periodically and a batch evaluate job should run to validate each model. So this is a streaming-triggered-batch case. If we continue from the above example, once the model passes the validation, the model needs to be deployed to a serving job. That serving job could be a streaming job that keeps polling records from Kafka and uses a model to do prediction. And the availability of a new model requires that streaming prediction job either to restart, or to reload the new model on the fly. In either case, it is a batch-to-streaming job triggering. At this point above, people basically have to invent their own way to deal with such workflows consisting of both batch and streaming jobs. I think having a system that can help smoothly work with a mixture of streaming and batch jobs is valuable here. This AIP-35 focuses on signal-based scheduling to let the operators send signals to the scheduler to trigger a scheduling action, such as starting jobs, stopping jobs and restarting jobs. With the change of scheduling mechanism, a streaming job can send signals to the scheduler to indicate the state change of the dependencies. Signal-based scheduling allows the scheduler to know the change of the dependency state immediately without periodically querying the metadata database. This also allows potential support for richer scheduling semantics such as periodic execution and manual trigger at per operator granularity. Changes proposed: - Signals are used to define conditions that must be met to run an operator. State change of the upstream tasks is one type of the signals. There may be other types of signals. The scheduler may take different actions when receiving different signals. To let the operators take signals as their starting condition, we propose to introduce SignalOperator which is mentioned in the public interface section. - A notification service is necessary to help receive and propagate the signals from the operators and other sources to the scheduler. Upon receiving a signal, the scheduler can take action according to the predefined signal-based conditions on the operators. Therefore we propose to introduce a Signal Notification Service component to Airflow. Please see related documents and PRs for details: AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-35+Add+Signal+Based+Scheduling+To+Airflow Issue: https://github.com/apache/airflow/issues/9321 Please let me know if there are any aspects that you agree/disagree with or need more clarification (especially the SignalOperator and SignalService). Any comments are welcome and I am looking forward to it! Thanks, Nicholas