Re: [AIP-35] Add Signal Based Scheduling To Airflow

Chris Palmer Tue, 16 Jun 2020 10:38:17 -0700

Nicholas,

Are you saying that you actually have tasks in Airflow that are intended to
run indefinitely?


That in of itself seems to be a huge fundamental departure from many of the
assumptions built into Airflow.

Chris

On Tue, Jun 16, 2020 at 12:00 PM Gerard Casas Saez
<[email protected]> wrote:

> That looks interesting. My main worry is how to handle multiple executions
> of Signal based operators. If I follow your definition correctly, the idea
> is to run multiple evaluations of the online trained model (on a
> permanently running DAGRun). So what happens when you have triggered the
> downstream operators using a signal once? Do you clear state of the
> operators once a new signal is generated? Do you generate a new DAGRun?
>
> How do you evaluate the model several times while keeping a history of
> operator execution?
>
> Related to this, TFX has a similar proposal for ASYNC DAGs, which
> basically describe something similar to what you are proposing in this AIP:
> https://github.com/tensorflow/community/pull/253 (interesting read as
> it’s also related to the ML field).
>
> My main concern would be that you may need to change a few more things to
> support signal based triggers, as the relationship between upstream tasks
> to downstream tasks is no longer 1:1 but 1:many.
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez
> On Jun 16, 2020, 7:00 AM -0600, 蒋晓峰 <[email protected]>, wrote:
> > Hello everyone,
> >
> > Sending a message to everyone and collecting feedback on the AIP-35 on
> > adding signal-based scheduling. This was previously briefly mentioned in
> > the discussion of development slack channel. The key motivation of this
> > proposal is to support a mixture of batch and stream jobs in the same
> > workflow.
> >
> > In practice, there are many business logics that need collaboration
> between
> > streaming and batch jobs. For example, in machine learning, an online
> > learning job is a streaming job running forever. It emits a model
> > periodically and a batch evaluate job should run to validate each model.
> So
> > this is a streaming-triggered-batch case. If we continue from the above
> > example, once the model passes the validation, the model needs to be
> > deployed to a serving job. That serving job could be a streaming job that
> > keeps polling records from Kafka and uses a model to do prediction. And
> the
> > availability of a new model requires that streaming prediction job either
> > to restart, or to reload the new model on the fly. In either case, it is
> a
> > batch-to-streaming job triggering.
> >
> > At this point above, people basically have to invent their own way to
> deal
> > with such workflows consisting of both batch and streaming jobs. I think
> > having a system that can help smoothly work with a mixture of streaming
> and
> > batch jobs is valuable here.
> >
> > This AIP-35 focuses on signal-based scheduling to let the operators send
> > signals to the scheduler to trigger a scheduling action, such as starting
> > jobs, stopping jobs and restarting jobs. With the change of
> > scheduling mechanism, a streaming job can send signals to the scheduler
> to
> > indicate the state change of the dependencies.
> >
> > Signal-based scheduling allows the scheduler to know the change of the
> > dependency state immediately without periodically querying the metadata
> > database. This also allows potential support for richer scheduling
> > semantics such as periodic execution and manual trigger at per operator
> > granularity.
> >
> > Changes proposed:
> >
> > - Signals are used to define conditions that must be met to run an
> > operator. State change of the upstream tasks is one type of the signals.
> > There may be other types of signals. The scheduler may take different
> > actions when receiving different signals. To let the operators take
> signals
> > as their starting condition, we propose to introduce SignalOperator which
> > is mentioned in the public interface section.
> > - A notification service is necessary to help receive and propagate the
> > signals from the operators and other sources to the scheduler. Upon
> > receiving a signal, the scheduler can take action according to the
> > predefined signal-based conditions on the operators. Therefore we propose
> > to introduce a Signal Notification Service component to Airflow.
> >
> > Please see related documents and PRs for details:
> >
> > AIP:
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-35+Add+Signal+Based+Scheduling+To+Airflow
> > Issue: https://github.com/apache/airflow/issues/9321
> >
> > Please let me know if there are any aspects that you agree/disagree with
> or
> > need more clarification (especially the SignalOperator and
> SignalService).
> > Any comments are welcome and I am looking forward to it!
> >
> > Thanks,
> > Nicholas
>

Re: [AIP-35] Add Signal Based Scheduling To Airflow

Reply via email to