Hi,

I have been using airflow extensively in my current work at Walmart Labs.
while working on our requirements, came across a functionality which is
missing in airflow and if implemented will be very useful.
Currently, Airflow is a schedule based workflow management, a cron
expression defines the creation of dag runs. If there is a dependency on a
different dag - TriggerDagRunOperator helps in creating dag runs.
Suppose, there is a dependency which is outside of Airflow cluster eg:
different database, filesystem or an event from an API which is an upstream
dependency. There is no way in Airflow to achieve this unless we schedule a
DAG for a a very short interval and allow it to poll.

To solve above issue, what if airflow takes 2 different args -
schedule_interval
and trigger_sensor.

   - schedule_interval - works the same way as it is already working now
   - trigger_sensor - accepts a sensor which returns true when an event is
   sensed and this in turn creates a dag run

If you specify both the argument, schedule_interval takes precedence.
Scheduler parses all DAGs in a loop for every heartbeat and checks for DAGs
which has reached scheduled time and creates DAG run, same loop can also
check for trigger_sensor and if argument is set - check if it returns true
to create dag run. This might slow down scheduler as it has to execute
sensors now, we can find some other way to avoid slowness.
Can we create AIP for this? Any thoughts?

Thanks,
Bharath

Reply via email to