Hi, I have been using airflow extensively in my current work at Walmart Labs. while working on our requirements, came across a functionality which is missing in airflow and if implemented will be very useful. Currently, Airflow is a schedule based workflow management, a cron expression defines the creation of dag runs. If there is a dependency on a different dag - TriggerDagRunOperator helps in creating dag runs. Suppose, there is a dependency which is outside of Airflow cluster eg: different database, filesystem or an event from an API which is an upstream dependency. There is no way in Airflow to achieve this unless we schedule a DAG for a a very short interval and allow it to poll.
To solve above issue, what if airflow takes 2 different args - schedule_interval and trigger_sensor. - schedule_interval - works the same way as it is already working now - trigger_sensor - accepts a sensor which returns true when an event is sensed and this in turn creates a dag run If you specify both the argument, schedule_interval takes precedence. Scheduler parses all DAGs in a loop for every heartbeat and checks for DAGs which has reached scheduled time and creates DAG run, same loop can also check for trigger_sensor and if argument is set - check if it returns true to create dag run. This might slow down scheduler as it has to execute sensors now, we can find some other way to avoid slowness. Can we create AIP for this? Any thoughts? Thanks, Bharath
