A possible (hacky?) workaround would be:

1. Create a DAG that is scheduled to run every 1 minute with  max_active_runs=2 
and catchup=False
2. As the first task in the DAG set as the sensor which needs wait on when the 
DAG in Airflow should be triggered and set task_concurrency to 1
3. As the second task (downstream from the 1st task) have the 
TriggerDagRunOperator

This should work in the use case that you're not needing the triggered DAG to 
run more than once a minute. Also you will need to handle with care the sensor 
logic so the next sensor doesn't trigger the same Trigger DAG.

Damian

-----Original Message-----
From: bharath palaksha <[email protected]> 
Sent: Friday, February 14, 2020 05:33
To: [email protected]
Subject: Re: Trigger based dag run

1. External system which i am thinking of doesn't have the ability to run code 
2. If we use first task as sensor, how is the dag run getting created for the 
first task which is a sensor to run?

Running user code on scheduler does seems problematic, can think of some other 
way for that. I will think about it

Thanks,
Bharath


On Fri, Feb 14, 2020 at 3:57 PM Ash Berlin-Taylor <[email protected]> wrote:

> If you have the ability to run code from the external system you might 
> want to consider using the ("experimental") API to trigger the dag run 
> from the external system?
>
>
> http://airflow.apache.org/docs/stable/api.html#post--api-experimental-
> dags--DAG_ID--dag_runs When using the API doesn't work for you the 
> common approach I have seen is as you hint at -- having a "trigger" 
> dag that runs (frequently depending on your needs), checks the 
> external condition and uses TriggerDagRunOperator.
> The other way I have seen this done is to just have the first task of 
> your dag be a sensor that checks/waits on the external resource. With 
> the recently added "reschedule" mode of sensors this also doesn't tie 
> up a worker slot when the sensor isn't running. This is the approach I 
> have used in the past when processingly weekly datasets that would 
> appear anywhere in a 72 hour window after the expected delivery time.
>
> Given these options exist I'm not quite sure I see the need for a new 
> parameter to the DAG (especially one which runs user code in the 
> scheduler, that gets quite a strong no from me) Could you perhaps 
> explain your idea in more detail, specifically how it fits in to your 
> workflow, and why you don't want to use the two methods I talked about here?
> Thanks,
> Ash
> On Feb 14 2020, at 10:10 am, bharath palaksha <[email protected]>
> wrote:
> > Hi,
> >
> > I have been using airflow extensively in my current work at Walmart Labs.
> > while working on our requirements, came across a functionality which 
> > is missing in airflow and if implemented will be very useful.
> > Currently, Airflow is a schedule based workflow management, a cron 
> > expression defines the creation of dag runs. If there is a 
> > dependency on
> a
> > different dag - TriggerDagRunOperator helps in creating dag runs.
> > Suppose, there is a dependency which is outside of Airflow cluster eg:
> > different database, filesystem or an event from an API which is an
> upstream
> > dependency. There is no way in Airflow to achieve this unless we
> schedule a
> > DAG for a a very short interval and allow it to poll.
> >
> > To solve above issue, what if airflow takes 2 different args - 
> > schedule_interval and trigger_sensor.
> >
> > - schedule_interval - works the same way as it is already working 
> > now
> > - trigger_sensor - accepts a sensor which returns true when an event 
> > is sensed and this in turn creates a dag run
> >
> > If you specify both the argument, schedule_interval takes precedence.
> > Scheduler parses all DAGs in a loop for every heartbeat and checks 
> > for
> DAGs
> > which has reached scheduled time and creates DAG run, same loop can 
> > also check for trigger_sensor and if argument is set - check if it 
> > returns
> true
> > to create dag run. This might slow down scheduler as it has to 
> > execute sensors now, we can find some other way to avoid slowness.
> > Can we create AIP for this? Any thoughts?
> >
> > Thanks,
> > Bharath
> >
>
>



=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Reply via email to