Re: Trigger based dag run

bharath palaksha Fri, 14 Feb 2020 08:00:41 -0800

I have kind of the done the same hack now, but I am starting this
discussion thread to see if we can make a feature out of this. As Ash said,
making scheduler run user code is against the design and I also had
realized it will make the scheduler thread slower. I will try to come up
with a different solution to implement this.


Does AIP need to have solution explained ? Can't it have just the problem
statement, so that it helps to track and it can be used for discussion

On Fri, Feb 14, 2020, 8:39 PM Shaw, Damian P. <
[email protected]> wrote:

> A possible (hacky?) workaround would be:
>
> 1. Create a DAG that is scheduled to run every 1 minute with
> max_active_runs=2 and catchup=False
> 2. As the first task in the DAG set as the sensor which needs wait on when
> the DAG in Airflow should be triggered and set task_concurrency to 1
> 3. As the second task (downstream from the 1st task) have the
> TriggerDagRunOperator
>
> This should work in the use case that you're not needing the triggered DAG
> to run more than once a minute. Also you will need to handle with care the
> sensor logic so the next sensor doesn't trigger the same Trigger DAG.
>
> Damian
>
> -----Original Message-----
> From: bharath palaksha <[email protected]>
> Sent: Friday, February 14, 2020 05:33
> To: [email protected]
> Subject: Re: Trigger based dag run
>
> 1. External system which i am thinking of doesn't have the ability to run
> code 2. If we use first task as sensor, how is the dag run getting created
> for the first task which is a sensor to run?
>
> Running user code on scheduler does seems problematic, can think of some
> other way for that. I will think about it
>
> Thanks,
> Bharath
>
>
> On Fri, Feb 14, 2020 at 3:57 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> > If you have the ability to run code from the external system you might
> > want to consider using the ("experimental") API to trigger the dag run
> > from the external system?
> >
> >
> > http://airflow.apache.org/docs/stable/api.html#post--api-experimental-
> > dags--DAG_ID--dag_runs When using the API doesn't work for you the
> > common approach I have seen is as you hint at -- having a "trigger"
> > dag that runs (frequently depending on your needs), checks the
> > external condition and uses TriggerDagRunOperator.
> > The other way I have seen this done is to just have the first task of
> > your dag be a sensor that checks/waits on the external resource. With
> > the recently added "reschedule" mode of sensors this also doesn't tie
> > up a worker slot when the sensor isn't running. This is the approach I
> > have used in the past when processingly weekly datasets that would
> > appear anywhere in a 72 hour window after the expected delivery time.
> >
> > Given these options exist I'm not quite sure I see the need for a new
> > parameter to the DAG (especially one which runs user code in the
> > scheduler, that gets quite a strong no from me) Could you perhaps
> > explain your idea in more detail, specifically how it fits in to your
> > workflow, and why you don't want to use the two methods I talked about
> here?
> > Thanks,
> > Ash
> > On Feb 14 2020, at 10:10 am, bharath palaksha <[email protected]>
> > wrote:
> > > Hi,
> > >
> > > I have been using airflow extensively in my current work at Walmart
> Labs.
> > > while working on our requirements, came across a functionality which
> > > is missing in airflow and if implemented will be very useful.
> > > Currently, Airflow is a schedule based workflow management, a cron
> > > expression defines the creation of dag runs. If there is a
> > > dependency on
> > a
> > > different dag - TriggerDagRunOperator helps in creating dag runs.
> > > Suppose, there is a dependency which is outside of Airflow cluster eg:
> > > different database, filesystem or an event from an API which is an
> > upstream
> > > dependency. There is no way in Airflow to achieve this unless we
> > schedule a
> > > DAG for a a very short interval and allow it to poll.
> > >
> > > To solve above issue, what if airflow takes 2 different args -
> > > schedule_interval and trigger_sensor.
> > >
> > > - schedule_interval - works the same way as it is already working
> > > now
> > > - trigger_sensor - accepts a sensor which returns true when an event
> > > is sensed and this in turn creates a dag run
> > >
> > > If you specify both the argument, schedule_interval takes precedence.
> > > Scheduler parses all DAGs in a loop for every heartbeat and checks
> > > for
> > DAGs
> > > which has reached scheduled time and creates DAG run, same loop can
> > > also check for trigger_sensor and if argument is set - check if it
> > > returns
> > true
> > > to create dag run. This might slow down scheduler as it has to
> > > execute sensors now, we can find some other way to avoid slowness.
> > > Can we create AIP for this? Any thoughts?
> > >
> > > Thanks,
> > > Bharath
> > >
> >
> >
>
>
>
> ===============================================================================
>
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>
>

Re: Trigger based dag run

Reply via email to