1. External system which i am thinking of doesn't have the ability to run
code
2. If we use first task as sensor, how is the dag run getting created for
the first task which is a sensor to run?

Running user code on scheduler does seems problematic, can think of some
other way for that. I will think about it

Thanks,
Bharath


On Fri, Feb 14, 2020 at 3:57 PM Ash Berlin-Taylor <[email protected]> wrote:

> If you have the ability to run code from the external system you might
> want to consider using the ("experimental") API to trigger the dag run from
> the external system?
>
>
> http://airflow.apache.org/docs/stable/api.html#post--api-experimental-dags--DAG_ID--dag_runs
> When using the API doesn't work for you the common approach I have seen is
> as you hint at -- having a "trigger" dag that runs (frequently depending on
> your needs), checks the external condition and uses TriggerDagRunOperator.
> The other way I have seen this done is to just have the first task of your
> dag be a sensor that checks/waits on the external resource. With the
> recently added "reschedule" mode of sensors this also doesn't tie up a
> worker slot when the sensor isn't running. This is the approach I have used
> in the past when processingly weekly datasets that would appear anywhere in
> a 72 hour window after the expected delivery time.
>
> Given these options exist I'm not quite sure I see the need for a new
> parameter to the DAG (especially one which runs user code in the scheduler,
> that gets quite a strong no from me) Could you perhaps explain your idea in
> more detail, specifically how it fits in to your workflow, and why you
> don't want to use the two methods I talked about here?
> Thanks,
> Ash
> On Feb 14 2020, at 10:10 am, bharath palaksha <[email protected]>
> wrote:
> > Hi,
> >
> > I have been using airflow extensively in my current work at Walmart Labs.
> > while working on our requirements, came across a functionality which is
> > missing in airflow and if implemented will be very useful.
> > Currently, Airflow is a schedule based workflow management, a cron
> > expression defines the creation of dag runs. If there is a dependency on
> a
> > different dag - TriggerDagRunOperator helps in creating dag runs.
> > Suppose, there is a dependency which is outside of Airflow cluster eg:
> > different database, filesystem or an event from an API which is an
> upstream
> > dependency. There is no way in Airflow to achieve this unless we
> schedule a
> > DAG for a a very short interval and allow it to poll.
> >
> > To solve above issue, what if airflow takes 2 different args -
> > schedule_interval
> > and trigger_sensor.
> >
> > - schedule_interval - works the same way as it is already working now
> > - trigger_sensor - accepts a sensor which returns true when an event is
> > sensed and this in turn creates a dag run
> >
> > If you specify both the argument, schedule_interval takes precedence.
> > Scheduler parses all DAGs in a loop for every heartbeat and checks for
> DAGs
> > which has reached scheduled time and creates DAG run, same loop can also
> > check for trigger_sensor and if argument is set - check if it returns
> true
> > to create dag run. This might slow down scheduler as it has to execute
> > sensors now, we can find some other way to avoid slowness.
> > Can we create AIP for this? Any thoughts?
> >
> > Thanks,
> > Bharath
> >
>
>

Reply via email to