Vincent, I think "dag.catchup = False" affects the whole DAG, means skipping all tasks in it. While "LatestOnlyOperator" can be used to skip only some of the tasks in a DAG as well.
On Wed, Mar 22, 2017 at 7:05 PM, Vincent Poulain < vincent.poul...@tinyclues.com> wrote: > I did not see the clear explanation there : > http://airflow.incubator.apache.org/concepts.html? > highlight=provide_context#latest-run-only > > All good! > > On Wed, Mar 22, 2017 at 2:22 PM, Vincent Poulain < > vincent.poul...@tinyclues.com> wrote: > > > Sid, in your example what is the difference between using the > > LatestOnlyOperator & set catch_up feature to False ? "[The catch up > > feature] kick off a DAG Run for any interval that has not been run" > > I am still learning Airflow concepts too.. > > > > Thanks! > > > > On Tue, Mar 21, 2017 at 10:31 PM, Ruslan Dautkhanov < > dautkha...@gmail.com> > > wrote: > > > >> Thank you for the detailed explanation Boris. > >> > >> > >> Best regards, > >> > >> Ruslan Dautkhanov > >> > >> On Mon, Mar 20, 2017 at 12:12 PM, Boris Tyukin <bo...@boristyukin.com> > >> wrote: > >> > >> > depends_on_past is looking at previous task instance which sounds the > >> same > >> > as "latestonly" but the difference becomes apparent if you look at > this > >> > example. > >> > > >> > Let's say you have a dag, scheduled to run every day and it has been > >> > failing for the past 3 days. The whole purpose of that dag is to > >> populate > >> > snapshot table or do a daily backup. If you use depends on past, you > >> would > >> > have to rerun all missed runs or mark them as successful eventually > >> doing > >> > useless work (3 daily snapshots or backups for the same data). > >> > > >> > LatestOnly allows you to bypass missed runs and just do it once for > most > >> > recent instance. > >> > > >> > Another difference, depends on past is tricky if you use > BranchOperator > >> > because some branches may not run one day and run another - it will > >> really > >> > mess up your logic. > >> > > >> > On Mon, Mar 20, 2017 at 12:45 PM, Ruslan Dautkhanov < > >> dautkha...@gmail.com> > >> > wrote: > >> > > >> > > Thanks Boris. It does make sense. > >> > > Although how it's different from depends_on_past task-level > parameter? > >> > > In both cases, a task will be skipped if there is another TI of this > >> task > >> > > is still running (from a previous dagrun), right? > >> > > > >> > > > >> > > Thanks, > >> > > Ruslan > >> > > > >> > > > >> > > On Sat, Mar 18, 2017 at 7:11 PM, Boris Tyukin < > bo...@boristyukin.com> > >> > > wrote: > >> > > > >> > > > you would just chain them - there is an example that came with > >> airflow > >> > > 1.8 > >> > > > https://github.com/apache/incubator-airflow/blob/master/ > >> > > > airflow/example_dags/example_latest_only.py > >> > > > > >> > > > so in your case, instead of dummy operator, you would use your > >> Oracle > >> > > > operator. > >> > > > > >> > > > Does it make sense? > >> > > > > >> > > > > >> > > > On Sat, Mar 18, 2017 at 7:12 PM, Ruslan Dautkhanov < > >> > dautkha...@gmail.com > >> > > > > >> > > > wrote: > >> > > > > >> > > > > Is there is a way to combine scheduling behavior operators > (like > >> > this > >> > > > > LatestOnlyOperator) > >> > > > > with a functional operator (like Oracle_Operator)? I was > thinking > >> > > > multiple > >> > > > > inheritance would do,like > >> > > > > > >> > > > > > class Oracle_LatestOnly_Operator (Oracle_Operator, > >> > > LatestOnlyOperator): > >> > > > > > ... > >> > > > > > >> > > > > I might be overthinking this and there could be a simpler way? > >> > > > > Sorry, I am still learning Airflow concepts... > >> > > > > > >> > > > > Thanks. > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > Ruslan Dautkhanov > >> > > > > > >> > > > > On Sat, Mar 18, 2017 at 2:15 PM, Boris Tyukin < > >> bo...@boristyukin.com > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > Thanks George for that feature! > >> > > > > > > >> > > > > > sure, just created a jira on this > >> > > > > > https://issues.apache.org/jira/browse/AIRFLOW-1008 > >> > > > > > > >> > > > > > > >> > > > > > On Sat, Mar 18, 2017 at 12:05 PM, siddharth anand < > >> > san...@apache.org > >> > > > > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Thx Boris . Credit goes to George (gwax) for the > >> implementation > >> > of > >> > > > the > >> > > > > > > LatestOnlyOperator. > >> > > > > > > > >> > > > > > > Boris, > >> > > > > > > Can you describe what you mean in a Jira? > >> > > > > > > -s > >> > > > > > > > >> > > > > > > On Fri, Mar 17, 2017 at 6:02 PM, Boris Tyukin < > >> > > bo...@boristyukin.com > >> > > > > > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > this is nice indeed along with the new catchup option > >> > > > > > > > https://airflow.incubator.apache.org/scheduler.html# > >> > > > > > backfill-and-catchup > >> > > > > > > > > >> > > > > > > > Thanks Sid and Ben for adding these new options! > >> > > > > > > > > >> > > > > > > > for a complete picture, it would be nice to force only one > >> dag > >> > > run > >> > > > at > >> > > > > > the > >> > > > > > > > time. > >> > > > > > > > > >> > > > > > > > On Fri, Mar 17, 2017 at 7:33 PM, siddharth anand < > >> > > > san...@apache.org> > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > With the Apache Airflow 1.8 release imminent, you may > >> want to > >> > > try > >> > > > > out > >> > > > > > > the > >> > > > > > > > > > >> > > > > > > > > *LatestOnlyOperator.* > >> > > > > > > > > > >> > > > > > > > > If you want your DAG to only run on the most recent > >> scheduled > >> > > > slot, > >> > > > > > > > > regardless of backlog, this operator will skip running > >> > > downstream > >> > > > > > tasks > >> > > > > > > > for > >> > > > > > > > > all DAG Runs prior to the current time slot. > >> > > > > > > > > > >> > > > > > > > > For example, I might have a DAG that takes a DB snapshot > >> > once a > >> > > > > day. > >> > > > > > It > >> > > > > > > > > might be that I paused that DAG for 2 weeks or that I > had > >> set > >> > > the > >> > > > > > start > >> > > > > > > > > date to a fixed data 2 weeks in the past. When I enable > my > >> > > DAG, I > >> > > > > > don't > >> > > > > > > > > want it to run 14 days' worth of snapshots for the > current > >> > > state > >> > > > of > >> > > > > > the > >> > > > > > > > DB > >> > > > > > > > > -- that's unnecessary work. > >> > > > > > > > > > >> > > > > > > > > The LatestOnlyOperator avoids that work. > >> > > > > > > > > > >> > > > > > > > > https://github.com/apache/incubator-airflow/commit/ > >> > > > > > > > > edf033be65b575f44aa221d5d0ec9ecb6b32c67a > >> > > > > > > > > > >> > > > > > > > > With it, you can simply use > >> > > > > > > > > latest_only = LatestOnlyOperator(task_id='latest_only', > >> > > dag=dag) > >> > > > > > > > > > >> > > > > > > > > instead of > >> > > > > > > > > def skip_to_current_job(ds, **kwargs): > >> > > > > > > > > now = datetime.now() > >> > > > > > > > > left_window = kwargs['dag'].following_ > >> > > > > > schedule(kwargs['execution_ > >> > > > > > > > > date']) > >> > > > > > > > > right_window = kwargs['dag'].following_ > >> > > schedule(left_window) > >> > > > > > > > > logging.info(('Left Window {}, Now {}, Right Window > >> > > > > > > > > {}').format(left_window,now,right_window)) > >> > > > > > > > > if not now <= right_window: > >> > > > > > > > > logging.info('Not latest execution, skipping > >> > > > downstream.') > >> > > > > > > > > return False > >> > > > > > > > > return True > >> > > > > > > > > > >> > > > > > > > > short_circuit = ShortCircuitOperator( > >> > > > > > > > > task_id = 'short_circuit_if_not_current_job', > >> > > > > > > > > provide_context = True, > >> > > > > > > > > python_callable = skip_to_current_job, > >> > > > > > > > > dag = dag > >> > > > > > > > > ) > >> > > > > > > > > > >> > > > > > > > > -s > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > > > > > -- > > > > > > > > *Vincent Poulain* > > > > Senior Software Engineer > > > > > > > > Office +33 1 75 50 67 26 <+33%201%2075%2050%2067%2026> | Mobile +33 6 21 > > 82 87 62 | vinc...@tinyclues.com <supp...@tinyclues.com> > > > > Tinyclues | 51 rue Étienne Marcel, 75001 Paris > > > > www.tinyclues.com <http://bit.ly/2hNL4Fs> | @tinyclues > > <https://twitter.com/Tinyclues> > > > > > > -- > > > > *Vincent Poulain* > > Senior Software Engineer > > > > Office +33 1 75 50 67 26 <+33%201%2075%2050%2067%2026> | Mobile +33 6 21 > 82 > 87 62 | vinc...@tinyclues.com <supp...@tinyclues.com> > > Tinyclues | 51 rue Étienne Marcel, 75001 Paris > > www.tinyclues.com <http://bit.ly/2hNL4Fs> | @tinyclues > <https://twitter.com/Tinyclues> >