This is a change to one of Airflow's core concepts, and it would require a lot of work for existing DAGs to cut over to it. Given that, my personal preference would be to allow arbitrary customization rather than just a bit toggle. Such as allowing passing in a mapping function: given an interval's start date and end date, when should it be executed?
On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > Happy for it as well. There are a number of cases where scheduling at start > makes more sense and as we see Airflow is used now in multiple cases where > there is no need to process data from an interval and wait until that data > is ready. > But indeed some more tests would be great - especially for edge cases. > Changig mid-air is one but I think there should be test about Daylight > Saving Time changing. > There are some tests for DST so they just need to be extended to cover > those two different cases. > > > J. > > On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > Happy for this feature to merged > > > > On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > This has come up a few times before, someone has now opened a PR that > > > makes this a global+per-dag setting: > > > https://github.com/apache/airflow/pull/5787 and it also includes docs > > > that I think does a good job of illustrating the two modes. > > > > > > Does anyone object to this being merged? If no one says anything by > > midday > > > on Tuesday I will take that as assent and will merge it. > > > > > > The docs from the PR included below. > > > > > > Thanks, > > > Ash > > > > > > Scheduled Time vs Execution Time > > > '''''''''''''''''''''''''''''''' > > > > > > A DAG with a ``schedule_interval`` will execute once per interval. By > > > default, the execution of a DAG will occur at the **end** of the > > > schedule interval. > > > > > > A few examples: > > > > > > - A DAG with ``schedule_interval='@hourly'``: The DAG run that > processes > > > 2019-08-16 17:00 will start running just after 2019-08-16 17:59:59, > > > i.e. once that hour is over. > > > - A DAG with ``schedule_interval='@daily'``: The DAG run that processes > > > 2019-08-16 will start running shortly after 2019-08-17 00:00. > > > > > > The reasoning behind this execution vs scheduling behaviour is that > > > data for the interval to be processed won't be fully available until > > > the interval has elapsed. > > > > > > In cases where you wish the DAG to be executed at the **start** of the > > > interval, specify ``schedule_at_interval_end=False``, either in > > > ``airflow.cfg``, or on a per-DAG basis. > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >