I definitely agree. If we don't update it in 2.0 it is going to be hard to change that in any 2.x versions
On Thu, Sep 26, 2019 at 10:51 AM James Meickle <jmeic...@quantopian.com.invalid> wrote: > I am *strongly* in favor of using the 2.0 update to break compat here, > because this is a very confusing feature to most new users of Airflow, but > also will break a _lot_ of DAGs. I feel like if we don't change this in 2.0 > we probably won't for any 2.x either, which would be a shame. > > On Wed, Sep 25, 2019 at 8:33 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > I agree with Dan to change the default execution at start of the > interval. > > > > How about adding this for 2.0 ?? > > > > Don't want to keep delaying this if we have a consensus already. > > > > Regards, > > Kaxil > > > > > > On Fri, Aug 23, 2019, 15:39 Dan Davydov <ddavy...@twitter.com.invalid> > > wrote: > > > > > What are people's feelings on changing the default execution to > schedule > > > interval start and communicating this to existing users in the Updating > > > notes so that they can preserve the old behavior? Could potentially > cause > > > headaches for users who don't read the notes but I think it might make > > > sense to bite the bullet at some point for more intuitive behavior > > overall > > > for new users. > > > > > > On Fri, Aug 23, 2019 at 10:29 AM Dan Davydov <ddavy...@twitter.com> > > wrote: > > > > > > > I am for this change, since I feel like in general the start of the > > > > interval is more intuitive (I have been working on Airflow for 3 > years > > > and > > > > this still trips me up). That being said I'm not sure how I feel > about > > > > allowing customization at DAG level instead of cluster level as it > > makes > > > it > > > > harder to make assumptions about DAGs on the cluster for ops, though > > > maybe > > > > this isn't a huge deal given there are tools available that show you > > why > > > > tasks aren't running. > > > > > > > > I agree with Bole that we should communicate recommended migration > > > > strategies if they can't be done automatically. > > > > > > > > I don't think I'm a fan for arbitrary customization of the interval > > via a > > > > callback, my feeling is this would not provide significant value and > > > could > > > > be an ops nightmare. > > > > > > > > On Fri, Aug 23, 2019 at 9:11 AM Jarek Potiuk < > jarek.pot...@polidea.com > > > > > > > wrote: > > > > > > > >> DST: I recall problems with DST especially when the hour goes back > and > > > the > > > >> daily schedule time technically occurs twice the same day or does > not > > > >> occur > > > >> at all. We have some code that chooses arbitrary the first occurence > > in > > > >> the > > > >> latter case (there was a problem that it worked differently python > 3.6 > > > vs > > > >> 3.5 (!). But also the case when we move forward is an interesting > > one. I > > > >> am > > > >> not 100% it will work correctly after changing the scheduling > > mechanisms > > > >> but it's rather easy to test and there is no harm adding it. > > > >> There is a DST-specific logic implemented in our next/previous run > > > >> calculation and I imagine it could get wrong. > > > >> > > > >> The tests I am talking about: > > > >> > > > >> > > > > > > DagTest.test_following_previous_schedule_daily_dag_CEST_to_CET/DagTest.test_following_previous_schedule_daily_dag_CET_to_CEST. > > > >> > > > >> Re: arbitrary customisation/converting DAGs. I think there is no > need > > to > > > >> convert existing dags - the default behaviour remains as it is as > far > > > as I > > > >> understand. And this flag is much simpler to understand and reason > > about > > > >> than arbitrary function and it corresponds to real business cases: > > > >> > > > >> 1) schedule_at_interval_end = True -> wait for the data to be ready > > for > > > >> the > > > >> interval (current/default behaviour related to processing batches of > > > data) > > > >> 2) schedule_at_interval_end = False -> CRON-like behaviour where we > > > simply > > > >> run arbitrary operation in regular intervals (more intuitive for > > people > > > >> who > > > >> are used to CRON-like jobs) > > > >> > > > >> You can always build your schedule differently if you need something > > > >> "in-between" IMHO. > > > >> > > > >> J. > > > >> > > > >> > > > >> > > > >> > > > >> On Fri, Aug 23, 2019 at 8:44 AM James Meickle > > > >> <jmeic...@quantopian.com.invalid> wrote: > > > >> > > > >> > This is a change to one of Airflow's core concepts, and it would > > > >> require a > > > >> > lot of work for existing DAGs to cut over to it. Given that, my > > > personal > > > >> > preference would be to allow arbitrary customization rather than > > just > > > a > > > >> bit > > > >> > toggle. Such as allowing passing in a mapping function: given an > > > >> interval's > > > >> > start date and end date, when should it be executed? > > > >> > > > > >> > On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk < > > > jarek.pot...@polidea.com> > > > >> > wrote: > > > >> > > > > >> > > Happy for it as well. There are a number of cases where > scheduling > > > at > > > >> > start > > > >> > > makes more sense and as we see Airflow is used now in multiple > > cases > > > >> > where > > > >> > > there is no need to process data from an interval and wait until > > > that > > > >> > data > > > >> > > is ready. > > > >> > > But indeed some more tests would be great - especially for edge > > > cases. > > > >> > > Changig mid-air is one but I think there should be test about > > > Daylight > > > >> > > Saving Time changing. > > > >> > > There are some tests for DST so they just need to be extended to > > > cover > > > >> > > those two different cases. > > > >> > > > > > >> > > > > > >> > > J. > > > >> > > > > > >> > > On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com > > > > > >> wrote: > > > >> > > > > > >> > > > Happy for this feature to merged > > > >> > > > > > > >> > > > On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <a...@apache.org > > > > > >> wrote: > > > >> > > > > > > >> > > > > This has come up a few times before, someone has now opened > a > > PR > > > >> that > > > >> > > > > makes this a global+per-dag setting: > > > >> > > > > https://github.com/apache/airflow/pull/5787 and it also > > > includes > > > >> > docs > > > >> > > > > that I think does a good job of illustrating the two modes. > > > >> > > > > > > > >> > > > > Does anyone object to this being merged? If no one says > > anything > > > >> by > > > >> > > > midday > > > >> > > > > on Tuesday I will take that as assent and will merge it. > > > >> > > > > > > > >> > > > > The docs from the PR included below. > > > >> > > > > > > > >> > > > > Thanks, > > > >> > > > > Ash > > > >> > > > > > > > >> > > > > Scheduled Time vs Execution Time > > > >> > > > > '''''''''''''''''''''''''''''''' > > > >> > > > > > > > >> > > > > A DAG with a ``schedule_interval`` will execute once per > > > >> interval. By > > > >> > > > > default, the execution of a DAG will occur at the **end** of > > the > > > >> > > > > schedule interval. > > > >> > > > > > > > >> > > > > A few examples: > > > >> > > > > > > > >> > > > > - A DAG with ``schedule_interval='@hourly'``: The DAG run > that > > > >> > > processes > > > >> > > > > 2019-08-16 17:00 will start running just after 2019-08-16 > > > >> 17:59:59, > > > >> > > > > i.e. once that hour is over. > > > >> > > > > - A DAG with ``schedule_interval='@daily'``: The DAG run > that > > > >> > processes > > > >> > > > > 2019-08-16 will start running shortly after 2019-08-17 > 00:00. > > > >> > > > > > > > >> > > > > The reasoning behind this execution vs scheduling behaviour > is > > > >> that > > > >> > > > > data for the interval to be processed won't be fully > available > > > >> until > > > >> > > > > the interval has elapsed. > > > >> > > > > > > > >> > > > > In cases where you wish the DAG to be executed at the > > **start** > > > of > > > >> > the > > > >> > > > > interval, specify ``schedule_at_interval_end=False``, either > > in > > > >> > > > > ``airflow.cfg``, or on a per-DAG basis. > > > >> > > > > > > >> > > > > > >> > > > > > >> > > -- > > > >> > > > > > >> > > Jarek Potiuk > > > >> > > Polidea <https://www.polidea.com/> | Principal Software > Engineer > > > >> > > > > > >> > > M: +48 660 796 129 <+48660796129> > > > >> > > [image: Polidea] <https://www.polidea.com/> > > > >> > > > > > >> > > > > >> > > > >> > > > >> -- > > > >> > > > >> Jarek Potiuk > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > > > >> > > > >> M: +48 660 796 129 <+48660796129> > > > >> [image: Polidea] <https://www.polidea.com/> > > > >> > > > > > > > > > >