I think that will lead to a very large number of questions about why it worked before and now it doesn’t when doing a clean install. And additionally, if developing in a new install and deploying to an old install, you would get different behavior. Adding to more confusion.
James Coder > On Sep 26, 2019, at 6:41 AM, Ash Berlin-Taylor <a...@apache.org> wrote: > > I'm wondering if there is some way we can do this so that new installs will > pick up the new default, but anyone that carries over an Airflow.cfg from an > old install will keep their existing behaviour. > > And then also is that a good/helpful idea or will that be more confusing than > not? > > -a > >> On 26 Sep 2019, at 11:40, Kaxil Naik <kaxiln...@gmail.com> wrote: >> >> I definitely agree. If we don't update it in 2.0 it is going to be hard to >> change that in any 2.x versions >> >> On Thu, Sep 26, 2019 at 10:51 AM James Meickle >> <jmeic...@quantopian.com.invalid> wrote: >> >>> I am *strongly* in favor of using the 2.0 update to break compat here, >>> because this is a very confusing feature to most new users of Airflow, but >>> also will break a _lot_ of DAGs. I feel like if we don't change this in 2.0 >>> we probably won't for any 2.x either, which would be a shame. >>> >>>> On Wed, Sep 25, 2019 at 8:33 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >>>> >>>> I agree with Dan to change the default execution at start of the >>> interval. >>>> >>>> How about adding this for 2.0 ?? >>>> >>>> Don't want to keep delaying this if we have a consensus already. >>>> >>>> Regards, >>>> Kaxil >>>> >>>> >>>> On Fri, Aug 23, 2019, 15:39 Dan Davydov <ddavy...@twitter.com.invalid> >>>> wrote: >>>> >>>>> What are people's feelings on changing the default execution to >>> schedule >>>>> interval start and communicating this to existing users in the Updating >>>>> notes so that they can preserve the old behavior? Could potentially >>> cause >>>>> headaches for users who don't read the notes but I think it might make >>>>> sense to bite the bullet at some point for more intuitive behavior >>>> overall >>>>> for new users. >>>>> >>>>> On Fri, Aug 23, 2019 at 10:29 AM Dan Davydov <ddavy...@twitter.com> >>>> wrote: >>>>> >>>>>> I am for this change, since I feel like in general the start of the >>>>>> interval is more intuitive (I have been working on Airflow for 3 >>> years >>>>> and >>>>>> this still trips me up). That being said I'm not sure how I feel >>> about >>>>>> allowing customization at DAG level instead of cluster level as it >>>> makes >>>>> it >>>>>> harder to make assumptions about DAGs on the cluster for ops, though >>>>> maybe >>>>>> this isn't a huge deal given there are tools available that show you >>>> why >>>>>> tasks aren't running. >>>>>> >>>>>> I agree with Bole that we should communicate recommended migration >>>>>> strategies if they can't be done automatically. >>>>>> >>>>>> I don't think I'm a fan for arbitrary customization of the interval >>>> via a >>>>>> callback, my feeling is this would not provide significant value and >>>>> could >>>>>> be an ops nightmare. >>>>>> >>>>>> On Fri, Aug 23, 2019 at 9:11 AM Jarek Potiuk < >>> jarek.pot...@polidea.com >>>>> >>>>>> wrote: >>>>>> >>>>>>> DST: I recall problems with DST especially when the hour goes back >>> and >>>>> the >>>>>>> daily schedule time technically occurs twice the same day or does >>> not >>>>>>> occur >>>>>>> at all. We have some code that chooses arbitrary the first occurence >>>> in >>>>>>> the >>>>>>> latter case (there was a problem that it worked differently python >>> 3.6 >>>>> vs >>>>>>> 3.5 (!). But also the case when we move forward is an interesting >>>> one. I >>>>>>> am >>>>>>> not 100% it will work correctly after changing the scheduling >>>> mechanisms >>>>>>> but it's rather easy to test and there is no harm adding it. >>>>>>> There is a DST-specific logic implemented in our next/previous run >>>>>>> calculation and I imagine it could get wrong. >>>>>>> >>>>>>> The tests I am talking about: >>>>>>> >>>>>>> >>>>> >>>> >>> DagTest.test_following_previous_schedule_daily_dag_CEST_to_CET/DagTest.test_following_previous_schedule_daily_dag_CET_to_CEST. >>>>>>> >>>>>>> Re: arbitrary customisation/converting DAGs. I think there is no >>> need >>>> to >>>>>>> convert existing dags - the default behaviour remains as it is as >>> far >>>>> as I >>>>>>> understand. And this flag is much simpler to understand and reason >>>> about >>>>>>> than arbitrary function and it corresponds to real business cases: >>>>>>> >>>>>>> 1) schedule_at_interval_end = True -> wait for the data to be ready >>>> for >>>>>>> the >>>>>>> interval (current/default behaviour related to processing batches of >>>>> data) >>>>>>> 2) schedule_at_interval_end = False -> CRON-like behaviour where we >>>>> simply >>>>>>> run arbitrary operation in regular intervals (more intuitive for >>>> people >>>>>>> who >>>>>>> are used to CRON-like jobs) >>>>>>> >>>>>>> You can always build your schedule differently if you need something >>>>>>> "in-between" IMHO. >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 23, 2019 at 8:44 AM James Meickle >>>>>>> <jmeic...@quantopian.com.invalid> wrote: >>>>>>> >>>>>>>> This is a change to one of Airflow's core concepts, and it would >>>>>>> require a >>>>>>>> lot of work for existing DAGs to cut over to it. Given that, my >>>>> personal >>>>>>>> preference would be to allow arbitrary customization rather than >>>> just >>>>> a >>>>>>> bit >>>>>>>> toggle. Such as allowing passing in a mapping function: given an >>>>>>> interval's >>>>>>>> start date and end date, when should it be executed? >>>>>>>> >>>>>>>> On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk < >>>>> jarek.pot...@polidea.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Happy for it as well. There are a number of cases where >>> scheduling >>>>> at >>>>>>>> start >>>>>>>>> makes more sense and as we see Airflow is used now in multiple >>>> cases >>>>>>>> where >>>>>>>>> there is no need to process data from an interval and wait until >>>>> that >>>>>>>> data >>>>>>>>> is ready. >>>>>>>>> But indeed some more tests would be great - especially for edge >>>>> cases. >>>>>>>>> Changig mid-air is one but I think there should be test about >>>>> Daylight >>>>>>>>> Saving Time changing. >>>>>>>>> There are some tests for DST so they just need to be extended to >>>>> cover >>>>>>>>> those two different cases. >>>>>>>>> >>>>>>>>> >>>>>>>>> J. >>>>>>>>> >>>>>>>>> On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com >>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Happy for this feature to merged >>>>>>>>>> >>>>>>>>>> On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <a...@apache.org >>>> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> This has come up a few times before, someone has now opened >>> a >>>> PR >>>>>>> that >>>>>>>>>>> makes this a global+per-dag setting: >>>>>>>>>>> https://github.com/apache/airflow/pull/5787 and it also >>>>> includes >>>>>>>> docs >>>>>>>>>>> that I think does a good job of illustrating the two modes. >>>>>>>>>>> >>>>>>>>>>> Does anyone object to this being merged? If no one says >>>> anything >>>>>>> by >>>>>>>>>> midday >>>>>>>>>>> on Tuesday I will take that as assent and will merge it. >>>>>>>>>>> >>>>>>>>>>> The docs from the PR included below. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Ash >>>>>>>>>>> >>>>>>>>>>> Scheduled Time vs Execution Time >>>>>>>>>>> '''''''''''''''''''''''''''''''' >>>>>>>>>>> >>>>>>>>>>> A DAG with a ``schedule_interval`` will execute once per >>>>>>> interval. By >>>>>>>>>>> default, the execution of a DAG will occur at the **end** of >>>> the >>>>>>>>>>> schedule interval. >>>>>>>>>>> >>>>>>>>>>> A few examples: >>>>>>>>>>> >>>>>>>>>>> - A DAG with ``schedule_interval='@hourly'``: The DAG run >>> that >>>>>>>>> processes >>>>>>>>>>> 2019-08-16 17:00 will start running just after 2019-08-16 >>>>>>> 17:59:59, >>>>>>>>>>> i.e. once that hour is over. >>>>>>>>>>> - A DAG with ``schedule_interval='@daily'``: The DAG run >>> that >>>>>>>> processes >>>>>>>>>>> 2019-08-16 will start running shortly after 2019-08-17 >>> 00:00. >>>>>>>>>>> >>>>>>>>>>> The reasoning behind this execution vs scheduling behaviour >>> is >>>>>>> that >>>>>>>>>>> data for the interval to be processed won't be fully >>> available >>>>>>> until >>>>>>>>>>> the interval has elapsed. >>>>>>>>>>> >>>>>>>>>>> In cases where you wish the DAG to be executed at the >>>> **start** >>>>> of >>>>>>>> the >>>>>>>>>>> interval, specify ``schedule_at_interval_end=False``, either >>>> in >>>>>>>>>>> ``airflow.cfg``, or on a per-DAG basis. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Jarek Potiuk >>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software >>> Engineer >>>>>>>>> >>>>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Jarek Potiuk >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>>>> >>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>> >>>>>> >>>>> >>>> >>> >