I'm wondering if there is some way we can do this so that new installs will 
pick up the new default, but anyone that carries over an Airflow.cfg from an 
old install will keep their existing behaviour.

And then also is that a good/helpful idea or will that be more confusing than 
not?

-a

> On 26 Sep 2019, at 11:40, Kaxil Naik <kaxiln...@gmail.com> wrote:
> 
> I definitely agree. If we don't update it in 2.0 it is going to be hard to
> change that in any 2.x versions
> 
> On Thu, Sep 26, 2019 at 10:51 AM James Meickle
> <jmeic...@quantopian.com.invalid> wrote:
> 
>> I am *strongly* in favor of using the 2.0 update to break compat here,
>> because this is a very confusing feature to most new users of Airflow, but
>> also will break a _lot_ of DAGs. I feel like if we don't change this in 2.0
>> we probably won't for any 2.x either, which would be a shame.
>> 
>> On Wed, Sep 25, 2019 at 8:33 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>> 
>>> I agree with Dan to change the default execution at start of the
>> interval.
>>> 
>>> How about adding this for 2.0 ??
>>> 
>>> Don't want to keep delaying this if we have a consensus already.
>>> 
>>> Regards,
>>> Kaxil
>>> 
>>> 
>>> On Fri, Aug 23, 2019, 15:39 Dan Davydov <ddavy...@twitter.com.invalid>
>>> wrote:
>>> 
>>>> What are people's feelings on changing the default execution to
>> schedule
>>>> interval start and communicating this to existing users in the Updating
>>>> notes so that they can preserve the old behavior? Could potentially
>> cause
>>>> headaches for users who don't read the notes but I think it might make
>>>> sense to bite the bullet at some point for more intuitive behavior
>>> overall
>>>> for new users.
>>>> 
>>>> On Fri, Aug 23, 2019 at 10:29 AM Dan Davydov <ddavy...@twitter.com>
>>> wrote:
>>>> 
>>>>> I am for this change, since I feel like in general the start of the
>>>>> interval is more intuitive (I have been working on Airflow for 3
>> years
>>>> and
>>>>> this still trips me up). That being said I'm not sure how I feel
>> about
>>>>> allowing customization at DAG level instead of cluster level as it
>>> makes
>>>> it
>>>>> harder to make assumptions about DAGs on the cluster for ops, though
>>>> maybe
>>>>> this isn't a huge deal given there are tools available that show you
>>> why
>>>>> tasks aren't running.
>>>>> 
>>>>> I agree with Bole that we should communicate recommended migration
>>>>> strategies if they can't be done automatically.
>>>>> 
>>>>> I don't think I'm a fan for arbitrary customization of the interval
>>> via a
>>>>> callback, my feeling is this would not provide significant value and
>>>> could
>>>>> be an ops nightmare.
>>>>> 
>>>>> On Fri, Aug 23, 2019 at 9:11 AM Jarek Potiuk <
>> jarek.pot...@polidea.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> DST: I recall problems with DST especially when the hour goes back
>> and
>>>> the
>>>>>> daily schedule time technically occurs twice the same day or does
>> not
>>>>>> occur
>>>>>> at all. We have some code that chooses arbitrary the first occurence
>>> in
>>>>>> the
>>>>>> latter case (there was a problem that it worked differently python
>> 3.6
>>>> vs
>>>>>> 3.5 (!). But also the case when we move forward is an interesting
>>> one. I
>>>>>> am
>>>>>> not 100% it will work correctly after changing the scheduling
>>> mechanisms
>>>>>> but it's rather easy to test and there is no harm adding it.
>>>>>> There is a DST-specific logic implemented in our next/previous run
>>>>>> calculation and I imagine it could get wrong.
>>>>>> 
>>>>>> The tests I am talking about:
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> DagTest.test_following_previous_schedule_daily_dag_CEST_to_CET/DagTest.test_following_previous_schedule_daily_dag_CET_to_CEST.
>>>>>> 
>>>>>> Re: arbitrary customisation/converting DAGs. I think there is no
>> need
>>> to
>>>>>> convert existing dags - the default behaviour remains as it is as
>> far
>>>> as I
>>>>>> understand. And this flag is much simpler to understand and reason
>>> about
>>>>>> than arbitrary function and it corresponds to real business cases:
>>>>>> 
>>>>>> 1) schedule_at_interval_end = True -> wait for the data to be ready
>>> for
>>>>>> the
>>>>>> interval (current/default behaviour related to processing batches of
>>>> data)
>>>>>> 2) schedule_at_interval_end = False -> CRON-like behaviour where we
>>>> simply
>>>>>> run arbitrary operation in regular intervals (more intuitive for
>>> people
>>>>>> who
>>>>>> are used to CRON-like jobs)
>>>>>> 
>>>>>> You can always build your schedule differently if you need something
>>>>>> "in-between" IMHO.
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, Aug 23, 2019 at 8:44 AM James Meickle
>>>>>> <jmeic...@quantopian.com.invalid> wrote:
>>>>>> 
>>>>>>> This is a change to one of Airflow's core concepts, and it would
>>>>>> require a
>>>>>>> lot of work for existing DAGs to cut over to it. Given that, my
>>>> personal
>>>>>>> preference would be to allow arbitrary customization rather than
>>> just
>>>> a
>>>>>> bit
>>>>>>> toggle. Such as allowing passing in a mapping function: given an
>>>>>> interval's
>>>>>>> start date and end date, when should it be executed?
>>>>>>> 
>>>>>>> On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk <
>>>> jarek.pot...@polidea.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Happy for it as well. There are a number of cases where
>> scheduling
>>>> at
>>>>>>> start
>>>>>>>> makes more sense and as we see Airflow is used now in multiple
>>> cases
>>>>>>> where
>>>>>>>> there is no need to process data from an interval and wait until
>>>> that
>>>>>>> data
>>>>>>>> is ready.
>>>>>>>> But indeed some more tests would be great - especially for edge
>>>> cases.
>>>>>>>> Changig mid-air is one but I think there should be test about
>>>> Daylight
>>>>>>>> Saving Time changing.
>>>>>>>> There are some tests for DST so they just need to be extended to
>>>> cover
>>>>>>>> those two different cases.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>>> On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com
>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Happy for this feature to merged
>>>>>>>>> 
>>>>>>>>> On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <a...@apache.org
>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> This has come up a few times before, someone has now opened
>> a
>>> PR
>>>>>> that
>>>>>>>>>> makes this a global+per-dag setting:
>>>>>>>>>> https://github.com/apache/airflow/pull/5787 and it also
>>>> includes
>>>>>>> docs
>>>>>>>>>> that I think does a good job of illustrating the two modes.
>>>>>>>>>> 
>>>>>>>>>> Does anyone object to this being merged? If no one says
>>> anything
>>>>>> by
>>>>>>>>> midday
>>>>>>>>>> on Tuesday I will take that as assent and will merge it.
>>>>>>>>>> 
>>>>>>>>>> The docs from the PR included below.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Ash
>>>>>>>>>> 
>>>>>>>>>> Scheduled Time vs Execution Time
>>>>>>>>>> ''''''''''''''''''''''''''''''''
>>>>>>>>>> 
>>>>>>>>>> A DAG with a ``schedule_interval`` will execute once per
>>>>>> interval. By
>>>>>>>>>> default, the execution of a DAG will occur at the **end** of
>>> the
>>>>>>>>>> schedule interval.
>>>>>>>>>> 
>>>>>>>>>> A few examples:
>>>>>>>>>> 
>>>>>>>>>> - A DAG with ``schedule_interval='@hourly'``: The DAG run
>> that
>>>>>>>> processes
>>>>>>>>>> 2019-08-16 17:00 will start running just after 2019-08-16
>>>>>> 17:59:59,
>>>>>>>>>> i.e. once that hour is over.
>>>>>>>>>> - A DAG with ``schedule_interval='@daily'``: The DAG run
>> that
>>>>>>> processes
>>>>>>>>>> 2019-08-16 will start running shortly after 2019-08-17
>> 00:00.
>>>>>>>>>> 
>>>>>>>>>> The reasoning behind this execution vs scheduling behaviour
>> is
>>>>>> that
>>>>>>>>>> data for the interval to be processed won't be fully
>> available
>>>>>> until
>>>>>>>>>> the interval has elapsed.
>>>>>>>>>> 
>>>>>>>>>> In cases where you wish the DAG to be executed at the
>>> **start**
>>>> of
>>>>>>> the
>>>>>>>>>> interval, specify ``schedule_at_interval_end=False``, either
>>> in
>>>>>>>>>> ``airflow.cfg``, or on a per-DAG basis.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Jarek Potiuk
>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>>>>>>>> 
>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>> 
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to