What are people's feelings on changing the default execution to schedule
interval start and communicating this to existing users in the Updating
notes so that they can preserve the old behavior? Could potentially cause
headaches for users who don't read the notes but I think it might make
sense to bite the bullet at some point for more intuitive behavior overall
for new users.

On Fri, Aug 23, 2019 at 10:29 AM Dan Davydov <ddavy...@twitter.com> wrote:

> I am for this change, since I feel like in general the start of the
> interval is more intuitive (I have been working on Airflow for 3 years and
> this still trips me up). That being said I'm not sure how I feel about
> allowing customization at DAG level instead of cluster level as it makes it
> harder to make assumptions about DAGs on the cluster for ops, though maybe
> this isn't a huge deal given there are tools available that show you why
> tasks aren't running.
>
> I agree with Bole that we should communicate recommended migration
> strategies if they can't be done automatically.
>
> I don't think I'm a fan for arbitrary customization of the interval via a
> callback, my feeling is this would not provide significant value and could
> be an ops nightmare.
>
> On Fri, Aug 23, 2019 at 9:11 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
>> DST: I recall problems with DST especially when the hour goes back and the
>> daily schedule time technically occurs twice the same day or does not
>> occur
>> at all. We have some code that chooses arbitrary the first occurence in
>> the
>> latter case (there was a problem that it worked differently python 3.6 vs
>> 3.5 (!). But also the case when we move forward is an interesting one. I
>> am
>> not 100% it will work correctly after changing the scheduling mechanisms
>> but it's rather easy to test and there is no harm adding it.
>> There is a DST-specific logic implemented in our next/previous run
>> calculation and I imagine it could get wrong.
>>
>> The tests I am talking about:
>>
>> DagTest.test_following_previous_schedule_daily_dag_CEST_to_CET/DagTest.test_following_previous_schedule_daily_dag_CET_to_CEST.
>>
>> Re: arbitrary customisation/converting DAGs. I think there is no need to
>> convert existing dags - the default behaviour remains as it is as far as I
>> understand. And this flag is much simpler to understand and reason about
>> than arbitrary function and it corresponds to real business cases:
>>
>> 1) schedule_at_interval_end = True -> wait for the data to be ready for
>> the
>> interval (current/default behaviour related to processing batches of data)
>> 2) schedule_at_interval_end = False -> CRON-like behaviour where we simply
>> run arbitrary operation in regular intervals (more intuitive for people
>> who
>> are used to CRON-like jobs)
>>
>> You can always build your schedule differently if you need something
>> "in-between" IMHO.
>>
>> J.
>>
>>
>>
>>
>> On Fri, Aug 23, 2019 at 8:44 AM James Meickle
>> <jmeic...@quantopian.com.invalid> wrote:
>>
>> > This is a change to one of Airflow's core concepts, and it would
>> require a
>> > lot of work for existing DAGs to cut over to it. Given that, my personal
>> > preference would be to allow arbitrary customization rather than just a
>> bit
>> > toggle. Such as allowing passing in a mapping function: given an
>> interval's
>> > start date and end date, when should it be executed?
>> >
>> > On Fri, Aug 23, 2019 at 8:24 AM Jarek Potiuk <jarek.pot...@polidea.com>
>> > wrote:
>> >
>> > > Happy for it as well. There are a number of cases where scheduling at
>> > start
>> > > makes more sense and as we see Airflow is used now in multiple cases
>> > where
>> > > there is no need to process data from an interval and wait until that
>> > data
>> > > is ready.
>> > > But indeed some more tests would be great - especially for edge cases.
>> > > Changig mid-air is one but I think there should be test about Daylight
>> > > Saving Time changing.
>> > > There are some tests for DST so they just need to be extended to cover
>> > > those two different cases.
>> > >
>> > >
>> > > J.
>> > >
>> > > On Fri, Aug 23, 2019 at 7:37 AM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>> > >
>> > > > Happy for this feature to merged
>> > > >
>> > > > On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor <a...@apache.org>
>> wrote:
>> > > >
>> > > > > This has come up a few times before, someone has now opened a PR
>> that
>> > > > > makes this a global+per-dag setting:
>> > > > > https://github.com/apache/airflow/pull/5787 and it also includes
>> > docs
>> > > > > that I think does a good job of illustrating the two modes.
>> > > > >
>> > > > > Does anyone object to this being merged? If no one says anything
>> by
>> > > > midday
>> > > > > on Tuesday I will take that as assent and will merge it.
>> > > > >
>> > > > > The docs from the PR included below.
>> > > > >
>> > > > > Thanks,
>> > > > > Ash
>> > > > >
>> > > > > Scheduled Time vs Execution Time
>> > > > > ''''''''''''''''''''''''''''''''
>> > > > >
>> > > > > A DAG with a ``schedule_interval`` will execute once per
>> interval. By
>> > > > > default, the execution of a DAG will occur at the **end** of the
>> > > > > schedule interval.
>> > > > >
>> > > > > A few examples:
>> > > > >
>> > > > > - A DAG with ``schedule_interval='@hourly'``: The DAG run that
>> > > processes
>> > > > > 2019-08-16 17:00 will start running just after 2019-08-16
>> 17:59:59,
>> > > > > i.e. once that hour is over.
>> > > > > - A DAG with ``schedule_interval='@daily'``: The DAG run that
>> > processes
>> > > > > 2019-08-16 will start running shortly after 2019-08-17 00:00.
>> > > > >
>> > > > > The reasoning behind this execution vs scheduling behaviour is
>> that
>> > > > > data for the interval to be processed won't be fully available
>> until
>> > > > > the interval has elapsed.
>> > > > >
>> > > > > In cases where you wish the DAG to be executed at the **start** of
>> > the
>> > > > > interval, specify ``schedule_at_interval_end=False``, either in
>> > > > > ``airflow.cfg``, or on a per-DAG basis.
>> > > >
>> > >
>> > >
>> > > --
>> > >
>> > > Jarek Potiuk
>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >
>> > > M: +48 660 796 129 <+48660796129>
>> > > [image: Polidea] <https://www.polidea.com/>
>> > >
>> >
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>

Reply via email to