As an Airflow user I would just like to add I strongly agree with this. The 
current behavior is intuitive but documented incorrectly.

IMO cron notation is expected to work like cron on modern Linux/Unix, i.e. 
whatever the system or user timezone is it follows the DST rules. And for jobs 
you wish to run every 24 hours that starts in a specific time or timezone then 
timedelta(days=1) makes sense.

Regards
Damian



-----Original Message-----
From: Bolke de Bruin [mailto:bdbr...@gmail.com] 
Sent: Tuesday, May 14, 2019 2:13 AM
To: dev@airflow.apache.org
Subject: Re: Cron schedule with DST-aware timezone

The idea is obviously that with a Timedelta you want to say "add 24h". With DST 
this shifts the actual point in local time (in local time 17.00 can become 
16.00/18.00). 

Cron schedules specify points in local time so 17.00h always stays 17.00h.

There is nothing to make configurable here. Otherwise condition #1 wouldn't be 
true anymore and you would be adding either "23h" or "25h". 

As a side note: we need to upgrade pendulum, python >3.6 introduced changed 
behavior.

B.

Sent from my iPhone

> On 14 May 2019, at 06:57, David Klosowski <dav...@thinknear.com> wrote:
> 
> The distinction could come from being non-UTC VS UTC.  You can schedule a
> DAG at any time UTC and it would follow the 24-hour intervals but if it is
> non-UTC it would follow DST changes when relevant (by timezone).  You could
> technically make this behavior configurable and follow the former path as
> well (not sure the use cases).
> 
> On Mon, May 13, 2019 at 9:17 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> 
>> Just to add to that: there are already tests that tests this behaviour:
>> 
>> https://github.com/PolideaInternal/airflow/blob/master/tests/models/test_dag.py#L749
>> and
>> indeed - the schedule follows DST changes rather than discards the DST
>> time.
>> 
>> I think it is generally a good idea to follow DST (for clarity), but I
>> might miss some context/cases. Maybe it is indeed better to have always 24
>> hour intervals for daily schedules for example rather than 23/25
>> sometimes).
>> 
>> BTW. I fixed one of those tests that was failing recently on Python 3.6 CI.
>> Python 3.6 behaved a bit differently than 3.5 at DST change:
>> https://issues.apache.org/jira/browse/AIRFLOW-4308
>> 
>> J.
>> 
>> 
>> 
>> On Mon, May 13, 2019 at 11:39 PM Maxime Beauchemin <
>> maximebeauche...@gmail.com> wrote:
>> 
>>> It would be great if people can provide failing unit tests as PR with
>> clear
>>> expectations stated out as code. It makes it easier for people to get
>>> consensus on expectations and for anyone to jump in and implement a fix.
>>> 
>>> Max
>>> 
>>> On Mon, May 13, 2019 at 12:48 PM David Klosowski <dav...@thinknear.com>
>>> wrote:
>>> 
>>>> Damian is correct.  We've observed that exact behavior and noticed the
>>>> timedelta logic is dubiously broken for DST but works for CRON.
>>>> 
>>>> On Mon, May 13, 2019 at 12:38 PM Shaw, Damian P. <
>>>> damian.sha...@credit-suisse.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I’m not part of the Airflow team but I came to the same conclusion,
>>> that
>>>>> the behavior is the opposite of what the documentation specifies.
>> When
>>>>> using the cron notation then DST is properly honored, when using
>>>> timedelta
>>>>> it is not. I played around with the DAG.following_schedule method to
>>>>> satisfy this was the case.
>>>>> 
>>>>> I’ve had production instance of airflow based on this that
>> successfully
>>>>> respected the March/April DST changes in many timezones.
>>>>> 
>>>>> Regards
>>>>> Damian
>>>>> 
>>>>> From: Jiahao Chen [mailto:jhc...@google.com.INVALID]
>>>>> Sent: Monday, May 13, 2019 2:08 PM
>>>>> To: dev@airflow.apache.org
>>>>> Subject: Cron schedule with DST-aware timezone
>>>>> 
>>>>> Hi team,
>>>>> 
>>>>> I have a question about the expected behavior of Airflow scheduler
>> when
>>>>> the schedule_interval is a cron expression and the start_date is in a
>>>>> timezone with DST.
>>>>> 
>>>>> Based on the Airflow documentation
>>>>> https://airflow.apache.org/timezone.html#cron-schedules, the DST
>>> change
>>>>> will be ignored if schedule_interval is a cron expresion (e.g. '0 17
>> *
>>> *
>>>>> *'). And it gives an example that the GMT offset will not change
>>>> regardless
>>>>> how DST changes. If I'm understanding it correctly, that means if I
>>>> upload
>>>>> a DAG with a schedule_interval of "0 17 * * *" and a start_date of
>>>>> 2019-03-15 17:00 PST(GMT-8) which is before the DST change on March
>> 10,
>>>> the
>>>>> Airflow scheduler will always start the DAG on 5 pm everyday GMT-8
>> even
>>>>> after the DST change on March 10.
>>>>> 
>>>>> However, that is not the behavior I've seen with my experimental code
>>>> (see
>>>>> attachments). It looks like the the Airflow is actually taking the
>> DST
>>>> into
>>>>> account, since the execution time is always 17:00 locally, which is 1
>>>> hour
>>>>> off on the GMT after the DST change.
>>>>> 
>>>>> Could you please confirm the behavior of Airflow scheduler in this
>> use
>>>>> case?
>>>>> 
>>>>> Thank you!
>>>>> Jiahao
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> ===============================================================================
>>>>> 
>>>>> Please access the attached hyperlink for an important electronic
>>>>> communications disclaimer:
>>>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>>> 
>>>> 
>>> 
>> ===============================================================================
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> E: jarek.pot...@polidea.com
>> 


=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Reply via email to