As an Airflow user I would just like to add I strongly agree with this. The current behavior is intuitive but documented incorrectly.
IMO cron notation is expected to work like cron on modern Linux/Unix, i.e. whatever the system or user timezone is it follows the DST rules. And for jobs you wish to run every 24 hours that starts in a specific time or timezone then timedelta(days=1) makes sense. Regards Damian -----Original Message----- From: Bolke de Bruin [mailto:bdbr...@gmail.com] Sent: Tuesday, May 14, 2019 2:13 AM To: dev@airflow.apache.org Subject: Re: Cron schedule with DST-aware timezone The idea is obviously that with a Timedelta you want to say "add 24h". With DST this shifts the actual point in local time (in local time 17.00 can become 16.00/18.00). Cron schedules specify points in local time so 17.00h always stays 17.00h. There is nothing to make configurable here. Otherwise condition #1 wouldn't be true anymore and you would be adding either "23h" or "25h". As a side note: we need to upgrade pendulum, python >3.6 introduced changed behavior. B. Sent from my iPhone > On 14 May 2019, at 06:57, David Klosowski <dav...@thinknear.com> wrote: > > The distinction could come from being non-UTC VS UTC. You can schedule a > DAG at any time UTC and it would follow the 24-hour intervals but if it is > non-UTC it would follow DST changes when relevant (by timezone). You could > technically make this behavior configurable and follow the former path as > well (not sure the use cases). > > On Mon, May 13, 2019 at 9:17 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >> Just to add to that: there are already tests that tests this behaviour: >> >> https://github.com/PolideaInternal/airflow/blob/master/tests/models/test_dag.py#L749 >> and >> indeed - the schedule follows DST changes rather than discards the DST >> time. >> >> I think it is generally a good idea to follow DST (for clarity), but I >> might miss some context/cases. Maybe it is indeed better to have always 24 >> hour intervals for daily schedules for example rather than 23/25 >> sometimes). >> >> BTW. I fixed one of those tests that was failing recently on Python 3.6 CI. >> Python 3.6 behaved a bit differently than 3.5 at DST change: >> https://issues.apache.org/jira/browse/AIRFLOW-4308 >> >> J. >> >> >> >> On Mon, May 13, 2019 at 11:39 PM Maxime Beauchemin < >> maximebeauche...@gmail.com> wrote: >> >>> It would be great if people can provide failing unit tests as PR with >> clear >>> expectations stated out as code. It makes it easier for people to get >>> consensus on expectations and for anyone to jump in and implement a fix. >>> >>> Max >>> >>> On Mon, May 13, 2019 at 12:48 PM David Klosowski <dav...@thinknear.com> >>> wrote: >>> >>>> Damian is correct. We've observed that exact behavior and noticed the >>>> timedelta logic is dubiously broken for DST but works for CRON. >>>> >>>> On Mon, May 13, 2019 at 12:38 PM Shaw, Damian P. < >>>> damian.sha...@credit-suisse.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I’m not part of the Airflow team but I came to the same conclusion, >>> that >>>>> the behavior is the opposite of what the documentation specifies. >> When >>>>> using the cron notation then DST is properly honored, when using >>>> timedelta >>>>> it is not. I played around with the DAG.following_schedule method to >>>>> satisfy this was the case. >>>>> >>>>> I’ve had production instance of airflow based on this that >> successfully >>>>> respected the March/April DST changes in many timezones. >>>>> >>>>> Regards >>>>> Damian >>>>> >>>>> From: Jiahao Chen [mailto:jhc...@google.com.INVALID] >>>>> Sent: Monday, May 13, 2019 2:08 PM >>>>> To: dev@airflow.apache.org >>>>> Subject: Cron schedule with DST-aware timezone >>>>> >>>>> Hi team, >>>>> >>>>> I have a question about the expected behavior of Airflow scheduler >> when >>>>> the schedule_interval is a cron expression and the start_date is in a >>>>> timezone with DST. >>>>> >>>>> Based on the Airflow documentation >>>>> https://airflow.apache.org/timezone.html#cron-schedules, the DST >>> change >>>>> will be ignored if schedule_interval is a cron expresion (e.g. '0 17 >> * >>> * >>>>> *'). And it gives an example that the GMT offset will not change >>>> regardless >>>>> how DST changes. If I'm understanding it correctly, that means if I >>>> upload >>>>> a DAG with a schedule_interval of "0 17 * * *" and a start_date of >>>>> 2019-03-15 17:00 PST(GMT-8) which is before the DST change on March >> 10, >>>> the >>>>> Airflow scheduler will always start the DAG on 5 pm everyday GMT-8 >> even >>>>> after the DST change on March 10. >>>>> >>>>> However, that is not the behavior I've seen with my experimental code >>>> (see >>>>> attachments). It looks like the the Airflow is actually taking the >> DST >>>> into >>>>> account, since the execution time is always 17:00 locally, which is 1 >>>> hour >>>>> off on the GMT after the DST change. >>>>> >>>>> Could you please confirm the behavior of Airflow scheduler in this >> use >>>>> case? >>>>> >>>>> Thank you! >>>>> Jiahao >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> =============================================================================== >>>>> >>>>> Please access the attached hyperlink for an important electronic >>>>> communications disclaimer: >>>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html >>>>> >>>> >>> >> =============================================================================== >>>>> >>>>> >>>> >>> >> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> E: jarek.pot...@polidea.com >> =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================