This all just to say: I think this one specific use case is already achievable.
On 20 January 2021 19:36:54 GMT, Ash Berlin-Taylor <a...@apache.org> wrote: >I'll have to test, bit my gut says that isn't how it works, instead next >schedule = previous schedule + interval. > >On 20 January 2021 19:25:50 GMT, Elad Kalif <elad...@gmail.com> wrote: >>Sorry ignore my explanation. It's wrong. >> >> https://github.com/apache/airflow/issues/8649 explains why >>`schedule_interval=timedelta(days=14), start_date=...`. is not the exactly >>the same as bi-weekly for an exact specified time. >>It also provide an example for daily. >> >>On Wed, Jan 20, 2021 at 9:19 PM Elad Kalif <elad...@gmail.com> wrote: >> >>> > `schedule_interval=timedelta(days=14), start_date=...`. >>> >>> ash from what I've seen It doesn't give the same exact functionality >>> This is what I observed: >>> cron = schedule exactly on regardless of what happened with previous run >>> (mostly) >>> timedelta() = schedule after delta passed. However it's from end_date not >>> from start_date. >>> so schedule_interval=timedelta(days=14) means It will be scheduled 14 days >>> from end_date or the previous run. If we consider for example that the dag >>> takes 2 days to complete then the next one will be scheduled after 14 days >>> from end_date (= 16 days from start_date). >>> >>> This exact case is actually explained really well in the first issue >>> listed https://github.com/apache/airflow/issues/8649 using daily example >>> >>> On Wed, Jan 20, 2021 at 9:17 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >>> >>>> >>>> `schedule_interval=timedelta(days=14), start_date=...`. >>>> >>>> >>>> That won't support every second Thursday for example >>>> >>>> On Wed, Jan 20, 2021 at 6:54 PM Ash Berlin-Taylor <a...@apache.org> wrote: >>>> >>>>> Running exactly every two weeks can be done by setting >>>>> `schedule_interval=timedelta(days=14), start_date=...`. >>>>> >>>>> Does this do what you need Elad? >>>>> >>>>> On 20 January 2021 18:12:36 GMT, Elad Kalif <elad...@gmail.com> wrote: >>>>>> >>>>>> >> In the example of a twice-a-month dag (not sure if it you have this >>>>>> use case too?) what do you expect the "data interval" (i.e. >>>>>> execution_date) >>>>>> to be? >>>>>> Yes we have this use case too. The execution date does matter because I >>>>>> want it to be bi-weekly for starting specific day and time >>>>>> so with the current implementation I expect to provide >>>>>> start_date=datetime(2021,1,19,20,5) & schedule_interval='2 weeks' >>>>>> >>>>>> Currently Airflow has 'hourly', 'daily' , 'weekly' - which doesn't >>>>>> allow us to set it. >>>>>> So a possible solution for this specific use case could be defining: >>>>>> repeat_every - integer that represents the frequency (1,2,3,... n) >>>>>> unit - str that provide the "gaps" (minutes, hours, days, weeks, >>>>>> months, years) >>>>>> Example: bi-weekly / twice a month can be: repeat_every = 2, unit = >>>>>> 'weeks' >>>>>> To get the 'hourly', 'daily' , 'weekly' functionality it >>>>>> just needs to set unit=1. >>>>>> >>>>>> By the way this is exactly what google calendar allows to set if you >>>>>> click on custom scheduling for a meeting. >>>>>> >>>>>> I'm still in favor of the python function approach as it should cover >>>>>> all cases and provide full control for the users. >>>>>> >>>>>> On Wed, Jan 20, 2021 at 7:20 PM Deng Xiaodong <xd.den...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> A quick thought (*maybe not making sense*): if *schedule_interval* >>>>>>> accepts >>>>>>> a list of values, we may support much higher complexity. >>>>>>> >>>>>>> For example, I may want to schedule my jobs at every days' 04:05 AND >>>>>>> 02:31 , which cannot be expressed by single Cron pattern. Then I may >>>>>>> want >>>>>>> to have *schedule_interval = ["5 4 * * *", "31 2 * * *"]*. >>>>>>> >>>>>>> Maybe I missed something or the idea doesn't make sense. Please let me >>>>>>> know. >>>>>>> >>>>>>> >>>>>>> XD >>>>>>> >>>>>>> On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor <a...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes, we quite possibly could do this -- I'm trying to work out what >>>>>>>> the needs are here. >>>>>>>> >>>>>>>> In the example of a twice-a-month dag (not sure if it you have this >>>>>>>> use case too?) what do you expect the "data interval" (i.e. >>>>>>>> execution_date) >>>>>>>> to be? >>>>>>>> >>>>>>>> Or for this case does it not matter? >>>>>>>> >>>>>>>> -ash >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote: >>>>>>>> >>>>>>>> Another case that is mentioned in one of the issues is the ability to >>>>>>>> schedule a bi-weekly job (equivalent of bi-weekly meeting that you can >>>>>>>> set >>>>>>>> in a calendar) which is very much needed. >>>>>>>> >>>>>>>> Maybe this is unrealistic but I think the game changer is if it would >>>>>>>> be possible to let the users define their own logic and airflow will >>>>>>>> use it >>>>>>>> to schedule DAGs. >>>>>>>> My thought here is - if I can define the logic in a python function >>>>>>>> (regardless of what this logic is). Can't Airflow utilize it? >>>>>>>> >>>>>>>> On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I'd like to (re)start the discussion about a new feature I'd like to >>>>>>>>> add for Airflow 2.1, that I am loosely calling "improving >>>>>>>>> schedule_interval" (catchy name I know!) >>>>>>>>> >>>>>>>>> I have two main high-level goals in mind here: >>>>>>>>> >>>>>>>>> 1. To reduce the confusion around execution_date (specifically the >>>>>>>>> naming of the parameter!) - the whole start vs end discussion. >>>>>>>>> 2. To support more complex schedules. >>>>>>>>> >>>>>>>>> Previous thread on this point 1 here: >>>>>>>>> https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E, >>>>>>>>> (but I'm taking a bit of a step back from that to think if there's a >>>>>>>>> bigger >>>>>>>>> change we could make that encompases this) >>>>>>>>> >>>>>>>>> >>>>>>>>> I don't yet have a concrete plan, nor implementation in mind, but >>>>>>>>> I'd like to start collecting peoples "wish list" when it comes to >>>>>>>>> scheduling DAGS: >>>>>>>>> >>>>>>>>> - What do you wish you could express natively in terms of scheduling >>>>>>>>> your DAGs? (I.e. without using "hacks" such as date sensor/skip tasks >>>>>>>>> at >>>>>>>>> start) >>>>>>>>> - What schedules do you wish you could express now, that you just >>>>>>>>> can't? >>>>>>>>> - Do you have good example workflows that give a good example of >>>>>>>>> where you want schedule at start? Follow up question: do you also >>>>>>>>> want this >>>>>>>>> to be different for different DAGs in your Airflow install? >>>>>>>>> >>>>>>>>> >>>>>>>>> Existing issues: >>>>>>>>> https://github.com/apache/airflow/issues/8649 "Add support for more >>>>>>>>> than 1 cron exp per DAG" >>>>>>>>> https://github.com/apache/airflow/issues/10194 "Ability to better >>>>>>>>> support odd scheduling time" >>>>>>>>> https://github.com/apache/airflow/issues/10449 "Dynamic Schedule >>>>>>>>> Intervals" >>>>>>>>> https://github.com/apache/airflow/issues/10123 "Job Schedule >>>>>>>>> Interval on 2nd & 4th Tuesday" >>>>>>>>> >>>>>>>>> I'll start: >>>>>>>>> >>>>>>>>> Case1: >>>>>>>>> >>>>>>>>> One example that came up recently in slack was an actual astronomer >>>>>>>>> wanting a DAG to run with a schedule of "@sunset"! This also brings >>>>>>>>> up the >>>>>>>>> subject of "running dags at interval start or end" >>>>>>>>> >>>>>>>>> Case2: >>>>>>>>> >>>>>>>>> I'd like to be able to run a daily process at the end of each week >>>>>>>>> day. I.e. to process data for Monday..Friday. The naive way of >>>>>>>>> expressing >>>>>>>>> this would be "0 0 * * MON-FRI", but that means that the dags would >>>>>>>>> run >>>>>>>>> Tuesday, Wednesday ,Thursday ,Friday, Monday -- meaning Friday's data >>>>>>>>> isn't processed until Monday! >>>>>>>>> >>>>>>>>> My thoughts on this is we need to separate schedule interval (when >>>>>>>>> to run a task) from the period duration (i.e look at one days worth of >>>>>>>>> data). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Ash >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>