This all just to say: I think this one specific use case is already achievable.

On 20 January 2021 19:36:54 GMT, Ash Berlin-Taylor <a...@apache.org> wrote:
>I'll have to test, bit my gut says that isn't how  it works, instead next 
>schedule = previous schedule + interval.
>
>On 20 January 2021 19:25:50 GMT, Elad Kalif <elad...@gmail.com> wrote:
>>Sorry ignore my explanation. It's wrong.
>>
>> https://github.com/apache/airflow/issues/8649 explains why
>>`schedule_interval=timedelta(days=14), start_date=...`. is not the exactly
>>the same as bi-weekly for an exact specified time.
>>It also provide an example for daily.
>>
>>On Wed, Jan 20, 2021 at 9:19 PM Elad Kalif <elad...@gmail.com> wrote:
>>
>>> > `schedule_interval=timedelta(days=14), start_date=...`.
>>>
>>> ash from what I've seen It doesn't give the same exact functionality
>>> This is what I observed:
>>> cron = schedule exactly on regardless of what happened with previous run
>>> (mostly)
>>> timedelta() = schedule after delta passed. However it's from end_date not
>>> from start_date.
>>> so schedule_interval=timedelta(days=14) means It will be scheduled 14 days
>>> from end_date or the previous run. If we consider for example that the dag
>>> takes 2 days to complete then the next one will be scheduled after 14 days
>>> from end_date (= 16 days from start_date).
>>>
>>> This exact case is actually explained really well in the first issue
>>> listed https://github.com/apache/airflow/issues/8649 using daily example
>>>
>>> On Wed, Jan 20, 2021 at 9:17 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>>
>>>>
>>>> `schedule_interval=timedelta(days=14), start_date=...`.
>>>>
>>>>
>>>> That won't support every second Thursday for example
>>>>
>>>> On Wed, Jan 20, 2021 at 6:54 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>>>>
>>>>> Running exactly every two weeks can be done by setting
>>>>> `schedule_interval=timedelta(days=14), start_date=...`.
>>>>>
>>>>> Does this do what you need Elad?
>>>>>
>>>>> On 20 January 2021 18:12:36 GMT, Elad Kalif <elad...@gmail.com> wrote:
>>>>>>
>>>>>> >> In the example of a twice-a-month dag (not sure if it you have this
>>>>>> use case too?) what do you expect the "data interval" (i.e. 
>>>>>> execution_date)
>>>>>> to be?
>>>>>> Yes we have this use case too. The execution date does matter because I
>>>>>> want it to be bi-weekly for starting specific day and time
>>>>>> so with the current implementation I expect to provide
>>>>>> start_date=datetime(2021,1,19,20,5) & schedule_interval='2 weeks'
>>>>>>
>>>>>> Currently Airflow has 'hourly', 'daily' , 'weekly' - which doesn't
>>>>>> allow us to set it.
>>>>>> So a possible solution for this specific use case could be defining:
>>>>>> repeat_every - integer that represents the frequency (1,2,3,... n)
>>>>>> unit - str that provide the "gaps" (minutes, hours, days, weeks,
>>>>>> months, years)
>>>>>> Example: bi-weekly / twice a month can be: repeat_every = 2, unit =
>>>>>> 'weeks'
>>>>>>                To get the 'hourly', 'daily' , 'weekly' functionality it
>>>>>> just needs to set unit=1.
>>>>>>
>>>>>> By the way this is exactly what google calendar allows to set if you
>>>>>> click on custom scheduling for a meeting.
>>>>>>
>>>>>> I'm still in favor of the python function approach as it should cover
>>>>>> all cases and provide full control for the users.
>>>>>>
>>>>>> On Wed, Jan 20, 2021 at 7:20 PM Deng Xiaodong <xd.den...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> A quick thought (*maybe not making sense*): if *schedule_interval* 
>>>>>>> accepts
>>>>>>> a list of values, we may support much higher complexity.
>>>>>>>
>>>>>>> For example, I may want to schedule my jobs at every days' 04:05 AND
>>>>>>> 02:31 , which cannot be expressed by single Cron pattern. Then I may 
>>>>>>> want
>>>>>>> to have *schedule_interval = ["5 4 * * *", "31 2 * * *"]*.
>>>>>>>
>>>>>>> Maybe I missed something or the idea doesn't make sense. Please let me
>>>>>>> know.
>>>>>>>
>>>>>>>
>>>>>>> XD
>>>>>>>
>>>>>>> On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes, we quite possibly could do this -- I'm trying to work out what
>>>>>>>> the needs are here.
>>>>>>>>
>>>>>>>> In the example of a twice-a-month dag (not sure if it you have this
>>>>>>>> use case too?) what do you expect the "data interval" (i.e. 
>>>>>>>> execution_date)
>>>>>>>> to be?
>>>>>>>>
>>>>>>>> Or for this case does it not matter?
>>>>>>>>
>>>>>>>> -ash
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Another case that is mentioned in one of the issues is the ability to
>>>>>>>> schedule a bi-weekly job (equivalent of bi-weekly meeting that you can 
>>>>>>>> set
>>>>>>>> in a calendar) which is very much needed.
>>>>>>>>
>>>>>>>> Maybe this is unrealistic but I think the game changer is if it would
>>>>>>>> be possible to let the users define their own logic and airflow will 
>>>>>>>> use it
>>>>>>>> to schedule DAGs.
>>>>>>>> My thought here is - if I can define the logic in a python function
>>>>>>>> (regardless of what this logic is). Can't Airflow utilize it?
>>>>>>>>
>>>>>>>> On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I'd like to (re)start the discussion about a new feature I'd like to
>>>>>>>>> add for Airflow 2.1, that I am loosely calling "improving
>>>>>>>>> schedule_interval" (catchy name I know!)
>>>>>>>>>
>>>>>>>>> I have two main high-level goals in mind here:
>>>>>>>>>
>>>>>>>>> 1. To reduce the confusion around execution_date (specifically the
>>>>>>>>> naming of the parameter!) - the whole start vs end discussion.
>>>>>>>>> 2. To support more complex schedules.
>>>>>>>>>
>>>>>>>>> Previous thread on this point 1 here:
>>>>>>>>> https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E,
>>>>>>>>> (but I'm taking a bit of a step back from that to think if there's a 
>>>>>>>>> bigger
>>>>>>>>> change we could make that encompases this)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't yet have a concrete plan, nor implementation in mind, but
>>>>>>>>> I'd like to start collecting peoples "wish list" when it comes to
>>>>>>>>> scheduling DAGS:
>>>>>>>>>
>>>>>>>>> - What do you wish you could express natively in terms of scheduling
>>>>>>>>> your DAGs? (I.e. without using "hacks" such as date sensor/skip tasks 
>>>>>>>>> at
>>>>>>>>> start)
>>>>>>>>> - What schedules do you wish you could express now, that you just
>>>>>>>>> can't?
>>>>>>>>> - Do you have good example workflows that give a good example of
>>>>>>>>> where you want schedule at start? Follow up question: do you also 
>>>>>>>>> want this
>>>>>>>>> to be different for different DAGs in your Airflow install?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Existing issues:
>>>>>>>>> https://github.com/apache/airflow/issues/8649 "Add support for more
>>>>>>>>> than 1 cron exp per DAG"
>>>>>>>>> https://github.com/apache/airflow/issues/10194 "Ability to better
>>>>>>>>> support odd scheduling time"
>>>>>>>>> https://github.com/apache/airflow/issues/10449 "Dynamic Schedule
>>>>>>>>> Intervals"
>>>>>>>>> https://github.com/apache/airflow/issues/10123 "Job Schedule
>>>>>>>>> Interval on 2nd & 4th Tuesday"
>>>>>>>>>
>>>>>>>>> I'll start:
>>>>>>>>>
>>>>>>>>> Case1:
>>>>>>>>>
>>>>>>>>> One example that came up recently in slack was an actual astronomer
>>>>>>>>> wanting a DAG to run with a schedule of "@sunset"! This also brings 
>>>>>>>>> up the
>>>>>>>>> subject of "running dags at interval start or end"
>>>>>>>>>
>>>>>>>>> Case2:
>>>>>>>>>
>>>>>>>>> I'd like to be able to run a daily process at the end of each week
>>>>>>>>> day. I.e. to process data for Monday..Friday. The naive way of 
>>>>>>>>> expressing
>>>>>>>>> this would be "0 0 * * MON-FRI", but that means that the dags would 
>>>>>>>>> run
>>>>>>>>> Tuesday, Wednesday ,Thursday ,Friday, Monday  -- meaning Friday's data
>>>>>>>>> isn't processed until Monday!
>>>>>>>>>
>>>>>>>>> My thoughts on this is we need to separate schedule interval (when
>>>>>>>>> to run a task) from the period duration (i.e look at one days worth of
>>>>>>>>> data).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ash
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Reply via email to