> `schedule_interval=timedelta(days=14), start_date=...`.

That won't support every second Thursday for example

On Wed, Jan 20, 2021 at 6:54 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> Running exactly every two weeks can be done by setting
> `schedule_interval=timedelta(days=14), start_date=...`.
>
> Does this do what you need Elad?
>
> On 20 January 2021 18:12:36 GMT, Elad Kalif <elad...@gmail.com> wrote:
>>
>> >> In the example of a twice-a-month dag (not sure if it you have this
>> use case too?) what do you expect the "data interval" (i.e. execution_date)
>> to be?
>> Yes we have this use case too. The execution date does matter because I
>> want it to be bi-weekly for starting specific day and time
>> so with the current implementation I expect to provide
>> start_date=datetime(2021,1,19,20,5) & schedule_interval='2 weeks'
>>
>> Currently Airflow has 'hourly', 'daily' , 'weekly' - which doesn't allow
>> us to set it.
>> So a possible solution for this specific use case could be defining:
>> repeat_every - integer that represents the frequency (1,2,3,... n)
>> unit - str that provide the "gaps" (minutes, hours, days, weeks, months,
>> years)
>> Example: bi-weekly / twice a month can be: repeat_every = 2, unit =
>> 'weeks'
>>                To get the 'hourly', 'daily' , 'weekly' functionality it
>> just needs to set unit=1.
>>
>> By the way this is exactly what google calendar allows to set if you
>> click on custom scheduling for a meeting.
>>
>> I'm still in favor of the python function approach as it should cover all
>> cases and provide full control for the users.
>>
>> On Wed, Jan 20, 2021 at 7:20 PM Deng Xiaodong <xd.den...@gmail.com>
>> wrote:
>>
>>> A quick thought (*maybe not making sense*): if *schedule_interval* accepts
>>> a list of values, we may support much higher complexity.
>>>
>>> For example, I may want to schedule my jobs at every days' 04:05 AND
>>> 02:31 , which cannot be expressed by single Cron pattern. Then I may want
>>> to have *schedule_interval = ["5 4 * * *", "31 2 * * *"]*.
>>>
>>> Maybe I missed something or the idea doesn't make sense. Please let me
>>> know.
>>>
>>>
>>> XD
>>>
>>> On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>>
>>>> Yes, we quite possibly could do this -- I'm trying to work out what the
>>>> needs are here.
>>>>
>>>> In the example of a twice-a-month dag (not sure if it you have this use
>>>> case too?) what do you expect the "data interval" (i.e. execution_date) to
>>>> be?
>>>>
>>>> Or for this case does it not matter?
>>>>
>>>> -ash
>>>>
>>>>
>>>> On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote:
>>>>
>>>> Another case that is mentioned in one of the issues is the ability to
>>>> schedule a bi-weekly job (equivalent of bi-weekly meeting that you can set
>>>> in a calendar) which is very much needed.
>>>>
>>>> Maybe this is unrealistic but I think the game changer is if it would
>>>> be possible to let the users define their own logic and airflow will use it
>>>> to schedule DAGs.
>>>> My thought here is - if I can define the logic in a python function
>>>> (regardless of what this logic is). Can't Airflow utilize it?
>>>>
>>>> On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'd like to (re)start the discussion about a new feature I'd like to
>>>>> add for Airflow 2.1, that I am loosely calling "improving
>>>>> schedule_interval" (catchy name I know!)
>>>>>
>>>>> I have two main high-level goals in mind here:
>>>>>
>>>>> 1. To reduce the confusion around execution_date (specifically the
>>>>> naming of the parameter!) - the whole start vs end discussion.
>>>>> 2. To support more complex schedules.
>>>>>
>>>>> Previous thread on this point 1 here:
>>>>> https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E,
>>>>> (but I'm taking a bit of a step back from that to think if there's a 
>>>>> bigger
>>>>> change we could make that encompases this)
>>>>>
>>>>>
>>>>> I don't yet have a concrete plan, nor implementation in mind, but I'd
>>>>> like to start collecting peoples "wish list" when it comes to scheduling
>>>>> DAGS:
>>>>>
>>>>> - What do you wish you could express natively in terms of scheduling
>>>>> your DAGs? (I.e. without using "hacks" such as date sensor/skip tasks at
>>>>> start)
>>>>> - What schedules do you wish you could express now, that you just
>>>>> can't?
>>>>> - Do you have good example workflows that give a good example of where
>>>>> you want schedule at start? Follow up question: do you also want this to 
>>>>> be
>>>>> different for different DAGs in your Airflow install?
>>>>>
>>>>>
>>>>> Existing issues:
>>>>> https://github.com/apache/airflow/issues/8649 "Add support for more
>>>>> than 1 cron exp per DAG"
>>>>> https://github.com/apache/airflow/issues/10194 "Ability to better
>>>>> support odd scheduling time"
>>>>> https://github.com/apache/airflow/issues/10449 "Dynamic Schedule
>>>>> Intervals"
>>>>> https://github.com/apache/airflow/issues/10123 "Job Schedule Interval
>>>>> on 2nd & 4th Tuesday"
>>>>>
>>>>> I'll start:
>>>>>
>>>>> Case1:
>>>>>
>>>>> One example that came up recently in slack was an actual astronomer
>>>>> wanting a DAG to run with a schedule of "@sunset"! This also brings up the
>>>>> subject of "running dags at interval start or end"
>>>>>
>>>>> Case2:
>>>>>
>>>>> I'd like to be able to run a daily process at the end of each week
>>>>> day. I.e. to process data for Monday..Friday. The naive way of expressing
>>>>> this would be "0 0 * * MON-FRI", but that means that the dags would run
>>>>> Tuesday, Wednesday ,Thursday ,Friday, Monday  -- meaning Friday's data
>>>>> isn't processed until Monday!
>>>>>
>>>>> My thoughts on this is we need to separate schedule interval (when to
>>>>> run a task) from the period duration (i.e look at one days worth of data).
>>>>>
>>>>> Thanks,
>>>>> Ash
>>>>>
>>>>>
>>>>>
>>>>>

Reply via email to