A quick thought (*maybe not making sense*): if *schedule_interval* accepts
a list of values, we may support much higher complexity.

For example, I may want to schedule my jobs at every days' 04:05 AND 02:31
, which cannot be expressed by single Cron pattern. Then I may want to
have *schedule_interval
= ["5 4 * * *", "31 2 * * *"]*.

Maybe I missed something or the idea doesn't make sense. Please let me know.


XD

On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> Yes, we quite possibly could do this -- I'm trying to work out what the
> needs are here.
>
> In the example of a twice-a-month dag (not sure if it you have this use
> case too?) what do you expect the "data interval" (i.e. execution_date) to
> be?
>
> Or for this case does it not matter?
>
> -ash
>
>
> On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote:
>
> Another case that is mentioned in one of the issues is the ability to
> schedule a bi-weekly job (equivalent of bi-weekly meeting that you can set
> in a calendar) which is very much needed.
>
> Maybe this is unrealistic but I think the game changer is if it would be
> possible to let the users define their own logic and airflow will use it to
> schedule DAGs.
> My thought here is - if I can define the logic in a python function
> (regardless of what this logic is). Can't Airflow utilize it?
>
> On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
>> Hi everyone,
>>
>> I'd like to (re)start the discussion about a new feature I'd like to add
>> for Airflow 2.1, that I am loosely calling "improving schedule_interval"
>> (catchy name I know!)
>>
>> I have two main high-level goals in mind here:
>>
>> 1. To reduce the confusion around execution_date (specifically the naming
>> of the parameter!) - the whole start vs end discussion.
>> 2. To support more complex schedules.
>>
>> Previous thread on this point 1 here:
>> https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E,
>> (but I'm taking a bit of a step back from that to think if there's a bigger
>> change we could make that encompases this)
>>
>>
>> I don't yet have a concrete plan, nor implementation in mind, but I'd
>> like to start collecting peoples "wish list" when it comes to scheduling
>> DAGS:
>>
>> - What do you wish you could express natively in terms of scheduling your
>> DAGs? (I.e. without using "hacks" such as date sensor/skip tasks at start)
>> - What schedules do you wish you could express now, that you just can't?
>> - Do you have good example workflows that give a good example of where
>> you want schedule at start? Follow up question: do you also want this to be
>> different for different DAGs in your Airflow install?
>>
>>
>> Existing issues:
>> https://github.com/apache/airflow/issues/8649 "Add support for more than
>> 1 cron exp per DAG"
>> https://github.com/apache/airflow/issues/10194 "Ability to better
>> support odd scheduling time"
>> https://github.com/apache/airflow/issues/10449 "Dynamic Schedule
>> Intervals"
>> https://github.com/apache/airflow/issues/10123 "Job Schedule Interval on
>> 2nd & 4th Tuesday"
>>
>> I'll start:
>>
>> Case1:
>>
>> One example that came up recently in slack was an actual astronomer
>> wanting a DAG to run with a schedule of "@sunset"! This also brings up the
>> subject of "running dags at interval start or end"
>>
>> Case2:
>>
>> I'd like to be able to run a daily process at the end of each week day.
>> I.e. to process data for Monday..Friday. The naive way of expressing this
>> would be "0 0 * * MON-FRI", but that means that the dags would run Tuesday,
>> Wednesday ,Thursday ,Friday, Monday  -- meaning Friday's data isn't
>> processed until Monday!
>>
>> My thoughts on this is we need to separate schedule interval (when to run
>> a task) from the period duration (i.e look at one days worth of data).
>>
>> Thanks,
>> Ash
>>
>>
>>
>>

Reply via email to