Yes, we quite possibly could do this -- I'm trying to work out what the
needs are here.
In the example of a twice-a-month dag (not sure if it you have this use
case too?) what do you expect the "data interval" (i.e. execution_date)
to be?
Or for this case does it not matter?
-ash
On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote:
Another case that is mentioned in one of the issues is the ability to
schedule a bi-weekly job (equivalent of bi-weekly meeting that you
can set in a calendar) which is very much needed.
Maybe this is unrealistic but I think the game changer is if it would
be possible to let the users define their own logic and airflow will
use it to schedule DAGs.
My thought here is - if I can define the logic in a python function
(regardless of what this logic is). Can't Airflow utilize it?
On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org
<mailto:a...@apache.org>> wrote:
Hi everyone,
I'd like to (re)start the discussion about a new feature I'd like to
add for Airflow 2.1, that I am loosely calling "improving
schedule_interval" (catchy name I know!)
I have two main high-level goals in mind here:
1. To reduce the confusion around execution_date (specifically the
naming of the parameter!) - the whole start vs end discussion.
2. To support more complex schedules.
Previous thread on this point 1 here:
<https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E>,
(but I'm taking a bit of a step back from that to think if there's a
bigger change we could make that encompases this)
I don't yet have a concrete plan, nor implementation in mind, but
I'd like to start collecting peoples "wish list" when it comes to
scheduling DAGS:
- What do you wish you could express natively in terms of scheduling
your DAGs? (I.e. without using "hacks" such as date sensor/skip
tasks at start)
- What schedules do you wish you could express now, that you just
can't?
- Do you have good example workflows that give a good example of
where you want schedule at start? Follow up question: do you also
want this to be different for different DAGs in your Airflow install?
Existing issues:
<https://github.com/apache/airflow/issues/8649> "Add support for
more than 1 cron exp per DAG"
<https://github.com/apache/airflow/issues/10194> "Ability to better
support odd scheduling time"
<https://github.com/apache/airflow/issues/10449> "Dynamic Schedule
Intervals"
<https://github.com/apache/airflow/issues/10123> "Job Schedule
Interval on 2nd & 4th Tuesday"
I'll start:
Case1:
One example that came up recently in slack was an actual astronomer
wanting a DAG to run with a schedule of "@sunset"! This also brings
up the subject of "running dags at interval start or end"
Case2:
I'd like to be able to run a daily process at the end of each week
day. I.e. to process data for Monday..Friday. The naive way of
expressing this would be "0 0 * * MON-FRI", but that means that the
dags would run Tuesday, Wednesday ,Thursday ,Friday, Monday --
meaning Friday's data isn't processed until Monday!
My thoughts on this is we need to separate schedule interval (when
to run a task) from the period duration (i.e look at one days worth
of data).
Thanks,
Ash