I love the idea of allowing users to create their own scheduling 
objects/scheduling python functions. They could either live in the scheduler or 
as a seperate process that trips some value in the DB when it is “true”. Would 
be great from a “marketplace” standpoint as well as users could post their 
custom scheduling objects for others to use.

On Wed, Jan 20, 2021 at 10:12 AM, Elad Kalif <elad...@gmail.com> wrote:
>> In the example of a twice-a-month dag (not sure if it you have this use case 
>> too?) what do you expect the "data interval" (i.e. execution_date) to be?
Yes we have this use case too. The execution date does matter because I want it 
to be bi-weekly for starting specific day and time
so with the current implementation I expect to provide 
start_date=datetime(2021,1,19,20,5) & schedule_interval='2 weeks'

Currently Airflow has 'hourly', 'daily' , 'weekly' - which doesn't allow us to 
set it.
So a possible solution for this specific use case could be defining:
repeat_every - integer that represents the frequency (1,2,3,... n)
unit - str that provide the "gaps" (minutes, hours, days, weeks, months, years)
Example: bi-weekly / twice a month can be: repeat_every = 2, unit = 'weeks'
To get the 'hourly', 'daily' , 'weekly' functionality it just needs to set 
unit=1.

By the way this is exactly what google calendar allows to set if you click on 
custom scheduling for a meeting.

I'm still in favor of the python function approach as it should cover all cases 
and provide full control for the users.

On Wed, Jan 20, 2021 at 7:20 PM Deng Xiaodong < xd.den...@gmail.com 
[xd.den...@gmail.com] > wrote:
A quick thought ( maybe not making sense ): if schedule_interval accepts a list 
of values, we may support much higher complexity.
For example, I may want to schedule my jobs at every days' 04:05 AND 02:31 , 
which cannot be expressed by single Cron pattern. Then I may want to have 
schedule_interval = ["5 4 * * *", "31 2 * * *"] .
Maybe I missed something or the idea doesn't make sense. Please let me know.

XD
On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor < a...@apache.org 
[a...@apache.org] > wrote:
Yes, we quite possibly could do this -- I'm trying to work out what the needs 
are here.
In the example of a twice-a-month dag (not sure if it you have this use case 
too?) what do you expect the "data interval" (i.e. execution_date) to be?
Or for this case does it not matter?
-ash

On Wed, 20 Jan, 2021 at 19:06, Elad Kalif < elad...@gmail.com 
[elad...@gmail.com] > wrote:
Another case that is mentioned in one of the issues is the ability to schedule 
a bi-weekly job (equivalent of bi-weekly meeting that you can set in a 
calendar) which is very much needed.

Maybe this is unrealistic but I think the game changer is if it would be 
possible to let the users define their own logic and airflow will use it to 
schedule DAGs.
My thought here is - if I can define the logic in a python function (regardless 
of what this logic is). Can't Airflow utilize it?

On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor < a...@apache.org 
[a...@apache.org] > wrote:
Hi everyone,
I'd like to (re)start the discussion about a new feature I'd like to add for 
Airflow 2.1, that I am loosely calling "improving schedule_interval" (catchy 
name I know!)
I have two main high-level goals in mind here:
1. To reduce the confusion around execution_date (specifically the naming of 
the parameter!) - the whole start vs end discussion. 2. To support more complex 
schedules.
Previous thread on this poin t 1 here: 
https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E
 
[https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E]
 , (but I'm taking a bit of a step back from that to think if there's a bigger 
change we could make that encompases this)

I don't yet have a concrete plan, nor implementation in mind, but I'd like to 
start collecting peoples "wish list" when it comes to scheduling DAGS:
- What do you wish you could express natively in terms of scheduling your DAGs? 
(I.e. without using "hacks" such as date sensor/skip tasks at start) - What 
schedules do you wish you could express now, that you just can't? - Do you have 
good example workflows that give a good example of where you want schedule at 
start? Follow up question: do you also want this to be different for different 
DAGs in your Airflow install?

Existing issues: https://github.com/apache/airflow/issues/8649 
[https://github.com/apache/airflow/issues/8649] "Add support for more than 1 
cron exp per DAG" https://github.com/apache/airflow/issues/10194 
[https://github.com/apache/airflow/issues/10194] "Ability to better support odd 
scheduling time" https://github.com/apache/airflow/issues/10449 
[https://github.com/apache/airflow/issues/10449] "Dynamic Schedule Intervals" 
https://github.com/apache/airflow/issues/10123 
[https://github.com/apache/airflow/issues/10123] "Job Schedule Interval on 2nd 
& 4th Tuesday"
I'll start:
Case1:
One example that came up recently in slack was an actual astronomer wanting a 
DAG to run with a schedule of "@sunset"! This also brings up the subject of 
"running dags at interval start or end"
Case2:
I'd like to be able to run a daily process at the end of each week day. I.e. to 
process data for Monday..Friday. The naive way of expressing this would be "0 0 
* * MON-FRI", but that means that the dags would run Tuesday, Wednesday 
,Thursday ,Friday, Monday -- meaning Friday's data isn't processed until 
Monday! My thoughts on this is we need to separate schedule interval (when to 
run a task) from the period duration (i.e look at one days worth of data).
Thanks, Ash

Reply via email to