On Wed, Jan 20, 2021 at 7:28 PM Daniel Imberman <daniel.imber...@gmail.com>
wrote:

> @Jarek the problem with just cron is I don’t think cron can handle “every
> third Thursday” or “the next open market day after the 15th.” I think we
> need something more flexible than just cron (though agree that cron syntax
> can get a fair bit of mileage)
>

In 1) Yeah I am not talking about specification but behaviour. We can still
use a pythonic way of specifying "cron-like" behaviour where you specify "
run something at time X" (which is what CRON does) rather than "run this
for every interval <A, B>" (which Airflow is all about).
So even in the case 1) I described, I am all for more dynamica
specification of time when things should start. I think we are at the same
page here :)



>
> On Wed, Jan 20, 2021 at 10:24 AM, Jarek Potiuk <ja...@potiuk.com> wrote:
>
> My thoughts (no final solution in mind, just wild thoughts):
>
> 1) I think we should add support for regular CRON behaviour. Simply "cron"
> schedule for dag execution, without the "data interval" rhetoric.
>
> There are a number of good cases where Airflow can be used as just a
> scheduler to run the jobs. This should be akin to CI jobs - > either
> trigger the run on some event (trigger) or in regular intervals, but each
> run should not be tied with a particular "data interval" - which means that
> the whole backfill, re-running. idempotency of runs etc. will not be
> applicable. This should be IMHO even different type of DAGs, differently
> treated in the UI (for example every rerun should result in a NEW run
> rather than repetition of the previous run for a specific interval). I
> think we should very very clearly distinguish it from the "Data interval"
> kinds - maybe even the base class should be called differently for those
> (CronDAG) ?? It should be very, very clear what kind of DAG you have when
> you look at it. Both in the code and in the UI,
>
> 2) Get rid of CRON in the "Data Interval" (i.e all current DAGs !). This
> might be bold, but I think it might be best.
>
> This is very confusing that we are using the CRON syntax but not the
> execution model. I think this is a major source of confusion among the
> users. The current way of specifying the schedule should be deprecated and
> dropped in 3.0 or automatically convert it to a new form.
> To that, I am all for Elad's proposal of using python function (with
> predefined set of parameterizable ones expressing intervals not start/end
> times). The CRON specification part is the only part that is declarative
> rather than imperative in airflow. All other stuff is python code. Heck,
> why not schedule? It has of course a number of problems to solve (largely
> optimisations in scheduler that needs to look ahead and plan scheduling in
> the future), but it is all solvable imho.
>
> J.
>
> On Wed, Jan 20, 2021 at 6:20 PM Deng Xiaodong <xd.den...@gmail.com> wrote:
>
>> A quick thought (*maybe not making sense*): if *schedule_interval*
>> accepts a list of values, we may support much higher complexity.
>>
>> For example, I may want to schedule my jobs at every days' 04:05 AND
>> 02:31 , which cannot be expressed by single Cron pattern. Then I may want
>> to have *schedule_interval = ["5 4 * * *", "31 2 * * *"]*.
>>
>> Maybe I missed something or the idea doesn't make sense. Please let me
>> know.
>>
>>
>> XD
>>
>> On Wed, Jan 20, 2021 at 6:09 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>>
>>> Yes, we quite possibly could do this -- I'm trying to work out what the
>>> needs are here.
>>>
>>> In the example of a twice-a-month dag (not sure if it you have this use
>>> case too?) what do you expect the "data interval" (i.e. execution_date) to
>>> be?
>>>
>>> Or for this case does it not matter?
>>>
>>> -ash
>>>
>>>
>>> On Wed, 20 Jan, 2021 at 19:06, Elad Kalif <elad...@gmail.com> wrote:
>>>
>>> Another case that is mentioned in one of the issues is the ability to
>>> schedule a bi-weekly job (equivalent of bi-weekly meeting that you can set
>>> in a calendar) which is very much needed.
>>>
>>> Maybe this is unrealistic but I think the game changer is if it would be
>>> possible to let the users define their own logic and airflow will use it to
>>> schedule DAGs.
>>> My thought here is - if I can define the logic in a python function
>>> (regardless of what this logic is). Can't Airflow utilize it?
>>>
>>> On Wed, Jan 20, 2021 at 5:39 PM Ash Berlin-Taylor <a...@apache.org>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I'd like to (re)start the discussion about a new feature I'd like to
>>>> add for Airflow 2.1, that I am loosely calling "improving
>>>> schedule_interval" (catchy name I know!)
>>>>
>>>> I have two main high-level goals in mind here:
>>>>
>>>> 1. To reduce the confusion around execution_date (specifically the
>>>> naming of the parameter!) - the whole start vs end discussion.
>>>> 2. To support more complex schedules.
>>>>
>>>> Previous thread on this point 1 here:
>>>> https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E,
>>>> (but I'm taking a bit of a step back from that to think if there's a bigger
>>>> change we could make that encompases this)
>>>>
>>>>
>>>> I don't yet have a concrete plan, nor implementation in mind, but I'd
>>>> like to start collecting peoples "wish list" when it comes to scheduling
>>>> DAGS:
>>>>
>>>> - What do you wish you could express natively in terms of scheduling
>>>> your DAGs? (I.e. without using "hacks" such as date sensor/skip tasks at
>>>> start)
>>>> - What schedules do you wish you could express now, that you just can't?
>>>> - Do you have good example workflows that give a good example of where
>>>> you want schedule at start? Follow up question: do you also want this to be
>>>> different for different DAGs in your Airflow install?
>>>>
>>>>
>>>> Existing issues:
>>>> https://github.com/apache/airflow/issues/8649 "Add support for more
>>>> than 1 cron exp per DAG"
>>>> https://github.com/apache/airflow/issues/10194 "Ability to better
>>>> support odd scheduling time"
>>>> https://github.com/apache/airflow/issues/10449 "Dynamic Schedule
>>>> Intervals"
>>>> https://github.com/apache/airflow/issues/10123 "Job Schedule Interval
>>>> on 2nd & 4th Tuesday"
>>>>
>>>> I'll start:
>>>>
>>>> Case1:
>>>>
>>>> One example that came up recently in slack was an actual astronomer
>>>> wanting a DAG to run with a schedule of "@sunset"! This also brings up the
>>>> subject of "running dags at interval start or end"
>>>>
>>>> Case2:
>>>>
>>>> I'd like to be able to run a daily process at the end of each week day.
>>>> I.e. to process data for Monday..Friday. The naive way of expressing this
>>>> would be "0 0 * * MON-FRI", but that means that the dags would run Tuesday,
>>>> Wednesday ,Thursday ,Friday, Monday -- meaning Friday's data isn't
>>>> processed until Monday!
>>>> My thoughts on this is we need to separate schedule interval (when to
>>>> run a task) from the period duration (i.e look at one days worth of data).
>>>>
>>>> Thanks,
>>>> Ash
>>>>
>>>>
>>>>
>>>>
>
> --
> +48 660 796 129 <+48660796129>
>
>

-- 
+48 660 796 129

Reply via email to