I worry that multiple crons would become difficult to read for stranger
use-cases (for example "run on the first trading day after the 15th of the
month"). If we create a python function or class we can easily create a
"CronTimeTable" that does exactly what Dmitry is suggesting while still
leaving open the possibility of creating other custom schedules.

On Sat, Jan 23, 2021, 2:32 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> I think whatever approach we decide on we should display
> *next_execution_date* in the webserver for each DAG. This would help most
> of the users.
>
> Regards,
> Kaxil
>
> On Sat, Jan 23, 2021 at 10:25 PM Dmitri Khokhlov <dkhokh...@gmail.com>
> wrote:
>
>> Root problem:
>> - existing Airflow schedule syntax defines only one interval pattern per
>> DAG
>> - there are use-cases that need multiple interval patterns per DAG
>> (during a day etc)
>>
>> I vote for "crontab list" solution from Deng Xiaodong. Example:
>>
>> *schedule_interval = ["* 0,22,23 * * *", "30 1-21 * * *"]
>>
>> Reasoning:
>> - it is additive change - does not remove or break existing usage
>> patterns (very important)
>> - it is generic and it has compact definition - easy to
>> read/print/present in UI (a string). that is why it is better than
>> "function" approach.
>> - it is complete solution as it allows to define interval based schedules
>> of any complexity.
>> - it is relatively easy to implement by OR-ing crontabs times and
>> choosing next earliest run time and following these instructions from Ash
>> Berlin-Taylor <a...@apache.org>:
>> "
>> The way the scheduler works now it just looks at two columns on the dag
>> (model) table called I think "next_dagrun_after" (which is the earliest
>> date that the dag run can be created, and "next execution date" (which is
>> the value to put in the execution date of the dag run when it's created.
>>
>> Both these values are set by the dag parser process, which has full
>> access to run code. What ever interface for defining new schedule
>> expression should run in the existing process, much like how James C did in
>> a subclass.
>> "
>> --
>> Dmitri
>>
>>
>> On 2021/01/21 19:12:06, Daniel Imberman <daniel.imber...@gmail.com>
>> wrote:
>> > My only concern with tying this to the dag_parsing process is that that
>> process might miss SLAs because it takes too long to loop around. I could
>> imagine a separate thread or component that can read either TimeTable
>> objects or SmartSensor objects and run them might make sense.
>> > Ultimately I don’t see anything about SmartSensors that specifically
>> need to run in a DAG. It could just as easily be while loop or something
>> embarrasingly parallel (as sensors/timetables shouldn’t depend on each
>> other).
>> >
>> > On Thu, Jan 21, 2021 at 11:07 AM, Vikram Koka <vik...@astronomer.io>
>> wrote:
>> > Great discussion.
>> > I generally agree with the "Custom scheduling class" / subclass
>> approach which would run as part of the "scheduler" set of processes,
>> rather than an internal DAG approach.
>> > I do think it would be good to have boundaries on what information this
>> class would operate on and at what frequency. This is primarily from a
>> performance standpoint, though it could be argued that there are security
>> concerns with that as well.
>> > Specifically from the "what information would this have access to"
>> perspective, I think that interface would be helpful in clarifying some of
>> the use cases and making sure that those are covered. One example I was
>> thinking about in the "sunset" example is location. I was originally
>> thinking of a timezone, but this is more specific than that.
>> >
>> >
>> > On Thu, Jan 21, 2021 at 10:35 AM Ash Berlin-Taylor < a...@apache.org [
>> a...@apache.org] > wrote:
>> > It shouldn't need something that complex (or to my mind hacky) as in
>> internal DAG.
>> >
>> > The way the scheduler works now it just looks at two columns on the dag
>> (model) table called I think "next_dagrun_after" (which is the earliest
>> date that the dag run can be created, and "next execution date" (which is
>> the value to put in the execution date of the dag run when it's created.
>> >
>> > Both these values are set by the dag parser process, which has full
>> access to run code. What ever interface for defining new schedule
>> expression should run in the existing process, much like how James C did in
>> a subclass.
>> >
>> > Ash
>> >
>> > On 21 January 2021 18:21:58 GMT, Daniel Imberman <
>> daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote: I think
>> James Idea sounds like a pretty good idea. What would you all think of us
>> doing something similar to how we handle smart sensors for how we implement
>> this? Have an internal DAG that reads all custom timetables and triggers a
>> DAG if the function returns True? Seems like a pretty simple/customizeable
>> solution.
>> > On Wed, Jan 20, 2021 at 5:52 PM, James Timmins < ja...@astronomer.io [
>> ja...@astronomer.io] > wrote:
>> > Django provides a really good model for allowing users to customize the
>> behavior of Class Based Views. It's in line w/ what Daniel/Kaxil and co are
>> saying about a consistent backend class. It uses a standard base class as
>> well as a default concrete implementation. Customization then only requires
>> setting an explicit class if you're overriding the default.
>> > Seems that the interface is more important than the backend mechanism
>> to make this work. There are multiple ways to make this work internally,
>> but the interface should be in line with future plans for hooks/extensible
>> areas.
>> > Just to make things concrete, here's my understanding of what that
>> would look like / what they're suggesting.
>> > BaseTimetable abstract class - Defines a ` get_next_execution_time `
>> method. This method accepts one argument, an arbitrary datetime value.
>> Based on that datetime, this method returns the next time the DAG should
>> start. This makes it easy to schedule past events, and also makes it easy
>> to print out a "dry run" of execution times for testing purposes. - Defines
>> a '_check_timetable_arguments ` method that looks for any existing
>> timetable args in the DAG and makes sure they're used by whatever Timetable
>> class is selected. Error checking.
>> > CronTimetable - Default TimetableClass. Built on BaseTimetable.
>> > If they want a different timetable, they can just extend BaseTimetable
>> and define a custom `get_next_execution_time` class. Then pass the class
>> into the DAG constructor under the `timetable_class` argument. So for
>> `sunset` or `sunrise`, they could easily create a `SolarTimetable` class
>> and pass that in.
>> > `get_next_execution_time` can then be called whenever DAGs are parsed
>> or whenever tasks run.
>> > On Wed, Jan 20, 2021 at 3:53 PM James Coder < jcode...@gmail.com [
>> jcode...@gmail.com] > wrote:
>> > Kaxil you beat me to it. I actually have a dag where I achieve an
>> irregular schedule by overriding DAG.next [http://DAG.next]
>> _dagrun_info(). If that method were swapped out for an object it may be a
>> semi-easy way to make the schedule “plugable”.
>> >
>> > James Coder
>> > On Jan 20, 2021, at 6:37 PM, Kaxil Naik < kaxiln...@gmail.com [
>> kaxiln...@gmail.com] > wrote:
>> >
>> > "CronBackend" / "ScheduleIntervalBackend" :D similar to Xcom and
>> Secrets Backend
>> > Would be definitely good to have Custom Schedule intervals using
>> functions/class that is Serializable too.
>> >
>> > On Wed, Jan 20, 2021 at 11:02 PM QP Hou <q...@scribd.com.invalid> wrote:
>> > On Wed, Jan 20, 2021 at 10:22 AM Daniel Imberman
>> > < daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote:
>> > >
>> > > I love the idea of allowing users to create their own scheduling
>> objects/scheduling python functions. They could either live in the
>> scheduler or as a seperate process that trips some value in the DB when it
>> is “true”. Would be great from a “marketplace” standpoint as well as users
>> could post their custom scheduling objects for others to use.
>> > >
>> >
>> > I like this idea as well, a quick escape patch for custom and complex
>> > scheduling behaviors without having to wait for upstream support.
>>
>

Reply via email to