As discussed in the slack I agree with Malthe that the current timetable interface is complex. But my assessment of the situation and proposal including a bit more context and plans we had for the AIP-39 are a bit different.
TL;DR; I think it is about time to complete what we were planning in AIP-39 as "Future Enhancement" and implement a few simple timetable implementations that will handle most popular use cases (using the "complex" timetable API) that will be available to regular users to use (without the need of writing new code). My proposal is that we should define what timetables to add and aim to implement them to include them in Airflow 2.3. Sounds doable and should solve the real problem of our users. Assessment of the situation. I do not think the current interface is "too complex". Not at all. But I think that it is targeted to a different audience than Malthe and Bas talk about. It is addressed for "power users" - not only because it requires deep understanding of Airflow scheduling internals and optimizations but also, because it requires "admin" rights to develop, test and install it. Regular users. who are Dag authors cannot create new Timetables. This is mostly because of security. The "regular users" need to convince the admins to do so. And yes I am talking about the important segment of our users where you have professional admins/devops configuring Airflow and DAG authors who just write DAGs. I think this is the most interesting and biggest segment of our users to be honest. We should always think about this segment of our users first IMHO. But what I very strongly agree with - we have very limited "offering" for the "DAG authors" to be able to harness the powers of our non-cron-based-timetables. The typical ask that cannot be easily fulfilled (which I saw many cases of is (from slack discussion from Friday): *"Can someone provide me some codesamples of scheduling a job on the second to last day of every month using timetables?" * https://apache-airflow.slack.com/archives/CCPRP7943/p1645697960286899 . Currently, Airflow out-of-the box has no way of supporting that (rather typical) use case without actually becoming the "power user with admin rights''. You need to have "someone else" to provide it as a plugin that you install. This is what we miss currently (and it has been already planned as future enhancements in AIP-39 actually: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-39+Richer+scheduler_interval . What current API provides and for whom The current API is great when it comes to power users who know airflow's scheduling internals and optimizations that Ash explained. Looking at where the AIP-39 came from: * have the "versatile API" that you will be able to implement literally ANY timetable * where there are no fixed scheduled intervals * where the manual runs can co-exist with scheduled run and * where you could specify backfill range and it will figure out how many dagruns there will be in this range and run them * and allow to optimize scheduling decisions (date of next run stored in the database for easy DB queries among others I think the current API fulfills those very well and is a great "low level" API that we can build more "higher-level" implementations of Custom Timetables on. But the current API is terrible for casual DAG Authors who want to use non-cron-compatible timetables - both because of complexity and security limitations. What can we do? I think we should design and write a few (literally a few) higher level timetables addressed to be used by "regular" DAG authors without installing anything. Not many. Just a few. We could rather easily ask our users and produce a list of several timetables that will not have "cron" limitations but also will handle just a subset of "general timetable" cases. For example a Timetable that will allow the user to run for example: "-2 day of every month" (second to last day for example). Those timetables should be available in Airflow out-of-the-box. No package installation and admin permission necessary. We literally need two three such schedules and be open for user expressing their non-cron-compliant "typical" schedules and add them as needed. I do not have yet clear idea on the "UX/declarative configuration" for such timetables (but something that comes to my mind is that one of those could allow textual description of the schedule - it would be extremely cool if the users could create the schedule like "timetable="run on the second to last day of every month"). With NLP solutions out there, it should be possible because the domain of "typical" scheduling is really narrow. Maybe there are some libraries we could use for that :D. But this is just an idea, maybe we can do it differently. Those are my thoughts :). J. On Wed, Feb 23, 2022 at 4:23 PM Malthe <[email protected]> wrote: > On Wed, 23 Feb 2022 at 15:20, Ash Berlin-Taylor <[email protected]> wrote: > > > > On Wed, Feb 23 2022 at 15:17:48 +0000, Malthe <[email protected]> wrote: > > > > Backfilling is not out of scope for a timetable at all. If I run > `airflow dags backfill mydagid --start-date 2020-01-01 --end-date > 2021-06-30` how many DagRuns are created and what are logical > dates/intervals of them? > > > > If the timetable has a daily frequency, then one dagrun per day in that > interval. > > > > > > DAGs don't have a frequency. They have a timetable. They don't even have > a scheduler_interval anymore -- that gets converted to an instance of the > CronDataIntervalTimetable > > Yes, if the timetable has a daily frequency internally – that is, if > the timetable has a logic that produces dagruns spaced out daily – > then I would expect one dagrun per day in the given interval. >
