For my problem, and the one mentioned earlier for those of us in the financial world dealing with holidays this could be a solid solution. For my example below you could derive DAG and add a max_interval property that is a timedelta and if the delta between dttm and the value coming out of following/previous schedule is greater than that property, return dttm + max_interval. You might actually be able to do it without adding a property and just look at the delta for other runs and derive it from that. For the holidays, one could probably just check if the return value of a super() following/previous schedule is in your holiday list, and then just super following/previous_schedule again until it’s not a holiday.
While these are somewhat orthogonal to whether this PR should be merged, it is a helpful conversation for dealing with funky scheduling logic. Thanks for the idea Max! -James > On Sep 6, 2019, at 1:40 AM, Maxime Beauchemin <maximebeauche...@gmail.com> > wrote: > > Just had a thought and looked a tiny bit at the source code to assess > feasibility, but it seems like you could just derive the DAG class and > override `previous_schedule` and `following_schedule` methods. The > signature of both is you get a `datetime.datetime` and have to return > another. It's pretty easy to put your arbitrarily complex logic in there. > > There may be a few hiccups to sort out things like like > `airflow.utils.dates.date_range` (where duplicated time-step logic exist) > to make sure that all time-step logic aligns with these two methods I just > mentioned, but from that point it could be become the official way to > incorporate funky date-step logic. > > Max > > On Wed, Sep 4, 2019 at 12:54 PM Daniel Standish <dpstand...@gmail.com> > wrote: > >> Re: >> >>> For example, if I need to run a DAG every 20 minutes between 8 AM and 4 >>> PM... >> >> >> This makes a lot of sense! Thank you for providing this example. My >> initial thought of course is "well can't you just set it to run */20 >> between 7:40am and 3:40pm," but I don't think that is possible in cron. >> Which is why you have to do hacky shit as you've said and it indeed sounds >> terrible. I never had to achieve a schedule like this, and yeah -- it >> should not be this hard. >> >> Re: >> >>> I can’t see how adding a property to Dagrun that is essentially >>> identical to next_execution_date would add any benefit. >> >> That's why i was like what the hell is the point of this thing! I thought >> it was just purely cosmetic, so that in effect "execution_date" would >> optionally mean "run_date". >> >> >> >> >> >> >> >>> On Wed, Sep 4, 2019 at 12:10 PM James Coder <jcode...@gmail.com> wrote: >>> >>> I can’t see how adding a property to Dagrun that is essentially >>> identical to next_execution_date would add any benefit. The way I see >>> it the issue at hand here is not the availability of dates. There are >>> plenty of options in the template context for dates before and after >>> execution date. My view point is the problem this is trying to solve >>> is that waiting until the right edge of an interval has passed to >>> schedule a dag run has some shortcomings. Mainly that if your >>> intervals vary in length you are forced to put scheduling logic that >>> should reside in the scheduler in your DAGs. For example, if I need to >>> run a DAG every 20 minutes between 8 AM and 4 PM, in it's current >>> form, the scheduler won't schedule that 4PM run until 8 AM the next >>> day. "Just use next_execution_date" you say, well that's all well and >>> good between 8AM and 3:40 PM, but when 4:01 PM rolls around and you >>> don't have the results because they won't be available until after 8 >>> the next day, that doesn't sound so good, does it? In order to work >>> around this, you have to add additional runs and short circuit >>> operators over and over. It's a hassle. Allowing for scheduling dags >>> at the left edge of an interval and allowing it to behave more like >>> cron, where it runs at the time specified, not schedule + interval, >>> would make things much less complicated for users like myself that >>> can't always wait until the right edge of the interval. >>> >>> >>> James Coder >>> >>>> On Sep 3, 2019, at 11:14 PM, Daniel Standish <dpstand...@gmail.com> >>> wrote: >>>> >>>> What if we merely add a property "run_date" to DagRun? At present >>>> this would be essentially same as "next_execution_date". >>>> >>>> Then no change to scheduler would be required, and no new dag parameter >>> or >>>> config. Perhaps you could add a toggle to the DAGs UI view that lets >> you >>>> choose whether to display "last run" by "run_date" or "execution_date". >>>> >>>> If you want your dags to be parameterized by the date when they meant >> to >>> be >>>> run -- as opposed to their implicit interval-of-interest -- then you >> can >>>> reference "run_date". >>>> >>>> One potential source of confusion with this is backfilling: what does >>>> "run_date" mean in the context of a backfill? You could say it means >>>> essentially "initial run date", i.e. "do not run before date", i.e. >> "run >>>> after date" or "run-at date". So, for a daily, job the 2019-01-02 >>>> "run_date" corresponds to a 2019-01-01 execution_date. This makes >> sense >>>> right? >>>> >>>> Perhaps in the future, the relationship between "run_date" and >>>> "execution_date" can be more dynamic. Perhaps in the future we rename >>>> "execution_date" for clarity, or to be more generic. But it makes >> sense >>>> that a dag run will always have a run date, so it doesn't seem like a >>>> terrible idea to add a property representing this. >>>> >>>> Would this meet the goals of the PR? >>>> >>>> >>>> >>>> >>>> On Wed, Aug 28, 2019 at 11:50 AM James Meickle >>>> <jmeic...@quantopian.com.invalid> wrote: >>>> >>>>> Totally agree with Daniel here. I think that if we implement this >>> feature >>>>> as proposed, it will actively discourage us from implementing a better >>>>> data-aware feature that would remain invisible to most users while >>> neatly >>>>> addressing a lot of edge cases that currently require really ugly >>> hacks. I >>>>> believe that having more data awareness features in Airflow (like the >>> data >>>>> lineage work, or other metadata integrations) is worth investing in if >>> we >>>>> can do it without too much required user-facing complexity. The >> Airflow >>>>> project isn't a full data warehouse suite but it's also not just "cron >>> with >>>>> a UI", so we should try to be pragmatic and fit in power-user features >>>>> where we can do so without compromising the project's overall goals. >>>>> >>>>> On Wed, Aug 28, 2019 at 2:24 PM Daniel Standish <dpstand...@gmail.com >>> >>>>> wrote: >>>>> >>>>>> I am just thinking there is the potential for a more comprehensive >>>>>> enhancement here, and I worry that this is a band-aid that, like all >>> new >>>>>> features has the potential to constrain future options. It does not >>> help >>>>>> us to do anything we cannot already do. >>>>>> >>>>>> The source of this problem is that scheduling and >> interval-of-interest >>>>> are >>>>>> mixed together. >>>>>> >>>>>> My thought is there may be a way to separate scheduling and >>>>>> interval-of-interest to uniformly resolve "execution_date" vs >>> "run_date" >>>>>> confusion. We could make *explicit* instead of *implicit* the >>>>> relationship >>>>>> between run_date *(not currently a concept in airflow)* and >>>>>> "interval-of-interest" *(currently represented by execution_date)*. >>>>>> >>>>>> I also see in this the potential to unlock some other improvements: >>>>>> * support a greater diversity of incremental processes >>>>>> * allow more flexible backfilling >>>>>> * provide better views of data you have vs data you don't. >>>>>> >>>>>> The canonical airflow job is date-partitioned idempotent data pull. >>> Your >>>>>> interval of interest is from execution_date to execution_date + 1 >>>>>> interval. Schedule_interval is not just the scheduling cadence but >> it >>> is >>>>>> also your interval-of-interest partition function. If that doesn't >>> work >>>>>> for your job, you set catchup=False and roll your own. >>>>>> >>>>>> What if there was a way to generalize? E.g. could we allow for more >>>>>> flexible partition function that deviated from scheduler cadence? >> E.g. >>>>>> what if your interval-of-interest partitions could be governed by >> "min >>> 1 >>>>>> day, max 30 days". Then on on-going basis, your daily loads would >> be a >>>>>> range of 1 day but then if server down for couple days, this could be >>>>>> caught up in one task and if you backfill it could be up to 30-day >>>>> batches. >>>>>> >>>>>> Perhaps there is an abstraction that could be used by a greater >>> diversity >>>>>> of incremental processes. Such a thing could support a nice "data >>>>>> contiguity view". I imagine a horizontal bar that is solid where we >>> have >>>>>> the data and empty where we don't. Then you click on a "missing" >>> section >>>>>> and you can trigger a backfill task with that date interval >> according >>> to >>>>>> your partitioning rules. >>>>>> >>>>>> I can imagine using this for an incremental job where each time we >> pull >>>>> the >>>>>> new data since last time; in the `execute` method the operator could >>> set >>>>>> `self.high_watermark` with the max datetime processed. Or maybe a >>>>> callback >>>>>> function could be used to gather this value. This value could be >> used >>> in >>>>>> next run, and cold be depicted in a view. >>>>>> >>>>>> Default intervals of interest could be status quo -- i.e. partitions >>>>> equal >>>>>> to schedule interval -- but could be overwritten using templating or >>>>>> callbacks or setting it during `execute`. >>>>>> >>>>>> So anyway, I don't have a master plan all figured out. But I think >>> there >>>>>> is opportunity in this area for more comprehensive enhancement that >>> goes >>>>>> more directly at the root of the problem. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 27, 2019 at 10:00 AM Maxime Beauchemin < >>>>>> maximebeauche...@gmail.com> wrote: >>>>>> >>>>>>> How about an alternative approach that would introduce 2 new keyword >>>>>>> arguments that are clear (something like, but maybe better than >>>>>>> `period_start_dttm`, `period_end_dttm`) and leave `execution_date` >>>>>>> unchanged, but plan it's deprecation. As a first step >> `execution_date` >>>>>>> would be inferred from the new args, and warn about deprecation when >>>>>> used. >>>>>>> >>>>>>> Max >>>>>>> >>>>>>> On Tue, Aug 27, 2019 at 9:26 AM Bolke de Bruin <bdbr...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>>> Execution date is execution date for a dag run no matter what. >> There >>>>> is >>>>>>> no >>>>>>>> end interval or start interval for a dag run. The only time this is >>>>>>>> relevant is when we calculate the next or previous dagrun. >>>>>>>> >>>>>>>> So I don't Daniels rationale makes sense (?) >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On 27 Aug 2019, at 17:40, Philippe Gagnon <philgagn...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> I agree with Daniel's rationale but I am also worried about >>>>> backwards >>>>>>>>> compatibility as this would perhaps be the most disruptive >> breaking >>>>>>>> change >>>>>>>>> possible. I think maybe we should write down the different options >>>>>>>>> available to us (AIP?) and call for a vote. What does everyone >>>>> think? >>>>>>>>> >>>>>>>>>> On Tue, Aug 27, 2019 at 9:25 AM James Coder <jcode...@gmail.com> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Can't execution date can already mean different things depending >>>>> on >>>>>> if >>>>>>>> the >>>>>>>>>> dag run was initiated via the scheduler or manually via command >>>>>>>> line/API? >>>>>>>>>> I agree that making it consistent might make it easier to explain >>>>> to >>>>>>> new >>>>>>>>>> users, but should we exchange that for breaking pretty much every >>>>>>>> existing >>>>>>>>>> dag by re-defining what execution date is? >>>>>>>>>> -James >>>>>>>>>> >>>>>>>>>> On Mon, Aug 26, 2019 at 11:12 PM Daniel Standish < >>>>>>> dpstand...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> To Daniel’s concerns, I would argue this is not a change to >>>>> what a >>>>>>> dag >>>>>>>>>>> run >>>>>>>>>>>> is, it is rather a change to WHEN that dag run will be >>>>> scheduled. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Execution date is part of the definition of a dag_run; it is >>>>>> uniquely >>>>>>>>>>> identified by an execution_date and dag_id. >>>>>>>>>>> >>>>>>>>>>> When someone asks what is a dag_run, we should be able to >> provide >>>>>> an >>>>>>>>>>> answer. >>>>>>>>>>> >>>>>>>>>>> Imagine trying to explain what a dag run is, when execution_date >>>>>> can >>>>>>>> mean >>>>>>>>>>> different things. >>>>>>>>>>> Admin: "A dag run is an execution_date and a dag_id". >>>>>>>>>>> New user: "Ok. Clear as a bell. What's an execution_date?" >>>>>>>>>>> Admin: "Well, it can be one of two things. It *could* be when >>>>>> the >>>>>>>>>> dag >>>>>>>>>>> will be run... but it could *also* be 'the time when dag should >>>>> be >>>>>>> run >>>>>>>>>>> minus one schedule interval". It depends on whether you choose >>>>>> 'end' >>>>>>>> or >>>>>>>>>>> 'start' for 'schedule_interval_edge.' If you choose 'start' >> then >>>>>>>>>>> execution_date means 'when dag will be run'. If you choose >> 'end' >>>>>>> then >>>>>>>>>>> execution_date means 'when dag will be run minus one interval.' >>>>> If >>>>>>> you >>>>>>>>>>> change the parameter after some time, then we don't necessarily >>>>>> know >>>>>>>> what >>>>>>>>>>> it means at all times". >>>>>>>>>>> >>>>>>>>>>> Why would we do this to ourselves? >>>>>>>>>>> >>>>>>>>>>> Alternatively, we can give dag_run a clear, unambiguous meaning: >>>>>>>>>>> * dag_run is dag_id + execution_date >>>>>>>>>>> * execution_date is when dag will be run (notwithstanding >>>>> scheduler >>>>>>>>>> delay, >>>>>>>>>>> queuing) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Execution_date is defined as "run-at date minus 1 interval". >> The >>>>>>>>>>> assumption in this is that you tasks care about this particular >>>>>> date. >>>>>>>>>>> Obviously this makes sense for some tasks but not for others. >>>>>>>>>>> >>>>>>>>>>> I would prop >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Sat, Aug 24, 2019 at 5:08 AM James Coder < >> jcode...@gmail.com >>>>>> >>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I think this is a great improvement and should be merged. To >>>>>>> Daniel’s >>>>>>>>>>>> concerns, I would argue this is not a change to what a dag run >>>>> is, >>>>>>> it >>>>>>>>>> is >>>>>>>>>>>> rather a change to WHEN that dag run will be scheduled. >>>>>>>>>>>> I had implemented a similar change in my own version but >>>>>> ultimately >>>>>>>>>>> backed >>>>>>>>>>>> so I didn’t have to patch after each new release. In my opinion >>>>>> the >>>>>>>>>> main >>>>>>>>>>>> flaw in the current scheduler, and I have brought this up >>>>> before, >>>>>> is >>>>>>>>>> when >>>>>>>>>>>> you don’t have a consistent schedule interval (e.g. only run >>>>> M-F). >>>>>>>>>> After >>>>>>>>>>>> backing out the “schedule at interval start” I had to switch to >>>>> a >>>>>>>> daily >>>>>>>>>>>> schedule and go through and put a short circuit operator in >> each >>>>>> of >>>>>>> my >>>>>>>>>>> M-F >>>>>>>>>>>> dags to get the behavior that I wanted. This results in putting >>>>>>>>>>> scheduling >>>>>>>>>>>> logic inside the dag, when scheduling logic should be in the >>>>>>>> scheduler. >>>>>>>>>>>> >>>>>>>>>>>> -James >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Aug 23, 2019, at 3:14 PM, Daniel Standish < >>>>>> dpstand...@gmail.com >>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Re >>>>>>>>>>>>> >>>>>>>>>>>>>> What are people's feelings on changing the default execution >>>>> to >>>>>>>>>>> schedule >>>>>>>>>>>>>> interval start >>>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>>> I'm in favor of doing that, but then exposing new variables >> of >>>>>>>>>>>>>> "interval_start" and "interval_end", etc. so that people >> write >>>>>>>>>>>>>> clearer-looking at-a-glance DAGs >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> While I am def on board with the spirit of this PR, I would >>>>> vote >>>>>> we >>>>>>>>>> do >>>>>>>>>>>> not >>>>>>>>>>>>> accept this PR as is, because it cements a confusing option. >>>>>>>>>>>>> >>>>>>>>>>>>> *What is the right representation of a dag run?* >>>>>>>>>>>>> >>>>>>>>>>>>> Right now the representation is "dag run-at date minus 1 >>>>>> interval". >>>>>>>>>> It >>>>>>>>>>>>> should just be "dag run-at date". >>>>>>>>>>>>> >>>>>>>>>>>>> We don't need to address the question of whether execution >> date >>>>>> is >>>>>>>>>> the >>>>>>>>>>>>> start or the end of an interval; it doesn't matter. >>>>>>>>>>>>> >>>>>>>>>>>>> In all cases, a given dag run will be targeted for *some* >>>>> initial >>>>>>>>>>> "run-at >>>>>>>>>>>>> time"; so *that* should be the time that is part of the PK of >> a >>>>>> dag >>>>>>>>>>> run, >>>>>>>>>>>>> and *that *is the time that should be exposed as the dag run >>>>>>>>>> "execution >>>>>>>>>>>>> date" >>>>>>>>>>>>> >>>>>>>>>>>>> *Interval of interest is not a dag_run attribute* >>>>>>>>>>>>> >>>>>>>>>>>>> We also mix in this question of the date interval that the >>>>>> *tasks* >>>>>>>>>> are >>>>>>>>>>>>> interested in. But the *dag run* need not concern itself with >>>>>> this >>>>>>>>>> in >>>>>>>>>>>> any >>>>>>>>>>>>> way. That is for the tasks to figure out: if they happen to >>>>> need >>>>>>>>>> "dag >>>>>>>>>>>>> run-at date," then they can reference that; if they want the >>>>>> prior >>>>>>>>>> one, >>>>>>>>>>>> ask >>>>>>>>>>>>> for the prior one. >>>>>>>>>>>>> >>>>>>>>>>>>> Previously, I was in the camp that thought it was a great idea >>>>> to >>>>>>>>>>> rename >>>>>>>>>>>>> "execution_date" to "period_start" or "interval_start". But I >>>>>> now >>>>>>>>>>> think >>>>>>>>>>>>> this is folly. It invokes this question of the "interval of >>>>>>>>>> interest" >>>>>>>>>>> or >>>>>>>>>>>>> "period of interest". But the dag doesn't need to know >>>>> anything >>>>>>>>>> about >>>>>>>>>>>>> that. >>>>>>>>>>>>> >>>>>>>>>>>>> Within the same dag you may have tasks with different >> intervals >>>>>> of >>>>>>>>>>>>> interest. So why make assumptions in the dag; just give the >>>>>> facts: >>>>>>>>>>> this >>>>>>>>>>>> is >>>>>>>>>>>>> my run date; this is the prior run date, etc. It would be a >>>>>>>>>> regression >>>>>>>>>>>>> from the perspective of providing accurate names. >>>>>>>>>>>>> >>>>>>>>>>>>> *Proposal* >>>>>>>>>>>>> >>>>>>>>>>>>> So, I would propose we change "execution_date" to mean "dag >>>>>> run-at >>>>>>>>>>> date" >>>>>>>>>>>> as >>>>>>>>>>>>> opposed to "dag run-at date minus 1". But we should do so >>>>>> without >>>>>>>>>>>>> reference to interval end or interval start. >>>>>>>>>>>>> >>>>>>>>>>>>> *Configurability* >>>>>>>>>>>>> >>>>>>>>>>>>> The more configuration options we have, the more noise there >> is >>>>>> as >>>>>>> a >>>>>>>>>>> user >>>>>>>>>>>>> trying to understand how to use airflow, so I'd rather us not >>>>>> make >>>>>>>>>> this >>>>>>>>>>>>> configurable at all. >>>>>>>>>>>>> >>>>>>>>>>>>> That said, perhaps a more clear and more explicit means making >>>>>> this >>>>>>>>>>>>> configurable would be to define an integer param >>>>>>>>>>>>> "dag_run_execution_date_interval_offset", which would control >>>>> how >>>>>>>>>> many >>>>>>>>>>>>> intervals back from actual "dag run-at date" the "execution >>>>> date" >>>>>>>>>>> should >>>>>>>>>>>>> be. (current behavior = 1, new behavior = 0). >>>>>>>>>>>>> >>>>>>>>>>>>> *Side note* >>>>>>>>>>>>> >>>>>>>>>>>>> Hopefully not to derail discussion: I think there are >>>>> additional, >>>>>>>>>>> related >>>>>>>>>>>>> task attributes that may want to come into being: namely, >>>>>>>>>> low_watermark >>>>>>>>>>>> and >>>>>>>>>>>>> high_watermark. There is the potential, with attributes like >>>>>> this, >>>>>>>>>> for >>>>>>>>>>>>> adding better out-of-the-box support for common data workflows >>>>>> that >>>>>>>>>> we >>>>>>>>>>>> now >>>>>>>>>>>>> need to use xcom for, namely incremental loads. But I want to >>>>>> give >>>>>>>>>> it >>>>>>>>>>>> more >>>>>>>>>>>>> thought before proposing anything specific. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Aug 23, 2019 at 9:42 AM Jarek Potiuk < >>>>>>>>>> jarek.pot...@polidea.com >>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Good one Damian. I will have a list of issues that can be >>>>>> possible >>>>>>>>>> to >>>>>>>>>>>>>> handle at the workshop, so that one goes there. >>>>>>>>>>>>>> >>>>>>>>>>>>>> J. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Principal Software Engineer >>>>>>>>>>>>>> Phone: +48660796129 >>>>>>>>>>>>>> >>>>>>>>>>>>>> pt., 23 sie 2019, 11:09 użytkownik Shaw, Damian P. < >>>>>>>>>>>>>> damian.sha...@credit-suisse.com> napisał: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I can't understate what a conceptual improvement this would >>>>> be >>>>>>> for >>>>>>>>>>> the >>>>>>>>>>>>>> end >>>>>>>>>>>>>>> users of Airflow in our environment. I've written a lot of >>>>> code >>>>>>> so >>>>>>>>>>> all >>>>>>>>>>>>>> our >>>>>>>>>>>>>>> configuration works like this anyway. But the UI still shows >>>>>> the >>>>>>>>>>>> Airflow >>>>>>>>>>>>>>> dates which still to this day sometimes confuse me. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'll be at the NY meet ups on Monday and Tuesday, maybe some >>>>> of >>>>>>> my >>>>>>>>>>>> first >>>>>>>>>>>>>>> PRs could be additional test cases around edge cases to do >>>>> with >>>>>>> DST >>>>>>>>>>> and >>>>>>>>>>>>>>> cron scheduling that I have concerns about :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Damian >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>>>> From: Ash Berlin-Taylor [mailto:a...@apache.org] >>>>>>>>>>>>>>> Sent: Friday, August 23, 2019 6:50 AM >>>>>>>>>>>>>>> To: dev@airflow.apache.org >>>>>>>>>>>>>>> Subject: Setting to add choice of schedule at end or >> schedule >>>>>> at >>>>>>>>>>> start >>>>>>>>>>>> of >>>>>>>>>>>>>>> interval >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This has come up a few times before, someone has now opened >> a >>>>>> PR >>>>>>>>>> that >>>>>>>>>>>>>>> makes this a global+per-dag setting: >>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/5787 and it also >>>>>> includes >>>>>>>>>>> docs >>>>>>>>>>>>>>> that I think does a good job of illustrating the two modes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Does anyone object to this being merged? If no one says >>>>>> anything >>>>>>> by >>>>>>>>>>>>>> midday >>>>>>>>>>>>>>> on Tuesday I will take that as assent and will merge it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The docs from the PR included below. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Ash >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Scheduled Time vs Execution Time >>>>>>>>>>>>>>> '''''''''''''''''''''''''''''''' >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A DAG with a ``schedule_interval`` will execute once per >>>>>>> interval. >>>>>>>>>> By >>>>>>>>>>>>>>> default, the execution of a DAG will occur at the **end** of >>>>>> the >>>>>>>>>>>>>>> schedule interval. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> A few examples: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - A DAG with ``schedule_interval='@hourly'``: The DAG run >>>>> that >>>>>>>>>>>> processes >>>>>>>>>>>>>>> 2019-08-16 17:00 will start running just after 2019-08-16 >>>>>>> 17:59:59, >>>>>>>>>>>>>>> i.e. once that hour is over. >>>>>>>>>>>>>>> - A DAG with ``schedule_interval='@daily'``: The DAG run >> that >>>>>>>>>>> processes >>>>>>>>>>>>>>> 2019-08-16 will start running shortly after 2019-08-17 >> 00:00. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The reasoning behind this execution vs scheduling behaviour >>>>> is >>>>>>> that >>>>>>>>>>>>>>> data for the interval to be processed won't be fully >>>>> available >>>>>>>>>> until >>>>>>>>>>>>>>> the interval has elapsed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In cases where you wish the DAG to be executed at the >>>>> **start** >>>>>>> of >>>>>>>>>>> the >>>>>>>>>>>>>>> interval, specify ``schedule_at_interval_end=False``, either >>>>> in >>>>>>>>>>>>>>> ``airflow.cfg``, or on a per-DAG basis. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> =============================================================================== >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please access the attached hyperlink for an important >>>>>> electronic >>>>>>>>>>>>>>> communications disclaimer: >>>>>>>>>>>>>>> >>>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> =============================================================================== >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>