What about (aliasing) execution_date to period_start, and next_execution_date to period_end? Would this help any do we think?
(Though things like ds and ts might still be confusing? This is probably where the OP got the idea for run_stamped from? One step at a time.) Ash On 27 September 2018 20:42:07 BST, George Leslie-Waksman <waks...@gmail.com> wrote: >I would like to challenge the notion that "execution_date" is well >documented. Looking at airflow.apache.org right now and searching for >all >references to "execution_date", I find that the only definition of >execution_date is, "The execution date of the DAG". There are some >other >passing references that imply more but nothing explicit. > >From the documentation, as currently published, it seems reasonable to >expect some concurrence between "execution_date" and when a dag >executes, >especially given the heavy repetition of, "execution_date - The >execution >date of the DAG". > >Personally, I think the problem is the word "execution", not with which >bound is used to label/define an interval. I think this is especially >difficult for people coming to Airflow with a cron background who are >not >necessarily thinking about intervals. > >On Thu, Sep 27, 2018 at 11:23 AM Brian Greene < >br...@heisenbergwoodworking.com> wrote: > >> Second use of “inane” on this subject. Brilliant, less combative >response >> Chris. >> >> There’s another point.. left bound makes sense to some people, right >bound >> to others. >> >> There’s no way to know or measure how “hard” this is to new users, so >even >> if the change was made - new name, use right bound... how can you be >sure >> you’re not actually confusing a LARGER number of new users from that >point >> on. >> >> It’s like left handed versus right handed people, except there’s no >> statistical basis for your argument that one group is larger than the >> other, or that there would actually be a measurable uptick in >understanding >> and usability across the ENTIRE user community. >> >> So your proposal 100% breaks backwards compatibility of code AND >concept, >> on anecdotal evidence that it would somehow make usage magically >easier? >> >> Airflow is like a bulldozer made out of scalpels that can fly(not >well, >> but it’s possible). A slick dag can accomplish a staggering amount >of work >> with the smallest little bit of elegant code. Learning to “think in >> airflow” though is so, so much more than understanding execution >date. >> That’s barely table stakes in terms of concepts you’ll need to accept >to be >> effective with airflow. >> >> Maybe somebody just has a thing against lefty’s? Some kind of >> left-bound-thinking conspiracy? >> >> Sent from a device with less than stellar autocorrect >> >> > On Sep 27, 2018, at 12:56 PM, Chris Palmer <ch...@crpalmer.com> >wrote: >> > >> > While taking a step back makes some sense, we also need to identify >what >> > the issue is. Simply saying 'execution_date behavior is confusing >to new >> > users' isn't good enough. What is confusing about it? Is it what it >> > represents, or just the name itself? >> > >> > There are a number of different timestamps that might be of >interest, >> > including (but not limited to): >> > >> > *Identifying timestamp* >> > For any time interval, there are two natural choices of timestamps >to >> > represent that interval, the left and right bounds. For Airflow the >left >> > bound has been chosen, and is called execution_date. For various >> reasons, I >> > think that makes a much better choice than the right bound. >> > >> > *Create/update/delete timestamps* >> > Timestamps representing when particular database records where >created, >> > updated and or deleted. I don't believe that Airflow currently >records >> > these. >> > >> > *Runtime timestamps* >> > The timestamps that a task or other process started and stopped. >Airflow >> > records these for Tasks, but I think the implementation is maybe a >little >> > lacking for DagRuns. >> > >> > >> > So what's the confusion with execution_date? Is it what it >represents or >> > the name itself? >> > >> > I think part of the learning curve with Airflow is understanding >that >> > execution_date is the left bound of the interval. No matter what >name you >> > use for the identifying timestamp I think new users will need to >learn >> what >> > that choice means. Changing the name won't magically make all the >> confusion >> > go away. >> > >> > While I don't think execution_date is the greatest name in the >world, >> it's >> > a lot better than the suggested alternative run_stamped. Tasks also >have >> an >> > identifying timestamp, and if I saw run_stamped on a Task I would >have no >> > idea what it means (stamped by what?). >> > >> > While there may be better names than execution_date, I don't think >they >> are >> > so much better that it is worth the effort to overhaul such an >integral >> > part of Airflow. Maybe some improvements to the documentation could >be >> > made, but nothing so drastic as to renaming such a core item. >> > >> > >> > As for the second suggestion to add "a new variable which indicated >the >> > actual datetime when the DAG run was generated. call it >> > execution_start_date". It is very unclear what the desired outcome >is >> with >> > this. >> > >> > To me "generated" implies creation time, i.e. recorded in the >database. >> > However, creation of a DagRun record in the database is a distinct >event >> > from when Tasks associated with that DagRun start executing. Plus >DagRuns >> > themselves don't actually "run" - Tasks are the only thing that >really >> gets >> > run by Airflow. >> > >> > What is actually desired here? >> > - The right bound of the schedule interval? >> > - The time the DagRun was created? >> > - The time that any Tasks associated with a DagRun were first >considered >> > by the scheduler? >> > - The time that any Tasks associated with a DagRun were first >scheduled? >> > - The time that any Tasks associated with a DagRun were actually >started >> > by a worker? >> > >> > >> > The lack of clarity and completeness around these suggestions, >alongside >> > inane declarations like "This name won't cause people to get >confused" is >> > hardly a good way to get people to take suggestions seriously. >> > >> > Chris >> > >> > >> > On Wed, Sep 26, 2018 at 7:37 PM George Leslie-Waksman ><waks...@gmail.com >> > >> > wrote: >> > >> >> This comes up a lot. I've seen it on this mailing list multiple >times >> and >> >> it's something that I have to explicitly call out to every single >person >> >> that I've helped train up on Airflow. >> >> >> >> If we take a moment to set aside why things are the way they are, >what >> the >> >> documentation says, and how experienced users feel things should >behave; >> >> there still remains the fact that a lot of new users get confused >by how >> >> "execution_date" works. >> >> >> >> Whether it's a problem, whether we need to do something, and what >we >> could >> >> do are all separate questions but I think it's important that we >> >> acknowledge and start from: >> >> >> >> A lot of new users get confused by how "execution_date" works. >> >> >> >> I recognize that some of this is a learning curve issue and some >of >> this is >> >> a mindset issue but it begs the question: do enough users benefit >from >> the >> >> current structure to justify the harm to new users? >> >> >> >> --George >> >> >> >> On Wed, Sep 26, 2018 at 1:40 PM Brian Greene < >> >> br...@heisenbergwoodworking.com> wrote: >> >> >> >>> It took a minute to grok, but in the larger context of how af >works it >> >>> makes perfect sense the way it is. Changing something so >fundamentally >> >>> breaking to every dag in existence should bring a comparable >benefit. >> >>> Beyond the avoiding teaching a concept you disagree with, what >benefits >> >>> does the proposal bring to offset the cost of change? >> >>> >> >>> I’m gonna make a meme - “do you even airflow bro?” >> >>> >> >>> Sent from a device with less than stellar autocorrect >> >>> >> >>>> On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin < >> >>> maximebeauche...@gmail.com> wrote: >> >>>> >> >>>> I think if you have a functional mindset (as in "functional data >> >>> engineering >> >>>> < >> >>> >> >> >> >https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a >> >>>> ") >> >>>> as opposed to a cron mindset, using the left bound of the time >> interval >> >>>> makes a lot of sense. Things like your daily table partition >keys >> align >> >>>> with your Airflow execution_date. >> >>>> >> >>>> The main thing is that whatever we do we cannot break backwards >> >>>> compatibility. Offering both views (left bound/right bound), as >it's >> >> been >> >>>> proposed before, either as an environment setting or a user >personal >> >>>> preference is even more confusing to me personally. Users would >have >> to >> >>>> switch context as they help each other or change environments. >> >>>> >> >>>> Also note that your intuition may differ from other people's >> intuition, >> >>> and >> >>>> that "unlearning" something is way harder than learning >something. >> >>>> >> >>>> My personal take on this is to make this a rite of passage. This >is >> >> just >> >>>> one of the many thing you have to learn when learning Airflow. >> >>>> >> >>>> Max >> >>>> >> >>>>> On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin ><hussam.ela...@gmail.com> >> >>> wrote: >> >>>>> >> >>>>> Hi Bolke >> >>>>> >> >>>>> Speaking as a consultant who is constantly training other teams >how >> to >> >>> use >> >>>>> airflow, I do frequently see this confusion. >> >>>>> Another one is how the batch_date is always batch_date + >interval or >> >> as >> >>> the >> >>>>> docs make it quite clear >> >>>>> >> >>>>> "*Let’s Repeat That* The scheduler runs your job one >> schedule_interval >> >>>>> AFTER >> >>>>> the start date, at the END of the period." >> >>>>> >> >>>>> Renaming it would make it simpler for newbies, but essentially >they >> >> will >> >>>>> need to understand how Airflow behaves, execution_date being >the >> batch >> >>>>> execution date not the run_date of the DAG >> >>>>> >> >>>>> I am actually in the process of writing a blog post >> >>>>> < >> >> >https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/> >> >>>>> about this which I could use peoples feedback >> >>>>> >> >>>>> If it helps, I find that explaining how backfills work and why >they >> >> are >> >>>>> important will drive home what the execution_date is :) >> >>>>> >> >>>>> >> >>>>> Regards >> >>>>> Sam >> >>>>> >> >>>>> >> >>>>> >> >>>>>> On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin ><bdbr...@gmail.com> >> >>> wrote: >> >>>>>> >> >>>>>> I dont think this makes sense and I dont that think anyone had >a >> real >> >>>>>> issue with this. Execution date has been clearly documented >and is >> >>> part >> >>>>> of >> >>>>>> the core principles of airflow. Renaming will create more >confusion. >> >>>>>> >> >>>>>> Please note that I do think that as an anonymous user you >cannot >> >> speak >> >>>>> for >> >>>>>> any "new airflow user". That is a contradiction to me. >> >>>>>> >> >>>>>> Thanks >> >>>>>> Bolke >> >>>>>> >> >>>>>> Sent from my iPhone >> >>>>>> >> >>>>>>> On 26 Sep 2018, at 07:59, airflowuser ><airflowu...@protonmail.com >> >>>>> .INVALID> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> One of the most annoying, hard to understand and against all >common >> >>>>>> sense is the execution_date behavior. I assume that any new >Airflow >> >>> user >> >>>>>> has been struggling with it. >> >>>>>>> The amount of questions with answers referring to : >> >>>>>> https://airflow.apache.org/scheduler.html?scheduling-triggers >is >> >>>>>> uncountable. >> >>>>>>> >> >>>>>>> Most people mistakenly think that execution_date is the >datetime >> >> which >> >>>>>> the DAG started to run. >> >>>>>>> >> >>>>>>> I suggest the following changes: >> >>>>>>> 1. Renaming the execution_date to something else like: >run_stamped >> >>>>>> This name won't cause people to get confused. >> >>>>>>> 2. Adding a new variable which indicated the actual datetime >when >> >> the >> >>>>>> DAG run was generated. call it execution_start_date. People >seem to >> >>> want >> >>>>>> the information when the DAG actually started to be >executed/run. >> >>>>>>> >> >>>>>>> This is only naming changes. No need to actual change the >behavior >> - >> >>>>>> This will only make things simpler as when user encounter >> >> run_stamped >> >>>>> he >> >>>>>> won't be confused by the name like execution_date >> >>>>>> >> >>>>> >> >>> >> >> >>