This comes up a lot. I've seen it on this mailing list multiple times and it's something that I have to explicitly call out to every single person that I've helped train up on Airflow.
If we take a moment to set aside why things are the way they are, what the documentation says, and how experienced users feel things should behave; there still remains the fact that a lot of new users get confused by how "execution_date" works. Whether it's a problem, whether we need to do something, and what we could do are all separate questions but I think it's important that we acknowledge and start from: A lot of new users get confused by how "execution_date" works. I recognize that some of this is a learning curve issue and some of this is a mindset issue but it begs the question: do enough users benefit from the current structure to justify the harm to new users? --George On Wed, Sep 26, 2018 at 1:40 PM Brian Greene < br...@heisenbergwoodworking.com> wrote: > It took a minute to grok, but in the larger context of how af works it > makes perfect sense the way it is. Changing something so fundamentally > breaking to every dag in existence should bring a comparable benefit. > Beyond the avoiding teaching a concept you disagree with, what benefits > does the proposal bring to offset the cost of change? > > I’m gonna make a meme - “do you even airflow bro?” > > Sent from a device with less than stellar autocorrect > > > On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin < > maximebeauche...@gmail.com> wrote: > > > > I think if you have a functional mindset (as in "functional data > engineering > > < > https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a > >") > > as opposed to a cron mindset, using the left bound of the time interval > > makes a lot of sense. Things like your daily table partition keys align > > with your Airflow execution_date. > > > > The main thing is that whatever we do we cannot break backwards > > compatibility. Offering both views (left bound/right bound), as it's been > > proposed before, either as an environment setting or a user personal > > preference is even more confusing to me personally. Users would have to > > switch context as they help each other or change environments. > > > > Also note that your intuition may differ from other people's intuition, > and > > that "unlearning" something is way harder than learning something. > > > > My personal take on this is to make this a rite of passage. This is just > > one of the many thing you have to learn when learning Airflow. > > > > Max > > > >> On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin <hussam.ela...@gmail.com> > wrote: > >> > >> Hi Bolke > >> > >> Speaking as a consultant who is constantly training other teams how to > use > >> airflow, I do frequently see this confusion. > >> Another one is how the batch_date is always batch_date + interval or as > the > >> docs make it quite clear > >> > >> "*Let’s Repeat That* The scheduler runs your job one schedule_interval > >> AFTER > >> the start date, at the END of the period." > >> > >> Renaming it would make it simpler for newbies, but essentially they will > >> need to understand how Airflow behaves, execution_date being the batch > >> execution date not the run_date of the DAG > >> > >> I am actually in the process of writing a blog post > >> <https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/> > >> about this which I could use peoples feedback > >> > >> If it helps, I find that explaining how backfills work and why they are > >> important will drive home what the execution_date is :) > >> > >> > >> Regards > >> Sam > >> > >> > >> > >>> On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin <bdbr...@gmail.com> > wrote: > >>> > >>> I dont think this makes sense and I dont that think anyone had a real > >>> issue with this. Execution date has been clearly documented and is > part > >> of > >>> the core principles of airflow. Renaming will create more confusion. > >>> > >>> Please note that I do think that as an anonymous user you cannot speak > >> for > >>> any "new airflow user". That is a contradiction to me. > >>> > >>> Thanks > >>> Bolke > >>> > >>> Sent from my iPhone > >>> > >>>> On 26 Sep 2018, at 07:59, airflowuser <airflowu...@protonmail.com > >> .INVALID> > >>> wrote: > >>>> > >>>> One of the most annoying, hard to understand and against all common > >>> sense is the execution_date behavior. I assume that any new Airflow > user > >>> has been struggling with it. > >>>> The amount of questions with answers referring to : > >>> https://airflow.apache.org/scheduler.html?scheduling-triggers is > >>> uncountable. > >>>> > >>>> Most people mistakenly think that execution_date is the datetime which > >>> the DAG started to run. > >>>> > >>>> I suggest the following changes: > >>>> 1. Renaming the execution_date to something else like: run_stamped > >>> This name won't cause people to get confused. > >>>> 2. Adding a new variable which indicated the actual datetime when the > >>> DAG run was generated. call it execution_start_date. People seem to > want > >>> the information when the DAG actually started to be executed/run. > >>>> > >>>> This is only naming changes. No need to actual change the behavior - > >>> This will only make things simpler as when user encounter run_stamped > >> he > >>> won't be confused by the name like execution_date > >>> > >> >