Changing terms or aliasing may both introduce another set of confusions. Refining the documentation systematically may be a more feasible solution to this sort of issues? Like having “execution_date” in “Concepts” section, or having a dedicated section named “Vocabularies” to list all potentially confusing terms?
Thanks. XD On Mon, Oct 1, 2018 at 23:51 Maxime Beauchemin <maximebeauche...@gmail.com> wrote: > I'm not against aliasing personally. > > The downside is that it creates more vocabulary overall and most users will > need to learn the mapping of the given aliases at some point in their > learning curve anyways. Only users in environments free of `execution_date` > will benefit from less confusion, and it's likely that the pre-aliased > terms will live on for perpetuity (habit + legacy code). > > I'm assuming that the scope of the aliasing would be BaseOperator, the > tutorial, examples, the web UI and CLI. If we start using `period_start` in > those user-facing locations, it creates a bit of a dissonance with the > object naming in the code base and database. Contributors will really need > to understand that aliasing, with `period_start` and `execution_date` > potentially being used interchangeably in the codebase. > > I don't think anyone is pushing for this, but I feel strongly that any > campaign to deprecate the original interface would be a giant waste of > effort and time and alienate the community as whole. > > Max > > On Sun, Sep 30, 2018 at 1:15 AM airflowuser > <airflowu...@protonmail.com.invalid> wrote: > > > Yep. > > Aliasing seems a reasonable solution that preserve the structure and make > > things simpler for new users. > > > > While I agree with everyone that learning a new technology has learning > > curve still we can see more and more theologies embrace the user friendly > > flag. > > > > > > Sent with ProtonMail Secure Email. > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Saturday, September 29, 2018 9:47 AM, <a...@apache.org> wrote: > > > > > What about (aliasing) execution_date to period_start, and > > next_execution_date to period_end? Would this help any do we think? > > > > > > (Though things like ds and ts might still be confusing? This is > probably > > where the OP got the idea for run_stamped from? One step at a time.) > > > > > > Ash > > > > > > On 27 September 2018 20:42:07 BST, George Leslie-Waksman > > waks...@gmail.com wrote: > > > > > > > I would like to challenge the notion that "execution_date" is well > > > > documented. Looking at airflow.apache.org right now and searching > for > > > > all > > > > references to "execution_date", I find that the only definition of > > > > execution_date is, "The execution date of the DAG". There are some > > > > other > > > > passing references that imply more but nothing explicit. > > > > From the documentation, as currently published, it seems reasonable > to > > > > expect some concurrence between "execution_date" and when a dag > > > > executes, > > > > especially given the heavy repetition of, "execution_date - The > > > > execution > > > > date of the DAG". > > > > Personally, I think the problem is the word "execution", not with > which > > > > bound is used to label/define an interval. I think this is especially > > > > difficult for people coming to Airflow with a cron background who are > > > > not > > > > necessarily thinking about intervals. > > > > On Thu, Sep 27, 2018 at 11:23 AM Brian Greene < > > > > br...@heisenbergwoodworking.com> wrote: > > > > > > > > > Second use of “inane” on this subject. Brilliant, less combative > > > > > response > > > > > Chris. > > > > > There’s another point.. left bound makes sense to some people, > right > > > > > bound > > > > > to others. > > > > > There’s no way to know or measure how “hard” this is to new users, > so > > > > > even > > > > > if the change was made - new name, use right bound... how can you > be > > > > > sure > > > > > you’re not actually confusing a LARGER number of new users from > that > > > > > point > > > > > on. > > > > > It’s like left handed versus right handed people, except there’s no > > > > > statistical basis for your argument that one group is larger than > the > > > > > other, or that there would actually be a measurable uptick in > > > > > understanding > > > > > and usability across the ENTIRE user community. > > > > > So your proposal 100% breaks backwards compatibility of code AND > > > > > concept, > > > > > on anecdotal evidence that it would somehow make usage magically > > > > > easier? > > > > > Airflow is like a bulldozer made out of scalpels that can fly(not > > > > > well, > > > > > but it’s possible). A slick dag can accomplish a staggering amount > > > > > of work > > > > > with the smallest little bit of elegant code. Learning to “think in > > > > > airflow” though is so, so much more than understanding execution > > > > > date. > > > > > That’s barely table stakes in terms of concepts you’ll need to > accept > > > > > to be > > > > > effective with airflow. > > > > > Maybe somebody just has a thing against lefty’s? Some kind of > > > > > left-bound-thinking conspiracy? > > > > > Sent from a device with less than stellar autocorrect > > > > > > > > > > > On Sep 27, 2018, at 12:56 PM, Chris Palmer ch...@crpalmer.com > > > > > > wrote: > > > > > > > > > > > While taking a step back makes some sense, we also need to > identify > > > > > > what > > > > > > > > > > > the issue is. Simply saying 'execution_date behavior is confusing > > > > > > to new > > > > > > > > > > > users' isn't good enough. What is confusing about it? Is it what > it > > > > > > represents, or just the name itself? > > > > > > There are a number of different timestamps that might be of > > > > > > interest, > > > > > > > > > > > including (but not limited to): > > > > > > Identifying timestamp > > > > > > For any time interval, there are two natural choices of > timestamps > > > > > > to > > > > > > > > > > > represent that interval, the left and right bounds. For Airflow > the > > > > > > left > > > > > > > > > > > bound has been chosen, and is called execution_date. For various > > > > > > reasons, I > > > > > > think that makes a much better choice than the right bound. > > > > > > Create/update/delete timestamps > > > > > > Timestamps representing when particular database records where > > > > > > created, > > > > > > > > > > > updated and or deleted. I don't believe that Airflow currently > > > > > > records > > > > > > > > > > > these. > > > > > > Runtime timestamps > > > > > > The timestamps that a task or other process started and stopped. > > > > > > Airflow > > > > > > > > > > > records these for Tasks, but I think the implementation is maybe > a > > > > > > little > > > > > > > > > > > lacking for DagRuns. > > > > > > So what's the confusion with execution_date? Is it what it > > > > > > represents or > > > > > > > > > > > the name itself? > > > > > > I think part of the learning curve with Airflow is understanding > > > > > > that > > > > > > > > > > > execution_date is the left bound of the interval. No matter what > > > > > > name you > > > > > > > > > > > use for the identifying timestamp I think new users will need to > > > > > > learn > > > > > > what > > > > > > > > > > > that choice means. Changing the name won't magically make all the > > > > > > confusion > > > > > > go away. > > > > > > While I don't think execution_date is the greatest name in the > > > > > > world, > > > > > > it's > > > > > > > > > > > a lot better than the suggested alternative run_stamped. Tasks > also > > > > > > have > > > > > > an > > > > > > > > > > > identifying timestamp, and if I saw run_stamped on a Task I would > > > > > > have no > > > > > > > > > > > idea what it means (stamped by what?). > > > > > > While there may be better names than execution_date, I don't > think > > > > > > they > > > > > > are > > > > > > > > > > > so much better that it is worth the effort to overhaul such an > > > > > > integral > > > > > > > > > > > part of Airflow. Maybe some improvements to the documentation > could > > > > > > be > > > > > > > > > > > made, but nothing so drastic as to renaming such a core item. > > > > > > As for the second suggestion to add "a new variable which > indicated > > > > > > the > > > > > > > > > > > actual datetime when the DAG run was generated. call it > > > > > > execution_start_date". It is very unclear what the desired > outcome > > > > > > is > > > > > > with > > > > > > > > > > > this. > > > > > > To me "generated" implies creation time, i.e. recorded in the > > > > > > database. > > > > > > > > > > > However, creation of a DagRun record in the database is a > distinct > > > > > > event > > > > > > > > > > > from when Tasks associated with that DagRun start executing. Plus > > > > > > DagRuns > > > > > > > > > > > themselves don't actually "run" - Tasks are the only thing that > > > > > > really > > > > > > gets > > > > > > > > > > > run by Airflow. > > > > > > What is actually desired here? > > > > > > > > > > > > - The right bound of the schedule interval? > > > > > > - The time the DagRun was created? > > > > > > - The time that any Tasks associated with a DagRun were first > > > > > > considered > > > > > > > > > > > > > > > > > by the scheduler? > > > > > > > > > > > > - The time that any Tasks associated with a DagRun were first > > > > > > scheduled? > > > > > > > > > > > > > > > > > - The time that any Tasks associated with a DagRun were > actually > > > > > > started > > > > > > > > > > > > > > > > > by a worker? > > > > > > The lack of clarity and completeness around these suggestions, > > > > > > alongside > > > > > > > > > > > inane declarations like "This name won't cause people to get > > > > > > confused" is > > > > > > > > > > > hardly a good way to get people to take suggestions seriously. > > > > > > Chris > > > > > > On Wed, Sep 26, 2018 at 7:37 PM George Leslie-Waksman > > > > > > <waks...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > This comes up a lot. I've seen it on this mailing list multiple > > > > > > > times > > > > > > > and > > > > > > > > > > > > it's something that I have to explicitly call out to every > single > > > > > > > person > > > > > > > > > > > > that I've helped train up on Airflow. > > > > > > > If we take a moment to set aside why things are the way they > are, > > > > > > > what > > > > > > > the > > > > > > > > > > > > documentation says, and how experienced users feel things > should > > > > > > > behave; > > > > > > > > > > > > there still remains the fact that a lot of new users get > confused > > > > > > > by how > > > > > > > > > > > > "execution_date" works. > > > > > > > Whether it's a problem, whether we need to do something, and > what > > > > > > > we > > > > > > > could > > > > > > > > > > > > do are all separate questions but I think it's important that > we > > > > > > > acknowledge and start from: > > > > > > > A lot of new users get confused by how "execution_date" works. > > > > > > > I recognize that some of this is a learning curve issue and > some > > > > > > > of > > > > > > > this is > > > > > > > > > > > > a mindset issue but it begs the question: do enough users > benefit > > > > > > > from > > > > > > > the > > > > > > > > > > > > current structure to justify the harm to new users? > > > > > > > --George > > > > > > > On Wed, Sep 26, 2018 at 1:40 PM Brian Greene < > > > > > > > br...@heisenbergwoodworking.com> wrote: > > > > > > > > > > > > > > > It took a minute to grok, but in the larger context of how af > > > > > > > > works it > > > > > > > > > > > > > makes perfect sense the way it is. Changing something so > > > > > > > > fundamentally > > > > > > > > > > > > > breaking to every dag in existence should bring a comparable > > > > > > > > benefit. > > > > > > > > > > > > > Beyond the avoiding teaching a concept you disagree with, > what > > > > > > > > benefits > > > > > > > > > > > > > does the proposal bring to offset the cost of change? > > > > > > > > I’m gonna make a meme - “do you even airflow bro?” > > > > > > > > Sent from a device with less than stellar autocorrect > > > > > > > > > > > > > > > > > On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin < > > > > > > > > > maximebeauche...@gmail.com> wrote: > > > > > > > > > I think if you have a functional mindset (as in "functional > > data > > > > > > > > > engineering > > > > > > > > > < > > > > > > > > > > > https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a > > > > > > > > > > > > > ") > > > > > > > > > as opposed to a cron mindset, using the left bound of the > > time > > > > > > > > > interval > > > > > > > > > > > > > > > makes a lot of sense. Things like your daily table > partition > > > > > > > > > keys > > > > > > > > > align > > > > > > > > > > > > > > with your Airflow execution_date. > > > > > > > > > The main thing is that whatever we do we cannot break > > backwards > > > > > > > > > compatibility. Offering both views (left bound/right > bound), > > as > > > > > > > > > it's > > > > > > > > > > > > been > > > > > > > > > > > > > > > > proposed before, either as an environment setting or a user > > > > > > > > > personal > > > > > > > > > > > > > > preference is even more confusing to me personally. Users > > would > > > > > > > > > have > > > > > > > > > to > > > > > > > > > > > > > > switch context as they help each other or change > > environments. > > > > > > > > > Also note that your intuition may differ from other > people's > > > > > > > > > intuition, > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > that "unlearning" something is way harder than learning > > > > > > > > > something. > > > > > > > > > > > > > > My personal take on this is to make this a rite of passage. > > This > > > > > > > > > is > > > > > > > > > > > > just > > > > > > > > > > > > > > > > one of the many thing you have to learn when learning > > Airflow. > > > > > > > > > Max > > > > > > > > > > > > > > > > > > > On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin > > > > > > > > > > hussam.ela...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Bolke > > > > > > > > > > Speaking as a consultant who is constantly training other > > teams > > > > > > > > > > how > > > > > > > > > > to > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > > airflow, I do frequently see this confusion. > > > > > > > > > > Another one is how the batch_date is always batch_date + > > > > > > > > > > interval or > > > > > > > > > > > > as > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > docs make it quite clear > > > > > > > > > > "Let’s Repeat That The scheduler runs your job one > > > > > > > > > > schedule_interval > > > > > > > > > > > > > > > > AFTER > > > > > > > > > > the start date, at the END of the period." > > > > > > > > > > Renaming it would make it simpler for newbies, but > > essentially > > > > > > > > > > they > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > need to understand how Airflow behaves, execution_date > > being > > > > > > > > > > the > > > > > > > > > > batch > > > > > > > > > > > > > > > execution date not the run_date of the DAG > > > > > > > > > > I am actually in the process of writing a blog post > > > > > > > > > > < > > > > > > > > > https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/> > > > > > > > > > > > > > > about this which I could use peoples feedback > > > > > > > > > > If it helps, I find that explaining how backfills work > and > > why > > > > > > > > > > they > > > > > > > > > > > > are > > > > > > > > > > > > > > > > > important will drive home what the execution_date is :) > > > > > > > > > > Regards > > > > > > > > > > Sam > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin > > > > > > > > > > > bdbr...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I dont think this makes sense and I dont that think > > anyone had > > > > > > > > > > > a > > > > > > > > > > > real > > > > > > > > > > > > > > > > issue with this. Execution date has been clearly > > documented > > > > > > > > > > > and is > > > > > > > > > > > > > part > > > > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > > > > > the core principles of airflow. Renaming will create > more > > > > > > > > > > > confusion. > > > > > > > > > > > > > > > > Please note that I do think that as an anonymous user > you > > > > > > > > > > > cannot > > > > > > > > > > > > speak > > > > > > > > > > > > > > > > > for > > > > > > > > > > > > > > > > > > > > > any "new airflow user". That is a contradiction to me. > > > > > > > > > > > Thanks > > > > > > > > > > > Bolke > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > > > > > > > > On 26 Sep 2018, at 07:59, airflowuser > > > > > > > > > > > > <airflowu...@protonmail.com > > > > > > > > > > > > > > > .INVALID> > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > One of the most annoying, hard to understand and > > against all > > > > > > > > > > > > common > > > > > > > > > > > > > > > > sense is the execution_date behavior. I assume that any > > new > > > > > > > > > > > Airflow > > > > > > > > > > > > > user > > > > > > > > > > > > > > > > > > > has been struggling with it. > > > > > > > > > > > > > > > > > > > > > > > The amount of questions with answers referring to : > > > > > > > > > > > > > > https://airflow.apache.org/scheduler.html?scheduling-triggers > > > > > > > > > > > > is > > > > > > > > > > > > > > > > uncountable. > > > > > > > > > > > > > > > > > > > > > > > Most people mistakenly think that execution_date is > the > > > > > > > > > > > > datetime > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > the DAG started to run. > > > > > > > > > > > > > > > > > > > > > > > I suggest the following changes: > > > > > > > > > > > > > > > > > > > > > > > > 1. Renaming the execution_date to something else > like: > > > > > > > > > > > > run_stamped > > > > > > > > > > > > > > > > > > > > > > > > > > > > This name won't cause people to get confused. > > > > > > > > > > > > > > > > > > > > > > > 2. Adding a new variable which indicated the actual > > datetime > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > DAG run was generated. call it execution_start_date. > > People > > > > > > > > > > > seem to > > > > > > > > > > > > > want > > > > > > > > > > > > > > > > > > > the information when the DAG actually started to be > > > > > > > > > > > executed/run. > > > > > > > > > > > > > > > > > This is only naming changes. No need to actual change > > the > > > > > > > > > > > > behavior > > > > > > > > > > - > > > > > > > > > > > > > > > > This will only make things simpler as when user > encounter > > > > > > > > > > > run_stamped > > > > > > > > > > > > > > > > > > he > > > > > > > > > > > > > > > > > > > > > won't be confused by the name like execution_date > > > > > > >