To (slightly) hijack this thread: On the subject of execuction_date: as I'm sure we're all aware the concept of execution_date is confusing to new-commers to Airflow (there are many questions about "why hasn't my DAG run yet"? "Why is my dag a day behind?" etc.) and although we mention this in the docs it's a confusing concept.
What to people think about adding two new parameters: `period_start` and `period_end` and making these the preferred terms in place of execution_date and next_execution_date? This hopefully avoids any ambitious terms like "execution" or "run" which is understandably easy to conflate with the time the task is being run (i.e. `now()`) If people think this naming is better and less confusing I would suggest we update all the docs and examples to use these terms (but still mention the old names somewhere, probably in the macros docs) Thoughts? -ash > On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com> wrote: > > Hi Bas, > > 1) I am aware of a few places where those parameters are used in production > in a few hundred jobs. I highly recommend we don't deprecate them unless we > do it in a major version. > > 2) As James mentioned, inlets and outlets are a lineage annotation feature > which is still under development. Let's leave them in, but we can guard > them behind a feature flag if you prefer. > > 3) the yesterday*/tomorrow* params are convenience ones if you use a daily > ETL. I agree with you that they are simple to compute, but not everyone > using Apache Airflow is amazing with Python. Some users are only trying to > get a query scheduled and rely on a couple of niceties like these to get by. > > 4) latest_date, end_date (I feel like there used to be start_date, but > maybe it got lost) were a blend of things which were used by a backfill > framework used internally at Airbnb. Latest date was used if you needed to > join to a dimension for which you only wanted the latest version of the > attributes in you backfill. end_date was used for time ranges where several > days were processed together in a range to save on compute. I don't see an > issue with removing them. > > Best regards, > Arthur > > > > On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <basharens...@godatadriven.com> > wrote: > >> Hi all, >> >> Following Tao Feng’s question to discuss this PR< >> https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192< >> https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss here >> if you agree/disagree/would change. >> >> ----------- >> >> The summary of the PR: >> >> I was confused by the task context values and suggest to clean up and >> clarify these variables. Some are derivations from other variables, some >> are undocumented and unused, some are wrong (name doesn’t match the value). >> Please discuss what you think of the removal of these variables: >> >> >> * Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds, >> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful >> since these require complex logic to compute the next execution date, >> however would leave computing the yesterday* and tomorrow* variables up to >> the user since they are simple one-liners and don't relate to the DAG >> interval. >> * Removed tables. This is a field in params, and is thus also >> accessible by the user ({{ params.tables }}). Also, it was undocumented. >> * Removed latest_date. It's the same as ds and was also undocumented. >> * Removed inlets and outlets. Also undocumented, and have the >> inlets/outlets ever worked/ever been used by anybody? >> * Removed end_date and END_DATE. Both have the same value, so it >> doesn't make sense to have both variables. Also, the value is ds which >> contains the start date of the interval, so the naming didn't make sense to >> me. However, if anybody argues in favour of adding "start_date" and >> "end_date" to provide the start and end datetime of task instance >> intervals, I'd be happy to add them. >> >> Cheers, >> Bas >>