To (slightly) hijack this thread:

On the subject of execuction_date: as I'm sure we're all aware the concept of 
execution_date is confusing to new-commers to Airflow (there are many questions 
about "why hasn't my DAG run yet"? "Why is my dag a day behind?" etc.) and 
although we mention this in the docs it's a confusing concept.

What to people think about adding two new parameters: `period_start` and 
`period_end` and making these the preferred terms in place of execution_date 
and next_execution_date?

This hopefully avoids any ambitious terms like "execution" or "run" which is 
understandably easy to conflate with the time the task is being run (i.e. 
`now()`) 

If people think this naming is better and less confusing I would suggest we 
update all the docs and examples to use these terms (but still mention the old 
names somewhere, probably in the macros docs)

Thoughts?

-ash


> On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com> wrote:
> 
> Hi Bas,
> 
> 1) I am aware of a few places where those parameters are used in production
> in a few hundred jobs. I highly recommend we don't deprecate them unless we
> do it in a major version.
> 
> 2) As James mentioned, inlets and outlets are a lineage annotation feature
> which is still under development. Let's leave them in, but we can guard
> them behind a feature flag if you prefer.
> 
> 3) the yesterday*/tomorrow* params are convenience ones if you use a daily
> ETL. I agree with you that they are simple to compute, but not everyone
> using Apache Airflow is amazing with Python. Some users are only trying to
> get a query scheduled and rely on a couple of niceties like these to get by.
> 
> 4) latest_date, end_date (I feel like there used to be start_date, but
> maybe it got lost) were a blend of things which were used by a backfill
> framework used internally at Airbnb. Latest date was used if you needed to
> join to a dimension for which you only wanted the latest version of the
> attributes in you backfill. end_date was used for time ranges where several
> days were processed together in a range to save on compute. I don't see an
> issue with removing them.
> 
> Best regards,
> Arthur
> 
> 
> 
> On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <basharens...@godatadriven.com>
> wrote:
> 
>> Hi all,
>> 
>> Following Tao Feng’s question to discuss this PR<
>> https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192<
>> https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss here
>> if you agree/disagree/would change.
>> 
>> -----------
>> 
>> The summary of the PR:
>> 
>> I was confused by the task context values and suggest to clean up and
>> clarify these variables. Some are derivations from other variables, some
>> are undocumented and unused, some are wrong (name doesn’t match the value).
>> Please discuss what you think of the removal of these variables:
>> 
>> 
>>  *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
>> since these require complex logic to compute the next execution date,
>> however would leave computing the yesterday* and tomorrow* variables up to
>> the user since they are simple one-liners and don't relate to the DAG
>> interval.
>>  *   Removed tables. This is a field in params, and is thus also
>> accessible by the user ({{ params.tables }}). Also, it was undocumented.
>>  *   Removed latest_date. It's the same as ds and was also undocumented.
>>  *   Removed inlets and outlets. Also undocumented, and have the
>> inlets/outlets ever worked/ever been used by anybody?
>>  *   Removed end_date and END_DATE. Both have the same value, so it
>> doesn't make sense to have both variables. Also, the value is ds which
>> contains the start date of the interval, so the naming didn't make sense to
>> me. However, if anybody argues in favour of adding "start_date" and
>> "end_date" to provide the start and end datetime of task instance
>> intervals, I'd be happy to add them.
>> 
>> Cheers,
>> Bas
>> 

Reply via email to