(trying to break this out in to another thread)

The ML doesn't allow  images, but I can guess that it is the deps section of a 
task instance details screen?

I'm not saying it's not clear once you know to look there, but I'm trying 
remove/reduce the confusion in the first place. And I think we as committers 
aren't best placed to know what makes sense as we have internalised how Airflow 
works :)

So I guess this is a question to the newest people on the list: Would 
`period_start` and `period_end` be more or less confusing for you when you were 
first getting started with Airflow?

-ash

> On 9 Apr 2019, at 14:47, Driesprong, Fokko <fo...@driesprong.frl> wrote:
> 
> Ash,
> 
> Personally, I think this is quite clear, there is a list of reasons why the 
> job isn't being scheduled:
> 
> 
> Coming back to the question of Bas, I believe that yesterday_ds does not make 
> sense since we cannot assume that the schedule is daily. I don't see any 
> usage of this variable. Personally, I do use next_execution_date quite 
> extensively. When you have a job that runs daily, but you want to change this 
> to an hourly job. In such a case you don't want to change {{ (execution_date 
> + macros.timedelta(days=1)) }} to {{ (execution_date + 
> macros.timedelta(hours=1)) }} everywhere.
> 
> I'm just not sure if the aggressive deprecation of is really worth it. I 
> don't see too much harm in letting them stay.
> 
> Cheers, Fokko 
> 
> Op di 9 apr. 2019 om 12:17 schreef Ash Berlin-Taylor <a...@apache.org 
> <mailto:a...@apache.org>>:
> To (slightly) hijack this thread:
> 
> On the subject of execuction_date: as I'm sure we're all aware the concept of 
> execution_date is confusing to new-commers to Airflow (there are many 
> questions about "why hasn't my DAG run yet"? "Why is my dag a day behind?" 
> etc.) and although we mention this in the docs it's a confusing concept.
> 
> What to people think about adding two new parameters: `period_start` and 
> `period_end` and making these the preferred terms in place of execution_date 
> and next_execution_date?
> 
> This hopefully avoids any ambitious terms like "execution" or "run" which is 
> understandably easy to conflate with the time the task is being run (i.e. 
> `now()`) 
> 
> If people think this naming is better and less confusing I would suggest we 
> update all the docs and examples to use these terms (but still mention the 
> old names somewhere, probably in the macros docs)
> 
> Thoughts?
> 
> -ash
> 
> 
> > On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com 
> > <mailto:arthur.wied...@gmail.com>> wrote:
> > 
> > Hi Bas,
> > 
> > 1) I am aware of a few places where those parameters are used in production
> > in a few hundred jobs. I highly recommend we don't deprecate them unless we
> > do it in a major version.
> > 
> > 2) As James mentioned, inlets and outlets are a lineage annotation feature
> > which is still under development. Let's leave them in, but we can guard
> > them behind a feature flag if you prefer.
> > 
> > 3) the yesterday*/tomorrow* params are convenience ones if you use a daily
> > ETL. I agree with you that they are simple to compute, but not everyone
> > using Apache Airflow is amazing with Python. Some users are only trying to
> > get a query scheduled and rely on a couple of niceties like these to get by.
> > 
> > 4) latest_date, end_date (I feel like there used to be start_date, but
> > maybe it got lost) were a blend of things which were used by a backfill
> > framework used internally at Airbnb. Latest date was used if you needed to
> > join to a dimension for which you only wanted the latest version of the
> > attributes in you backfill. end_date was used for time ranges where several
> > days were processed together in a range to save on compute. I don't see an
> > issue with removing them.
> > 
> > Best regards,
> > Arthur
> > 
> > 
> > 
> > On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <basharens...@godatadriven.com 
> > <mailto:basharens...@godatadriven.com>>
> > wrote:
> > 
> >> Hi all,
> >> 
> >> Following Tao Feng’s question to discuss this PR<
> >> https://github.com/apache/airflow/pull/5010 
> >> <https://github.com/apache/airflow/pull/5010>> (AIRFLOW-4192<
> >> https://issues.apache.org/jira/browse/AIRFLOW-4192 
> >> <https://issues.apache.org/jira/browse/AIRFLOW-4192>>), please discuss here
> >> if you agree/disagree/would change.
> >> 
> >> -----------
> >> 
> >> The summary of the PR:
> >> 
> >> I was confused by the task context values and suggest to clean up and
> >> clarify these variables. Some are derivations from other variables, some
> >> are undocumented and unused, some are wrong (name doesn’t match the value).
> >> Please discuss what you think of the removal of these variables:
> >> 
> >> 
> >>  *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
> >> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
> >> since these require complex logic to compute the next execution date,
> >> however would leave computing the yesterday* and tomorrow* variables up to
> >> the user since they are simple one-liners and don't relate to the DAG
> >> interval.
> >>  *   Removed tables. This is a field in params, and is thus also
> >> accessible by the user ({{ params.tables }}). Also, it was undocumented.
> >>  *   Removed latest_date. It's the same as ds and was also undocumented.
> >>  *   Removed inlets and outlets. Also undocumented, and have the
> >> inlets/outlets ever worked/ever been used by anybody?
> >>  *   Removed end_date and END_DATE. Both have the same value, so it
> >> doesn't make sense to have both variables. Also, the value is ds which
> >> contains the start date of the interval, so the naming didn't make sense to
> >> me. However, if anybody argues in favour of adding "start_date" and
> >> "end_date" to provide the start and end datetime of task instance
> >> intervals, I'd be happy to add them.
> >> 
> >> Cheers,
> >> Bas
> >> 
> 

Reply via email to