Ash,

Personally, I think this is quite clear, there is a list of reasons why the
job isn't being scheduled:
[image: image.png]

Coming back to the question of Bas, I believe that yesterday_ds does not
make sense since we cannot assume that the schedule is daily. I don't see
any usage of this variable. Personally, I do use next_execution_date quite
extensively. When you have a job that runs daily, but you want to change
this to an hourly job. In such a case you don't want to change {{
(execution_date + macros.timedelta(days=1)) }} to {{ (execution_date +
macros.timedelta(hours=1)) }} everywhere.

I'm just not sure if the aggressive deprecation of is really worth it. I
don't see too much harm in letting them stay.

Cheers, Fokko

Op di 9 apr. 2019 om 12:17 schreef Ash Berlin-Taylor <a...@apache.org>:

> To (slightly) hijack this thread:
>
> On the subject of execuction_date: as I'm sure we're all aware the concept
> of execution_date is confusing to new-commers to Airflow (there are many
> questions about "why hasn't my DAG run yet"? "Why is my dag a day behind?"
> etc.) and although we mention this in the docs it's a confusing concept.
>
> What to people think about adding two new parameters: `period_start` and
> `period_end` and making these the preferred terms in place of
> execution_date and next_execution_date?
>
> This hopefully avoids any ambitious terms like "execution" or "run" which
> is understandably easy to conflate with the time the task is being run
> (i.e. `now()`)
>
> If people think this naming is better and less confusing I would suggest
> we update all the docs and examples to use these terms (but still mention
> the old names somewhere, probably in the macros docs)
>
> Thoughts?
>
> -ash
>
>
> > On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com>
> wrote:
> >
> > Hi Bas,
> >
> > 1) I am aware of a few places where those parameters are used in
> production
> > in a few hundred jobs. I highly recommend we don't deprecate them unless
> we
> > do it in a major version.
> >
> > 2) As James mentioned, inlets and outlets are a lineage annotation
> feature
> > which is still under development. Let's leave them in, but we can guard
> > them behind a feature flag if you prefer.
> >
> > 3) the yesterday*/tomorrow* params are convenience ones if you use a
> daily
> > ETL. I agree with you that they are simple to compute, but not everyone
> > using Apache Airflow is amazing with Python. Some users are only trying
> to
> > get a query scheduled and rely on a couple of niceties like these to get
> by.
> >
> > 4) latest_date, end_date (I feel like there used to be start_date, but
> > maybe it got lost) were a blend of things which were used by a backfill
> > framework used internally at Airbnb. Latest date was used if you needed
> to
> > join to a dimension for which you only wanted the latest version of the
> > attributes in you backfill. end_date was used for time ranges where
> several
> > days were processed together in a range to save on compute. I don't see
> an
> > issue with removing them.
> >
> > Best regards,
> > Arthur
> >
> >
> >
> > On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <
> basharens...@godatadriven.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> Following Tao Feng’s question to discuss this PR<
> >> https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192<
> >> https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss
> here
> >> if you agree/disagree/would change.
> >>
> >> -----------
> >>
> >> The summary of the PR:
> >>
> >> I was confused by the task context values and suggest to clean up and
> >> clarify these variables. Some are derivations from other variables, some
> >> are undocumented and unused, some are wrong (name doesn’t match the
> value).
> >> Please discuss what you think of the removal of these variables:
> >>
> >>
> >>  *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
> >> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
> >> since these require complex logic to compute the next execution date,
> >> however would leave computing the yesterday* and tomorrow* variables up
> to
> >> the user since they are simple one-liners and don't relate to the DAG
> >> interval.
> >>  *   Removed tables. This is a field in params, and is thus also
> >> accessible by the user ({{ params.tables }}). Also, it was undocumented.
> >>  *   Removed latest_date. It's the same as ds and was also undocumented.
> >>  *   Removed inlets and outlets. Also undocumented, and have the
> >> inlets/outlets ever worked/ever been used by anybody?
> >>  *   Removed end_date and END_DATE. Both have the same value, so it
> >> doesn't make sense to have both variables. Also, the value is ds which
> >> contains the start date of the interval, so the naming didn't make
> sense to
> >> me. However, if anybody argues in favour of adding "start_date" and
> >> "end_date" to provide the start and end datetime of task instance
> >> intervals, I'd be happy to add them.
> >>
> >> Cheers,
> >> Bas
> >>
>
>

Reply via email to