`end_date` is useful, some DAGs / tasks may be scheduled to expire, say an
A/B test in an A/B testing framework with an experiment duration, a
backfill framework, or a framework/UI that allows users to schedule to run
a task to run for say 30 days. I'd keep this one for sure, not all
pipelines / tasks are open ended.

I agree on "tomorrow" and "yesterday", clearly should be "previous" and
"next". My bad, I'm guessing I didn't have enough coffee that day...

Max

On Thu, Apr 11, 2019 at 2:52 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> I agree with Max here, we should be careful.
>
> Regarding the yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
> tomorrow_ds_nodash. I'm not against having better readable shorthands, but
> more about the fact that having a tomorrow_ds doesn't make sense when we
> have an hourly (or weekly) job.
>
> I'm against guarding the {in,out}lets. Since the guarding will add extra
> logic, and this is not worth the added complexity (and another flag) in my
> opinion.
>
> - We start by putting deprecation warnings on tables, latest_date, end_date
> and END_DATE and remove them in Airflow 2.0.
>
> This is only possible if there is another version before 2.0. Currently,
> we're preparing to move to 2.0
>
> - We add a lineage_enabled config which is false by default and thus
> inlets/outlets aren’t provided, unless set to true.
>
> I'm against guarding the {in,out}lets. Since the guarding will add extra
> logic, and this is not worth the added complexity (and another flag) in my
> opinion.
>
> - We continue discussion about clarification of execution_date and better
> named variables such as something_start and something_end
>
>
> https://lists.apache.org/thread.html/31423aa7feba421c53356a1e566f777c7a7973966c3320611286a2fb@%3Cdev.airflow.apache.org%3E
>
> Cheers, Fokko
>
> Op do 11 apr. 2019 om 08:44 schreef Bas Harenslak <
> basharens...@godatadriven.com>:
>
> > Great discussion, let’s stay on track. If I can summarise:
> >
> >
> >   *   yesterday_ds, yesterday_ds_nodash, tomorrow_ds, tomorrow_ds_nodash
> >      *   Arthur: some users use these for convenience
> >      *   Bas/Fokko: these are values that can be easily derived in a
> > one-liner
> >
> >   *   tables
> >      *   nobody?
> >
> >   *   latest_date
> >      *   Arthur: used by internal backfill framework at Airbnb, but no
> > issue removing them.
> >
> >   *   inlets/outlets
> >      *   Arthur: still in development, could be guarded behind feature
> > flag.
> >
> >   *   end_date/END_DATE
> >      *   Arthur: used by internal backfill framework at Airbnb, but no
> > issue removing them.
> >
> > And to add to the discussion:
> >
> >
> >   *   execution_date
> >      *   The meaning of execution_date is ambivalent and super confusing
> > to everybody because it’s not the date of execution.
> >      *   Suggestions to add period_start/period_end, or
> > interval_start/interval_end, or something like that, to provide datetime
> > variables for start & end datetime of a DAG run.
> >   *   Other:
> >      *   Removal of variables should be done in major version and
> > deprecation warnings should be added.
> >
> > So how about the following:
> > - We start by putting deprecation warnings on tables, latest_date,
> > end_date and END_DATE and remove them in Airflow 2.0.
> > - We add a lineage_enabled config which is false by default and thus
> > inlets/outlets aren’t provided, unless set to true.
> > - We continue discussion about yesterday* and tomorrow*
> > - We continue discussion about clarification of execution_date and better
> > named variables such as something_start and something_end
> >
> > Cheers,
> > Bas
> >
> >
> > On 11 Apr 2019, at 04:52, Maxime Beauchemin <maximebeauche...@gmail.com
> > <mailto:maximebeauche...@gmail.com>> wrote:
> >
> > Making backwards incompatible changes that require altering the thousands
> > (millions?!) of DAGs in the wild will alienate the community and prevent
> > many from orchestrating an upgrade. Upgrading hundreds of DAGs and
> Airflow
> > atomically would be hard and dangerous.
> >
> > To mitigate this, changes to the DAG / core API should be backward
> > compatible, at least for a release range, and alert on upcoming
> deprecation
> > (if needed)
> >
> > To be clear if `execution_date` is renamed to `foo`, we should have a
> > version range where both work just as well, but using `execution_date`
> > would result is warning messages about its upcoming deprecation, maybe
> > something like "It appears you're using the task parameter
> `execution_date`
> > which will be deprecated in favor or `foo` in upcoming release 2.0.0,
> find
> > more info at link.to/foo<http://link.to/foo>"
> >
> > Max
> >
> > On Tue, Apr 9, 2019 at 7:58 AM James Meickle
> > <jmeic...@quantopian.com.invalid<mailto:jmeic...@quantopian.com.invalid
> >>
> > wrote:
> >
> > I agree with Ash here. The naming of "execution_date" is incredibly
> > confusing to people who are new to Airflow, who think it has something to
> > do with... execution.
> >
> > However, I think that there's still room for improvement with
> > "period_start" and "period_end". Think about manually triggered tasks -
> > they'd have a "period_start", but no useful "period_end" until the next
> > trigger occurs. And mid-day ETL tasks that are dated to "today" still
> have
> > to reference "period_end" to get the correct date, even though the DAGrun
> > won't be over yet!
> >
> > If we're going to go through and rename "execution_date", I'd rather see
> a
> > wider-ranging rework that understands a mapping from a date to a window
> > (like "every day, process the data from one week ago", or "every
> Saturday,
> > process today's data"). Maybe something closer to how Beam does it
> > (but retaining simplicity for the 99% of cases that are just "run daily,
> > for the past day").
> >
> >
> >
> https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions
> >
> > I could see something where there are two default DAG parameters are:
> > schedule="@daily"
> > window=-1 (or some schedule_delta object that provides a windowing impl,
> or
> > None if there is no window)
> >
> > In this world, all DAGRuns would have a "schedule_date" (the date this
> > would have normally started at), and an "execution_date" (when it was
> > actually executed). If a given DAGrun has a window (one-off DAGs may or
> may
> > not!), there would be variables for "window_start" (the start of the
> > window) and "window_end" (the end of the window; this would default to
> the
> > same as schedule_date, and to the next DAGrun's window_start).
> >
> > Disconnecting schedule date, execution date, and window/period would also
> > open a pathway for more advanced users to implement irregular schedules
> and
> > windows, without having to rely on hacks. For example: a DAG that runs at
> > "the end of the week" to produce a end-of-week would normally run on
> > Friday, but on Friday holidays, instead complete a day early (and scan
> one
> > fewer day in its window).
> >
> > I know there's some historical baggage here, but I think the above is
> much
> > more natural to new users than what we're offering today. And I think
> from
> > a current user perspective, there would be breaking variable name
> changes,
> > but no logical differences for the majority of DAGs.
> >
> > On Tue, Apr 9, 2019 at 6:17 AM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > To (slightly) hijack this thread:
> >
> > On the subject of execuction_date: as I'm sure we're all aware the
> > concept
> > of execution_date is confusing to new-commers to Airflow (there are many
> > questions about "why hasn't my DAG run yet"? "Why is my dag a day
> > behind?"
> > etc.) and although we mention this in the docs it's a confusing concept.
> >
> > What to people think about adding two new parameters: `period_start` and
> > `period_end` and making these the preferred terms in place of
> > execution_date and next_execution_date?
> >
> > This hopefully avoids any ambitious terms like "execution" or "run" which
> > is understandably easy to conflate with the time the task is being run
> > (i.e. `now()`)
> >
> > If people think this naming is better and less confusing I would suggest
> > we update all the docs and examples to use these terms (but still mention
> > the old names somewhere, probably in the macros docs)
> >
> > Thoughts?
> >
> > -ash
> >
> >
> > On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com>
> > wrote:
> >
> > Hi Bas,
> >
> > 1) I am aware of a few places where those parameters are used in
> > production
> > in a few hundred jobs. I highly recommend we don't deprecate them
> > unless
> > we
> > do it in a major version.
> >
> > 2) As James mentioned, inlets and outlets are a lineage annotation
> > feature
> > which is still under development. Let's leave them in, but we can guard
> > them behind a feature flag if you prefer.
> >
> > 3) the yesterday*/tomorrow* params are convenience ones if you use a
> > daily
> > ETL. I agree with you that they are simple to compute, but not everyone
> > using Apache Airflow is amazing with Python. Some users are only trying
> > to
> > get a query scheduled and rely on a couple of niceties like these to
> > get
> > by.
> >
> > 4) latest_date, end_date (I feel like there used to be start_date, but
> > maybe it got lost) were a blend of things which were used by a backfill
> > framework used internally at Airbnb. Latest date was used if you needed
> > to
> > join to a dimension for which you only wanted the latest version of the
> > attributes in you backfill. end_date was used for time ranges where
> > several
> > days were processed together in a range to save on compute. I don't see
> > an
> > issue with removing them.
> >
> > Best regards,
> > Arthur
> >
> >
> >
> > On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <
> > basharens...@godatadriven.com>
> > wrote:
> >
> > Hi all,
> >
> > Following Tao Feng’s question to discuss this PR<
> > https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192<
> > https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss
> > here
> > if you agree/disagree/would change.
> >
> > -----------
> >
> > The summary of the PR:
> >
> > I was confused by the task context values and suggest to clean up and
> > clarify these variables. Some are derivations from other variables,
> > some
> > are undocumented and unused, some are wrong (name doesn’t match the
> > value).
> > Please discuss what you think of the removal of these variables:
> >
> >
> > *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
> > tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
> > since these require complex logic to compute the next execution date,
> > however would leave computing the yesterday* and tomorrow* variables
> > up
> > to
> > the user since they are simple one-liners and don't relate to the DAG
> > interval.
> > *   Removed tables. This is a field in params, and is thus also
> > accessible by the user ({{ params.tables }}). Also, it was
> > undocumented.
> > *   Removed latest_date. It's the same as ds and was also
> > undocumented.
> > *   Removed inlets and outlets. Also undocumented, and have the
> > inlets/outlets ever worked/ever been used by anybody?
> > *   Removed end_date and END_DATE. Both have the same value, so it
> > doesn't make sense to have both variables. Also, the value is ds which
> > contains the start date of the interval, so the naming didn't make
> > sense to
> > me. However, if anybody argues in favour of adding "start_date" and
> > "end_date" to provide the start and end datetime of task instance
> > intervals, I'd be happy to add them.
> >
> > Cheers,
> > Bas
> >
> >
> >
> >
> >
> >
>

Reply via email to