Does anyone actually use end_date (of either spelling) given it's value is 
currently the same as `ds`: 
https://github.com/apache/airflow/blob/3020d9733cb189f091489a62f58f0f586dc8d4a9/airflow/models/taskinstance.py#L1183-L1184
 


    'END_DATE': ds,
    'end_date': ds,

That seems like it is just the wrong value - I could see it being useful as 
either the same value as `next_execution_date` or `dag.end_date` but the 
current value (from a quick thought) doesn't seem like it is useful?

Given what the value is currently set to I vote for deprecating/removing these 
two.

> On 12 Apr 2019, at 09:39, Bolke de Bruin <bdbr...@gmail.com> wrote:
> 
> I don’t think guarding inlets and outlets makes sense, cause if they are not 
> defined they code won’t be executed anyway. Ie. The logic is already there, 
> no need for a config flag.
> 
> End-date is actually used by some beyond Airbnb as far as I know. What is the 
> reason for deprecating it?
> 
> Execution_date in finance is actually very well known. A transaction has an 
> execution date it is not the time of the sending of the data. I honestly 
> think that a change here would make it a thing that techies understand but 
> does not make business sense. What is “today’s data?” it might not be the 
> data that was sent today.
> 
> 
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 12 apr. 2019 om 08:35 heeft Maxime Beauchemin 
>> <maximebeauche...@gmail.com> het volgende geschreven:
>> 
>> It should be labelled as a "bridge" or "transition" release somehow, and
>> tell the community to absolutely to go there first and address all
>> deprecation warnings prior to upgrade to 2.0. I wonder if semver has
>> something for that.
>> 
>> It might be the time to deprecate more things. Spring cleaning!
>> 
>> Should we start another thread labeled "[2.0 spring cleaning] what Airflow
>> features should we deprecate!?"
>> 
>> Max
>> 
>>> On Thu, Apr 11, 2019 at 11:26 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>>> 
>>> I am happy to do another small(!) 1.10.x release.
>>> 
>>> (There was a small bug I introduced where I broke the rendering of doc_md
>>> on dags)
>>> 
>>> On 11 April 2019 22:52:13 BST, "Driesprong, Fokko" <fo...@driesprong.frl>
>>> wrote:
>>>> I agree with Max here, we should be careful.
>>>> 
>>>> Regarding the yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>>>> tomorrow_ds_nodash. I'm not against having better readable shorthands,
>>>> but
>>>> more about the fact that having a tomorrow_ds doesn't make sense when
>>>> we
>>>> have an hourly (or weekly) job.
>>>> 
>>>> I'm against guarding the {in,out}lets. Since the guarding will add
>>>> extra
>>>> logic, and this is not worth the added complexity (and another flag) in
>>>> my
>>>> opinion.
>>>> 
>>>> - We start by putting deprecation warnings on tables, latest_date,
>>>> end_date
>>>> and END_DATE and remove them in Airflow 2.0.
>>>> 
>>>> This is only possible if there is another version before 2.0.
>>>> Currently,
>>>> we're preparing to move to 2.0
>>>> 
>>>> - We add a lineage_enabled config which is false by default and thus
>>>> inlets/outlets aren’t provided, unless set to true.
>>>> 
>>>> I'm against guarding the {in,out}lets. Since the guarding will add
>>>> extra
>>>> logic, and this is not worth the added complexity (and another flag) in
>>>> my
>>>> opinion.
>>>> 
>>>> - We continue discussion about clarification of execution_date and
>>>> better
>>>> named variables such as something_start and something_end
>>>> 
>>>> 
>>> https://lists.apache.org/thread.html/31423aa7feba421c53356a1e566f777c7a7973966c3320611286a2fb@%3Cdev.airflow.apache.org%3E
>>>> 
>>>> Cheers, Fokko
>>>> 
>>>> Op do 11 apr. 2019 om 08:44 schreef Bas Harenslak <
>>>> basharens...@godatadriven.com>:
>>>> 
>>>>> Great discussion, let’s stay on track. If I can summarise:
>>>>> 
>>>>> 
>>>>> *   yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>>>> tomorrow_ds_nodash
>>>>>    *   Arthur: some users use these for convenience
>>>>>    *   Bas/Fokko: these are values that can be easily derived in a
>>>>> one-liner
>>>>> 
>>>>> *   tables
>>>>>    *   nobody?
>>>>> 
>>>>> *   latest_date
>>>>>    *   Arthur: used by internal backfill framework at Airbnb, but
>>>> no
>>>>> issue removing them.
>>>>> 
>>>>> *   inlets/outlets
>>>>>    *   Arthur: still in development, could be guarded behind
>>>> feature
>>>>> flag.
>>>>> 
>>>>> *   end_date/END_DATE
>>>>>    *   Arthur: used by internal backfill framework at Airbnb, but
>>>> no
>>>>> issue removing them.
>>>>> 
>>>>> And to add to the discussion:
>>>>> 
>>>>> 
>>>>> *   execution_date
>>>>>    *   The meaning of execution_date is ambivalent and super
>>>> confusing
>>>>> to everybody because it’s not the date of execution.
>>>>>    *   Suggestions to add period_start/period_end, or
>>>>> interval_start/interval_end, or something like that, to provide
>>>> datetime
>>>>> variables for start & end datetime of a DAG run.
>>>>> *   Other:
>>>>>    *   Removal of variables should be done in major version and
>>>>> deprecation warnings should be added.
>>>>> 
>>>>> So how about the following:
>>>>> - We start by putting deprecation warnings on tables, latest_date,
>>>>> end_date and END_DATE and remove them in Airflow 2.0.
>>>>> - We add a lineage_enabled config which is false by default and thus
>>>>> inlets/outlets aren’t provided, unless set to true.
>>>>> - We continue discussion about yesterday* and tomorrow*
>>>>> - We continue discussion about clarification of execution_date and
>>>> better
>>>>> named variables such as something_start and something_end
>>>>> 
>>>>> Cheers,
>>>>> Bas
>>>>> 
>>>>> 
>>>>> On 11 Apr 2019, at 04:52, Maxime Beauchemin
>>>> <maximebeauche...@gmail.com
>>>>> <mailto:maximebeauche...@gmail.com>> wrote:
>>>>> 
>>>>> Making backwards incompatible changes that require altering the
>>>> thousands
>>>>> (millions?!) of DAGs in the wild will alienate the community and
>>>> prevent
>>>>> many from orchestrating an upgrade. Upgrading hundreds of DAGs and
>>>> Airflow
>>>>> atomically would be hard and dangerous.
>>>>> 
>>>>> To mitigate this, changes to the DAG / core API should be backward
>>>>> compatible, at least for a release range, and alert on upcoming
>>>> deprecation
>>>>> (if needed)
>>>>> 
>>>>> To be clear if `execution_date` is renamed to `foo`, we should have a
>>>>> version range where both work just as well, but using
>>>> `execution_date`
>>>>> would result is warning messages about its upcoming deprecation,
>>>> maybe
>>>>> something like "It appears you're using the task parameter
>>>> `execution_date`
>>>>> which will be deprecated in favor or `foo` in upcoming release 2.0.0,
>>>> find
>>>>> more info at link.to/foo<http://link.to/foo>"
>>>>> 
>>>>> Max
>>>>> 
>>>>> On Tue, Apr 9, 2019 at 7:58 AM James Meickle
>>>>> 
>>>> <jmeic...@quantopian.com.invalid<mailto:jmeic...@quantopian.com.invalid>>
>>>>> wrote:
>>>>> 
>>>>> I agree with Ash here. The naming of "execution_date" is incredibly
>>>>> confusing to people who are new to Airflow, who think it has
>>>> something to
>>>>> do with... execution.
>>>>> 
>>>>> However, I think that there's still room for improvement with
>>>>> "period_start" and "period_end". Think about manually triggered tasks
>>>> -
>>>>> they'd have a "period_start", but no useful "period_end" until the
>>>> next
>>>>> trigger occurs. And mid-day ETL tasks that are dated to "today" still
>>>> have
>>>>> to reference "period_end" to get the correct date, even though the
>>>> DAGrun
>>>>> won't be over yet!
>>>>> 
>>>>> If we're going to go through and rename "execution_date", I'd rather
>>>> see a
>>>>> wider-ranging rework that understands a mapping from a date to a
>>>> window
>>>>> (like "every day, process the data from one week ago", or "every
>>>> Saturday,
>>>>> process today's data"). Maybe something closer to how Beam does it
>>>>> (but retaining simplicity for the 99% of cases that are just "run
>>>> daily,
>>>>> for the past day").
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions
>>>>> 
>>>>> I could see something where there are two default DAG parameters are:
>>>>> schedule="@daily"
>>>>> window=-1 (or some schedule_delta object that provides a windowing
>>>> impl, or
>>>>> None if there is no window)
>>>>> 
>>>>> In this world, all DAGRuns would have a "schedule_date" (the date
>>>> this
>>>>> would have normally started at), and an "execution_date" (when it was
>>>>> actually executed). If a given DAGrun has a window (one-off DAGs may
>>>> or may
>>>>> not!), there would be variables for "window_start" (the start of the
>>>>> window) and "window_end" (the end of the window; this would default
>>>> to the
>>>>> same as schedule_date, and to the next DAGrun's window_start).
>>>>> 
>>>>> Disconnecting schedule date, execution date, and window/period would
>>>> also
>>>>> open a pathway for more advanced users to implement irregular
>>>> schedules and
>>>>> windows, without having to rely on hacks. For example: a DAG that
>>>> runs at
>>>>> "the end of the week" to produce a end-of-week would normally run on
>>>>> Friday, but on Friday holidays, instead complete a day early (and
>>>> scan one
>>>>> fewer day in its window).
>>>>> 
>>>>> I know there's some historical baggage here, but I think the above is
>>>> much
>>>>> more natural to new users than what we're offering today. And I think
>>>> from
>>>>> a current user perspective, there would be breaking variable name
>>>> changes,
>>>>> but no logical differences for the majority of DAGs.
>>>>> 
>>>>> On Tue, Apr 9, 2019 at 6:17 AM Ash Berlin-Taylor <a...@apache.org>
>>>> wrote:
>>>>> 
>>>>> To (slightly) hijack this thread:
>>>>> 
>>>>> On the subject of execuction_date: as I'm sure we're all aware the
>>>>> concept
>>>>> of execution_date is confusing to new-commers to Airflow (there are
>>>> many
>>>>> questions about "why hasn't my DAG run yet"? "Why is my dag a day
>>>>> behind?"
>>>>> etc.) and although we mention this in the docs it's a confusing
>>>> concept.
>>>>> 
>>>>> What to people think about adding two new parameters: `period_start`
>>>> and
>>>>> `period_end` and making these the preferred terms in place of
>>>>> execution_date and next_execution_date?
>>>>> 
>>>>> This hopefully avoids any ambitious terms like "execution" or "run"
>>>> which
>>>>> is understandably easy to conflate with the time the task is being
>>>> run
>>>>> (i.e. `now()`)
>>>>> 
>>>>> If people think this naming is better and less confusing I would
>>>> suggest
>>>>> we update all the docs and examples to use these terms (but still
>>>> mention
>>>>> the old names somewhere, probably in the macros docs)
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> -ash
>>>>> 
>>>>> 
>>>>> On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> Hi Bas,
>>>>> 
>>>>> 1) I am aware of a few places where those parameters are used in
>>>>> production
>>>>> in a few hundred jobs. I highly recommend we don't deprecate them
>>>>> unless
>>>>> we
>>>>> do it in a major version.
>>>>> 
>>>>> 2) As James mentioned, inlets and outlets are a lineage annotation
>>>>> feature
>>>>> which is still under development. Let's leave them in, but we can
>>>> guard
>>>>> them behind a feature flag if you prefer.
>>>>> 
>>>>> 3) the yesterday*/tomorrow* params are convenience ones if you use a
>>>>> daily
>>>>> ETL. I agree with you that they are simple to compute, but not
>>>> everyone
>>>>> using Apache Airflow is amazing with Python. Some users are only
>>>> trying
>>>>> to
>>>>> get a query scheduled and rely on a couple of niceties like these to
>>>>> get
>>>>> by.
>>>>> 
>>>>> 4) latest_date, end_date (I feel like there used to be start_date,
>>>> but
>>>>> maybe it got lost) were a blend of things which were used by a
>>>> backfill
>>>>> framework used internally at Airbnb. Latest date was used if you
>>>> needed
>>>>> to
>>>>> join to a dimension for which you only wanted the latest version of
>>>> the
>>>>> attributes in you backfill. end_date was used for time ranges where
>>>>> several
>>>>> days were processed together in a range to save on compute. I don't
>>>> see
>>>>> an
>>>>> issue with removing them.
>>>>> 
>>>>> Best regards,
>>>>> Arthur
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <
>>>>> basharens...@godatadriven.com>
>>>>> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> Following Tao Feng’s question to discuss this PR<
>>>>> https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192<
>>>>> https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss
>>>>> here
>>>>> if you agree/disagree/would change.
>>>>> 
>>>>> -----------
>>>>> 
>>>>> The summary of the PR:
>>>>> 
>>>>> I was confused by the task context values and suggest to clean up and
>>>>> clarify these variables. Some are derivations from other variables,
>>>>> some
>>>>> are undocumented and unused, some are wrong (name doesn’t match the
>>>>> value).
>>>>> Please discuss what you think of the removal of these variables:
>>>>> 
>>>>> 
>>>>> *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>>>>> tomorrow_ds_nodash. IMO the next_* and previous_* variables are
>>>> useful
>>>>> since these require complex logic to compute the next execution date,
>>>>> however would leave computing the yesterday* and tomorrow* variables
>>>>> up
>>>>> to
>>>>> the user since they are simple one-liners and don't relate to the DAG
>>>>> interval.
>>>>> *   Removed tables. This is a field in params, and is thus also
>>>>> accessible by the user ({{ params.tables }}). Also, it was
>>>>> undocumented.
>>>>> *   Removed latest_date. It's the same as ds and was also
>>>>> undocumented.
>>>>> *   Removed inlets and outlets. Also undocumented, and have the
>>>>> inlets/outlets ever worked/ever been used by anybody?
>>>>> *   Removed end_date and END_DATE. Both have the same value, so it
>>>>> doesn't make sense to have both variables. Also, the value is ds
>>>> which
>>>>> contains the start date of the interval, so the naming didn't make
>>>>> sense to
>>>>> me. However, if anybody argues in favour of adding "start_date" and
>>>>> "end_date" to provide the start and end datetime of task instance
>>>>> intervals, I'd be happy to add them.
>>>>> 
>>>>> Cheers,
>>>>> Bas
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 

Reply via email to