I am happy to do another small(!) 1.10.x release.

(There was a small bug I introduced where I broke the rendering of doc_md on 
dags)

On 11 April 2019 22:52:13 BST, "Driesprong, Fokko" <fo...@driesprong.frl> wrote:
>I agree with Max here, we should be careful.
>
>Regarding the yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>tomorrow_ds_nodash. I'm not against having better readable shorthands,
>but
>more about the fact that having a tomorrow_ds doesn't make sense when
>we
>have an hourly (or weekly) job.
>
>I'm against guarding the {in,out}lets. Since the guarding will add
>extra
>logic, and this is not worth the added complexity (and another flag) in
>my
>opinion.
>
>- We start by putting deprecation warnings on tables, latest_date,
>end_date
>and END_DATE and remove them in Airflow 2.0.
>
>This is only possible if there is another version before 2.0.
>Currently,
>we're preparing to move to 2.0
>
>- We add a lineage_enabled config which is false by default and thus
>inlets/outlets aren’t provided, unless set to true.
>
>I'm against guarding the {in,out}lets. Since the guarding will add
>extra
>logic, and this is not worth the added complexity (and another flag) in
>my
>opinion.
>
>- We continue discussion about clarification of execution_date and
>better
>named variables such as something_start and something_end
>
>https://lists.apache.org/thread.html/31423aa7feba421c53356a1e566f777c7a7973966c3320611286a2fb@%3Cdev.airflow.apache.org%3E
>
>Cheers, Fokko
>
>Op do 11 apr. 2019 om 08:44 schreef Bas Harenslak <
>basharens...@godatadriven.com>:
>
>> Great discussion, let’s stay on track. If I can summarise:
>>
>>
>>   *   yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>tomorrow_ds_nodash
>>      *   Arthur: some users use these for convenience
>>      *   Bas/Fokko: these are values that can be easily derived in a
>> one-liner
>>
>>   *   tables
>>      *   nobody?
>>
>>   *   latest_date
>>      *   Arthur: used by internal backfill framework at Airbnb, but
>no
>> issue removing them.
>>
>>   *   inlets/outlets
>>      *   Arthur: still in development, could be guarded behind
>feature
>> flag.
>>
>>   *   end_date/END_DATE
>>      *   Arthur: used by internal backfill framework at Airbnb, but
>no
>> issue removing them.
>>
>> And to add to the discussion:
>>
>>
>>   *   execution_date
>>      *   The meaning of execution_date is ambivalent and super
>confusing
>> to everybody because it’s not the date of execution.
>>      *   Suggestions to add period_start/period_end, or
>> interval_start/interval_end, or something like that, to provide
>datetime
>> variables for start & end datetime of a DAG run.
>>   *   Other:
>>      *   Removal of variables should be done in major version and
>> deprecation warnings should be added.
>>
>> So how about the following:
>> - We start by putting deprecation warnings on tables, latest_date,
>> end_date and END_DATE and remove them in Airflow 2.0.
>> - We add a lineage_enabled config which is false by default and thus
>> inlets/outlets aren’t provided, unless set to true.
>> - We continue discussion about yesterday* and tomorrow*
>> - We continue discussion about clarification of execution_date and
>better
>> named variables such as something_start and something_end
>>
>> Cheers,
>> Bas
>>
>>
>> On 11 Apr 2019, at 04:52, Maxime Beauchemin
><maximebeauche...@gmail.com
>> <mailto:maximebeauche...@gmail.com>> wrote:
>>
>> Making backwards incompatible changes that require altering the
>thousands
>> (millions?!) of DAGs in the wild will alienate the community and
>prevent
>> many from orchestrating an upgrade. Upgrading hundreds of DAGs and
>Airflow
>> atomically would be hard and dangerous.
>>
>> To mitigate this, changes to the DAG / core API should be backward
>> compatible, at least for a release range, and alert on upcoming
>deprecation
>> (if needed)
>>
>> To be clear if `execution_date` is renamed to `foo`, we should have a
>> version range where both work just as well, but using
>`execution_date`
>> would result is warning messages about its upcoming deprecation,
>maybe
>> something like "It appears you're using the task parameter
>`execution_date`
>> which will be deprecated in favor or `foo` in upcoming release 2.0.0,
>find
>> more info at link.to/foo<http://link.to/foo>"
>>
>> Max
>>
>> On Tue, Apr 9, 2019 at 7:58 AM James Meickle
>>
><jmeic...@quantopian.com.invalid<mailto:jmeic...@quantopian.com.invalid>>
>> wrote:
>>
>> I agree with Ash here. The naming of "execution_date" is incredibly
>> confusing to people who are new to Airflow, who think it has
>something to
>> do with... execution.
>>
>> However, I think that there's still room for improvement with
>> "period_start" and "period_end". Think about manually triggered tasks
>-
>> they'd have a "period_start", but no useful "period_end" until the
>next
>> trigger occurs. And mid-day ETL tasks that are dated to "today" still
>have
>> to reference "period_end" to get the correct date, even though the
>DAGrun
>> won't be over yet!
>>
>> If we're going to go through and rename "execution_date", I'd rather
>see a
>> wider-ranging rework that understands a mapping from a date to a
>window
>> (like "every day, process the data from one week ago", or "every
>Saturday,
>> process today's data"). Maybe something closer to how Beam does it
>> (but retaining simplicity for the 99% of cases that are just "run
>daily,
>> for the past day").
>>
>>
>>
>https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions
>>
>> I could see something where there are two default DAG parameters are:
>> schedule="@daily"
>> window=-1 (or some schedule_delta object that provides a windowing
>impl, or
>> None if there is no window)
>>
>> In this world, all DAGRuns would have a "schedule_date" (the date
>this
>> would have normally started at), and an "execution_date" (when it was
>> actually executed). If a given DAGrun has a window (one-off DAGs may
>or may
>> not!), there would be variables for "window_start" (the start of the
>> window) and "window_end" (the end of the window; this would default
>to the
>> same as schedule_date, and to the next DAGrun's window_start).
>>
>> Disconnecting schedule date, execution date, and window/period would
>also
>> open a pathway for more advanced users to implement irregular
>schedules and
>> windows, without having to rely on hacks. For example: a DAG that
>runs at
>> "the end of the week" to produce a end-of-week would normally run on
>> Friday, but on Friday holidays, instead complete a day early (and
>scan one
>> fewer day in its window).
>>
>> I know there's some historical baggage here, but I think the above is
>much
>> more natural to new users than what we're offering today. And I think
>from
>> a current user perspective, there would be breaking variable name
>changes,
>> but no logical differences for the majority of DAGs.
>>
>> On Tue, Apr 9, 2019 at 6:17 AM Ash Berlin-Taylor <a...@apache.org>
>wrote:
>>
>> To (slightly) hijack this thread:
>>
>> On the subject of execuction_date: as I'm sure we're all aware the
>> concept
>> of execution_date is confusing to new-commers to Airflow (there are
>many
>> questions about "why hasn't my DAG run yet"? "Why is my dag a day
>> behind?"
>> etc.) and although we mention this in the docs it's a confusing
>concept.
>>
>> What to people think about adding two new parameters: `period_start`
>and
>> `period_end` and making these the preferred terms in place of
>> execution_date and next_execution_date?
>>
>> This hopefully avoids any ambitious terms like "execution" or "run"
>which
>> is understandably easy to conflate with the time the task is being
>run
>> (i.e. `now()`)
>>
>> If people think this naming is better and less confusing I would
>suggest
>> we update all the docs and examples to use these terms (but still
>mention
>> the old names somewhere, probably in the macros docs)
>>
>> Thoughts?
>>
>> -ash
>>
>>
>> On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com>
>> wrote:
>>
>> Hi Bas,
>>
>> 1) I am aware of a few places where those parameters are used in
>> production
>> in a few hundred jobs. I highly recommend we don't deprecate them
>> unless
>> we
>> do it in a major version.
>>
>> 2) As James mentioned, inlets and outlets are a lineage annotation
>> feature
>> which is still under development. Let's leave them in, but we can
>guard
>> them behind a feature flag if you prefer.
>>
>> 3) the yesterday*/tomorrow* params are convenience ones if you use a
>> daily
>> ETL. I agree with you that they are simple to compute, but not
>everyone
>> using Apache Airflow is amazing with Python. Some users are only
>trying
>> to
>> get a query scheduled and rely on a couple of niceties like these to
>> get
>> by.
>>
>> 4) latest_date, end_date (I feel like there used to be start_date,
>but
>> maybe it got lost) were a blend of things which were used by a
>backfill
>> framework used internally at Airbnb. Latest date was used if you
>needed
>> to
>> join to a dimension for which you only wanted the latest version of
>the
>> attributes in you backfill. end_date was used for time ranges where
>> several
>> days were processed together in a range to save on compute. I don't
>see
>> an
>> issue with removing them.
>>
>> Best regards,
>> Arthur
>>
>>
>>
>> On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <
>> basharens...@godatadriven.com>
>> wrote:
>>
>> Hi all,
>>
>> Following Tao Feng’s question to discuss this PR<
>> https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192<
>> https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss
>> here
>> if you agree/disagree/would change.
>>
>> -----------
>>
>> The summary of the PR:
>>
>> I was confused by the task context values and suggest to clean up and
>> clarify these variables. Some are derivations from other variables,
>> some
>> are undocumented and unused, some are wrong (name doesn’t match the
>> value).
>> Please discuss what you think of the removal of these variables:
>>
>>
>> *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>> tomorrow_ds_nodash. IMO the next_* and previous_* variables are
>useful
>> since these require complex logic to compute the next execution date,
>> however would leave computing the yesterday* and tomorrow* variables
>> up
>> to
>> the user since they are simple one-liners and don't relate to the DAG
>> interval.
>> *   Removed tables. This is a field in params, and is thus also
>> accessible by the user ({{ params.tables }}). Also, it was
>> undocumented.
>> *   Removed latest_date. It's the same as ds and was also
>> undocumented.
>> *   Removed inlets and outlets. Also undocumented, and have the
>> inlets/outlets ever worked/ever been used by anybody?
>> *   Removed end_date and END_DATE. Both have the same value, so it
>> doesn't make sense to have both variables. Also, the value is ds
>which
>> contains the start date of the interval, so the naming didn't make
>> sense to
>> me. However, if anybody argues in favour of adding "start_date" and
>> "end_date" to provide the start and end datetime of task instance
>> intervals, I'd be happy to add them.
>>
>> Cheers,
>> Bas
>>
>>
>>
>>
>>
>>

Reply via email to