Reason for deprecating end_date is it is incorrect since the name does not match the value, it’s currently the same as ds (the start datetime of the interval).
Another option would be to keep it but set the value to the end date of the interval, although I think that would create even more confusion. I’m aware that execution_date is used by practically all Airflow users so wouldn’t deprecate that. However adding new, clearly named variables is what I’d do to remove the discussion and confusion around execution_date. And last: are the inlets/outlets still in development? Bas On 12 Apr 2019, at 10:39, Bolke de Bruin <bdbr...@gmail.com<mailto:bdbr...@gmail.com>> wrote: I don’t think guarding inlets and outlets makes sense, cause if they are not defined they code won’t be executed anyway. Ie. The logic is already there, no need for a config flag. End-date is actually used by some beyond Airbnb as far as I know. What is the reason for deprecating it? Execution_date in finance is actually very well known. A transaction has an execution date it is not the time of the sending of the data. I honestly think that a change here would make it a thing that techies understand but does not make business sense. What is “today’s data?” it might not be the data that was sent today. Verstuurd vanaf mijn iPad Op 12 apr. 2019 om 08:35 heeft Maxime Beauchemin <maximebeauche...@gmail.com<mailto:maximebeauche...@gmail.com>> het volgende geschreven: It should be labelled as a "bridge" or "transition" release somehow, and tell the community to absolutely to go there first and address all deprecation warnings prior to upgrade to 2.0. I wonder if semver has something for that. It might be the time to deprecate more things. Spring cleaning! Should we start another thread labeled "[2.0 spring cleaning] what Airflow features should we deprecate!?" Max On Thu, Apr 11, 2019 at 11:26 PM Ash Berlin-Taylor <a...@apache.org<mailto:a...@apache.org>> wrote: I am happy to do another small(!) 1.10.x release. (There was a small bug I introduced where I broke the rendering of doc_md on dags) On 11 April 2019 22:52:13 BST, "Driesprong, Fokko" <fo...@driesprong.frl<mailto:fo...@driesprong.frl>> wrote: I agree with Max here, we should be careful. Regarding the yesterday_ds, yesterday_ds_nodash, tomorrow_ds, tomorrow_ds_nodash. I'm not against having better readable shorthands, but more about the fact that having a tomorrow_ds doesn't make sense when we have an hourly (or weekly) job. I'm against guarding the {in,out}lets. Since the guarding will add extra logic, and this is not worth the added complexity (and another flag) in my opinion. - We start by putting deprecation warnings on tables, latest_date, end_date and END_DATE and remove them in Airflow 2.0. This is only possible if there is another version before 2.0. Currently, we're preparing to move to 2.0 - We add a lineage_enabled config which is false by default and thus inlets/outlets aren’t provided, unless set to true. I'm against guarding the {in,out}lets. Since the guarding will add extra logic, and this is not worth the added complexity (and another flag) in my opinion. - We continue discussion about clarification of execution_date and better named variables such as something_start and something_end https://lists.apache.org/thread.html/31423aa7feba421c53356a1e566f777c7a7973966c3320611286a2fb@%3Cdev.airflow.apache.org%3E Cheers, Fokko Op do 11 apr. 2019 om 08:44 schreef Bas Harenslak < basharens...@godatadriven.com>: Great discussion, let’s stay on track. If I can summarise: * yesterday_ds, yesterday_ds_nodash, tomorrow_ds, tomorrow_ds_nodash * Arthur: some users use these for convenience * Bas/Fokko: these are values that can be easily derived in a one-liner * tables * nobody? * latest_date * Arthur: used by internal backfill framework at Airbnb, but no issue removing them. * inlets/outlets * Arthur: still in development, could be guarded behind feature flag. * end_date/END_DATE * Arthur: used by internal backfill framework at Airbnb, but no issue removing them. And to add to the discussion: * execution_date * The meaning of execution_date is ambivalent and super confusing to everybody because it’s not the date of execution. * Suggestions to add period_start/period_end, or interval_start/interval_end, or something like that, to provide datetime variables for start & end datetime of a DAG run. * Other: * Removal of variables should be done in major version and deprecation warnings should be added. So how about the following: - We start by putting deprecation warnings on tables, latest_date, end_date and END_DATE and remove them in Airflow 2.0. - We add a lineage_enabled config which is false by default and thus inlets/outlets aren’t provided, unless set to true. - We continue discussion about yesterday* and tomorrow* - We continue discussion about clarification of execution_date and better named variables such as something_start and something_end Cheers, Bas On 11 Apr 2019, at 04:52, Maxime Beauchemin <maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>> wrote: Making backwards incompatible changes that require altering the thousands (millions?!) of DAGs in the wild will alienate the community and prevent many from orchestrating an upgrade. Upgrading hundreds of DAGs and Airflow atomically would be hard and dangerous. To mitigate this, changes to the DAG / core API should be backward compatible, at least for a release range, and alert on upcoming deprecation (if needed) To be clear if `execution_date` is renamed to `foo`, we should have a version range where both work just as well, but using `execution_date` would result is warning messages about its upcoming deprecation, maybe something like "It appears you're using the task parameter `execution_date` which will be deprecated in favor or `foo` in upcoming release 2.0.0, find more info at link.to/foo<http://link.to/foo>" Max On Tue, Apr 9, 2019 at 7:58 AM James Meickle <jmeic...@quantopian.com.invalid<mailto:jmeic...@quantopian.com.invalid>> wrote: I agree with Ash here. The naming of "execution_date" is incredibly confusing to people who are new to Airflow, who think it has something to do with... execution. However, I think that there's still room for improvement with "period_start" and "period_end". Think about manually triggered tasks - they'd have a "period_start", but no useful "period_end" until the next trigger occurs. And mid-day ETL tasks that are dated to "today" still have to reference "period_end" to get the correct date, even though the DAGrun won't be over yet! If we're going to go through and rename "execution_date", I'd rather see a wider-ranging rework that understands a mapping from a date to a window (like "every day, process the data from one week ago", or "every Saturday, process today's data"). Maybe something closer to how Beam does it (but retaining simplicity for the 99% of cases that are just "run daily, for the past day"). https://beam.apache.org/documentation/programming-guide/#provided-windowing-functions I could see something where there are two default DAG parameters are: schedule="@daily" window=-1 (or some schedule_delta object that provides a windowing impl, or None if there is no window) In this world, all DAGRuns would have a "schedule_date" (the date this would have normally started at), and an "execution_date" (when it was actually executed). If a given DAGrun has a window (one-off DAGs may or may not!), there would be variables for "window_start" (the start of the window) and "window_end" (the end of the window; this would default to the same as schedule_date, and to the next DAGrun's window_start). Disconnecting schedule date, execution date, and window/period would also open a pathway for more advanced users to implement irregular schedules and windows, without having to rely on hacks. For example: a DAG that runs at "the end of the week" to produce a end-of-week would normally run on Friday, but on Friday holidays, instead complete a day early (and scan one fewer day in its window). I know there's some historical baggage here, but I think the above is much more natural to new users than what we're offering today. And I think from a current user perspective, there would be breaking variable name changes, but no logical differences for the majority of DAGs. On Tue, Apr 9, 2019 at 6:17 AM Ash Berlin-Taylor <a...@apache.org> wrote: To (slightly) hijack this thread: On the subject of execuction_date: as I'm sure we're all aware the concept of execution_date is confusing to new-commers to Airflow (there are many questions about "why hasn't my DAG run yet"? "Why is my dag a day behind?" etc.) and although we mention this in the docs it's a confusing concept. What to people think about adding two new parameters: `period_start` and `period_end` and making these the preferred terms in place of execution_date and next_execution_date? This hopefully avoids any ambitious terms like "execution" or "run" which is understandably easy to conflate with the time the task is being run (i.e. `now()`) If people think this naming is better and less confusing I would suggest we update all the docs and examples to use these terms (but still mention the old names somewhere, probably in the macros docs) Thoughts? -ash On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com> wrote: Hi Bas, 1) I am aware of a few places where those parameters are used in production in a few hundred jobs. I highly recommend we don't deprecate them unless we do it in a major version. 2) As James mentioned, inlets and outlets are a lineage annotation feature which is still under development. Let's leave them in, but we can guard them behind a feature flag if you prefer. 3) the yesterday*/tomorrow* params are convenience ones if you use a daily ETL. I agree with you that they are simple to compute, but not everyone using Apache Airflow is amazing with Python. Some users are only trying to get a query scheduled and rely on a couple of niceties like these to get by. 4) latest_date, end_date (I feel like there used to be start_date, but maybe it got lost) were a blend of things which were used by a backfill framework used internally at Airbnb. Latest date was used if you needed to join to a dimension for which you only wanted the latest version of the attributes in you backfill. end_date was used for time ranges where several days were processed together in a range to save on compute. I don't see an issue with removing them. Best regards, Arthur On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak < basharens...@godatadriven.com> wrote: Hi all, Following Tao Feng’s question to discuss this PR< https://github.com/apache/airflow/pull/5010> (AIRFLOW-4192< https://issues.apache.org/jira/browse/AIRFLOW-4192>), please discuss here if you agree/disagree/would change. ----------- The summary of the PR: I was confused by the task context values and suggest to clean up and clarify these variables. Some are derivations from other variables, some are undocumented and unused, some are wrong (name doesn’t match the value). Please discuss what you think of the removal of these variables: * Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds, tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful since these require complex logic to compute the next execution date, however would leave computing the yesterday* and tomorrow* variables up to the user since they are simple one-liners and don't relate to the DAG interval. * Removed tables. This is a field in params, and is thus also accessible by the user ({{ params.tables }}). Also, it was undocumented. * Removed latest_date. It's the same as ds and was also undocumented. * Removed inlets and outlets. Also undocumented, and have the inlets/outlets ever worked/ever been used by anybody? * Removed end_date and END_DATE. Both have the same value, so it doesn't make sense to have both variables. Also, the value is ds which contains the start date of the interval, so the naming didn't make sense to me. However, if anybody argues in favour of adding "start_date" and "end_date" to provide the start and end datetime of task instance intervals, I'd be happy to add them. Cheers, Bas