Yeah, that's the other thing that has been talked about from time-to-time, 
which is a mode to change from "run at end of period, I need all the data 
available for this period" (the current) to "run at _this_ time on the 
schedule_interval, don't wait for the period to end".

(No such flag exists right now, before you go looking.)

> On 9 Apr 2019, at 15:31, Shaw, Damian P. <damian.sha...@credit-suisse.com> 
> wrote:
> 
> Hi all,
> 
> I'm new to this Airflow Dev mailing list so I wasn't expecting to reply to 
> anything but I feel I am the target audience for this question. I am quite 
> new to airflow and have been setting up an airflow environment for my 
> business this last month.
> 
> I find the current "execution_date" a small technical burden and a large 
> cognitive burden. Our workflow is based on DAGs running at a specified time 
> in a specified timezone using the same date as the current calendar date.
> 
> I have worked around this by creating my own macro and context variables, 
> with the logic looking like this:
>        airflow_execution_date = context['execution_date']
>        dag_timezone = context['dag'].timezone
>        local_execution_date = dag_timezone.convert(airflow_execution_date)
>        local_cal_date = local_execution_date + datetime.timedelta(days=1)
> 
> As you can see this isn't a lot of technical effort, but having a date that 
> 1) is in the timezone the business users are working in, and 2) Is the same 
> calendar date the business users are working in it significantly reduces the 
> cognitive effort required to set-up tasks. Of course this doesn't help with 
> cron format scheduling which I just let the business give me the requirements 
> for and I set it up myself as the date logic there is still confusing as it 
> doesn't work like real cron scheduling which everyone is familiar with.
> 
> Maybe "period_start" and "period_end" might help people on Day 0 of 
> understanding Airflow get that the dates you are dealing with are not what 
> you expect, but Day 1+ there's still a lot of cognitive overhead if you don't 
> have the exact same model as AirBnb for running DAGs and tasks.
> 
> My 2 cents anyway,
> Damian Shaw
> 
> 
> -----Original Message-----
> From: Ash Berlin-Taylor [mailto:a...@apache.org] 
> Sent: Tuesday, April 09, 2019 10:08 AM
> To: dev@airflow.apache.org
> Subject: [DISCUSS] period_start/period_end instead of 
> execution_date/next_execution_date 
> 
> (trying to break this out in to another thread)
> 
> The ML doesn't allow  images, but I can guess that it is the deps section of 
> a task instance details screen?
> 
> I'm not saying it's not clear once you know to look there, but I'm trying 
> remove/reduce the confusion in the first place. And I think we as committers 
> aren't best placed to know what makes sense as we have internalised how 
> Airflow works :)
> 
> So I guess this is a question to the newest people on the list: Would 
> `period_start` and `period_end` be more or less confusing for you when you 
> were first getting started with Airflow?
> 
> -ash
> 
>> On 9 Apr 2019, at 14:47, Driesprong, Fokko <fo...@driesprong.frl> wrote:
>> 
>> Ash,
>> 
>> Personally, I think this is quite clear, there is a list of reasons why the 
>> job isn't being scheduled:
>> 
>> 
>> Coming back to the question of Bas, I believe that yesterday_ds does not 
>> make sense since we cannot assume that the schedule is daily. I don't see 
>> any usage of this variable. Personally, I do use next_execution_date quite 
>> extensively. When you have a job that runs daily, but you want to change 
>> this to an hourly job. In such a case you don't want to change {{ 
>> (execution_date + macros.timedelta(days=1)) }} to {{ (execution_date + 
>> macros.timedelta(hours=1)) }} everywhere.
>> 
>> I'm just not sure if the aggressive deprecation of is really worth it. I 
>> don't see too much harm in letting them stay.
>> 
>> Cheers, Fokko 
>> 
>> Op di 9 apr. 2019 om 12:17 schreef Ash Berlin-Taylor <a...@apache.org 
>> <mailto:a...@apache.org>>:
>> To (slightly) hijack this thread:
>> 
>> On the subject of execuction_date: as I'm sure we're all aware the concept 
>> of execution_date is confusing to new-commers to Airflow (there are many 
>> questions about "why hasn't my DAG run yet"? "Why is my dag a day behind?" 
>> etc.) and although we mention this in the docs it's a confusing concept.
>> 
>> What to people think about adding two new parameters: `period_start` and 
>> `period_end` and making these the preferred terms in place of execution_date 
>> and next_execution_date?
>> 
>> This hopefully avoids any ambitious terms like "execution" or "run" which is 
>> understandably easy to conflate with the time the task is being run (i.e. 
>> `now()`) 
>> 
>> If people think this naming is better and less confusing I would suggest we 
>> update all the docs and examples to use these terms (but still mention the 
>> old names somewhere, probably in the macros docs)
>> 
>> Thoughts?
>> 
>> -ash
>> 
>> 
>>> On 8 Apr 2019, at 16:20, Arthur Wiedmer <arthur.wied...@gmail.com 
>>> <mailto:arthur.wied...@gmail.com>> wrote:
>>> 
>>> Hi Bas,
>>> 
>>> 1) I am aware of a few places where those parameters are used in production
>>> in a few hundred jobs. I highly recommend we don't deprecate them unless we
>>> do it in a major version.
>>> 
>>> 2) As James mentioned, inlets and outlets are a lineage annotation feature
>>> which is still under development. Let's leave them in, but we can guard
>>> them behind a feature flag if you prefer.
>>> 
>>> 3) the yesterday*/tomorrow* params are convenience ones if you use a daily
>>> ETL. I agree with you that they are simple to compute, but not everyone
>>> using Apache Airflow is amazing with Python. Some users are only trying to
>>> get a query scheduled and rely on a couple of niceties like these to get by.
>>> 
>>> 4) latest_date, end_date (I feel like there used to be start_date, but
>>> maybe it got lost) were a blend of things which were used by a backfill
>>> framework used internally at Airbnb. Latest date was used if you needed to
>>> join to a dimension for which you only wanted the latest version of the
>>> attributes in you backfill. end_date was used for time ranges where several
>>> days were processed together in a range to save on compute. I don't see an
>>> issue with removing them.
>>> 
>>> Best regards,
>>> Arthur
>>> 
>>> 
>>> 
>>> On Mon, Apr 8, 2019 at 5:37 AM Bas Harenslak <basharens...@godatadriven.com 
>>> <mailto:basharens...@godatadriven.com>>
>>> wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> Following Tao Feng’s question to discuss this PR<
>>>> https://github.com/apache/airflow/pull/5010 
>>>> <https://github.com/apache/airflow/pull/5010>> (AIRFLOW-4192<
>>>> https://issues.apache.org/jira/browse/AIRFLOW-4192 
>>>> <https://issues.apache.org/jira/browse/AIRFLOW-4192>>), please discuss here
>>>> if you agree/disagree/would change.
>>>> 
>>>> -----------
>>>> 
>>>> The summary of the PR:
>>>> 
>>>> I was confused by the task context values and suggest to clean up and
>>>> clarify these variables. Some are derivations from other variables, some
>>>> are undocumented and unused, some are wrong (name doesn’t match the value).
>>>> Please discuss what you think of the removal of these variables:
>>>> 
>>>> 
>>>> *   Removed yesterday_ds, yesterday_ds_nodash, tomorrow_ds,
>>>> tomorrow_ds_nodash. IMO the next_* and previous_* variables are useful
>>>> since these require complex logic to compute the next execution date,
>>>> however would leave computing the yesterday* and tomorrow* variables up to
>>>> the user since they are simple one-liners and don't relate to the DAG
>>>> interval.
>>>> *   Removed tables. This is a field in params, and is thus also
>>>> accessible by the user ({{ params.tables }}). Also, it was undocumented.
>>>> *   Removed latest_date. It's the same as ds and was also undocumented.
>>>> *   Removed inlets and outlets. Also undocumented, and have the
>>>> inlets/outlets ever worked/ever been used by anybody?
>>>> *   Removed end_date and END_DATE. Both have the same value, so it
>>>> doesn't make sense to have both variables. Also, the value is ds which
>>>> contains the start date of the interval, so the naming didn't make sense to
>>>> me. However, if anybody argues in favour of adding "start_date" and
>>>> "end_date" to provide the start and end datetime of task instance
>>>> intervals, I'd be happy to add them.
>>>> 
>>>> Cheers,
>>>> Bas
>>>> 
>> 
> 
> 
> 
> ===============================================================================
>  
> Please access the attached hyperlink for an important electronic 
> communications disclaimer: 
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
> ===============================================================================
>  

Reply via email to