Good point Bas.

While I am all for deprecation, I agree this one is very much "hard-wired"
into many users of Airflow, for many likely this is almost a  muscle-memory
or copying from existing DAGs.
It's not enough to mandate it but we also should be empathetic and provide
a viable replacement.

Unless there are good reasons (loudly thinking - performance? scheduling
complexity? the delete/recreate scenario and edge cases involved? ) I like
the proposal of start_date defaulting to DAG creation date.
It is fully backwards-compatible, might be a nice feature of 2.3.0 and it
is much more user-friendly than having to figure out artificial date and
"catchup=False" to achieve the same.
It does break a little the premise of a bit unwritten rule - "everything
needed to create a DAG needs to be in the dag python code" - because the
"creation date" is only stored in the database, but actually the start-date
is not really a "DAG" structural property when you think of it, so this is
not a real "DAG structure validation". Since all our DAGs are now stored in
serialized form in the DB in order to be scheduled, using "creation_date"
to fill the "start_date" makes perfect sense actually.

J,

On Tue, Feb 1, 2022 at 9:55 AM Bas Harenslak <b...@astronomer.io.invalid>
wrote:

> Regardless the behaviour of days_ago(), a lot of people use it so we’ll
> definitely need to document it well with some good examples of alternatives.
>
> That said, I think the usage of days_ago() is actually a side-effect of
> users that don’t really need their DAGs to start at X days ago, but want
> their DAGs to “just run”. Airflow requiring a start_date forces people to
> set something which they often do using days_ago(). Having Airflow default
> the start_date to the date a DAG was added would take away the need for
> days_ago().
>
> Bas
>
> On 1 Feb 2022, at 05:33, Tzu-ping Chung <t...@astronomer.io.INVALID> wrote:
>
> I was brought here by
> https://github.com/apache/airflow/pull/20508#issuecomment-1026414890
> Also +1 to deprecation from me. Since the function cannot be safely used
> in start_date and end_date, the only sensible way for a general,
> non-advanced user to use the function is in a task Python callable (e.g.
> @task function). But importing Airflow in a task callable is always a bad
> practice since it can slow things down way too much, and a more lightweight
> solution (e.g. Pendulum as Daniel mentioned) is much preferred. Conversely,
> by having the function in Airflow core, we are somewhat suggesting the
> function can be used in DAG definition, which is bad. The presence of the
> function does not provide any advantages.
>
> TP
>
> On Jan 6 2022, at 12:11 pm, Josh Fell <josh.d.f...@astronomer.io.invalid>
> wrote:
>
>
> +1 for deprecation as well.
>
> `days_ago()` was removed from example DAGs and other documentation since
> it was mainly being used for dynamic `start_date` values which is not a
> best practice in DAG authoring. Seemed to create more confusion and odd
> behavior than value.
>
> On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> I'd be for deprecating it. It's too easy to use with too much too
> loose and too little value. I see no real "business" value in it.
>
> On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
> >
> > Yeah that's correct. Sorry, I should have used `pendulum.today`.   But
> yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while
> `days_ago` uses timedelta it's the same when there's no DST is involved)
> >
> >
> > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <a...@apache.org> wrote:
> >>
> >> days_ago is not just the same as utcnow minus N days, it is always
> "truncated" to the start of the day, so it's closer to
> "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
> >>
> >>
> >> On 28 December 2021 00:08:53 GMT, Daniel Standish <
> daniel.stand...@astronomer.io.INVALID> wrote:
> >>>
> >>> I recall some time ago we removed `days_ago` from all  example dags.
> Not sure why we didn't also deprecate it.
> >>>
> >>> For reference, `days_ago(N)` returns utcnow minus N days.
> >>>
> >>> There's a PR to make it return a value in the default timezone, so
> that when you use it in an expression for dag `start_date`, the dag will be
> in the default timezone.
> >>>
> >>> I don't want to get into the merits of that here.  But even assuming
> that this would be desirable, there's still some ambiguity we'd have to
> resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
> timedelta(N)` would do) or should we return now minus N days (as
> pendulum.now().add(days=-N)  would do)?  Because of DST the two different
> approaches result in values that differ by 1 hour.
> >>>
> >>> What I do want to explore here is whether folks think we can / should
> just deprecate the function entirely.  Personally this would be my
> preference.  Using `days_ago(5)` is not much more convenient than
> `dttm.add(days=-N)`.   And the latter has the benefit that it is
> unambiguous, doesn't make assumptions, and doesn't get in the way between
> user and library.
> >>>
> >>> So my proposal would be, don't change the behavior of `days_ago` and
> deprecate it with removal targeted in 3.0.
> >>>
>
>
>

Reply via email to