Good point Bas. While I am all for deprecation, I agree this one is very much "hard-wired" into many users of Airflow, for many likely this is almost a muscle-memory or copying from existing DAGs. It's not enough to mandate it but we also should be empathetic and provide a viable replacement.
Unless there are good reasons (loudly thinking - performance? scheduling complexity? the delete/recreate scenario and edge cases involved? ) I like the proposal of start_date defaulting to DAG creation date. It is fully backwards-compatible, might be a nice feature of 2.3.0 and it is much more user-friendly than having to figure out artificial date and "catchup=False" to achieve the same. It does break a little the premise of a bit unwritten rule - "everything needed to create a DAG needs to be in the dag python code" - because the "creation date" is only stored in the database, but actually the start-date is not really a "DAG" structural property when you think of it, so this is not a real "DAG structure validation". Since all our DAGs are now stored in serialized form in the DB in order to be scheduled, using "creation_date" to fill the "start_date" makes perfect sense actually. J, On Tue, Feb 1, 2022 at 9:55 AM Bas Harenslak <b...@astronomer.io.invalid> wrote: > Regardless the behaviour of days_ago(), a lot of people use it so we’ll > definitely need to document it well with some good examples of alternatives. > > That said, I think the usage of days_ago() is actually a side-effect of > users that don’t really need their DAGs to start at X days ago, but want > their DAGs to “just run”. Airflow requiring a start_date forces people to > set something which they often do using days_ago(). Having Airflow default > the start_date to the date a DAG was added would take away the need for > days_ago(). > > Bas > > On 1 Feb 2022, at 05:33, Tzu-ping Chung <t...@astronomer.io.INVALID> wrote: > > I was brought here by > https://github.com/apache/airflow/pull/20508#issuecomment-1026414890 > Also +1 to deprecation from me. Since the function cannot be safely used > in start_date and end_date, the only sensible way for a general, > non-advanced user to use the function is in a task Python callable (e.g. > @task function). But importing Airflow in a task callable is always a bad > practice since it can slow things down way too much, and a more lightweight > solution (e.g. Pendulum as Daniel mentioned) is much preferred. Conversely, > by having the function in Airflow core, we are somewhat suggesting the > function can be used in DAG definition, which is bad. The presence of the > function does not provide any advantages. > > TP > > On Jan 6 2022, at 12:11 pm, Josh Fell <josh.d.f...@astronomer.io.invalid> > wrote: > > > +1 for deprecation as well. > > `days_ago()` was removed from example DAGs and other documentation since > it was mainly being used for dynamic `start_date` values which is not a > best practice in DAG authoring. Seemed to create more confusion and odd > behavior than value. > > On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > I'd be for deprecating it. It's too easy to use with too much too > loose and too little value. I see no real "business" value in it. > > On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish > <daniel.stand...@astronomer.io.invalid> wrote: > > > > Yeah that's correct. Sorry, I should have used `pendulum.today`. But > yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while > `days_ago` uses timedelta it's the same when there's no DST is involved) > > > > > > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <a...@apache.org> wrote: > >> > >> days_ago is not just the same as utcnow minus N days, it is always > "truncated" to the start of the day, so it's closer to > "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)” > >> > >> > >> On 28 December 2021 00:08:53 GMT, Daniel Standish < > daniel.stand...@astronomer.io.INVALID> wrote: > >>> > >>> I recall some time ago we removed `days_ago` from all example dags. > Not sure why we didn't also deprecate it. > >>> > >>> For reference, `days_ago(N)` returns utcnow minus N days. > >>> > >>> There's a PR to make it return a value in the default timezone, so > that when you use it in an expression for dag `start_date`, the dag will be > in the default timezone. > >>> > >>> I don't want to get into the merits of that here. But even assuming > that this would be desirable, there's still some ambiguity we'd have to > resolve. Namely, should we return `now minus N 24-hour periods` (as `now - > timedelta(N)` would do) or should we return now minus N days (as > pendulum.now().add(days=-N) would do)? Because of DST the two different > approaches result in values that differ by 1 hour. > >>> > >>> What I do want to explore here is whether folks think we can / should > just deprecate the function entirely. Personally this would be my > preference. Using `days_ago(5)` is not much more convenient than > `dttm.add(days=-N)`. And the latter has the benefit that it is > unambiguous, doesn't make assumptions, and doesn't get in the way between > user and library. > >>> > >>> So my proposal would be, don't change the behavior of `days_ago` and > deprecate it with removal targeted in 3.0. > >>> > > >