Re: [Lazy Consensus] Add a "last_attempted" field to the dagrun table

Daniel Standish via dev Wed, 14 Jan 2026 08:40:00 -0800

About the log table.

*Functional vs forensic*


The log table, as it is, I think is *mostly* non functional -- i.e.
informational, or forensic only.

BUT.... it is not *quite* only that.

I know of at least one place where it is used functionally.  And that is
when a task is stuck in queued.

When a task is stuck in queued state, we use the log table to ascertain how
many times it has been stuck in queued and requeued, before we attempt to
requeue it again.

I think this is not really a bad approach.  It's quite flexible and
practical.

That said, if you want to argue that we should separate the "only forensic"
log entries from those that are possibly functional, for some reason, then,
maybe there's a case.  But the distinction seems rather fuzzy.

My initial thought is that, probably keeping it in one table is best simply
because of the simplicity and not having to decide where to put which kind
of event, and not having to pull from two places in the UI.  But, surely
reasonable minds can differ.

*Log table vs text logs*

The way in which the log table is more useful to end users is, the log
records are visible in the UI right next to the task you are looking at.
So you can see what happened to your task even from different airflow
components.  And it's in theory more selective and structured.

*On the main question here: Log table vs last queued at*

Wait, did we consider just updating `queued_at` to always mean
"last_queued_at"?  That would seem to me to be the best, here.  I don't
know why we would *ever* care about the *first* queued at after the dag run
has been cleared etc.

On Wed, Jan 14, 2026 at 6:12 AM Jarek Potiuk <[email protected]> wrote:

> > So yeah I feel adding a field to dagrun is reasonable.
>
> Quite agree. Also as discussed in the past "log" should be really treated
> as an audit log of what happens rather than being used for all kinds of
> queries including other db entities that could impact what is being
> displayed in UI when we look at particular entity (dag run, task etc. ).
>
> The drawback of having this information in logs is that essentially our log
> table can (and it should be expected) be cleaned "any time" for "any time
> range" - without having to delete the corresponding entities -  including
> very recent dag run data that is still kept for general UI / history
> browsing in UI.
> Adding "last_attempted" to log would be prone to losing the
> `last_queued_at` information even if dag run would be still in the
> database.
>
> J.
>
> On Wed, Jan 14, 2026 at 6:13 AM Tzu-ping Chung via dev <
> [email protected]> wrote:
>
> > Both options make sense to me. Using the Log table allows retrospectively
> > investigate the scheduler’s behaviour, but I that is arguably not
> valuable
> > since you can already do that with logging. Most of the time it just
> takes
> > up needless disk space. So yeah I feel adding a field to dagrun is
> > reasonable.
> >
> > TP
> >
> >
> > > On 14 Jan 2026, at 04:36, Ferruzzi, Dennis <[email protected]>
> wrote:
> > >
> > > Proposal:  I plan to implement a new timestamp column named
> > `last_queued_at` to the `dagrun` table which is updated any time the run
> is
> > queued, including when it is cleared.  DeadlineAlert code will be
> modified
> > to use this new column for any calculations which currently use
> > `dagrun.queued_at` and will fall back on `queued_at` if it is `null` or
> > missing.  This will require a small migration which sets the new column
> to
> > `null` for existing rows.
> > >
> > > To summarize the discussion [1] regarding the `dagrun.queued_at` field:
> > it currently tracks the initial queue time and is never updated, which
> > breaks expected behavior of DeadlineAlerts (and maybe other areas?) if a
> > run is cleared or re-triggered.  Which means the `queued_at` column
> > essentially represents the first time this run was queued, not the most
> > recent time it was attempted.  For example, if you expect an email if the
> > run takes more than 30 minutes from when it was queued and it gets
> cleared
> > and restarted, you get that email 30 minutes from the first time it was
> > queued regardless of how long it actually took to run.
> > >
> > > There was a good discussion there and on Slack about expectations and a
> > few ideas were proposed.  I think these are the two primary options:
> > >
> > > Option 1: We leave `dagrun.queued_at` alone to represent the first time
> > it was attempted and add a new field to the `dagrun` table which is
> updated
> > each time it is queued, representing the most recent attempt.
> > >
> > > Option 2: Add rows to the `Log` table to store when a run was
> > queued/requeued (as suggested by Standish) and use that as the source of
> > truth for when a specific run was last attempted.
> > >
> > >
> > > While I like Option 2, it's a bigger project and feels like overkill
> for
> > this, especially considering the recent discussion [2] about the Log
> table
> > getting out of hand on some environments.  I think maybe Option 1 is the
> > right answer.  It maintains backward compatibility and solves the
> immediate
> > issue well.
> > >
> > > If there are no objections, I'll consider this accepted on Friday, 16
> > Jan at 21:00 UTC.
> > >
> > >
> > > [1] Email thread "DagRun queued_at timestamp discussion":
> > https://lists.apache.org/thread/n5y2khy8l9472spoclmql3nj2bskqksj
> > > [2] Email thread "Managing airflow database size and retention":
> > https://lists.apache.org/thread/88odp590r1syklo5rok4tq3kxpkhv922
> > >
> > >
> > > - ferruzzi
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [Lazy Consensus] Add a "last_attempted" field to the dagrun table

Reply via email to