> So yeah I feel adding a field to dagrun is reasonable. Quite agree. Also as discussed in the past "log" should be really treated as an audit log of what happens rather than being used for all kinds of queries including other db entities that could impact what is being displayed in UI when we look at particular entity (dag run, task etc. ).
The drawback of having this information in logs is that essentially our log table can (and it should be expected) be cleaned "any time" for "any time range" - without having to delete the corresponding entities - including very recent dag run data that is still kept for general UI / history browsing in UI. Adding "last_attempted" to log would be prone to losing the `last_queued_at` information even if dag run would be still in the database. J. On Wed, Jan 14, 2026 at 6:13 AM Tzu-ping Chung via dev < [email protected]> wrote: > Both options make sense to me. Using the Log table allows retrospectively > investigate the scheduler’s behaviour, but I that is arguably not valuable > since you can already do that with logging. Most of the time it just takes > up needless disk space. So yeah I feel adding a field to dagrun is > reasonable. > > TP > > > > On 14 Jan 2026, at 04:36, Ferruzzi, Dennis <[email protected]> wrote: > > > > Proposal: I plan to implement a new timestamp column named > `last_queued_at` to the `dagrun` table which is updated any time the run is > queued, including when it is cleared. DeadlineAlert code will be modified > to use this new column for any calculations which currently use > `dagrun.queued_at` and will fall back on `queued_at` if it is `null` or > missing. This will require a small migration which sets the new column to > `null` for existing rows. > > > > To summarize the discussion [1] regarding the `dagrun.queued_at` field: > it currently tracks the initial queue time and is never updated, which > breaks expected behavior of DeadlineAlerts (and maybe other areas?) if a > run is cleared or re-triggered. Which means the `queued_at` column > essentially represents the first time this run was queued, not the most > recent time it was attempted. For example, if you expect an email if the > run takes more than 30 minutes from when it was queued and it gets cleared > and restarted, you get that email 30 minutes from the first time it was > queued regardless of how long it actually took to run. > > > > There was a good discussion there and on Slack about expectations and a > few ideas were proposed. I think these are the two primary options: > > > > Option 1: We leave `dagrun.queued_at` alone to represent the first time > it was attempted and add a new field to the `dagrun` table which is updated > each time it is queued, representing the most recent attempt. > > > > Option 2: Add rows to the `Log` table to store when a run was > queued/requeued (as suggested by Standish) and use that as the source of > truth for when a specific run was last attempted. > > > > > > While I like Option 2, it's a bigger project and feels like overkill for > this, especially considering the recent discussion [2] about the Log table > getting out of hand on some environments. I think maybe Option 1 is the > right answer. It maintains backward compatibility and solves the immediate > issue well. > > > > If there are no objections, I'll consider this accepted on Friday, 16 > Jan at 21:00 UTC. > > > > > > [1] Email thread "DagRun queued_at timestamp discussion": > https://lists.apache.org/thread/n5y2khy8l9472spoclmql3nj2bskqksj > > [2] Email thread "Managing airflow database size and retention": > https://lists.apache.org/thread/88odp590r1syklo5rok4tq3kxpkhv922 > > > > > > - ferruzzi > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
