mwisnicki opened a new issue, #68721:
URL: https://github.com/apache/airflow/issues/68721
### Under which category would you file this issue?
Airflow Core
### Apache Airflow version
3.2.2
### What happened and how to reproduce it?
Again more slop but hopefully useful enough.
`<🤖>`
---
When a backfill is created for a DAG with fast-completing tasks (sub-second
per run),
the scheduler marks the backfill as complete before all queued runs have
been executed.
The root cause is in `_mark_backfills_complete` (`scheduler_job_runner.py`
~line 1967),
which runs every 30 seconds and marks a backfill complete when no dag runs
are in
`running` or `queued` state:
```python
~exists(
select(DagRun.id).where(
and_(DagRun.backfill_id == Backfill.id,
DagRun.state.in_(unfinished_states))
)
)
```
When tasks complete faster than the scheduler's next scheduling loop can
queue new
runs, there is a window where all current runs are `success` and the next
batch has
not yet been dispatched. The completion check fires in this window and
incorrectly
marks the backfill done, leaving remaining queued runs stranded.
**To reproduce:**
1. Create a DAG with a no-op task and a long date range:
```python
from airflow.sdk import dag, task
from datetime import datetime
@dag(
dag_id="test_backfill_bug",
schedule="@daily",
start_date=datetime(2020, 1, 1),
end_date=datetime(2022, 12, 31),
catchup=False,
)
def test_backfill_bug():
@task
def noop():
pass
noop()
test_backfill_bug()
```
2. Create a backfill:
```bash
airflow backfill create \
--dag-id test_backfill_bug \
--from-date 2020-01-01 \
--to-date 2022-12-31 \
--max-active-runs 10
```
3. Observe that the backfill completes having only processed a fraction of
the 1096 runs:
```python
import sqlite3, os
conn = sqlite3.connect(os.path.expanduser('~/airflow/airflow.db'))
cur = conn.cursor()
cur.execute('SELECT id, completed_at FROM backfill WHERE
dag_id="test_backfill_bug"')
b_id, completed_at = cur.fetchone()
cur.execute('SELECT state, COUNT(*) FROM dag_run WHERE backfill_id=? GROUP
BY state', (b_id,))
print('completed_at:', completed_at)
for row in cur.fetchall(): print(row)
conn.close()
```
**Observed output:**
```
completed_at: 2026-06-18 03:42:47.759611
('success', 441)
('queued', 455) <- remaining runs never executed
```
### What you think should happen instead?
The backfill should only be marked complete when all dag runs associated
with it have
reached a terminal state (`success` or `failed`), regardless of whether
there is a
momentary window where none are `running` or `queued`.
A possible fix: check that the count of terminal dag runs equals the total
`BackfillDagRun` associations (excluding skipped entries) before marking
complete.
### Operating System
macOS
### Deployment
Virtualenv installation
### Apache Airflow Provider(s)
_No response_
### Versions of Apache Airflow Providers
_No response_
### Official Helm Chart version
Not Applicable
### Kubernetes Version
_No response_
### Helm Chart configuration
_No response_
### Docker Image customizations
_No response_
### Anything else?
Note: PR #62561 (merged in 3.2.2) fixed a related but distinct issue where a
backfill
was marked complete before *any* dag runs were created (zero-runs race).
This issue
occurs after dag runs are created and begin executing — the completion
window opens
between scheduling batches when tasks complete faster than new ones are
dispatched.
Related issue: the SQLite `database is locked` error (see
[apache/airflow#68699](https://github.com/apache/airflow/issues/68699)) can
cause fewer
dag runs to be created than expected, which makes this bug easier to trigger
since
fewer runs complete faster.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]