chigili commented on PR #63166: URL: https://github.com/apache/airflow/pull/63166#issuecomment-4033375328
> Do you have a query plan showing that coalesce makes it not using indexes?
@ashb ,
Generated on Postgres with 50,000 DagRun rows. Key takeaway: OLD forces a
Seq Scan (scans all 50k rows, removes 43,200 by filter) because COALESCE()
wraps the column. NEW uses Bitmap Index Scan on
idx_dag_run_start_date — touches only the relevant rows. Execution time
drops from 4.0ms → 1.3ms on this test dataset; the gap widens with larger
tables.
Please find the query plan as below:
======================================================================
QUERY PLAN: OLD — func.coalesce (non-sargable)
======================================================================
HashAggregate (cost=2284.78..2284.79 rows=1 width=18) (actual
time=3.986..3.987 rows=1 loops=1)
Group Key: run_type
Batches: 1 Memory Usage: 24kB
-> Seq Scan on dag_run (cost=0.00..2257.00 rows=5556 width=27) (actual
time=2.851..3.471 rows=6800 loops=1)
Filter: ((COALESCE(start_date, '2026-03-10
17:29:41.071763+00'::timestamp with time zone) >= '2026-02-08
17:29:41.071763+00'::timestamp with time zone) AND (COALESCE(end_date,
'2026-03-10
17:29:41.071763+00'::timestamp with time zone) <= '2026-03-10
17:29:41.071763+00'::timestamp with time zone))
Rows Removed by Filter: 43200
Planning Time: 0.169 ms
Execution Time: 4.000 ms
======================================================================
QUERY PLAN: NEW — sargable OR/IS NULL
======================================================================
HashAggregate (cost=1800.33..1800.34 rows=1 width=26) (actual
time=1.277..1.277 rows=1 loops=1)
Group Key: run_type, state
Batches: 1 Memory Usage: 24kB
-> Bitmap Heap Scan on dag_run (cost=139.34..1749.00 rows=6844
width=35) (actual time=0.107..0.604 rows=6800 loops=1)
Recheck Cond: ((start_date >= '2026-02-08
17:29:41.071763+00'::timestamp with time zone) OR (start_date IS NULL))
Filter: ((end_date <= '2026-03-10 17:29:41.071763+00'::timestamp
with time zone) OR (end_date IS NULL))
Heap Blocks: exact=207
-> BitmapOr (cost=139.34..139.34 rows=6844 width=0) (actual
time=0.095..0.095 rows=0 loops=1)
-> Bitmap Index Scan on idx_dag_run_start_date
(cost=0.00..131.62 rows=6844 width=0) (actual time=0.094..0.094 rows=6800
loops=1)
Index Cond: (start_date >= '2026-02-08
17:29:41.071763+00'::timestamp with time zone)
-> Bitmap Index Scan on idx_dag_run_start_date
(cost=0.00..4.30 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1)
Index Cond: (start_date IS NULL)
Planning Time: 0.101 ms
Execution Time: 1.290 ms
I am going to include the benchmark code for this in the dev folder in the
next commit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
