hussein-awala opened a new pull request, #56257: URL: https://github.com/apache/airflow/pull/56257
The performance was already improved by https://github.com/apache/airflow/pull/55942, but the issue in the original query comes from the `SerializedDagModel.id != latest_serdag.id` filter being misplaced inside the `IN` clause. ``` SELECT serialized_dag.id, serialized_dag.dag_id FROM serialized_dag WHERE serialized_dag.dag_version_id IN ( SELECT task_instance.dag_version_id FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id WHERE dag_run.id IN (10205, 10203, 10201, 10200, 10198, 9455, 9430, 9415, 9413, 9407) AND serialized_dag.id != '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid); id | dag_id --------------------------------------+--------------------- 01998540-a9c1-7d90-8b59-7fbe38294778 | some_dag QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on serialized_dag (cost=0.00..923463.34 rows=21476 width=1641) (actual time=9509.231..10509.646 rows=1 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 42876 Buffers: shared hit=8200223 SubPlan 1 -> Result (cost=0.70..41.65 rows=41 width=16) (actual time=0.007..0.225 rows=236 loops=42877) One-Time Filter: (serialized_dag.id <> '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid) Buffers: shared hit=8189132 -> Nested Loop (cost=0.70..41.65 rows=41 width=16) (actual time=0.007..0.200 rows=236 loops=42876) Buffers: shared hit=8189132 -> Index Scan using dag_run_pkey on dag_run (cost=0.29..15.20 rows=10 width=61) (actual time=0.001..0.013 rows=10 loops=42876) Index Cond: (id = ANY ('{10205,10203,10201,10200,10198,9455,9430,9415,9413,9407}'::integer[])) Buffers: shared hit=1286253 -> Index Scan using ti_dag_run on task_instance (cost=0.41..2.63 rows=1 width=71) (actual time=0.005..0.011 rows=24 loops=428751) Index Cond: (((dag_id)::text = (dag_run.dag_id)::text) AND ((run_id)::text = (dag_run.run_id)::text)) Buffers: shared hit=6902879 Planning Time: 2.056 ms Execution Time: 10509.694 ms ``` And after the fix (without the mentioned PR): ``` SELECT serialized_dag.id, serialized_dag.dag_id FROM serialized_dag WHERE serialized_dag.dag_version_id IN ( SELECT task_instance.dag_version_id FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id AND dag_run.run_id = task_instance.run_id WHERE dag_run.id IN (10205, 10203, 10201, 10200, 10198, 9455, 9430, 9415, 9413, 9407) ) AND serialized_dag.id != '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid; id | dag_id --------------------------------------+--------------------- 01998540-a9c1-7d90-8b59-7fbe38294778 | some_dag QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=42.04..95.70 rows=41 width=1641) (actual time=0.398..0.400 rows=1 loops=1) Buffers: shared hit=197 -> HashAggregate (cost=41.75..42.16 rows=41 width=16) (actual time=0.386..0.387 rows=2 loops=1) Group Key: task_instance.dag_version_id Batches: 1 Memory Usage: 24kB Buffers: shared hit=191 -> Nested Loop (cost=0.70..41.65 rows=41 width=16) (actual time=0.031..0.339 rows=236 loops=1) Buffers: shared hit=191 -> Index Scan using dag_run_pkey on dag_run (cost=0.29..15.20 rows=10 width=61) (actual time=0.012..0.036 rows=10 loops=1) Index Cond: (id = ANY ('{10205,10203,10201,10200,10198,9455,9430,9415,9413,9407}'::integer[])) Buffers: shared hit=30 -> Index Scan using ti_dag_run on task_instance (cost=0.41..2.63 rows=1 width=71) (actual time=0.007..0.020 rows=24 loops=10) Index Cond: (((dag_id)::text = (dag_run.dag_id)::text) AND ((run_id)::text = (dag_run.run_id)::text)) Buffers: shared hit=161 -> Index Scan using serialized_dag_dag_version_id_uq on serialized_dag (cost=0.29..1.31 rows=1 width=1641) (actual time=0.005..0.005 rows=0 loops=2) Index Cond: (dag_version_id = task_instance.dag_version_id) Filter: (id <> '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid) Rows Removed by Filter: 0 Buffers: shared hit=6 Planning: Buffers: shared hit=99 Planning Time: 2.678 ms Execution Time: 0.448 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
