hussein-awala opened a new pull request, #56257:
URL: https://github.com/apache/airflow/pull/56257

   The performance was already improved by 
https://github.com/apache/airflow/pull/55942, but the issue in the original 
query comes from the `SerializedDagModel.id != latest_serdag.id` filter being 
misplaced inside the `IN` clause.
   ```
   SELECT serialized_dag.id, serialized_dag.dag_id
   FROM serialized_dag
   WHERE serialized_dag.dag_version_id IN (
     SELECT task_instance.dag_version_id
       FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id 
AND dag_run.run_id = task_instance.run_id
       WHERE dag_run.id IN (10205, 10203, 10201, 10200, 10198, 9455, 9430, 
9415, 9413, 9407)
       AND serialized_dag.id != '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid);
                     id                  |       dag_id
   --------------------------------------+---------------------
    01998540-a9c1-7d90-8b59-7fbe38294778 | some_dag
   
   
                                                                        QUERY 
PLAN
   
-----------------------------------------------------------------------------------------------------------------------------------------------------
    Seq Scan on serialized_dag  (cost=0.00..923463.34 rows=21476 width=1641) 
(actual time=9509.231..10509.646 rows=1 loops=1)
      Filter: (SubPlan 1)
      Rows Removed by Filter: 42876
      Buffers: shared hit=8200223
      SubPlan 1
        ->  Result  (cost=0.70..41.65 rows=41 width=16) (actual 
time=0.007..0.225 rows=236 loops=42877)
              One-Time Filter: (serialized_dag.id <> 
'019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid)
              Buffers: shared hit=8189132
              ->  Nested Loop  (cost=0.70..41.65 rows=41 width=16) (actual 
time=0.007..0.200 rows=236 loops=42876)
                    Buffers: shared hit=8189132
                    ->  Index Scan using dag_run_pkey on dag_run  
(cost=0.29..15.20 rows=10 width=61) (actual time=0.001..0.013 rows=10 
loops=42876)
                          Index Cond: (id = ANY 
('{10205,10203,10201,10200,10198,9455,9430,9415,9413,9407}'::integer[]))
                          Buffers: shared hit=1286253
                    ->  Index Scan using ti_dag_run on task_instance  
(cost=0.41..2.63 rows=1 width=71) (actual time=0.005..0.011 rows=24 
loops=428751)
                          Index Cond: (((dag_id)::text = 
(dag_run.dag_id)::text) AND ((run_id)::text = (dag_run.run_id)::text))
                          Buffers: shared hit=6902879
    Planning Time: 2.056 ms
    Execution Time: 10509.694 ms
   ```
   And after the fix (without the mentioned PR):
   ```
   SELECT serialized_dag.id, serialized_dag.dag_id
   FROM serialized_dag
   WHERE serialized_dag.dag_version_id IN (
     SELECT task_instance.dag_version_id
       FROM task_instance JOIN dag_run ON dag_run.dag_id = task_instance.dag_id 
AND dag_run.run_id = task_instance.run_id
       WHERE dag_run.id IN (10205, 10203, 10201, 10200, 10198, 9455, 9430, 
9415, 9413, 9407)
   ) AND serialized_dag.id != '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid;
                     id                  |       dag_id
   --------------------------------------+---------------------
    01998540-a9c1-7d90-8b59-7fbe38294778 | some_dag
   
   
   
                                                                           
QUERY PLAN
   
----------------------------------------------------------------------------------------------------------------------------------------------------------
    Nested Loop  (cost=42.04..95.70 rows=41 width=1641) (actual 
time=0.398..0.400 rows=1 loops=1)
      Buffers: shared hit=197
      ->  HashAggregate  (cost=41.75..42.16 rows=41 width=16) (actual 
time=0.386..0.387 rows=2 loops=1)
            Group Key: task_instance.dag_version_id
            Batches: 1  Memory Usage: 24kB
            Buffers: shared hit=191
            ->  Nested Loop  (cost=0.70..41.65 rows=41 width=16) (actual 
time=0.031..0.339 rows=236 loops=1)
                  Buffers: shared hit=191
                  ->  Index Scan using dag_run_pkey on dag_run  
(cost=0.29..15.20 rows=10 width=61) (actual time=0.012..0.036 rows=10 loops=1)
                        Index Cond: (id = ANY 
('{10205,10203,10201,10200,10198,9455,9430,9415,9413,9407}'::integer[]))
                        Buffers: shared hit=30
                  ->  Index Scan using ti_dag_run on task_instance  
(cost=0.41..2.63 rows=1 width=71) (actual time=0.007..0.020 rows=24 loops=10)
                        Index Cond: (((dag_id)::text = (dag_run.dag_id)::text) 
AND ((run_id)::text = (dag_run.run_id)::text))
                        Buffers: shared hit=161
      ->  Index Scan using serialized_dag_dag_version_id_uq on serialized_dag  
(cost=0.29..1.31 rows=1 width=1641) (actual time=0.005..0.005 rows=0 loops=2)
            Index Cond: (dag_version_id = task_instance.dag_version_id)
            Filter: (id <> '019986ae-e948-79cc-b9cc-d49cde67a0e1'::uuid)
            Rows Removed by Filter: 0
            Buffers: shared hit=6
    Planning:
      Buffers: shared hit=99
    Planning Time: 2.678 ms
    Execution Time: 0.448 ms
   ```  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to