dstandish commented on PR #53492:
URL: https://github.com/apache/airflow/pull/53492#issuecomment-3161548920

   I also, well, I tried to take a look at this question of whether any of the 
window functions can be collapsed into the same select.
   
   So I took the two innermost window functions in your query.
   
   This is what it looks like when I pull out only the two innermost selects.  
Here it is with no changes except naming of the aliases.
   
   ```sql
   WITH
       cte_running AS
           (SELECT
                task_instance.dag_id AS dag_id,
                task_instance.run_id AS run_id,
                count(:count_1) AS running_this_run
            FROM task_instance
            WHERE
                task_instance.state IN ('RUNNING')
            GROUP BY task_instance.dag_id, task_instance.run_id
           )
   SELECT
       task_instance.*,
           row_number() OVER (
           PARTITION BY task_instance.dag_id, task_instance.task_id
           ORDER BY -task_instance.priority_weight, dag_run.logical_date, 
task_instance.map_index
           ) AS tis_per_dag_count
   FROM
       task_instance
       JOIN (SELECT
                 task_instance.*,
                     row_number() OVER (
                     PARTITION BY task_instance.dag_id, task_instance.run_id
                     ORDER BY -task_instance.priority_weight, 
dag_run.logical_date, task_instance.map_index
                     ) AS total_tis_per_dagrun_count
             FROM
                 task_instance
                 JOIN dag_run ON dag_run.dag_id = task_instance.dag_id
                     AND dag_run.run_id = task_instance.run_id
                 JOIN dag ON task_instance.dag_id = dag.dag_id
             WHERE
                 dag_run.state = :state_2
                 AND NOT dag.is_paused
                 AND task_instance.state = :state_3
                 AND dag.bundle_name IS NOT NULL
       ) AS anon_4 ON task_instance.id = anon_4.id
       LEFT JOIN cte_running
           ON task_instance.dag_id = cte_running.dag_id AND 
task_instance.run_id = cte_running.run_id
       JOIN dag_run ON task_instance.run_id = dag_run.run_id
       JOIN dag ON task_instance.dag_id = dag.dag_id
   WHERE
       coalesce(anon_4.total_tis_per_dagrun_count, 0) + 
coalesce(cte_running.running_this_run, 0) <= coalesce(dag.max_active_tasks, 10)
   ```
   
   So then I explored, what would happen if I collapsed them into one.  Is it 
actually not possible?
   
   So this is what it looks like when they are collapsed into one:
   
   ```sql
   WITH
       cte_running AS
           (SELECT
                task_instance.dag_id AS dag_id,
                task_instance.run_id AS run_id,
                count(:count_1) AS running_this_run
            FROM task_instance
            WHERE
                task_instance.state IN ('RUNNING')
            GROUP BY task_instance.dag_id, task_instance.run_id
           )
   select *
   from
       (SELECT
           dag.max_active_tasks,
            task_instance.*,
                row_number() OVER (
                PARTITION BY task_instance.dag_id, task_instance.run_id
                ORDER BY -task_instance.priority_weight, dag_run.logical_date, 
task_instance.map_index
                ) AS total_tis_per_dagrun_count,
                row_number() OVER (
                PARTITION BY task_instance.dag_id, task_instance.task_id
                ORDER BY -task_instance.priority_weight, dag_run.logical_date, 
task_instance.map_index
                ) AS tis_per_dag_count
        FROM
            task_instance
            JOIN dag_run ON dag_run.dag_id = task_instance.dag_id
                AND dag_run.run_id = task_instance.run_id
            JOIN dag ON task_instance.dag_id = dag.dag_id
            LEFT JOIN cte_running
                ON task_instance.dag_id = cte_running.dag_id AND 
task_instance.run_id = cte_running.run_id
        WHERE
            dag_run.state = :state_2
            AND NOT dag.is_paused
            AND task_instance.state = :state_3
            AND dag.bundle_name IS NOT NULL
       ) base
   where
       coalesce(base.total_tis_per_dagrun_count, 0) + 
coalesce(cte_running.running_this_run, 0)
           <= coalesce(base.max_active_tasks, 10)
   ```
   
   But then I noticed something.  The two window functions are identical.  So, 
it does seem a bit doubtful that we actually need both of them.  Couldn't the 
outer queries just reference `total_tis_per_dagrun_count` instead of 
`tis_per_dag_count`, given that they are the same thing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to