dstandish commented on PR #53492:
URL: https://github.com/apache/airflow/pull/53492#issuecomment-3161548920
I also, well, I tried to take a look at this question of whether any of the
window functions can be collapsed into the same select.
So I took the two innermost window functions in your query.
This is what it looks like when I pull out only the two innermost selects.
Here it is with no changes except naming of the aliases.
```sql
WITH
cte_running AS
(SELECT
task_instance.dag_id AS dag_id,
task_instance.run_id AS run_id,
count(:count_1) AS running_this_run
FROM task_instance
WHERE
task_instance.state IN ('RUNNING')
GROUP BY task_instance.dag_id, task_instance.run_id
)
SELECT
task_instance.*,
row_number() OVER (
PARTITION BY task_instance.dag_id, task_instance.task_id
ORDER BY -task_instance.priority_weight, dag_run.logical_date,
task_instance.map_index
) AS tis_per_dag_count
FROM
task_instance
JOIN (SELECT
task_instance.*,
row_number() OVER (
PARTITION BY task_instance.dag_id, task_instance.run_id
ORDER BY -task_instance.priority_weight,
dag_run.logical_date, task_instance.map_index
) AS total_tis_per_dagrun_count
FROM
task_instance
JOIN dag_run ON dag_run.dag_id = task_instance.dag_id
AND dag_run.run_id = task_instance.run_id
JOIN dag ON task_instance.dag_id = dag.dag_id
WHERE
dag_run.state = :state_2
AND NOT dag.is_paused
AND task_instance.state = :state_3
AND dag.bundle_name IS NOT NULL
) AS anon_4 ON task_instance.id = anon_4.id
LEFT JOIN cte_running
ON task_instance.dag_id = cte_running.dag_id AND
task_instance.run_id = cte_running.run_id
JOIN dag_run ON task_instance.run_id = dag_run.run_id
JOIN dag ON task_instance.dag_id = dag.dag_id
WHERE
coalesce(anon_4.total_tis_per_dagrun_count, 0) +
coalesce(cte_running.running_this_run, 0) <= coalesce(dag.max_active_tasks, 10)
```
So then I explored, what would happen if I collapsed them into one. Is it
actually not possible?
So this is what it looks like when they are collapsed into one:
```sql
WITH
cte_running AS
(SELECT
task_instance.dag_id AS dag_id,
task_instance.run_id AS run_id,
count(:count_1) AS running_this_run
FROM task_instance
WHERE
task_instance.state IN ('RUNNING')
GROUP BY task_instance.dag_id, task_instance.run_id
)
select *
from
(SELECT
dag.max_active_tasks,
task_instance.*,
row_number() OVER (
PARTITION BY task_instance.dag_id, task_instance.run_id
ORDER BY -task_instance.priority_weight, dag_run.logical_date,
task_instance.map_index
) AS total_tis_per_dagrun_count,
row_number() OVER (
PARTITION BY task_instance.dag_id, task_instance.task_id
ORDER BY -task_instance.priority_weight, dag_run.logical_date,
task_instance.map_index
) AS tis_per_dag_count
FROM
task_instance
JOIN dag_run ON dag_run.dag_id = task_instance.dag_id
AND dag_run.run_id = task_instance.run_id
JOIN dag ON task_instance.dag_id = dag.dag_id
LEFT JOIN cte_running
ON task_instance.dag_id = cte_running.dag_id AND
task_instance.run_id = cte_running.run_id
WHERE
dag_run.state = :state_2
AND NOT dag.is_paused
AND task_instance.state = :state_3
AND dag.bundle_name IS NOT NULL
) base
where
coalesce(base.total_tis_per_dagrun_count, 0) +
coalesce(cte_running.running_this_run, 0)
<= coalesce(base.max_active_tasks, 10)
```
But then I noticed something. The two window functions are identical. So,
it does seem a bit doubtful that we actually need both of them. Couldn't the
outer queries just reference `total_tis_per_dagrun_count` instead of
`tis_per_dag_count`, given that they are the same thing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]