dstandish commented on PR #53492:
URL: https://github.com/apache/airflow/pull/53492#issuecomment-3161258751

   You seem to pretty quickly dismiss the notion that the query is .... pretty 
gnarly.  You don't think it can be simplified?  Or maybe you just want to get 
some data on performance first.
   
   It's also, you know, I'm not sure that piling so much into one query is 
really required to solve starvation problems.  But it is not a positive for 
intelligibility, and can make it harder to optimize.  The nesting in particular 
I think is pretty tough for intelligibility.  Maybe it's necessary but I'd be 
surprised.
   
   I did notice you are missing a join condition here:
   
   ```
   JOIN dag_run ON task_instance.run_id = dag_run.run_id
   ```
   
   This would result in join explosion if you have same run id for diff dags 
(and since airlfow's run id is derived from a date, it happens commonly).
   
   You may be able to avoid some joins to dag run if you are just after TI.  
Some of our relationships are eager loading and result in joins we don't need 
(but it can be disabled with query options).
   
   After taking a little closer look at the query.... 
   
   I was struck by how often we go back and do a row number over the task 
instance table.  Is that really necessary?  I would have expected that maybe 
once we grab the pool of candidate TIs in the innermost select, that we'd be 
able to just successively filter that down to the final set, even if we do have 
to involve some nested window functions.  Over and over it's select *, 
row_number() from TI, DR, inner_query.   Is there no way to stop going back to 
TI and DR after the first result set?
   
   It also can be a nice thing to do (for readability and debugging) to provide 
an alias or label for your subselects and ctes (e.g. to avoid anon_6 etc.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to