Re: [I] Avoid large tuple IN query in SchedulerJobRunner._activate_referenced_assets on PostgreSQL (performance / perceived hanging) [airflow]

via GitHub Mon, 09 Feb 2026 05:55:22 -0800


Nataneljpwd commented on issue #61453:
URL: https://github.com/apache/airflow/issues/61453#issuecomment-3871871325


   > @pjavier29 
   > 
   > Yes, in my opinion, considering the usage scenario, I don't think 
additional filtering is necessary (refer to PR description), but in-memory 
filtering would be nice. However, I have a question. What was the main cause of 
the query performance degradation you observed?
   > 
   > 1. Network IO from passing long string queries (the items going into the 
IN operation)
   > 2. Query performance of the IN operation
   > 
   > In case of 1, the same (or very similar) network IO would have occurred 
when calling 
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L2992.
   > 
   > If 2 is the problem, doing in-memory filtering in Python would be slower 
than the database if anything, not faster, wouldn't it?
   
   1) is not an issue compared to the amount of data you will get now without 
the filter, with a lot of assets it will be way worse than the current state
   
   2) not that bad, planning does not take a long time
   
   I think that either a proper query with a join (which is not that complex) 
is needed, or keep it as is and maybe paginate the request (though I think it 
will cause further performance degradation)
   
   In memory python filtering will be slow, and I say that from experience, 
where I tried to do in memory filtering in python of SQL objects, which was way 
slower than doing it in sql


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Avoid large tuple IN query in SchedulerJobRunner._activate_referenced_assets on PostgreSQL (performance / perceived hanging) [airflow]

Reply via email to