Nataneljpwd commented on issue #61453: URL: https://github.com/apache/airflow/issues/61453#issuecomment-3871871325
> @pjavier29 > > Yes, in my opinion, considering the usage scenario, I don't think additional filtering is necessary (refer to PR description), but in-memory filtering would be nice. However, I have a question. What was the main cause of the query performance degradation you observed? > > 1. Network IO from passing long string queries (the items going into the IN operation) > 2. Query performance of the IN operation > > In case of 1, the same (or very similar) network IO would have occurred when calling https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py#L2992. > > If 2 is the problem, doing in-memory filtering in Python would be slower than the database if anything, not faster, wouldn't it? 1) is not an issue compared to the amount of data you will get now without the filter, with a lot of assets it will be way worse than the current state 2) not that bad, planning does not take a long time I think that either a proper query with a join (which is not that complex) is needed, or keep it as is and maybe paginate the request (though I think it will cause further performance degradation) In memory python filtering will be slow, and I say that from experience, where I tried to do in memory filtering in python of SQL objects, which was way slower than doing it in sql -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
