potiuk commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1113020259
########## airflow/dag_processing/manager.py: ########## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: - query = query.filter(~errors.ImportError.filename.in_(file_paths)) + for file_path in file_paths: + if file_path.endswith(".zip"): + query = query.filter(~(errors.ImportError.filename.startswith(file_path))) + else: + query = query.filter(errors.ImportError.filename != file_path) Review Comment: Yep. That's what I was afraid of. And the extra column in case where people have a lot of DAGs, seems to me the only possible solution to make it in truly performant way when there are many of them. The "IN" query takes the advantage of the index speedup in this case. Also what I am afraid when there are multiple filters added, the SELECT query generated by SQLAlchemy might simply be too big to be effectively parsed and passed to the database. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org