potiuk commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1113025147
########## airflow/dag_processing/manager.py: ########## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: - query = query.filter(~errors.ImportError.filename.in_(file_paths)) + for file_path in file_paths: + if file_path.endswith(".zip"): + query = query.filter(~(errors.ImportError.filename.startswith(file_path))) + else: + query = query.filter(errors.ImportError.filename != file_path) Review Comment: Simply speaking what happens with the extra column - we distribute the overhead connected with the path/DAG mapping to the time when the file is found and entry gets created, and by having this "source" informatio we have effectively the "cache" that (by using exact match index) we can query very, very efficiently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org