potiuk commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1113025147


##########
airflow/dag_processing/manager.py:
##########
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
         """
         query = session.query(errors.ImportError)
         if file_paths:
-            query = query.filter(~errors.ImportError.filename.in_(file_paths))
+            for file_path in file_paths:
+                if file_path.endswith(".zip"):
+                    query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+                else:
+                    query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   Simply speaking what happens with the extra column - we distribute the 
overhead connected with the path/DAG mapping to the time when the file is found 
and entry gets created, and by having this "source" informatio we have 
effectively the "cache" that (by using exact match index) we can query very, 
very efficiently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to