potiuk commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1113020259


##########
airflow/dag_processing/manager.py:
##########
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
         """
         query = session.query(errors.ImportError)
         if file_paths:
-            query = query.filter(~errors.ImportError.filename.in_(file_paths))
+            for file_path in file_paths:
+                if file_path.endswith(".zip"):
+                    query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+                else:
+                    query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   Yep. That's what I was afraid of. And the extra column in case where people 
have a lot of DAGs, seems to me the only possible solution to make it in truly 
performant way when there are many of them. The "IN" query takes the advantage 
of the index speedup in this case. Also what I am afraid when there are 
multiple filters added, the SELECT query generated by SQLAlchemy might simply 
be too big to be effectively parsed and passed to the database.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to