potiuk commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1058096682


##########
airflow/dag_processing/manager.py:
##########
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
         :param session: session for ORM operations
         """
         query = session.query(errors.ImportError)
-        if self._file_paths:
-            query = 
query.filter(~errors.ImportError.filename.in_(self._file_paths))
+        files = list_py_file_paths(self._dag_directory, 
include_examples=False, include_zip_paths=True)

Review Comment:
   > I think there are two possible alternatives. One is to introduce a new 
attribute on DagFileProcessorManager that stores the “full” paths, so we can 
use it instead of `_file_paths` here. The other is to introduce a new column on 
ImportError that store the filesystem path (i.e. path to the zip file) so we 
can filter it against `_file_paths`.
   > 
   > The root issue here is that both `_file_paths` and `ImportError.filename` 
essentially has double meaning—they both represent the actual filesystem entry 
(path to an actual file), and a Python code loading target (path for the 
interpreter). Right now `_file_paths` is a list of filesystem entries, while 
`ImportError.filename` is a code target, and trying to comparing them is 
fundamentally not a good idea.
   
   Agree. I think having filesystem_path in import errors is a good idea - and 
likely it's an easy one that can be automatically set on migration, so should 
be rather easy to do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to