potiuk commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1058096682
########## airflow/dag_processing/manager.py: ########## @@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session): :param session: session for ORM operations """ query = session.query(errors.ImportError) - if self._file_paths: - query = query.filter(~errors.ImportError.filename.in_(self._file_paths)) + files = list_py_file_paths(self._dag_directory, include_examples=False, include_zip_paths=True) Review Comment: > I think there are two possible alternatives. One is to introduce a new attribute on DagFileProcessorManager that stores the “full” paths, so we can use it instead of `_file_paths` here. The other is to introduce a new column on ImportError that store the filesystem path (i.e. path to the zip file) so we can filter it against `_file_paths`. > > The root issue here is that both `_file_paths` and `ImportError.filename` essentially has double meaning—they both represent the actual filesystem entry (path to an actual file), and a Python code loading target (path for the interpreter). Right now `_file_paths` is a list of filesystem entries, while `ImportError.filename` is a code target, and trying to comparing them is fundamentally not a good idea. Agree. I think having filesystem_path in import errors is a good idea - and likely it's an easy one that can be automatically set on migration, so should be rather easy to do. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org