[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1112176062 ## airflow/dag_processing/manager.py: ## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: -query = query.filter(~errors.ImportError.filename.in_(file_paths)) +for file_path in file_paths: +if file_path.endswith(".zip"): +query = query.filter(~(errors.ImportError.filename.startswith(file_path))) +else: +query = query.filter(errors.ImportError.filename != file_path) Review Comment: Oops Got it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1112172817 ## airflow/dag_processing/manager.py: ## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: -query = query.filter(~errors.ImportError.filename.in_(file_paths)) +for file_path in file_paths: +if file_path.endswith(".zip"): +query = query.filter(~(errors.ImportError.filename.startswith(file_path))) +else: +query = query.filter(errors.ImportError.filename != file_path) Review Comment: It's not same with the one you linked. What we have here is equivalent to: ```python for i in range(3): x = i ``` at the end, the value of x is 2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r674660 ## airflow/dag_processing/manager.py: ## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: -query = query.filter(~errors.ImportError.filename.in_(file_paths)) +for file_path in file_paths: +if file_path.endswith(".zip"): +query = query.filter(~(errors.ImportError.filename.startswith(file_path))) +else: +query = query.filter(errors.ImportError.filename != file_path) Review Comment: Then if I understand correctly, each time through the loop, new query is formed and old one thrown away? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r619848 ## airflow/dag_processing/manager.py: ## @@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] | None, session=NEW_SE """ query = session.query(errors.ImportError) if file_paths: -query = query.filter(~errors.ImportError.filename.in_(file_paths)) +for file_path in file_paths: +if file_path.endswith(".zip"): +query = query.filter(~(errors.ImportError.filename.startswith(file_path))) +else: +query = query.filter(errors.ImportError.filename != file_path) Review Comment: Should we make it so that if there's no zip file in file_paths then we don't need to iterate the file_paths but run the removed query: `query = query.filter(~errors.ImportError.filename.in_(file_paths))`, because now we run a query for every file. I think we still have performance issue here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1083416339 ## airflow/utils/file.py: ## @@ -292,6 +293,7 @@ def list_py_file_paths( core.DAG_DISCOVERY_SAFE_MODE configuration setting. If not set, default to safe. :param include_examples: include example DAGs +:param zip_paths: include path to python files inside zip files Review Comment: ```suggestion :param include_zip_paths: include path to python files inside zip files ``` ## airflow/utils/file.py: ## @@ -324,6 +326,12 @@ def find_dag_file_paths(directory: str | pathlib.Path, safe_mode: bool) -> list[ continue if not might_contain_dag(file_path, safe_mode): continue +if include_zip_paths and zipfile.is_zipfile(file_path): +with zipfile.ZipFile(file_path) as current_zip_file: +for zip_info in current_zip_file.infolist(): +mod_name, ext = os.path.splitext(zip_info.filename) Review Comment: What if the `zip_info` object is a directory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1058346122 ## airflow/dag_processing/manager.py: ## @@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session): :param session: session for ORM operations """ query = session.query(errors.ImportError) -if self._file_paths: -query = query.filter(~errors.ImportError.filename.in_(self._file_paths)) +files = list_py_file_paths(self._dag_directory, include_examples=False, include_zip_paths=True) Review Comment: > Agree. I think having filesystem_path in import errors is a good idea - and likely it's an easy one that can be automatically set on migration, so should be rather easy to do. I like the idea too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.
ephraimbuddy commented on code in PR #28256: URL: https://github.com/apache/airflow/pull/28256#discussion_r1045474431 ## airflow/dag_processing/manager.py: ## @@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session): :param session: session for ORM operations """ query = session.query(errors.ImportError) -if self._file_paths: -query = query.filter(~errors.ImportError.filename.in_(self._file_paths)) +files = list_py_file_paths(self._dag_directory, include_examples=False, include_zip_paths=True) Review Comment: Sadly, `list_py_file_paths` is an expensive operation... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org