[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-02-20 Thread via GitHub


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1112176062


##
airflow/dag_processing/manager.py:
##
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
 """
 query = session.query(errors.ImportError)
 if file_paths:
-query = query.filter(~errors.ImportError.filename.in_(file_paths))
+for file_path in file_paths:
+if file_path.endswith(".zip"):
+query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+else:
+query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   Oops Got it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-02-20 Thread via GitHub


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1112172817


##
airflow/dag_processing/manager.py:
##
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
 """
 query = session.query(errors.ImportError)
 if file_paths:
-query = query.filter(~errors.ImportError.filename.in_(file_paths))
+for file_path in file_paths:
+if file_path.endswith(".zip"):
+query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+else:
+query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   It's not same with the one you linked. What we have here is equivalent to:
   ```python
   for i in range(3):
   x = i
   ```
   at the end, the value of x is 2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-02-20 Thread via GitHub


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r674660


##
airflow/dag_processing/manager.py:
##
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
 """
 query = session.query(errors.ImportError)
 if file_paths:
-query = query.filter(~errors.ImportError.filename.in_(file_paths))
+for file_path in file_paths:
+if file_path.endswith(".zip"):
+query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+else:
+query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   Then if I understand correctly, each time through the loop, new query is 
formed and old one thrown away?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-02-20 Thread via GitHub


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r619848


##
airflow/dag_processing/manager.py:
##
@@ -782,7 +782,11 @@ def clear_nonexistent_import_errors(file_paths: list[str] 
| None, session=NEW_SE
 """
 query = session.query(errors.ImportError)
 if file_paths:
-query = query.filter(~errors.ImportError.filename.in_(file_paths))
+for file_path in file_paths:
+if file_path.endswith(".zip"):
+query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
+else:
+query = query.filter(errors.ImportError.filename != 
file_path)

Review Comment:
   Should we make it so that if there's no zip file in file_paths then we don't 
need to iterate the file_paths but run the removed query: `query = 
query.filter(~errors.ImportError.filename.in_(file_paths))`, because now we run 
a query for every file. I think we still have performance issue here 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-22 Thread via GitHub


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1083416339


##
airflow/utils/file.py:
##
@@ -292,6 +293,7 @@ def list_py_file_paths(
 core.DAG_DISCOVERY_SAFE_MODE configuration setting. If not set, default
 to safe.
 :param include_examples: include example DAGs
+:param zip_paths: include path to python files inside zip files

Review Comment:
   ```suggestion
   :param include_zip_paths: include path to python files inside zip files
   ```



##
airflow/utils/file.py:
##
@@ -324,6 +326,12 @@ def find_dag_file_paths(directory: str | pathlib.Path, 
safe_mode: bool) -> list[
 continue
 if not might_contain_dag(file_path, safe_mode):
 continue
+if include_zip_paths and zipfile.is_zipfile(file_path):
+with zipfile.ZipFile(file_path) as current_zip_file:
+for zip_info in current_zip_file.infolist():
+mod_name, ext = os.path.splitext(zip_info.filename)

Review Comment:
   What if the `zip_info` object is a directory?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2022-12-28 Thread GitBox


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1058346122


##
airflow/dag_processing/manager.py:
##
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
 :param session: session for ORM operations
 """
 query = session.query(errors.ImportError)
-if self._file_paths:
-query = 
query.filter(~errors.ImportError.filename.in_(self._file_paths))
+files = list_py_file_paths(self._dag_directory, 
include_examples=False, include_zip_paths=True)

Review Comment:
   > Agree. I think having filesystem_path in import errors is a good idea - 
and likely it's an easy one that can be automatically set on migration, so 
should be rather easy to do.
   
   I like the idea too



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28256: Include full path to Python files under zip path while clearing import errors.

2022-12-11 Thread GitBox


ephraimbuddy commented on code in PR #28256:
URL: https://github.com/apache/airflow/pull/28256#discussion_r1045474431


##
airflow/dag_processing/manager.py:
##
@@ -777,8 +777,9 @@ def clear_nonexistent_import_errors(self, session):
 :param session: session for ORM operations
 """
 query = session.query(errors.ImportError)
-if self._file_paths:
-query = 
query.filter(~errors.ImportError.filename.in_(self._file_paths))
+files = list_py_file_paths(self._dag_directory, 
include_examples=False, include_zip_paths=True)

Review Comment:
   Sadly, `list_py_file_paths` is an expensive operation...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org