SameerMesiah97 commented on code in PR #62633:
URL: https://github.com/apache/airflow/pull/62633#discussion_r2868013041


##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -1734,27 +1734,38 @@ def delete_bucket_tagging(self, bucket_name: str | None 
= None) -> None:
         s3_client.delete_bucket_tagging(Bucket=bucket_name)
 
     def _sync_to_local_dir_delete_stale_local_files(self, current_s3_objects: 
list[Path], local_dir: Path):
-        current_s3_keys = {key for key in current_s3_objects}
-
-        for item in local_dir.iterdir():
-            item: Path  # type: ignore[no-redef]
-            absolute_item_path = item.resolve()
-
-            if absolute_item_path not in current_s3_keys:
-                try:
-                    if item.is_file():
-                        item.unlink(missing_ok=True)
-                        self.log.debug("Deleted stale local file: %s", item)
-                    elif item.is_dir():
-                        # delete only when the folder is empty
-                        if not os.listdir(item):
+        current_s3_keys = {key.resolve() for key in current_s3_objects}
+
+        def _clean_recursive(dir_path: Path) -> None:
+            """Process directory recursively (bottom-up) to delete stale files 
and empty dirs."""
+            # Use list() to avoid modifying directory while iterating
+            for item in list(dir_path.iterdir()):
+                item: Path  # type: ignore[no-redef]
+                absolute_item_path = item.resolve()
+
+                if item.is_file():
+                    if absolute_item_path not in current_s3_keys:
+                        try:
+                            item.unlink(missing_ok=True)
+                            self.log.debug("Deleted stale local file: %s", 
item)
+                        except OSError as e:
+                            self.log.error("Error deleting stale item %s: %s", 
item, e)
+                            raise e
+                elif item.is_dir():
+                    _clean_recursive(item)
+                    # After recursing, delete directory only if empty
+                    try:
+                        if not any(item.iterdir()):
                             item.rmdir()

Review Comment:
   I don't think you need `not any(item.iterdir())` here. It is a needlessly 
defensive check that subly impacts performance (you are traversing the entire 
directory). You already have `rmdir()`, which I believe only works on empty 
directories and will raise an exception that will be caught 2 lines below. I 
would remove line 1758.



##########
providers/amazon/src/airflow/providers/amazon/aws/hooks/s3.py:
##########
@@ -1734,27 +1734,38 @@ def delete_bucket_tagging(self, bucket_name: str | None 
= None) -> None:
         s3_client.delete_bucket_tagging(Bucket=bucket_name)
 
     def _sync_to_local_dir_delete_stale_local_files(self, current_s3_objects: 
list[Path], local_dir: Path):
-        current_s3_keys = {key for key in current_s3_objects}
-
-        for item in local_dir.iterdir():
-            item: Path  # type: ignore[no-redef]
-            absolute_item_path = item.resolve()
-
-            if absolute_item_path not in current_s3_keys:
-                try:
-                    if item.is_file():
-                        item.unlink(missing_ok=True)
-                        self.log.debug("Deleted stale local file: %s", item)
-                    elif item.is_dir():
-                        # delete only when the folder is empty
-                        if not os.listdir(item):
+        current_s3_keys = {key.resolve() for key in current_s3_objects}
+
+        def _clean_recursive(dir_path: Path) -> None:
+            """Process directory recursively (bottom-up) to delete stale files 
and empty dirs."""
+            # Use list() to avoid modifying directory while iterating
+            for item in list(dir_path.iterdir()):
+                item: Path  # type: ignore[no-redef]
+                absolute_item_path = item.resolve()
+
+                if item.is_file():
+                    if absolute_item_path not in current_s3_keys:
+                        try:
+                            item.unlink(missing_ok=True)
+                            self.log.debug("Deleted stale local file: %s", 
item)
+                        except OSError as e:
+                            self.log.error("Error deleting stale item %s: %s", 
item, e)
+                            raise e
+                elif item.is_dir():

Review Comment:
   My understanding is that `Path.is_dir() `follows symlinks by default. This 
means that if a symlinked directory points to a parent (or otherwise creates a 
cycle), the recursive traversal could enter an infinite loop. It may be safer 
to add this `item.is_dir() and not item.is_symlink()`. This should prevent 
potential recursion cycles or even traversal outside the immediate directory 
tree.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to