Re: [PR] Add bounded retry cleanup to RedshiftCreateClusterOperator on post-start failure [airflow]

via GitHub Mon, 09 Mar 2026 08:58:58 -0700


SameerMesiah97 commented on code in PR #63074:
URL: https://github.com/apache/airflow/pull/63074#discussion_r2906361939



##########
providers/amazon/src/airflow/providers/amazon/aws/operators/redshift_cluster.py:
##########
@@ -235,6 +239,70 @@ def __init__(
         self.deferrable = deferrable
         self.kwargs = kwargs
         self.delete_cluster_on_failure = delete_cluster_on_failure
+        self.cleanup_timeout_seconds = cleanup_timeout_seconds
+
+    def _attempt_cleanup_with_retry(self) -> None:
+        """
+        Attempt bounded best-effort deletion of the cluster.
+
+        This method is only invoked during task failure handling.
+        It does not block until deletion completes and will not
+        mask the original exception.
+        """
+        RETRY_INTERVAL_SECONDS = 60
+
+        # Bound cleanup attempts to avoid indefinitely occupying a worker slot.
+        deadline = time.monotonic() + self.cleanup_timeout_seconds
+        attempt = 1
+
+        while True:
+            try:
+                self.log.info(
+                    "Attempt %s: Deleting Redshift cluster %s.",
+                    attempt,
+                    self.cluster_identifier,
+                )
+
+                # Do not wait for deletion to complete; cleanup is best-effort.
+                
self.hook.delete_cluster(cluster_identifier=self.cluster_identifier)

Review Comment:
   > You are basically calling `self.hook.delete_cluster` with some retry 
strategy. Could you use `tenacity`? It would make the code way cleaner.
   
   I could use tenacity here and as you said, it would be cleaner and more 
idiomatic for providers.  but since this is best-effort cleanup after a failure 
rather than normal retry logic, I opted for a simple bounded loop so the 
timeout semantics remain explicit.
   
   Do you think Tenacity would improve readability here, or were you mainly 
suggesting it for consistency with other retry patterns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add bounded retry cleanup to RedshiftCreateClusterOperator on post-start failure [airflow]

Reply via email to