SameerMesiah97 opened a new pull request, #61333: URL: https://github.com/apache/airflow/pull/61333
**Description** Added best-effort cleanup for Redshift cluster creation to ensure clusters are deleted when failures occur after a cluster has been successfully created. Cleanup behavior is **guarded by a flag** and is **opted in by default**. Previously, Redshift cluster creation could succeed via `create_cluster`, but the operator could then fail during post-creation steps when `wait_for_completion=True` and the IAM role lacking `redshift:DescribeClusters` permissions. In these cases, the Airflow task failed while the Redshift cluster continued provisioning or remained active in AWS, resulting in leaked infrastructure. Cleanup has now been implemented for `RedshiftCreateClusterOperator`. If `WaiterError` is raised **after cluster creation has been initiated**. the operator attempts a best-effort deletion of the cluster. Cleanup failures are logged but do not mask or replace the original exception. **Rationale** Redshift cluster creation can succeed while post-creation steps fail. This commonly occurs with partially scoped IAM roles, for example, allowing `redshift:CreateCluster` but denying `redshift:DescribeClusters`, which is required by the availability waiter. In these scenarios, the Airflow task fails while the cluster continues provisioning or running in AWS, leading to leaked infrastructure and ongoing cost. This change ensures that when a cluster has been started by the operator, failures during post-creation steps trigger a best-effort cleanup without altering error semantics or impacting unrelated resources. **Tests** * Added a unit test verifying that cluster deletion is attempted when a `WaiterError` occurs during the wait phase after successful cluster creation. * Added a unit test ensuring that failures during cleanup do not mask or override the original exception raised by the waiter. **Documentation** The docstring for `RedshiftCreateClusterOperator` has been updated to document the new flag `delete_cluster_on_failure` and its default behavior. **Backwards Compatibility** A new flag called `delete_cluster_on_failure` has been added to `RedshiftCreateClusterOperator` with a default value of `True`. Best-effort cleanup will now be attempted if a post-creation failure (including `WaiterError`) occurs after the cluster has been successfully created. Closes: #61324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
