radhwene opened a new pull request, #68361:
URL: https://github.com/apache/airflow/pull/68361

   ## Problem
   
   Cloud SQL allows only one administrative operation at a time per instance. 
When
   `CloudSQLImportInstanceOperator` or `CloudSQLExportInstanceOperator` submits 
an
   operation while another Cloud SQL admin operation is still running on the 
same
   instance, the Cloud SQL Admin API returns `HTTP 409 operationInProgress`.
   
   `CloudSQLHook` already uses `GoogleBaseHook.operation_in_progress_retry()` 
for
   several Cloud SQL admin methods, but `import_instance` and `export_instance`
   were not covered. As a result, transient backend contention fails the Airflow
   task instead of retrying the operation submit.
   
   Closes: #68040
   
   ## Solution
   
   This PR applies the existing `operation_in_progress_retry()` policy to:
   
   * `CloudSQLHook.import_instance`
   * `CloudSQLHook.export_instance`
   
   It also fixes a latent exception-handling issue in `import_instance`.
   
   Before this change, `import_instance` wrapped every `HttpError` into an
   `AirflowException`. That prevented `operation_in_progress_retry()` from 
seeing
   the original retryable `HttpError`.
   
   With this change:
   
   * retryable `operationInProgress` `HttpError`s are re-raised unchanged so the
     retry decorator can evaluate them;
   * terminal `HttpError`s are still converted to the existing friendly
     `AirflowException` message.
   
   There is no public API change, no new operator parameter, and no new
   authentication surface.
   
   ## Why hook-level retry, not only a sensor
   
   A standalone "no operation in progress" sensor can be useful as an optional
   pre-wait primitive, but it cannot fully fix this bug.
   
   The sensor checks the Cloud SQL instance state before the import/export 
operator
   submits the operation. There is still a race between:
   
   1. the sensor observing the instance as idle;
   2. the downstream task being scheduled;
   3. the operator submitting the actual Cloud SQL Admin API request.
   
   Another DAG, scheduler, user, maintenance operation, or external process can
   start an admin operation during that gap. Therefore, the retry must exist at 
the
   submit call itself.
   
   This keeps the retry policy centralized in `CloudSQLHook`, where all 
operators
   using these methods benefit automatically. It is also consistent with the
   existing retry behavior already used by other Cloud SQL admin methods.
   
   A sensor can still be added separately as an optional convenience, but the
   correctness fix for `409 operationInProgress` belongs in the hook.
   
   ## Testing
   
   This PR is covered at two levels.
   
   ### Unit tests
   
   The unit tests verify the local retry contract:
   
   * `export_instance` retries retryable `HttpError` responses.
   * `import_instance` retries retryable `HttpError` responses.
   * `import_instance` re-raises retryable `operationInProgress` `HttpError`
     unchanged, so `operation_in_progress_retry()` can evaluate the original
     exception type.
   * terminal `HttpError`s are still converted to the existing 
`AirflowException`
     message.
   
   The assertions are based on exception types and retry behavior, not string
   matching on error messages.
   
   ### E2E validation
   
   The fix was also validated against a real Cloud SQL PostgreSQL instance.
   
   * Stock provider: parallel Cloud SQL admin operations against the same 
instance
     reproduce `409 operationInProgress`.
   * Patched provider: the same topology succeeds after retrying the submit.
   * The E2E scenario was run twice to reduce flakiness risk.
   
   The E2E DAG submits imports and exports in parallel against the same 
instance,
   which validates both patched hook methods under real Cloud SQL operation
   serialization.
   
   ## Breaking changes
   
   None.
   
   The public API is unchanged. The only behavioral change is that Cloud SQL
   `import` and `export` now retry the same transient backend contention 
condition
   already handled by other Cloud SQL admin methods.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to