SameerMesiah97 opened a new pull request, #62422: URL: https://github.com/apache/airflow/pull/62422
**Description** This change makes `DatabricksReposCreateOperator` resilient to a race condition when multiple tasks attempt to create a repository at the same `repo_path` concurrently. Previously, the operator performed a `get_repo_by_path` check followed by `create_repo`. If two tasks ran at the same time, both could observe that the repository did not exist and both attempt creation. One task would succeed, while the other would fail with a 400 error from the Databricks API indicating that the repo already exists. The operator now treats this as a recoverable condition. If `create_repo` fails because the repo already exists, the operator re-fetches the repo ID via `get_repo_by_path`. If the repo is found, execution proceeds normally and preserves the existing `ignore_existing_repo` semantics. **Rationale** The previous implementation relied on a non-atomic existence check followed by creation. In concurrent DAG runs, this leads to a classic time-of-check/time-of-use race condition. Two tasks can both pass the existence check and attempt creation, even though only one creation can succeed. Since repository creation is an external side-effect managed by the Databricks API, the operator cannot assume exclusivity or single-writer behavior. It must defensively handle the possibility that another task or DAG run creates the resource between the check and the create call. Handling this explicitly makes the operator more robust under concurrency without changing its single-run behavior. **Tests** Add tests that verify that: * the operator recovers when `create_repo` raises an “already exists” error by re-fetching the repo ID and proceeding successfully. * a genuine creation failure (where the repo still cannot be found after the error) is propagated and not silently swallowed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
