kaxil opened a new pull request, #63288: URL: https://github.com/apache/airflow/pull/63288
Follow-up to #63269. The backfill command previously used shared `providers.json` state, meaning two `breeze registry backfill` runs for different providers couldn't safely execute concurrently. This adds `--provider` and `--providers-json` flags to both extraction scripts (`extract_parameters.py`, `extract_connections.py`) so each backfill run uses an isolated temp `providers.json` and only scans the target provider. In `--provider` mode, `modules.json` is not written (it would be incomplete), so concurrent runs don't clobber each other. ## What changed - **`extract_parameters.py`**: `--provider` flag filters to single provider and skips `modules.json`/`runtime_modules.json` writes; `--providers-json` overrides the default search paths - **`extract_connections.py`**: Same two flags — filters output to single provider and accepts a custom providers.json path - **`registry_commands.py`**: Backfill command now creates a temp `providers.json` per version in a `TemporaryDirectory`, passes `--provider`/`--providers-json` to scripts, removes the `_patch_providers_json` save/restore pattern ## Usage Two terminal sessions can now safely backfill different providers simultaneously: ```bash # Terminal 1 breeze registry backfill --provider amazon --version 9.15.0 --version 9.14.0 # Terminal 2 breeze registry backfill --provider google --version 14.0.0 --version 13.0.0 ``` The `registry-backfill.yml` GitHub Actions workflow already uses a matrix strategy per provider, so this also makes individual CI jobs faster (no longer scanning 100+ providers per run). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
