deepujain opened a new pull request, #63206: URL: https://github.com/apache/airflow/pull/63206
## Summary Fixes DagBag timeout when DAGs import `DataprocCreateBatchOperator` (#62373). Importing this operator previously pulled in the full `operators.dataproc` module and its heavy dependencies (`google.cloud.dataproc_v1`, `DataprocHook`, triggers), causing parse times of 30+ seconds on small workers. ## Change - **Lazy-load `DataprocCreateBatchOperator`:** Turned `operators.dataproc` into a package. `DataprocCreateBatchOperator` is provided from a lightweight `._batch` submodule that defers `google.cloud.dataproc_v1`, `DataprocHook`, and related imports until `execute()` / `hook` / etc. All other operators remain in `._core` and are loaded on first access. - **`dataproc/__init__.py`:** Uses `__getattr__` to return `DataprocCreateBatchOperator` from `._batch` and other names from `._core`. - **`dataproc/_batch.py`:** Contains only `DataprocCreateBatchOperator` with local imports for heavy deps inside methods. - **`dataproc/_core.py`:** Previous `dataproc.py` content minus the Batch operator class. - **Tests:** `DATAPROC_PATH` now points at `._core` for non-Batch operators. `TestDataprocCreateBatchOperator` uses `DATAPROC_BATCH_HOOK_PATH` and `DATAPROC_BATCH_TO_DICT_PATH` so mocks apply where the Batch operator actually imports (hooks module and `google.cloud.dataproc_v1`). ## Why no new tests Existing `TestDataprocCreateBatchOperator` tests were updated to patch the correct modules and continue to cover the operator; no new test file added. Fixes #62373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
