amoghrajesh commented on PR #67473: URL: https://github.com/apache/airflow/pull/67473#issuecomment-4666827014
Addressed all review comments. Here is a summary of whats in the final state: **Hook (`spark_submit.py`)** - `_is_yarn_cluster_mode` flag added (YARN + deploy_mode=cluster) - `submit()` no longer blocks for RM API polling when `yarn_track_via_rm_api=True`. It now exits after capturing `_yarn_application_id`. Polling responsibility moved to the operator. - `_track_yarn_application` renamed to `_start_yarn_application_status_tracking` with state-change logging and a periodic heartbeat every 10 polls - `query_yarn_application_status()` added: normalizes the YARN `(state, finalStatus)` tuple to a single string for the `ResumableJobMixin` interface - `kill_yarn_application()` public wrapper removed. `on_kill()` in the operator calls `_kill_yarn_application()` directly so Kerberos auth is applied **Operator (`spark_submit.py`)** Four paths for YARN cluster mode in `execute()`: 1. `reconnect_on_retry=True` + `yarn_track_via_rm_api=True` -- full crash recovery via `execute_resumable()` 2. `reconnect_on_retry=True` + `yarn_track_via_rm_api=False` -- raises `ValueError` at startup (no way to resume without RM API) 3. `reconnect_on_retry=False` + `yarn_track_via_rm_api=True` -- submit + poll without task_state persistence 4. `reconnect_on_retry=False` + `yarn_track_via_rm_api=False` -- falls through to `hook.submit()`, spark-submit blocks with `waitAppCompletion=true` (unchanged legacy behavior) `on_kill()` kills via REST API for YARN cluster mode since spark-submit has already exited at that point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
