amoghrajesh opened a new issue, #67168:
URL: https://github.com/apache/airflow/issues/67168

   ## Context
   
   PR #67118 adds sync resumable support to `SparkSubmitOperator` via 
`ResumableJobMixin`. During review, @ashb suggested also supporting a 
`deferrable=True` mode in the same operator.
   
   ## What this issue tracks
   
   Add `deferrable: bool = False` parameter to `SparkSubmitOperator`:
   
   - `deferrable=False` (default): sync path with `ResumableJobMixin` — worker 
slot occupied during polling, reconnects to existing driver on infrastructure 
failure (implemented in #67118)
   - `deferrable=True`: submit job, `defer()` to `SparkDriverTrigger`, worker 
slot freed during polling. When `execute()` is called again (only happens on 
user clear), resubmit fresh — no reconnect needed since crashes are handled by 
Trigger row persistence.
   
   ## What's needed
   
   1. `SparkDriverTrigger` — polls Spark REST API async until driver reaches 
terminal state
      - Standalone: `GET http://master:6066/v1/submissions/status/{driver_id}` 
via `aiohttp`
      - YARN: `GET http://rm:8088/ws/v1/cluster/apps/{app_id}` (when YARN adds 
`_should_track_driver_status=True`)
      - K8s: k8s pod phase API (when K8s adds 
`_should_track_driver_status=True`)
   2. `deferrable` parameter on operator + `on_driver_finished()` callback
   3. Tests
   
   ## Relationship to #67118
   
   The two modes share `spark_job_id` in `task_state`. A user can switch from 
sync to deferrable without any state migration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to