shahar1 commented on code in PR #67118:
URL: https://github.com/apache/airflow/pull/67118#discussion_r3288908028


##########
providers/apache/spark/docs/operators.rst:
##########
@@ -181,3 +181,24 @@ Reference
 """""""""
 
 For further information, look at `Apache Spark submitting applications 
<https://spark.apache.org/docs/latest/submitting-applications.html>`_.
+
+Cluster mode crash recovery (Spark standalone)
+"""""""""""""""""""""""""""""""""""""""""""""""
+
+When running in Spark standalone cluster mode (``--deploy-mode cluster``), the 
Spark driver runs
+independently on the cluster. If the Airflow worker dies while the Spark job 
is running, the driver keeps running but
+Airflow loses track of it and the behaviour to submit a brand new job would be 
wasting
+the compute already done.
+
+Now, the ``SparkSubmitOperator`` solves this by persisting the driver ID to 
``task_state`` immediately after
+submission. On retry, it reads the ID back and reconnects to the 
already-running driver instead of
+resubmitting.
+
+This is the **synchronous path** — the worker holds a slot for the duration of 
polling. This is
+intentional for teams that prefer sync operators for log observability, org 
constraints, or
+because a Triggerer is not available. It is not a replacement for deferrable 
operators; the two
+approaches are complementary.

Review Comment:
   ```suggestion
   This is the **synchronous path** — the worker holds a slot for the duration 
of polling. This is
   a crash-safety net for teams running sync operators for log observability, 
org constraints, or
   because a Triggerer is not available. If a Triggerer is available, deferrable
   operators are the better choice for long-running tasks.
   ```
   
   The original "complementary" implies parity that doesn't exist - deferrable 
wins on resource cost whenever it's an option. Worth updating the docs to say 
so explicitly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to