mridulm commented on code in PR #40286:
URL: https://github.com/apache/spark/pull/40286#discussion_r1125790378


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2479,4 +2479,14 @@ package object config {
       .version("3.4.0")
       .booleanConf
       .createWithDefault(false)
+
+  private[spark] val STAGE_MAX_ATTEMPTS =
+    ConfigBuilder("spark.stage.maxAttempts")
+      .doc("The max attempts for a stage, the spark job will be aborted if any 
of its stages is " +
+        "resubmitted multiple times beyond the limitation. The value should be 
no less " +
+        "than `spark.stage.maxConsecutiveAttempts` which defines the max 
attempts for " +
+        "fetch failures.")
+      .version("3.5.0")
+      .intConf
+      .createWithDefault(16)

Review Comment:
   Since this is a behavior change, let us make this an optional parameter - 
and preserve current behavior when not configured (or make default int max 
value).
   We can change this to a more restrictive default value in a future release.
   
   Given cascading stage retries and deployments where decom is not applicable 
(yarn for example), particularly due to `INDETERMINATE` stage, this minimizes 
application failures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to