Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/21241#discussion_r186855467 --- Diff: docs/running-on-kubernetes.md --- @@ -561,6 +561,13 @@ specific to Spark on Kubernetes. This is distinct from <code>spark.executor.cores</code>: it is only used and takes precedence over <code>spark.executor.cores</code> for specifying the executor pod cpu request if set. Task parallelism, e.g., number of tasks an executor can run concurrently is not affected by this. </tr> +<tr> + <td><code>spark.kubernetes.executor.maxInitFailures</code></td> + <td>10</td> + <td> + Maximum number of times executors are allowed to fail with an Init:Error state before failing the application. Note that Init:Error failures should not be caused by Spark itself because Spark does not attach init-containers to pods. Init-containers can be attached by the cluster itself. Users should check with their cluster administrator if these kinds of failures to start the executor pod occur frequently. --- End diff -- As long as it's relatively easy to extend, generalizing on a case by case basis should be OK
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org