[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

mccheah Tue, 08 May 2018 13:15:03 -0700

Github user mccheah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21241#discussion_r186853242
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -561,6 +561,13 @@ specific to Spark on Kubernetes.
         This is distinct from <code>spark.executor.cores</code>: it is only 
used and takes precedence over <code>spark.executor.cores</code> for specifying 
the executor pod cpu request if set. Task 
         parallelism, e.g., number of tasks an executor can run concurrently is 
not affected by this.
     </tr>
    +<tr>
    +  <td><code>spark.kubernetes.executor.maxInitFailures</code></td>
    +  <td>10</td>
    +  <td>
    +    Maximum number of times executors are allowed to fail with an 
Init:Error state before failing the application. Note that Init:Error failures 
should not be caused by Spark itself because Spark does not attach 
init-containers to pods. Init-containers can be attached by the cluster itself. 
Users should check with their cluster administrator if these kinds of failures 
to start the executor pod occur frequently.
    --- End diff --
    
    As per https://github.com/apache/spark/pull/21241#discussion_r186789848 I 
think it's important to define what that full set of error types is. Do we have 
a comprehensive list we can follow?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21241: [SPARK-24135][K8s] Resilience to init-container e...

Reply via email to