[ 
https://issues.apache.org/jira/browse/SPARK-38794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914918#comment-17914918
 ] 

zuotingbing commented on SPARK-38794:
-------------------------------------

Is there a solution to the bug?

> When ConfigMap creation fails, Spark driver starts but fails to start 
> executors
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-38794
>                 URL: https://issues.apache.org/jira/browse/SPARK-38794
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1
>            Reporter: Seth Horrigan
>            Priority: Major
>
> When running Spark in Kubernetes client mode, all executors assume that a 
> ConfigMap exactly matching `KubernetesClientUtils.configMapNameExecutor` will 
> exist (see 
> [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L98])
> If the ConfigMap creation fails, 
> [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L80],
>  (due to the Kubernetes control plane being temporarily unavailable or the 
> permissions of the serviceaccount being insufficient to create a ConfigMap), 
> the driver will start fully, then will wait for executors that will forever 
> fail to start due to "MountVolume.SetUp failed for volume 
> \"spark-conf-volume-exec\" : configmap \"spark-exec-...-conf-map\" not found" 
>  
> Either the driver start-up should fail with an error, or the driver should 
> retry the attempt to create the ConfigMap
> --
> To reproduce the problem when the Kubernetes control plane is not 
> experiencing issues, start Spark in client mode, but do not give the 
> Kubernetes ServiceAccount permission to create ConfigMap. The driver pod will 
> start successfully, but the executor pods will terminate upon creation, and 
> the driver will not create new executors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to