[ https://issues.apache.org/jira/browse/SPARK-38794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914918#comment-17914918 ]
zuotingbing commented on SPARK-38794: ------------------------------------- Is there a solution to the bug? > When ConfigMap creation fails, Spark driver starts but fails to start > executors > ------------------------------------------------------------------------------- > > Key: SPARK-38794 > URL: https://issues.apache.org/jira/browse/SPARK-38794 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.1.1, 3.1.2, 3.2.0, 3.2.1 > Reporter: Seth Horrigan > Priority: Major > > When running Spark in Kubernetes client mode, all executors assume that a > ConfigMap exactly matching `KubernetesClientUtils.configMapNameExecutor` will > exist (see > [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L98]) > If the ConfigMap creation fails, > [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L80], > (due to the Kubernetes control plane being temporarily unavailable or the > permissions of the serviceaccount being insufficient to create a ConfigMap), > the driver will start fully, then will wait for executors that will forever > fail to start due to "MountVolume.SetUp failed for volume > \"spark-conf-volume-exec\" : configmap \"spark-exec-...-conf-map\" not found" > > Either the driver start-up should fail with an error, or the driver should > retry the attempt to create the ConfigMap > -- > To reproduce the problem when the Kubernetes control plane is not > experiencing issues, start Spark in client mode, but do not give the > Kubernetes ServiceAccount permission to create ConfigMap. The driver pod will > start successfully, but the executor pods will terminate upon creation, and > the driver will not create new executors. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org