Seth Horrigan created SPARK-38794: ------------------------------------- Summary: When ConfigMap creation fails, Spark driver starts but fails to start executors Key: SPARK-38794 URL: https://issues.apache.org/jira/browse/SPARK-38794 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1 Reporter: Seth Horrigan
When running Spark in Kubernetes client mode, all executors assume that a ConfigMap exactly matching `KubernetesClientUtils.configMapNameExecutor` will exist (see [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L98]) If the ConfigMap creation fails, [https://github.com/apache/spark/blob/02a055a42de5597cd42c1c0d4470f0e769571dc3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L80], (due to the Kubernetes control plane being temporarily unavailable or the permissions of the serviceaccount being insufficient to create a ConfigMap), the driver will start fully, then will wait for executors that will forever fail to start due to "MountVolume.SetUp failed for volume \"spark-conf-volume-exec\" : configmap \"spark-exec-...-conf-map\" not found" Either the driver start-up should fail with an error, or the driver should retry the attempt to create the ConfigMap -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org