Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21067#discussion_r194559094 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala --- @@ -67,12 +68,19 @@ private[spark] class BasicExecutorFeatureStep( } private val executorLimitCores = kubernetesConf.get(KUBERNETES_EXECUTOR_LIMIT_CORES) - override def configurePod(pod: SparkPod): SparkPod = { - val name = s"$executorPodNamePrefix-exec-${kubernetesConf.roleSpecificConf.executorId}" + // If the driver pod is killed, the new driver pod will try to + // create new executors with the same name, but it will fail + // and hangs indefinitely because a terminating executors blocks + // the creation of the new ones, so to avoid that apply salt + private val executorNameSalt = Random.alphanumeric.take(4).mkString("").toLowerCase --- End diff -- > I think it is only generated when a new Spark Application is submitted. We need a random number which is regenerated for every new driver Pod not for new Application. Also it is too long because Kubernetes only allows pods with name length not longer than 64 character. The application id is generated when then JVM launches - see `SchedulerBackend.scala`. Note this application ID isn't populated by spark submit itself.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org