[ https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503229#comment-17503229 ]
Thomas Graves commented on SPARK-38379: --------------------------------------- so the issue here is there is a race between when kubernetes call MountVolumesFeatureStep via adding it to the ExecutorPodsLifecycleManager which calls addSubscriber in ExecutorPodsSnapshotsStoreImpl. and when the spark.app.id is actually set in the Spark Context. Here spark context isn't set until after the scheduler backend has started. If its not set the only way to get the appId is to get the one generated in KubernetesClusterSchedulerBackend since that is wha tis ultimately used in spark context to set spark.app.id. I'll investigate a fix. > Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes > ------------------------------------------------------------------------------ > > Key: SPARK-38379 > URL: https://issues.apache.org/jira/browse/SPARK-38379 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.2.1 > Reporter: Thomas Graves > Priority: Major > > I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in > client mode. I'm using persistent local volumes to mount nvme under /data in > the executors and on startup the driver always throws the warning below. > using these options: > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false > > > {code:java} > 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > java.util.NoSuchElementException: spark.app.id > at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.SparkConf.get(SparkConf.scala:245) > at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org