[ https://issues.apache.org/jira/browse/SPARK-34684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307392#comment-17307392 ]
Attila Zsolt Piros commented on SPARK-34684: -------------------------------------------- > Have you tried to create a POD from a simple linux image with hadoop client > tools and access HDFS from command line? > Hadoop config could not be successfully serilized from driver pods to > executor pods > ----------------------------------------------------------------------------------- > > Key: SPARK-34684 > URL: https://issues.apache.org/jira/browse/SPARK-34684 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.0.1, 3.0.2 > Reporter: Yue Peng > Priority: Major > > I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs > have been stored into a configmap and mounted to driver. However, spark pi > example job keeps failing where executor do not know how to talk to hdfs. I > highly suspect that there is a bug causing it, as I manually create a > configmap storing hadoop configs and mounted it to executor in template file, > which could fix the error. > > Spark submit command: > /opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi > --deploy-mode cluster --master k8s://https://10.***.18.96:6443 > --num-executors 1 --conf spark.kubernetes.namespace=test --conf > spark.kubernetes.container.image=**** --conf > spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template > --conf > spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template > --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000 > > > Error log: > > 21/03/10 06:59:58 INFO TransportClientFactory: Successfully created > connection to > org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 > after 608 ms (392 ms spent in bootstraps) > 21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root > 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root > 21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to: > 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to: > 21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication > enabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 21/03/10 06:59:59 INFO TransportClientFactory: Successfully created > connection to > org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 > after 130 ms (104 ms spent in bootstraps) > 21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at > /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d > 21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 > MiB > 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: > spark://coarsegrainedschedu...@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078 > 21/03/10 06:59:59 INFO ResourceUtils: > ============================================================== > 21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor: > 21/03/10 06:59:59 INFO ResourceUtils: > ============================================================== > 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered > with driver > 21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192 > 21/03/10 07:00:00 INFO Utils: Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956. > 21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on > 100.64.0.192:37956 > 21/03/10 07:00:00 INFO BlockManager: Using > org.apache.spark.storage.RandomBlockReplicationPolicy for block replication > policy > 21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager > BlockManagerId(1, 100.64.0.192, 37956, None) > 21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager > BlockManagerId(1, 100.64.0.192, 37956, None) > 21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: > BlockManagerId(1, 100.64.0.192, 37956, None) > 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0 > 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1 > 21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 21/03/10 07:00:01 INFO Executor: Fetching > spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar > with timestamp 1615359587432 > 21/03/10 07:00:01 INFO TransportClientFactory: Successfully created > connection to > org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 > after 65 ms (58 ms spent in bootstraps) > 21/03/10 07:00:01 INFO Utils: Fetching > spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar > to > /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp > 21/03/10 07:00:01 INFO Utils: Copying > /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache > to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar > 21/03/10 07:00:01 INFO Executor: Adding > file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader > 21/03/10 07:00:01 INFO Executor: Fetching > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441 > 21/03/10 07:00:01 INFO Executor: Fetching > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441 > 21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) > java.io.IOException: Incomplete HDFS URI, no host: > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2 > 21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) > 21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > java.io.IOException: Incomplete HDFS URI, no host: > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > 21/03/10 07:00:01 INFO Executor: Fetching > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441 > 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3 > 21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3) > 21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2) > java.io.IOException: Incomplete HDFS URI, no host: > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479) > at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > 21/03/10 07:00:01 INFO Executor: Fetching > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441 > 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4 > 21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4) > 21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3) > java.io.IOException: Incomplete HDFS URI, no host: > hdfs:///tmp/spark-examples_2.12-3.0.125067.jar -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org