[jira] [Created] (SPARK-34684) Hadoop config could not be successfully serilized from driver pods to executor pods

Yue Peng (Jira) Tue, 09 Mar 2021 23:18:07 -0800

Yue Peng created SPARK-34684:
--------------------------------

             Summary: Hadoop config could not be successfully serilized from 
driver pods to executor pods
                 Key: SPARK-34684
                 URL: https://issues.apache.org/jira/browse/SPARK-34684
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.0.1
            Reporter: Yue Peng



I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs 
have been stored into a configmap and mounted to driver. However, spark pi 
example job keeps failing where executor do not know how to talk to hdfs. I 
highly suspect that there is a bug causing it, as I manually create a configmap 
storing hadoop configs and mounted it to executor in template file, which could 
fix the error. 

 

Spark submit command:

/opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi 
--deploy-mode cluster --master k8s://https://10.***.18.96:6443 --num-executors 
1 --conf spark.kubernetes.namespace=test --conf 
spark.kubernetes.container.image=**** --conf 
spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template
 --conf 
spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template
  --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000

 

 

Error log:

 

21/03/10 06:59:58 INFO TransportClientFactory: Successfully created connection 
to 
org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
 after 608 ms (392 ms spent in bootstraps)
21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root
21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root
21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to:
21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to:
21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication 
enabled; ui acls disabled; users with view permissions: Set(root); groups with 
view permissions: Set(); users with modify permissions: Set(root); groups with 
modify permissions: Set()
21/03/10 06:59:59 INFO TransportClientFactory: Successfully created connection 
to 
org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
 after 130 ms (104 ms spent in bootstraps)
21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at 
/var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d
21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 MiB
21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
spark://coarsegrainedschedu...@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078
21/03/10 06:59:59 INFO ResourceUtils: 
==============================================================
21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor:

21/03/10 06:59:59 INFO ResourceUtils: 
==============================================================
21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered 
with driver
21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192
21/03/10 07:00:00 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956.
21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on 
100.64.0.192:37956
21/03/10 07:00:00 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1
21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/10 07:00:01 INFO Executor: Fetching 
spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar
 with timestamp 1615359587432
21/03/10 07:00:01 INFO TransportClientFactory: Successfully created connection 
to 
org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
 after 65 ms (58 ms spent in bootstraps)
21/03/10 07:00:01 INFO Utils: Fetching 
spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar
 to 
/var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp
21/03/10 07:00:01 INFO Utils: Copying 
/var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache
 to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar
21/03/10 07:00:01 INFO Executor: Adding 
file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader
21/03/10 07:00:01 INFO Executor: Fetching 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO Executor: Fetching 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
 at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
 at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
 at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
 at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
 at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
 at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)

 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2
21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
 at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
 at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
 at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
 at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
 at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
 at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
21/03/10 07:00:01 INFO Executor: Fetching 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3
21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3)
21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
 at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
 at 
org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
 at 
scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
 at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
 at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
 at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
 at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
 at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
 at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
 at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.base/java.lang.Thread.run(Unknown Source)
21/03/10 07:00:01 INFO Executor: Fetching 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4
21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4)
21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3)
java.io.IOException: Incomplete HDFS URI, no host: 
hdfs:///tmp/spark-examples_2.12-3.0.125067.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-34684) Hadoop config could not be successfully serilized from driver pods to executor pods

Reply via email to