[jira] [Commented] (SPARK-34684) Hadoop config could not be successfully serilized from driver pods to executor pods

Attila Zsolt Piros (Jira) Tue, 23 Mar 2021 13:10:05 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-34684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307392#comment-17307392
 ]


Attila Zsolt Piros commented on SPARK-34684:
--------------------------------------------


> Have you tried to create a POD from a simple linux image with hadoop client 
> tools and access HDFS from command line?



> Hadoop config could not be successfully serilized from driver pods to 
> executor pods
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-34684
>                 URL: https://issues.apache.org/jira/browse/SPARK-34684
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1, 3.0.2
>            Reporter: Yue Peng
>            Priority: Major
>
> I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs 
> have been stored into a configmap and mounted to driver. However, spark pi 
> example job keeps failing where executor do not know how to talk to hdfs. I 
> highly suspect that there is a bug causing it, as I manually create a 
> configmap storing hadoop configs and mounted it to executor in template file, 
> which could fix the error. 
>  
> Spark submit command:
> /opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi 
> --deploy-mode cluster --master k8s://https://10.***.18.96:6443 
> --num-executors 1 --conf spark.kubernetes.namespace=test --conf 
> spark.kubernetes.container.image=**** --conf 
> spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template
>  --conf 
> spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template
>   --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000
>  
>  
> Error log:
>  
> 21/03/10 06:59:58 INFO TransportClientFactory: Successfully created 
> connection to 
> org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
>  after 608 ms (392 ms spent in bootstraps)
> 21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root
> 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root
> 21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to:
> 21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to:
> 21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users with view permissions: Set(root); groups 
> with view permissions: Set(); users with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 21/03/10 06:59:59 INFO TransportClientFactory: Successfully created 
> connection to 
> org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
>  after 130 ms (104 ms spent in bootstraps)
> 21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d
> 21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 
> MiB
> 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
> spark://coarsegrainedschedu...@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078
> 21/03/10 06:59:59 INFO ResourceUtils: 
> ==============================================================
> 21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor:
> 21/03/10 06:59:59 INFO ResourceUtils: 
> ==============================================================
> 21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered 
> with driver
> 21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192
> 21/03/10 07:00:00 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956.
> 21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on 
> 100.64.0.192:37956
> 21/03/10 07:00:00 INFO BlockManager: Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy
> 21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager 
> BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: 
> BlockManagerId(1, 100.64.0.192, 37956, None)
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1
> 21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 21/03/10 07:00:01 INFO Executor: Fetching 
> spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar
>  with timestamp 1615359587432
> 21/03/10 07:00:01 INFO TransportClientFactory: Successfully created 
> connection to 
> org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078
>  after 65 ms (58 ms spent in bootstraps)
> 21/03/10 07:00:01 INFO Utils: Fetching 
> spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar
>  to 
> /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp
> 21/03/10 07:00:01 INFO Utils: Copying 
> /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache
>  to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar
> 21/03/10 07:00:01 INFO Executor: Adding 
> file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader
> 21/03/10 07:00:01 INFO Executor: Fetching 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO Executor: Fetching 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
> java.io.IOException: Incomplete HDFS URI, no host: 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
>  21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2
> 21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.io.IOException: Incomplete HDFS URI, no host: 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> 21/03/10 07:00:01 INFO Executor: Fetching 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3
> 21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
> java.io.IOException: Incomplete HDFS URI, no host: 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
>  at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
>  at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
>  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
>  at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
>  at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> 21/03/10 07:00:01 INFO Executor: Fetching 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
> 21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4
> 21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4)
> 21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3)
> java.io.IOException: Incomplete HDFS URI, no host: 
> hdfs:///tmp/spark-examples_2.12-3.0.125067.jar



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34684) Hadoop config could not be successfully serilized from driver pods to executor pods

Reply via email to