[ https://issues.apache.org/jira/browse/SPARK-28992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28992: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Support update dependencies from hdfs when task run on executor pods > -------------------------------------------------------------------- > > Key: SPARK-28992 > URL: https://issues.apache.org/jira/browse/SPARK-28992 > Project: Spark > Issue Type: Improvement > Components: Kubernetes > Affects Versions: 3.1.0 > Reporter: Kent Yao > Priority: Major > > Here is a case: > {code:java} > bin/spark-submit --class com.github.ehiggs.spark.terasort.TeraSort > hdfs://hz-cluster10/user/kyuubi/udf/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar > hdfs://hz-cluster10/user/kyuubi/terasort/1000g > hdfs://hz-cluster10/user/kyuubi/terasort/1000g-out1 > {code} > Spark supports add jar logic and application-jar from hdfs - - > [http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit] > Take spark on yarn for example, it creates a __spark_hadoop_conf__.xml file > and upload the hadoop distribute cache, the executor processes can use this > to identify where their dependencies located. > But on k8s, i tried and failed to update dependencies. > {code:java} > 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 > (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted > due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent > failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hz-cluster10 > 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 > (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted > due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent > failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): > java.lang.IllegalArgumentException: java.net.UnknownHostException: > hz-cluster10 at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at > org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at > org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at > org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at > org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) at > org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869) > at > org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860) > at > scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792) > at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at > scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at > scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at > scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org