[ 
https://issues.apache.org/jira/browse/SPARK-28992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28992:
----------------------------------
        Parent:     (was: SPARK-33005)
    Issue Type: Improvement  (was: Sub-task)

> Support update dependencies from hdfs when task run on executor pods
> --------------------------------------------------------------------
>
>                 Key: SPARK-28992
>                 URL: https://issues.apache.org/jira/browse/SPARK-28992
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Kent Yao
>            Priority: Major
>
> Here is a case: 
> {code:java}
> bin/spark-submit  --class com.github.ehiggs.spark.terasort.TeraSort 
> hdfs://hz-cluster10/user/kyuubi/udf/spark-terasort-1.1-SNAPSHOT-jar-with-dependencies.jar
>  hdfs://hz-cluster10/user/kyuubi/terasort/1000g 
> hdfs://hz-cluster10/user/kyuubi/terasort/1000g-out1
> {code}
> Spark supports add jar logic and application-jar from hdfs - -  
> [http://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit]
> Take spark on yarn for example, it creates a __spark_hadoop_conf__.xml file 
> and upload the hadoop distribute cache, the executor processes can use this 
> to identify where their dependencies located.
> But on k8s, i tried and failed to update dependencies.
> {code:java}
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10
> 19/09/04 08:08:52 INFO scheduler.DAGScheduler: ShuffleMapStage 0 
> (newAPIHadoopFile at TeraSort.scala:60) failed in 1.058 s due to Job aborted 
> due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent 
> failure: Lost task 0.3 in stage 0.0 (TID 9, 100.66.0.75, executor 2): 
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> hz-cluster10 at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) 
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at 
> org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at 
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at 
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at 
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at 
> org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at 
> org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737) at 
> org.apache.spark.util.Utils$.fetchFile(Utils.scala:522) at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:869)
>  at 
> org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:860)
>  at 
> scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:792)
>  at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) at 
> scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) at 
> scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) at 
> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) at 
> scala.collection.mutable.HashMap.foreach(HashMap.scala:149) at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:791)
>  at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:860)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to