[ 
https://issues.apache.org/jira/browse/SPARK-8369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lei updated SPARK-8369:
----------------------------
    Description: 
Currently, in standalone cluster mode, spark can take care of the app-jar 
whether the app-jar is specified by file:// or hdfs://. But the dependencies 
specified by --jars and --files do not support a hdfs:// prefix. 

For example:
spark-submit 
--master spark://ip:port
--deploy-mode cluster
 ...
--jars hdfs://path1/1.jar hdfs://path2/2.jar
--files hdfs://path3/3.file hdfs://path4/4.file
hdfs://path5/app.jar

only app.jar will be downloaded to the driver and distributed to executors, 
others (1.jar, 2.jar. 3.file, 4.file) will not. 
I think such a feature is useful for users. 

----------------------------
To support such a feature, I think we can treat the jars and files like the app 
jar in DriverRunner. We download them and replace the remote addresses with 
local addresses. And the DriverWrapper will not be aware.  

The problem is it's not easy to replace these addresses than replace the 
location app jar, because we have a placeholder for app jar "<<USER_JAR>>".  We 
may need to do some string matching to achieve it. 

  was:
Currently, in standalone cluster mode, spark can take care of the app-jar 
whether the app-jar is specified by file:// or hdfs://. But the dependencies 
specified by --jars and --files do not support a hdfs:// prefix. 

For example:
spark-submit 
 ...
--jars hdfs://path1/1.jar hdfs://path2/2.jar
--files hdfs://path3/3.file hdfs://path4/4.file
hdfs://path5/app.jar

only app.jar will be downloaded to the driver and distributed to executors, 
others (1.jar, 2.jar. 3.file, 4.file) will not. 
I think such a feature is useful for users. 

----------------------------
To support such a feature, I think we can treat the jars and files like the app 
jar in DriverRunner. We download them and replace the remote addresses with 
local addresses. And the DriverWrapper will not be aware.  

The problem is it's not easy to replace these addresses than replace the 
location app jar, because we have a placeholder for app jar "<<USER_JAR>>".  We 
may need to do some string matching to achieve it. 


> Support dependency jar and files on HDFS in standalone cluster mode
> -------------------------------------------------------------------
>
>                 Key: SPARK-8369
>                 URL: https://issues.apache.org/jira/browse/SPARK-8369
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Dong Lei
>
> Currently, in standalone cluster mode, spark can take care of the app-jar 
> whether the app-jar is specified by file:// or hdfs://. But the dependencies 
> specified by --jars and --files do not support a hdfs:// prefix. 
> For example:
> spark-submit 
> --master spark://ip:port
> --deploy-mode cluster
>  ...
> --jars hdfs://path1/1.jar hdfs://path2/2.jar
> --files hdfs://path3/3.file hdfs://path4/4.file
> hdfs://path5/app.jar
> only app.jar will be downloaded to the driver and distributed to executors, 
> others (1.jar, 2.jar. 3.file, 4.file) will not. 
> I think such a feature is useful for users. 
> ----------------------------
> To support such a feature, I think we can treat the jars and files like the 
> app jar in DriverRunner. We download them and replace the remote addresses 
> with local addresses. And the DriverWrapper will not be aware.  
> The problem is it's not easy to replace these addresses than replace the 
> location app jar, because we have a placeholder for app jar "<<USER_JAR>>".  
> We may need to do some string matching to achieve it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to