[ 
https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-21714:
-----------------------------------
    Fix Version/s: 2.2.1

> SparkSubmit in Yarn Client mode downloads remote files and then reuploads 
> them again
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-21714
>                 URL: https://issues.apache.org/jira/browse/SPARK-21714
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.2.0
>            Reporter: Thomas Graves
>            Assignee: Saisai Shao
>            Priority: Critical
>             Fix For: 2.2.1, 2.3.0
>
>
> SPARK-10643 added the ability for spark-submit to download remote file in 
> client mode.
> However in yarn mode this introduced a bug where it downloads them for the 
> client but then yarn client just reuploads them to HDFS and uses them again. 
> This should not happen when the remote file is HDFS.  This is wasting 
> resources and its defeating the  distributed cache because if the original 
> object was public it would have been shared by many users. By us downloading 
> and reuploading, it becomes private.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to