[ https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127199#comment-16127199 ]
Saisai Shao commented on SPARK-21714: ------------------------------------- Let me take a crack on this if no one is working on it. > SparkSubmit in Yarn Client mode downloads remote files and then reuploads > them again > ------------------------------------------------------------------------------------ > > Key: SPARK-21714 > URL: https://issues.apache.org/jira/browse/SPARK-21714 > Project: Spark > Issue Type: Bug > Components: Spark Submit > Affects Versions: 2.2.0 > Reporter: Thomas Graves > Priority: Critical > > SPARK-10643 added the ability for spark-submit to download remote file in > client mode. > However in yarn mode this introduced a bug where it downloads them for the > client but then yarn client just reuploads them to HDFS and uses them again. > This should not happen when the remote file is HDFS. This is wasting > resources and its defeating the distributed cache because if the original > object was public it would have been shared by many users. By us downloading > and reuploading, it becomes private. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org