[ 
https://issues.apache.org/jira/browse/SPARK-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16787:
------------------------------------

    Assignee: Apache Spark  (was: Josh Rosen)

> SparkContext.addFile() should not fail if called twice with the same file
> -------------------------------------------------------------------------
>
>                 Key: SPARK-16787
>                 URL: https://issues.apache.org/jira/browse/SPARK-16787
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Josh Rosen
>            Assignee: Apache Spark
>
> The behavior of SparkContext.addFile() changed slightly with the introduction 
> of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where 
> it was disabled by default) and became the default / only file server in 
> Spark 2.0.0.
> Prior to 2.0, calling SparkContext.addFile() twice with the same path would 
> succeed and would cause future tasks to receive an updated copy of the file. 
> This behavior was never explicitly documented but Spark has behaved this way 
> since very early 1.x versions (some of the relevant lines in 
> Executor.updateDependencies() have existed since 2012).
> In 2.0 (or 1.6 with the Netty file server enabled), the second addFile() call 
> will fail with a requirement error because NettyStreamManager tries to guard 
> against duplicate file registration.
> I believe that this change of behavior was unintentional and propose to 
> remove the {{require}} check so that Spark 2.0 matches 1.x's default behavior.
> This problem also affects addJar() in a more subtle way: the 
> fileServer.addJar() call will also fail with an exception but that exception 
> is logged and ignored due to some code which was added in 2014 in order to 
> ignore errors caused by missing Spark examples JARs when running on YARN 
> cluster mode (AFAIK).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to