[ https://issues.apache.org/jira/browse/SPARK-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-16787: ------------------------------------ Assignee: Apache Spark (was: Josh Rosen) > SparkContext.addFile() should not fail if called twice with the same file > ------------------------------------------------------------------------- > > Key: SPARK-16787 > URL: https://issues.apache.org/jira/browse/SPARK-16787 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.6.2, 2.0.0 > Reporter: Josh Rosen > Assignee: Apache Spark > > The behavior of SparkContext.addFile() changed slightly with the introduction > of the Netty-RPC-based file server, which was introduced in Spark 1.6 (where > it was disabled by default) and became the default / only file server in > Spark 2.0.0. > Prior to 2.0, calling SparkContext.addFile() twice with the same path would > succeed and would cause future tasks to receive an updated copy of the file. > This behavior was never explicitly documented but Spark has behaved this way > since very early 1.x versions (some of the relevant lines in > Executor.updateDependencies() have existed since 2012). > In 2.0 (or 1.6 with the Netty file server enabled), the second addFile() call > will fail with a requirement error because NettyStreamManager tries to guard > against duplicate file registration. > I believe that this change of behavior was unintentional and propose to > remove the {{require}} check so that Spark 2.0 matches 1.x's default behavior. > This problem also affects addJar() in a more subtle way: the > fileServer.addJar() call will also fail with an exception but that exception > is logged and ignored due to some code which was added in 2014 in order to > ignore errors caused by missing Spark examples JARs when running on YARN > cluster mode (AFAIK). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org