[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936313#comment-14936313 ]
Saisai Shao commented on SPARK-10858: ------------------------------------- Hi [~tgraves], I tested again with Mac and Linux (centos), seems the behavior is different. In Mac, if we use {{--jars my.jar#renamed.jar}} this file path will be resolved to URI {{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}} if we use {{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} this file path will be resolved to URI {{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} This is done by Utils#resolveURI {code} def resolveURI(path: String): URI = { try { val uri = new URI(path) if (uri.getScheme() != null) { return uri } } catch { case e: URISyntaxException => } new File(path).getAbsoluteFile().toURI() } {code} Where if scheme is not specified, this code will transform the file path into URI, the noted thing is that "#" will be translated into "%23" in this `toURI`. In Centos: both {{--jars my.jar#renamed.jar}} and {{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} will be resolved to {{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} through Utils#resolveURI, obviously "#" is not escaped. So in my test, both these two ways of using --jars are failed in Centos. After digging into the Hadoop code RawLocalFileSystem#pathToFile: {code} public File pathToFile(Path path) { checkPath(path); if (!path.isAbsolute()) { path = new Path(getWorkingDirectory(), path); } return new File(path.toUri().getPath()); } {code} Here using `URI.getPath` to get file path will lead to different behavior if we do not escape "#" to "%23", which will treat the part after "#" as fragment, not path. So in Mac without specifying scheme is succeeded, whereas in Centos both two ways are failed. But if we instead using {{--jars my.jar%23renamed.jar}} or {{--jars file:///path/to/my.jar%23renamed.jar}}, it can be succeeded in Centos. > YARN: archives/jar/files rename with # doesn't work unless scheme given > ----------------------------------------------------------------------- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.5.1 > Reporter: Thomas Graves > Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org