[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948941#comment-14948941 ] Thomas Graves commented on SPARK-10858: --- Sorry for the delay on this didn't have time to look at it. Not sure why you are seeing different from me. thanks for looking into this. I agree its in the parsing in the resolveURI where its calling new File(path).getAbsoluteFile().toURI(). When I don't specify file://: 15/10/08 15:35:56 INFO Client: local uri is: file:/homes/tgraves/R_install/R_install.tgz%23R_installation with file:// 15/10/08 15:38:27 INFO Client: local uri is: file:/homes/tgraves/R_install/R_install.tgz#R_installation That is coming back with the %23 encoded versus the #. when I originally wrote those code it wasn't calling the Utils.resolveURIs. Looking at the actual code for File.toURI() you will see its not really parsing the fragment out before calling URI() which I think is the problem: public URI toURI() { try { File f = getAbsoluteFile(); String sp = slashify(f.getPath(), f.isDirectory()); if (sp.startsWith("//")) sp = "//" + sp; return new URI("file", null, sp, null); } catch (URISyntaxException x) { throw new Error(x); // Can't happen } } It seems like a bad idea to call this based on the fact that the string might already be URI format. So we are now going from possible URI to File and back to URI. When we change it to a File its not expecting it to be URI with fragment already so its treating it as part of the path. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949372#comment-14949372 ] Apache Spark commented on SPARK-10858: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/9035 > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949334#comment-14949334 ] Thomas Graves commented on SPARK-10858: --- so now I'm really confused why this is backwards for you.The code in resolveURI, clearly skips the code causing the problem if it already has a scheme on it: def resolveURI(path: String): URI = { try { val uri = new URI(path) if (uri.getScheme() != null) { return uri } Unless perhaps your shell environment is doing something. Which perhaps is why if you escape it with \# it works. I'm using bash. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934724#comment-14934724 ] Saisai Shao commented on SPARK-10858: - [~tgraves], I assume you're using yarn-cluster mode to submit application, because the way of yarn-client to deal with {{--jars}} is different and the stack shall be different. The interesting thing is that I get the opposite result compared to yours. I succeed without added scheme, but failed with scheme added, here are my two commands: success: {code} ./bin/spark-submit --master yarn-cluster --queue a --jars /Users/sshao/projects/apache-spark/my.jar\#renamed.jar --class org.apache.spark.examples.SparkPi examples/target/scala-2.10/spark-examples-1.6.0-SNAPSHOT-hadoop2.6.0.jar 10 {code} failed: {code} ./bin/spark-submit --master yarn-cluster --queue a --jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar --class org.apache.spark.examples.SparkPi examples/target/scala-2.10/spark-examples-1.6.0-SNAPSHOT-hadoop2.6.0.jar 10 {code} > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935129#comment-14935129 ] Thomas Graves commented on SPARK-10858: --- yes its a bad thing as users don't know when # works. It should work in all cases, file://, hdfs://. The default is file:// so I would expect it to act the same whether you specify the scheme or not since that is the default. [~jerryshao] what was the error you got in the failed case? You escaped the # in the first case and now the second. what platform are you on? I was assuming it was failing when the scheme was explicit because we are using getFragment() for perhaps it wasn't fully parsing the URI without the scheme. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935134#comment-14935134 ] Thomas Graves commented on SPARK-10858: --- Note the # is the name that we give it on the yarn side that the executor actually see so it doesn't matter where the file is originating from (file:// or hdfs://). > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936317#comment-14936317 ] Saisai Shao commented on SPARK-10858: - So basically I think the problem is do we need to treat this name "xx#xx" as a legal name, if so we need to fix this behavior. Another interesting thing is that not sure why your result is different from mine. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936313#comment-14936313 ] Saisai Shao commented on SPARK-10858: - Hi [~tgraves], I tested again with Mac and Linux (centos), seems the behavior is different. In Mac, if we use {{--jars my.jar#renamed.jar}} this file path will be resolved to URI {{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}} if we use {{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} this file path will be resolved to URI {{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} This is done by Utils#resolveURI {code} def resolveURI(path: String): URI = { try { val uri = new URI(path) if (uri.getScheme() != null) { return uri } } catch { case e: URISyntaxException => } new File(path).getAbsoluteFile().toURI() } {code} Where if scheme is not specified, this code will transform the file path into URI, the noted thing is that "#" will be translated into "%23" in this `toURI`. In Centos: both {{--jars my.jar#renamed.jar}} and {{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} will be resolved to {{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} through Utils#resolveURI, obviously "#" is not escaped. So in my test, both these two ways of using --jars are failed in Centos. After digging into the Hadoop code RawLocalFileSystem#pathToFile: {code} public File pathToFile(Path path) { checkPath(path); if (!path.isAbsolute()) { path = new Path(getWorkingDirectory(), path); } return new File(path.toUri().getPath()); } {code} Here using `URI.getPath` to get file path will lead to different behavior if we do not escape "#" to "%23", which will treat the part after "#" as fragment, not path. So in Mac without specifying scheme is succeeded, whereas in Centos both two ways are failed. But if we instead using {{--jars my.jar%23renamed.jar}} or {{--jars file:///path/to/my.jar%23renamed.jar}}, it can be succeeded in Centos. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936176#comment-14936176 ] Saisai Shao commented on SPARK-10858: - The error I got in the failed case is the same as you mentioned above, I'm running on Mac OS with Hadoop 2.6.0. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936174#comment-14936174 ] Saisai Shao commented on SPARK-10858: - The error I got in the failed case is the same as you mentioned above, I'm running on Mac OS with Hadoop 2.6.0. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936175#comment-14936175 ] Saisai Shao commented on SPARK-10858: - The error I got in the failed case is the same as you mentioned above, I'm running on Mac OS with Hadoop 2.6.0. > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Minor > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org