[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948941#comment-14948941
 ] 

Thomas Graves commented on SPARK-10858:
---

Sorry for the delay on this didn't have time to look at it.  Not sure why you 
are seeing different from me.

thanks for looking into this.  I agree its in the parsing  in the resolveURI 
where its calling new File(path).getAbsoluteFile().toURI().

When I don't specify file://:
15/10/08 15:35:56 INFO Client: local uri is: 
file:/homes/tgraves/R_install/R_install.tgz%23R_installation

with file://
15/10/08 15:38:27 INFO Client: local uri is: 
file:/homes/tgraves/R_install/R_install.tgz#R_installation

That is coming back with the %23 encoded versus the #.   when I originally 
wrote those code it wasn't calling the Utils.resolveURIs.  

 Looking at the actual code for File.toURI() you will see its not really 
parsing the fragment out before calling URI() which I think is the problem:

   public URI toURI() {
try {
File f = getAbsoluteFile();
String sp = slashify(f.getPath(), f.isDirectory());
if (sp.startsWith("//"))
sp = "//" + sp;
return new URI("file", null, sp, null);
} catch (URISyntaxException x) {
throw new Error(x); // Can't happen
}
}


It seems like a bad idea to call this based on the fact that the string might 
already be URI format.  So we are now going from possible URI to File and back 
to URI.  When we change it to a File its not expecting it to be URI with 
fragment already so its treating it as part of the path.


> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949372#comment-14949372
 ] 

Apache Spark commented on SPARK-10858:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/9035

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-10-08 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949334#comment-14949334
 ] 

Thomas Graves commented on SPARK-10858:
---

so now I'm really confused why this is backwards for you.The code in 
resolveURI, clearly skips the code causing the problem if it already has a 
scheme on it:

def resolveURI(path: String): URI = {
try {
  val uri = new URI(path)
  if (uri.getScheme() != null) {
return uri
  }

Unless perhaps your shell environment is doing something.  Which perhaps is why 
if you escape it with \# it works.   I'm using bash.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934724#comment-14934724
 ] 

Saisai Shao commented on SPARK-10858:
-

[~tgraves], I assume you're using yarn-cluster mode to submit application, 
because the way of yarn-client to deal with {{--jars}} is different and the 
stack shall be different.

The interesting thing is that I get the opposite result compared to yours. I 
succeed without added scheme, but failed with scheme added, here are my two 
commands:

success:

{code}
./bin/spark-submit --master yarn-cluster --queue a --jars 
/Users/sshao/projects/apache-spark/my.jar\#renamed.jar  --class 
org.apache.spark.examples.SparkPi 
examples/target/scala-2.10/spark-examples-1.6.0-SNAPSHOT-hadoop2.6.0.jar 10
{code}

failed:

{code}
./bin/spark-submit --master yarn-cluster --queue a --jars 
file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar  --class 
org.apache.spark.examples.SparkPi 
examples/target/scala-2.10/spark-examples-1.6.0-SNAPSHOT-hadoop2.6.0.jar 10
{code}




> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935129#comment-14935129
 ] 

Thomas Graves commented on SPARK-10858:
---

yes its a bad thing as users don't know when # works.  It should work in all 
cases, file://, hdfs://. The default is file:// so I would expect it to act the 
same whether you specify the scheme or not since that is the default.

[~jerryshao]  what was the error you got in the failed case?  You escaped the # 
in the first case and now the second. what platform are you on?

I was assuming it was failing when the scheme was explicit because we are using 
getFragment() for perhaps it wasn't fully parsing the URI without the scheme.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935134#comment-14935134
 ] 

Thomas Graves commented on SPARK-10858:
---

Note the # is the name that we give it on the yarn side that the executor 
actually see so it doesn't matter where the file is originating from (file:// 
or hdfs://).

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936317#comment-14936317
 ] 

Saisai Shao commented on SPARK-10858:
-

So basically I think the problem is do we need to treat this name "xx#xx" as a 
legal name, if so we need to fix this behavior.

Another interesting thing is that not sure why your result is different from 
mine.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936313#comment-14936313
 ] 

Saisai Shao commented on SPARK-10858:
-

Hi [~tgraves], I tested again with Mac and Linux (centos), seems the behavior 
is different.

In Mac, 

if we use {{--jars my.jar#renamed.jar}}

this file path will be resolved to URI 
{{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}}

if we use {{--jars 
file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}}

this file path will be resolved to URI 
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}}

This is done by Utils#resolveURI

{code}
  def resolveURI(path: String): URI = {
try {
  val uri = new URI(path)
  if (uri.getScheme() != null) {
return uri
  }
} catch {
  case e: URISyntaxException =>
}
new File(path).getAbsoluteFile().toURI()
  }
{code}

Where if scheme is not specified, this code will transform the file path into 
URI, the noted thing is that "#" will be translated into "%23" in this `toURI`.

In Centos:

both 

{{--jars my.jar#renamed.jar}} 

and 

{{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} 

will be resolved to 
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} through 
Utils#resolveURI, obviously "#" is not escaped.

So in my test, both these two ways of using --jars are failed in Centos.

After digging into the Hadoop code RawLocalFileSystem#pathToFile:

{code}
  public File pathToFile(Path path) {
checkPath(path);
if (!path.isAbsolute()) {
  path = new Path(getWorkingDirectory(), path);
}
return new File(path.toUri().getPath());
  }
{code}

Here using `URI.getPath` to get file path will lead to different behavior if we 
do not escape "#" to "%23", which will treat the part after "#" as fragment, 
not path. So in Mac without specifying scheme is succeeded, whereas in Centos 
both two ways are failed.

But if we instead using 

{{--jars my.jar%23renamed.jar}} 

or 

{{--jars file:///path/to/my.jar%23renamed.jar}},

it can be succeeded in Centos.













> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936176#comment-14936176
 ] 

Saisai Shao commented on SPARK-10858:
-

The error I got in the failed case is the same as you mentioned above, I'm 
running on Mac OS with Hadoop 2.6.0.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936174#comment-14936174
 ] 

Saisai Shao commented on SPARK-10858:
-

The error I got in the failed case is the same as you mentioned above, I'm 
running on Mac OS with Hadoop 2.6.0.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given

2015-09-29 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936175#comment-14936175
 ] 

Saisai Shao commented on SPARK-10858:
-

The error I got in the failed case is the same as you mentioned above, I'm 
running on Mac OS with Hadoop 2.6.0.

> YARN: archives/jar/files rename with # doesn't work unless scheme given
> ---
>
> Key: SPARK-10858
> URL: https://issues.apache.org/jira/browse/SPARK-10858
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
> at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
> at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
> at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
> at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org