[ 
https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936313#comment-14936313
 ] 

Saisai Shao commented on SPARK-10858:
-------------------------------------

Hi [~tgraves], I tested again with Mac and Linux (centos), seems the behavior 
is different.

In Mac, 

if we use {{--jars my.jar#renamed.jar}}

this file path will be resolved to URI 
{{file:/Users/sshao/projects/apache-spark/my.jar%23renamed.jar}}

if we use {{--jars 
file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}}

this file path will be resolved to URI 
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}}

This is done by Utils#resolveURI

{code}
  def resolveURI(path: String): URI = {
    try {
      val uri = new URI(path)
      if (uri.getScheme() != null) {
        return uri
      }
    } catch {
      case e: URISyntaxException =>
    }
    new File(path).getAbsoluteFile().toURI()
  }
{code}

Where if scheme is not specified, this code will transform the file path into 
URI, the noted thing is that "#" will be translated into "%23" in this `toURI`.

In Centos:

both 

{{--jars my.jar#renamed.jar}} 

and 

{{--jars file:///Users/sshao/projects/apache-spark/my.jar#renamed.jar}} 

will be resolved to 
{{file:/Users/sshao/projects/apache-spark/my.jar#renamed.jar}} through 
Utils#resolveURI, obviously "#" is not escaped.

So in my test, both these two ways of using --jars are failed in Centos.

After digging into the Hadoop code RawLocalFileSystem#pathToFile:

{code}
  public File pathToFile(Path path) {
    checkPath(path);
    if (!path.isAbsolute()) {
      path = new Path(getWorkingDirectory(), path);
    }
    return new File(path.toUri().getPath());
  }
{code}

Here using `URI.getPath` to get file path will lead to different behavior if we 
do not escape "#" to "%23", which will treat the part after "#" as fragment, 
not path. So in Mac without specifying scheme is succeeded, whereas in Centos 
both two ways are failed.

But if we instead using 

{{--jars my.jar%23renamed.jar}} 

or 

{{--jars file:///path/to/my.jar%23renamed.jar}},

it can be succeeded in Centos.













> YARN: archives/jar/files rename with # doesn't work unless scheme given
> -----------------------------------------------------------------------
>
>                 Key: SPARK-10858
>                 URL: https://issues.apache.org/jira/browse/SPARK-10858
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Priority: Minor
>
> The YARN distributed cache feature with --jars, --archives, --files where you 
> can rename the file/archive using a # symbol only works if you explicitly 
> include the scheme in the path:
> works:
> --jars file:///home/foo/my.jar#renamed.jar
> doesn't work:
> --jars /home/foo/my.jar#renamed.jar
> Exception in thread "main" java.io.FileNotFoundException: File 
> file:/home/foo/my.jar#renamed.jar does not exist
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
>         at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
>         at 
> org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240)
>         at 
> org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329)
>         at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393)
>         at 
> org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392)
>         at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to