[jira] [Commented] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly

Chris Trezzo (JIRA) Thu, 09 Feb 2017 15:26:23 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860382#comment-15860382
 ]


Chris Trezzo commented on MAPREDUCE-6846:
-----------------------------------------

bq. I was under the impression that if the wildcard mapped to only one file 
then we would not convey this as a wildcard through to the staging directory 
but instead remap it to the one entry that it globbed to (i.e.: as if the user 
had specified the one path directly rather than a glob to that one path).

True, once it is in the staging dir it will not look like a wildcard. That 
being said, there is a second part to the feature. I will attempt to explain my 
current understanding:

See {{JobResourceUploader#uploadLibJars}}:
{code:java}
  private void uploadLibJars(Configuration conf, Collection<String> libjars,
      Path submitJobDir, FsPermission mapredSysPerms, short submitReplication)
      throws IOException {
    Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
    if (!libjars.isEmpty()) {
      FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
      for (String tmpjars : libjars) {
        Path tmp = new Path(tmpjars);
        Path newPath =
            copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);

        // Add each file to the classpath
        DistributedCache.addFileToClassPath(
            new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard);
      }

      if (useWildcard) {
        // Add the whole directory to the cache
        Path libJarsDirWildcard =
            jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));

        DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
      }
    }
  }
{code}
{{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config 
parameter. If this is set to true, then we add the files individually to the 
classpath (i.e. {{mapreduce.job.classpath.files}}), but then we glob them all 
together when adding them to the distributed cache (i.e. 
{{mapreduce.job.cache.files}}). At that point, we would loose the fragment name 
because the LocalResource objects submitted to YARN are created based off of 
those paths.

> Fragments specified for libjar paths are not handled correctly
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6846
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.7.3, 3.0.0-alpha2
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>            Priority: Minor
>
> If a user specifies a fragment for a libjars path via generic options parser, 
> the client crashes with a FileNotFoundException:
> {noformat}
> java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt 
> does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628)
>       at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
>       at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387)
>       at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154)
>       at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105)
>       at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102)
>       at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>       at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>       at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
>       at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>       at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>       at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> {noformat}
> This is actually inconsistent with the behavior for files and archives. Here 
> is a table showing the current behavior for each type of path and resource:
> | || Qualified path (i.e. file://home/mapred/test.txt#frag.txt) || Absolute 
> path (i.e. /home/mapred/test.txt#frag.txt) || Relative path (i.e. 
> test.txt#frag.txt) ||
> || -libjars | FileNotFound | FileNotFound|FileNotFound|
> || -files | (/) | (/) | (/) |
> || -archives | (/) | (/) | (/) |



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly

Reply via email to