[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941283#comment-15941283 ] Daniel Templeton edited comment on MAPREDUCE-6846 at 3/24/17 10:41 PM: --- What I meant was: {code} Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, submitReplication); try { DistributedCache.addFileToClassPath(newPath, conf, jtFs, false); if (!foundFragment) { foundFragment = tmpURI.getFragment() != null; } libjarURIs.add(getPathURI(newPath, tmpURI.getFragment())); } {code} but after taking a second look, I'm fine with it as is. was (Author: templedf): What I meant was: {code} Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, submitReplication); try { DistributedCache.addFileToClassPath(newPath), conf, jtFs, false); if (!foundFragment) { foundFragment = tmpURI.getFragment() != null; } libjarURIs.add(getPathURI(newPath, tmpURI.getFragment())); } {code} but after taking a second look, I'm fine with it as is. > Fragments specified for libjar paths are not handled correctly > -- > > Key: MAPREDUCE-6846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > Attachments: MAPREDUCE-6846-trunk.001.patch, > MAPREDUCE-6846-trunk.002.patch > > > If a user specifies a fragment for a libjars path via generic options parser, > the client crashes with a FileNotFoundException: > {noformat} > java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314) > at > org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > {noformat} > This is a
[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860382#comment-15860382 ] Chris Trezzo edited comment on MAPREDUCE-6846 at 2/9/17 11:31 PM: -- bq. I was under the impression that if the wildcard mapped to only one file then we would not convey this as a wildcard through to the staging directory but instead remap it to the one entry that it globbed to (i.e.: as if the user had specified the one path directly rather than a glob to that one path). True, once it is in the staging dir it will not look like a wildcard. That being said, there is a second part to the feature. I will attempt to explain my current understanding: See {{JobResourceUploader#uploadLibJars}}: {code:java} private void uploadLibJars(Configuration conf, Collection libjars, Path submitJobDir, FsPermission mapredSysPerms, short submitReplication) throws IOException { Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir); if (!libjars.isEmpty()) { FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms); for (String tmpjars : libjars) { Path tmp = new Path(tmpjars); Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, submitReplication); // Add each file to the classpath DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard); } if (useWildcard) { // Add the whole directory to the cache Path libJarsDirWildcard = jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD)); DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf); } } } {code} {{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config parameter. If this is set to true, then we add the files individually to the classpath (i.e. {{mapreduce.job.classpath.files}}), but then we glob them all together when adding them to the distributed cache (i.e. {{mapreduce.job.cache.files}}). At that point, we would loose the fragment name because the LocalResource objects submitted to YARN are created based off of those paths. As a side note, this method also contains the original bug that motivated this jira. This bug is due to the uploadLibJars method creating a path from tmpjars instead of a URI. The path constructor does not support fragments and we loose them at this point with or without wildcards. was (Author: ctrezzo): bq. I was under the impression that if the wildcard mapped to only one file then we would not convey this as a wildcard through to the staging directory but instead remap it to the one entry that it globbed to (i.e.: as if the user had specified the one path directly rather than a glob to that one path). True, once it is in the staging dir it will not look like a wildcard. That being said, there is a second part to the feature. I will attempt to explain my current understanding: See {{JobResourceUploader#uploadLibJars}}: {code:java} private void uploadLibJars(Configuration conf, Collection libjars, Path submitJobDir, FsPermission mapredSysPerms, short submitReplication) throws IOException { Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir); if (!libjars.isEmpty()) { FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms); for (String tmpjars : libjars) { Path tmp = new Path(tmpjars); Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, submitReplication); // Add each file to the classpath DistributedCache.addFileToClassPath( new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard); } if (useWildcard) { // Add the whole directory to the cache Path libJarsDirWildcard = jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD)); DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf); } } } {code} {{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config parameter. If this is set to true, then we add the files individually to the classpath (i.e. {{mapreduce.job.classpath.files}}), but then we glob them all together when adding them to the distributed cache (i.e. {{mapreduce.job.cache.files}}). At that point, we would loose the fragment name because the LocalResource objects submitted to YARN are created based off of those paths. > Fragments specified for libjar paths are not handled correctly > -- > > Key: MAPREDUCE-6846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >A
[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858637#comment-15858637 ] Chris Trezzo edited comment on MAPREDUCE-6846 at 2/8/17 10:40 PM: -- Unfortunately, it looks like wildcards was implemented on top of this bug (see MAPREDUCE-6719). Fixing this so that fragments are honored might be a little tricky when wildcards are being used because you would lose the per-path fragment information. I can see two potential approaches so far: # Only use a wildcard when there have been no fragments specified by the user. This would preserve the intended naming of resources, but would reduce the number of instances where wildcards could be used. # Silently ignore fragments specified by libjars when wildcards are enabled - I am not a fan of this approach because the application could be expecting a specific resource name for libjars so that symlinks don't conflict when resources are localized. I will start working on a v1 patch for #1. Thoughts [~templedf] and [~sjlee0]? was (Author: ctrezzo): Unfortunately, it looks like wildcards was implemented on top of this bug (see MAPREDUCE-6719). Fixing this so that fragments are honored might be a little tricky when wildcards are being used because you would lose the per-path fragment information. I can see two potential approaches so far: # Only use a wildcard when there have been no fragments specified by the user. This would preserve the intended naming of resources, but would reduce the number of instances where wildcards could be used. # Silently ignore fragments specified by libjars - I am not a fan of this approach because the application could be expecting a specific resource name for libjars so that symlinks don't conflict when resources are localized. I will start working on a v1 patch for #1. Thoughts [~templedf] and [~sjlee0]? > Fragments specified for libjar paths are not handled correctly > -- > > Key: MAPREDUCE-6846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.3, 3.0.0-alpha2 >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Minor > > If a user specifies a fragment for a libjars path via generic options parser, > the client crashes with a FileNotFoundException: > {noformat} > java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314) > at > org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver