[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly

2017-03-24 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941283#comment-15941283
 ] 

Daniel Templeton edited comment on MAPREDUCE-6846 at 3/24/17 10:41 PM:
---

What I meant was: {code}
Path newPath =
copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);
try {
  DistributedCache.addFileToClassPath(newPath, conf, jtFs, false);
  if (!foundFragment) {
foundFragment = tmpURI.getFragment() != null;
  }
  libjarURIs.add(getPathURI(newPath, tmpURI.getFragment()));
}
{code} but after taking a second look, I'm fine with it as is.


was (Author: templedf):
What I meant was: {code}
Path newPath =
copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);
try {
  DistributedCache.addFileToClassPath(newPath), conf, jtFs, false);
  if (!foundFragment) {
foundFragment = tmpURI.getFragment() != null;
  }
  libjarURIs.add(getPathURI(newPath, tmpURI.getFragment()));
}
{code} but after taking a second look, I'm fine with it as is.

> Fragments specified for libjar paths are not handled correctly
> --
>
> Key: MAPREDUCE-6846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: MAPREDUCE-6846-trunk.001.patch, 
> MAPREDUCE-6846-trunk.002.patch
>
>
> If a user specifies a fragment for a libjars path via generic options parser, 
> the client crashes with a FileNotFoundException:
> {noformat}
> java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> {noformat}
> This is a

[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly

2017-02-09 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860382#comment-15860382
 ] 

Chris Trezzo edited comment on MAPREDUCE-6846 at 2/9/17 11:31 PM:
--

bq. I was under the impression that if the wildcard mapped to only one file 
then we would not convey this as a wildcard through to the staging directory 
but instead remap it to the one entry that it globbed to (i.e.: as if the user 
had specified the one path directly rather than a glob to that one path).

True, once it is in the staging dir it will not look like a wildcard. That 
being said, there is a second part to the feature. I will attempt to explain my 
current understanding:

See {{JobResourceUploader#uploadLibJars}}:
{code:java}
  private void uploadLibJars(Configuration conf, Collection libjars,
  Path submitJobDir, FsPermission mapredSysPerms, short submitReplication)
  throws IOException {
Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
if (!libjars.isEmpty()) {
  FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
  for (String tmpjars : libjars) {
Path tmp = new Path(tmpjars);
Path newPath =
copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);

// Add each file to the classpath
DistributedCache.addFileToClassPath(
new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard);
  }

  if (useWildcard) {
// Add the whole directory to the cache
Path libJarsDirWildcard =
jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));

DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
  }
}
  }
{code}
{{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config 
parameter. If this is set to true, then we add the files individually to the 
classpath (i.e. {{mapreduce.job.classpath.files}}), but then we glob them all 
together when adding them to the distributed cache (i.e. 
{{mapreduce.job.cache.files}}). At that point, we would loose the fragment name 
because the LocalResource objects submitted to YARN are created based off of 
those paths.

As a side note, this method also contains the original bug that motivated this 
jira. This bug is due to the uploadLibJars method creating a path from tmpjars 
instead of a URI. The path constructor does not support fragments and we loose 
them at this point with or without wildcards.


was (Author: ctrezzo):
bq. I was under the impression that if the wildcard mapped to only one file 
then we would not convey this as a wildcard through to the staging directory 
but instead remap it to the one entry that it globbed to (i.e.: as if the user 
had specified the one path directly rather than a glob to that one path).

True, once it is in the staging dir it will not look like a wildcard. That 
being said, there is a second part to the feature. I will attempt to explain my 
current understanding:

See {{JobResourceUploader#uploadLibJars}}:
{code:java}
  private void uploadLibJars(Configuration conf, Collection libjars,
  Path submitJobDir, FsPermission mapredSysPerms, short submitReplication)
  throws IOException {
Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
if (!libjars.isEmpty()) {
  FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
  for (String tmpjars : libjars) {
Path tmp = new Path(tmpjars);
Path newPath =
copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);

// Add each file to the classpath
DistributedCache.addFileToClassPath(
new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard);
  }

  if (useWildcard) {
// Add the whole directory to the cache
Path libJarsDirWildcard =
jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));

DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
  }
}
  }
{code}
{{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config 
parameter. If this is set to true, then we add the files individually to the 
classpath (i.e. {{mapreduce.job.classpath.files}}), but then we glob them all 
together when adding them to the distributed cache (i.e. 
{{mapreduce.job.cache.files}}). At that point, we would loose the fragment name 
because the LocalResource objects submitted to YARN are created based off of 
those paths.

> Fragments specified for libjar paths are not handled correctly
> --
>
> Key: MAPREDUCE-6846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>A

[jira] [Comment Edited] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly

2017-02-08 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858637#comment-15858637
 ] 

Chris Trezzo edited comment on MAPREDUCE-6846 at 2/8/17 10:40 PM:
--

Unfortunately, it looks like wildcards was implemented on top of this bug (see 
MAPREDUCE-6719). Fixing this so that fragments are honored might be a little 
tricky when wildcards are being used because you would lose the per-path 
fragment information.

I can see two potential approaches so far:
# Only use a wildcard when there have been no fragments specified by the user. 
This would preserve the intended naming of resources, but would reduce the 
number of instances where wildcards could be used.
# Silently ignore fragments specified by libjars when wildcards are enabled - I 
am not a fan of this approach because the application could be expecting a 
specific resource name for libjars so that symlinks don't conflict when 
resources are localized.

I will start working on a v1 patch for #1. Thoughts [~templedf] and [~sjlee0]?


was (Author: ctrezzo):
Unfortunately, it looks like wildcards was implemented on top of this bug (see 
MAPREDUCE-6719). Fixing this so that fragments are honored might be a little 
tricky when wildcards are being used because you would lose the per-path 
fragment information.

I can see two potential approaches so far:
# Only use a wildcard when there have been no fragments specified by the user. 
This would preserve the intended naming of resources, but would reduce the 
number of instances where wildcards could be used.
# Silently ignore fragments specified by libjars - I am not a fan of this 
approach because the application could be expecting a specific resource name 
for libjars so that symlinks don't conflict when resources are localized.

I will start working on a v1 patch for #1. Thoughts [~templedf] and [~sjlee0]?

> Fragments specified for libjar paths are not handled correctly
> --
>
> Key: MAPREDUCE-6846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.3, 3.0.0-alpha2
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
>
> If a user specifies a fragment for a libjars path via generic options parser, 
> the client crashes with a FileNotFoundException:
> {noformat}
> java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154)
>   at 
> org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver