[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876629#action_12876629 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: latest patch looks fine. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, mapreduce-1641--2010-05-21.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869933#action_12869933 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: The new patch does not apply to trunk. I tried to resolve conflicts and apply. Then, it does not compile also. Can you please update the patch to trunk? Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870263#action_12870263 ] Hadoop QA commented on MAPREDUCE-1641: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445204/mapreduce-1641--2010-05-21.patch against trunk revision 947112. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/console This message is automatically generated. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, mapreduce-1641--2010-05-21.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867585#action_12867585 ] Dick King commented on MAPREDUCE-1641: -- Should {{mapred.cache.libjars}} be checked too? Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861683#action_12861683 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: +1 patch for trunk looks good. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861596#action_12861596 ] Hadoop QA commented on MAPREDUCE-1641: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442974/mapreduce-1641--2010-04-27.patch against trunk revision 938576. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/148/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/148/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/148/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/148/console This message is automatically generated. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861606#action_12861606 ] Dick King commented on MAPREDUCE-1641: -- I checked out a trunk from Apache's {{git}} repository shortly before I checked in this patch. It, too, fails {{TestJobACLs.testACLS}} . Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch, mapreduce-1641--2010-04-27.patch, patch-1641-ydist-bugfix.txt The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860802#action_12860802 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: The following code change in JobClient does not look correct {code} @@ -767,6 +766,9 @@ public class JobClient extends Configured implements MRConstants, Tool { (new Path(file:/// + binaryTokenFilename), jobCopy); } + // First we check whether the cached archives and files are legal. + TrackerDistributedCacheManager.validate(jobCopy); + copyAndConfigureFiles(jobCopy, submitJobDir); {code} copyAndConfigureFiles adds files/archives given for command line options: -files, -archives, -libjars. So, the patch does not validate these files. Validate should happen after the call to copyAndConfigureFiles. A test with same file added for -files and -archives option would fail with the patch. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, duped-files-archives--off-0-20-101--2010-04-21.patch, duped-files-archives--off-0-20-101--2010-04-23--1819.patch The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859224#action_12859224 ] Arun C Murthy commented on MAPREDUCE-1641: -- Dick, some comments: # I'd rename DistributedCache.validateCachesDontOverlap to DistributedCache.validate to keep it simple, in future we may add more validations there, so having one public api might prove useful and succint. # In the same function, and elsewhere, I'd define a public static final String for 'mapred.cache.archives' and 'mapred.cache.files' and use them instead of the raw strings. # Request you to add a test-case which uses the Mini* clusters to ensure the validation works with fully-qualified hdfs uris (hdfs://nn:8000/user/blah/foo), and absolute Paths (/user/blah/foo). Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859231#action_12859231 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: Dick, though DistributedCache.validateCachesDontOverlap is a public api, it is not a user facing api. So, I think we should move it to TrackerDistributedCacheManager, where other public apis (which framework uses) like determineTimeStamps, determineCacheVisibilities are present. We should also remove the comment saying It is a public api and add it is used by framework only. Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858629#action_12858629 ] Dick King commented on MAPREDUCE-1641: -- The subject of this jira is that in one case a user specified that a particular archive be both in {{mapred.cache.archives}} and {{mapred.cache.files}} . This duplication was not forseen in the code, which happened to do something unpleasant. While the particular instance that triggered the bug report was accidental, it occurs to me that you could want that -- especially with a {{.jar}} file. I see the point that if we silently explode a {{.jar}} file as well as copying it locally [and using it as a {{-classpath}}, presumably] we could be exposed to pain -- not in terms of the code working, because the exploded copy wouldn't ever be accessed, but in terms of cleanup performance. For the time being, I'll just error out as described on 16/Apr/10 08:12 PM . Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858374#action_12858374 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: bq. Perhaps we should allow this, and both localize the file and unarchive it? What do you think? We should not make the file option to unarchive the file. We have seen many use cases where users do not want their jars to be unjarred, for example HADOOP-5175 bq. We perform the check for conflicts between mapred.cache.files and mapred.cache.archives when the user finally submits the offending JobConf . +1 bq. In particular, I plan to make a new class DistributedCache.DuplicatedURI extends InvalidJobConfException and throw that . +1 Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857854#action_12857854 ] Dick King commented on MAPREDUCE-1641: -- Perhaps we should allow this, and both localize the file _and_ unarchive it? What do you think? Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858057#action_12858057 ] Dick King commented on MAPREDUCE-1641: -- We will _not_ allow this file duplication as proposed in 16/Apr/10 11:41AM. However, we will not throw an {{IllegalArgumentException}} . We will throw an {{InvalidJobconfException}} instead. A consequence of this is that the check cannot be performed as you add individual files or blocks of files to the cache; the interface is wrong. We perform the check for conflicts between {{mapred.cache.files}} and {{mapred.cache.archives}} when the user finally submits the offending {{JobConf}} . In particular, I plan to make a new class {{DistributedCache.DuplicatedURI extends InvalidJobConfException}} and throw _that_ . Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives Key: MAPREDUCE-1641 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 Project: Hadoop Map/Reduce Issue Type: Bug Components: distributed-cache Reporter: Amareshwari Sriramadasu Assignee: Dick King Fix For: 0.22.0 The behavior of mapred.cache.files and mapred.cache.archives is different during localization in the following way: If a jar file is added to mapred.cache.files, it will be localized under TaskTracker under a unique path. If a jar file is added to mapred.cache.archives, it will be localized under a unique path in a directory named the jar file name, and will be unarchived under the same directory. If same jar file is passed for both the configurations, the behavior undefined. Thus the job submission should fail. Currently, since distributed cache processes files before archives, the jar file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira