[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870263#action_12870263 ] Hadoop QA commented on MAPREDUCE-1641: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445204/mapreduce-1641--2010-05-21.patch against trunk revision 947112. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/201/console This message is automatically generated. > Job submission should fail if same uri is added for mapred.cache.files and > mapred.cache.archives > > > Key: MAPREDUCE-1641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache >Reporter: Amareshwari Sriramadasu >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, > duped-files-archives--off-0-20-101--2010-04-21.patch, > duped-files-archives--off-0-20-101--2010-04-23--1819.patch, > mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, > mapreduce-1641--2010-05-21.patch, patch-1641-ydist-bugfix.txt > > > The behavior of mapred.cache.files and mapred.cache.archives is different > during localization in the following way: > If a jar file is added to mapred.cache.files, it will be localized under > TaskTracker under a unique path. > If a jar file is added to mapred.cache.archives, it will be localized under a > unique path in a directory named the jar file name, and will be unarchived > under the same directory. > If same jar file is passed for both the configurations, the behavior > undefined. Thus the job submission should fail. > Currently, since distributed cache processes files before archives, the jar > file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1533: - Attachment: mapreduce-1533--2010-05-21a.patch slight rework of previous patch to fit changes to Trunk since I downloaded trunk to write the former > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, mapreduce-1533--2010-05-21a.patch, > MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870210#action_12870210 ] Hadoop QA commented on MAPREDUCE-1533: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445219/mapreduce-1533--2010-05-21.patch against trunk revision 947112. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/544/console This message is automatically generated. > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1533: - Status: Patch Available (was: Open) > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1533: - Attachment: mapreduce-1533--2010-05-21.patch This new patch incorporates the minor items suggested in the previous comment. > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-815) Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro Serialization
[ https://issues.apache.org/jira/browse/MAPREDUCE-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting resolved MAPREDUCE-815. Resolution: Duplicate Closing this as a duplicate. This can be re-opened if someone objects. > Add AvroInputFormat and AvroOutputFormat so that hadoop can use Avro > Serialization > -- > > Key: MAPREDUCE-815 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-815 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Ravi Gummadi >Assignee: Aaron Kimball > Attachments: MAPREDUCE-815.2.patch, MAPREDUCE-815.3.patch, > MAPREDUCE-815.4.patch, MAPREDUCE-815.5.patch, MAPREDUCE-815.patch > > > MapReduce needs AvroInputFormat similar to other InputFormats like > TextInputFormat to be able to use avro serialization in hadoop. Similarly > AvroOutputFormat is needed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1774) Large-scale Automated Framework
[ https://issues.apache.org/jira/browse/MAPREDUCE-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1774: -- Attachment: MAPREDUCE-1774.patch This patch version has all correct build modifications in please. However, because of the code changes between MR in 0.20 and in the trunk aspects aren't binding anymore and this needs to be fixed. > Large-scale Automated Framework > --- > > Key: MAPREDUCE-1774 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1774 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Konstantin Boudnik > Attachments: MAPREDUCE-1774.patch, MAPREDUCE-1774.patch > > > This is MapReduce part of HADOOP-6332 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1641: - Status: Patch Available (was: Open) > Job submission should fail if same uri is added for mapred.cache.files and > mapred.cache.archives > > > Key: MAPREDUCE-1641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache >Reporter: Amareshwari Sriramadasu >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, > duped-files-archives--off-0-20-101--2010-04-21.patch, > duped-files-archives--off-0-20-101--2010-04-23--1819.patch, > mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, > mapreduce-1641--2010-05-21.patch, patch-1641-ydist-bugfix.txt > > > The behavior of mapred.cache.files and mapred.cache.archives is different > during localization in the following way: > If a jar file is added to mapred.cache.files, it will be localized under > TaskTracker under a unique path. > If a jar file is added to mapred.cache.archives, it will be localized under a > unique path in a directory named the jar file name, and will be unarchived > under the same directory. > If same jar file is passed for both the configurations, the behavior > undefined. Thus the job submission should fail. > Currently, since distributed cache processes files before archives, the jar > file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dick King updated MAPREDUCE-1641: - Attachment: mapreduce-1641--2010-05-21.patch This is a new patch to accommodate trunk changes in locations of defined literals. > Job submission should fail if same uri is added for mapred.cache.files and > mapred.cache.archives > > > Key: MAPREDUCE-1641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache >Reporter: Amareshwari Sriramadasu >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, > duped-files-archives--off-0-20-101--2010-04-21.patch, > duped-files-archives--off-0-20-101--2010-04-23--1819.patch, > mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, > mapreduce-1641--2010-05-21.patch, patch-1641-ydist-bugfix.txt > > > The behavior of mapred.cache.files and mapred.cache.archives is different > during localization in the following way: > If a jar file is added to mapred.cache.files, it will be localized under > TaskTracker under a unique path. > If a jar file is added to mapred.cache.archives, it will be localized under a > unique path in a directory named the jar file name, and will be unarchived > under the same directory. > If same jar file is passed for both the configurations, the behavior > undefined. Thus the job submission should fail. > Currently, since distributed cache processes files before archives, the jar > file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1801) do not throw exception if cannot get a delegation token, it may be from a unsecured cluster (part of HDFS-1044)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870172#action_12870172 ] Jitendra Nath Pandey commented on MAPREDUCE-1801: - +1 for the patch. > do not throw exception if cannot get a delegation token, it may be from a > unsecured cluster (part of HDFS-1044) > --- > > Key: MAPREDUCE-1801 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1801 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: MAPREDUCE_1801.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870141#action_12870141 ] Hadoop QA commented on MAPREDUCE-1783: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445178/submit-mapreduce-1783.patch against trunk revision 946955. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/543/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/543/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/543/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/543/console This message is automatically generated. > Task Initialization should be delayed till when a job can be run > > > Key: MAPREDUCE-1783 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share >Affects Versions: 0.20.1 >Reporter: Ramkumar Vadali > Fix For: 0.22.0 > > Attachments: 0001-Pool-aware-job-initialization.patch, > 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch > > > The FairScheduler task scheduler uses PoolManager to impose limits on the > number of jobs that can be running at a given time. However, jobs that are > submitted are initiaiized immediately by EagerTaskInitializationListener by > calling JobInProgress.initTasks. This causes the job split file to be read > into memory. The split information is not needed until the number of running > jobs is less than the maximum specified. If the amount of split information > is large, this leads to unnecessary memory pressure on the Job Tracker. > To ease memory pressure, FairScheduler can use another implementation of > JobInProgressListener that is aware of PoolManager limits and can delay task > initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870104#action_12870104 ] Krishna Ramachandran commented on MAPREDUCE-1744: - Dick am not sure if FileSystem.get(conf) is the right call in Job.java this returns configured file system (from doc) cluster.getFilesystem() gets the FileSystem where job specific files are stored Am checking further > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1354: - Status: Resolved (was: Patch Available) Resolution: Fixed I just committed this. Thanks Dick! > Incremental enhancements to the JobTracker for better scalability > - > > Key: MAPREDUCE-1354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Devaraj Das >Assignee: Dick King >Priority: Critical > Attachments: mapreduce-1354--2010-03-10.patch, > mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > mr-1354-y20.patch > > > It'd be nice to have the JobTracker object not be locked while accessing the > HDFS for reading the jobconf file and while writing the jobinfo file in the > submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-1354: - Hadoop Flags: [Reviewed] Issue Type: Improvement (was: Bug) > Incremental enhancements to the JobTracker for better scalability > - > > Key: MAPREDUCE-1354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Devaraj Das >Assignee: Dick King >Priority: Critical > Attachments: mapreduce-1354--2010-03-10.patch, > mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > mr-1354-y20.patch > > > It'd be nice to have the JobTracker object not be locked while accessing the > HDFS for reading the jobconf file and while writing the jobinfo file in the > submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1662) TaskRunner.prepare() and close() can be removed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870069#action_12870069 ] Hudson commented on MAPREDUCE-1662: --- Integrated in Hadoop-Mapreduce-trunk #324 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/324/]) MAPREDUCE-1662. Remove unused methods from TaskRunner. Contributed by Amareshwari Sriramadasu > TaskRunner.prepare() and close() can be removed > --- > > Key: MAPREDUCE-1662 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1662 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1662.txt > > > TaskRunner.prepare() and close() methods call only mapOutputFile.removeAll(). > The removeAll() call is a always a no-op in prepare(), because the directory > is always empty during start up of the task. The removeAll() call in close() > is useless, because it is followed by a attempt directory cleanup. Since the > map output files are in attempt directory, the call to close() is useless. > After MAPREDUCE-842, these calls are under TaskTracker space, passing the > wrong conf. Now, the calls do not make sense at all. > I think we can remove the methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1804) Stress-test tool for HDFS introduced in HDFS-708
[ https://issues.apache.org/jira/browse/MAPREDUCE-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870071#action_12870071 ] Hudson commented on MAPREDUCE-1804: --- Integrated in Hadoop-Mapreduce-trunk #324 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/324/]) Moving MAPREDUCE-1804 to new features. MAPREDUCE-1804. Stress-test tool for HDFS introduced in HDFS-708. Contributed by Joshua Harlow. > Stress-test tool for HDFS introduced in HDFS-708 > > > Key: MAPREDUCE-1804 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1804 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: benchmarks, test >Affects Versions: 0.22.0 >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: slive.patch.2 > > > This issue is to commit the SLive test developed in HDFS-708 to MR trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1617) TestBadRecords failed once in our test runs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870070#action_12870070 ] Hudson commented on MAPREDUCE-1617: --- Integrated in Hadoop-Mapreduce-trunk #324 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/324/]) MAPREDUCE-1617. Use IPv4 stack for unit tests. Contributed by Amar Kamat and Luke Lu > TestBadRecords failed once in our test runs > --- > > Key: MAPREDUCE-1617 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1617 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.20.2 >Reporter: Amareshwari Sriramadasu >Assignee: Luke Lu > Fix For: 0.22.0 > > Attachments: mr-1617-trunk-v1.patch, mr-1617-trunk-v2.patch, > mr-1617-v1.3.patch, TestBadRecords.txt > > > org.apache.hadoop.mapred.TestBadRecords.testBadMapRed failed with the > following > exception: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142) > at > org.apache.hadoop.mapred.TestBadRecords.runMapReduce(TestBadRecords.java:94) > at > org.apache.hadoop.mapred.TestBadRecords.testBadMapRed(TestBadRecords.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1807) TestQueueManager can take long enough to time out
[ https://issues.apache.org/jira/browse/MAPREDUCE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870046#action_12870046 ] Dick King commented on MAPREDUCE-1807: -- I agree. > TestQueueManager can take long enough to time out > - > > Key: MAPREDUCE-1807 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1807 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > > Sometimes TestQueueManager takes such a long time that the JUnit engine times > out and declares it a failure. We should fix this, possibly by splitting the > file's 19 test cases into two or more manageable test sets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1808) Have a configurable metric reporting CPU/disk usage per user
[ https://issues.apache.org/jira/browse/MAPREDUCE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870044#action_12870044 ] Hadoop QA commented on MAPREDUCE-1808: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444513/HADOOP-6755.patch against trunk revision 946955. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/200/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/200/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/200/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/200/console This message is automatically generated. > Have a configurable metric reporting CPU/disk usage per user > > > Key: MAPREDUCE-1808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1808 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: tasktracker >Reporter: Alex Kozlov >Assignee: Alex Kozlov > Attachments: HADOOP-6755.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Many organizations are looking at resource usage per department/group/user > for diagnostic and resource allocation purposes. It should be > straightforward to implement a metric showing the simple resource usage like > CPU time and disk I/O per user and aggregate them using Ganglia. > Eventually, we can create an API for pluggable metrics (there is one for > Jobtracker and Tasktracker). > Let me know your thoughts. > Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1798) normalize property names for JT kerberos principal names in configuration (from HADOOP 6633)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1798: --- Status: Resolved (was: Patch Available) Fix Version/s: 0.22.0 Resolution: Fixed I just committed this. Thanks, Boris! > normalize property names for JT kerberos principal names in configuration > (from HADOOP 6633) > > > Key: MAPREDUCE-1798 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1798 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1798.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-1783: --- Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] > Task Initialization should be delayed till when a job can be run > > > Key: MAPREDUCE-1783 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share >Affects Versions: 0.20.1 >Reporter: Ramkumar Vadali > Fix For: 0.22.0 > > Attachments: 0001-Pool-aware-job-initialization.patch, > 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch > > > The FairScheduler task scheduler uses PoolManager to impose limits on the > number of jobs that can be running at a given time. However, jobs that are > submitted are initiaiized immediately by EagerTaskInitializationListener by > calling JobInProgress.initTasks. This causes the job split file to be read > into memory. The split information is not needed until the number of running > jobs is less than the maximum specified. If the amount of split information > is large, this leads to unnecessary memory pressure on the Job Tracker. > To ease memory pressure, FairScheduler can use another implementation of > JobInProgressListener that is aware of PoolManager limits and can delay task > initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-1783: --- Status: Open (was: Patch Available) Patch was not generated correctly > Task Initialization should be delayed till when a job can be run > > > Key: MAPREDUCE-1783 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share >Affects Versions: 0.20.1 >Reporter: Ramkumar Vadali > Fix For: 0.22.0 > > Attachments: 0001-Pool-aware-job-initialization.patch, > 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch > > > The FairScheduler task scheduler uses PoolManager to impose limits on the > number of jobs that can be running at a given time. However, jobs that are > submitted are initiaiized immediately by EagerTaskInitializationListener by > calling JobInProgress.initTasks. This causes the job split file to be read > into memory. The split information is not needed until the number of running > jobs is less than the maximum specified. If the amount of split information > is large, this leads to unnecessary memory pressure on the Job Tracker. > To ease memory pressure, FairScheduler can use another implementation of > JobInProgressListener that is aware of PoolManager limits and can delay task > initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run
[ https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramkumar Vadali updated MAPREDUCE-1783: --- Attachment: submit-mapreduce-1783.patch Formatted patch, this should work. > Task Initialization should be delayed till when a job can be run > > > Key: MAPREDUCE-1783 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/fair-share >Affects Versions: 0.20.1 >Reporter: Ramkumar Vadali > Fix For: 0.22.0 > > Attachments: 0001-Pool-aware-job-initialization.patch, > 0001-Pool-aware-job-initialization.patch.1, submit-mapreduce-1783.patch > > > The FairScheduler task scheduler uses PoolManager to impose limits on the > number of jobs that can be running at a given time. However, jobs that are > submitted are initiaiized immediately by EagerTaskInitializationListener by > calling JobInProgress.initTasks. This causes the job split file to be read > into memory. The split information is not needed until the number of running > jobs is less than the maximum specified. If the amount of split information > is large, this leads to unnecessary memory pressure on the Job Tracker. > To ease memory pressure, FairScheduler can use another implementation of > JobInProgressListener that is aware of PoolManager limits and can delay task > initialization until the number of running jobs is below the maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1683) Remove JNI calls from ClusterStatus cstr
[ https://issues.apache.org/jira/browse/MAPREDUCE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869972#action_12869972 ] Hadoop QA commented on MAPREDUCE-1683: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445141/M1683-1.patch against trunk revision 946833. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/542/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/542/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/542/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/542/console This message is automatically generated. > Remove JNI calls from ClusterStatus cstr > > > Key: MAPREDUCE-1683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2 >Reporter: Chris Douglas >Assignee: Luke Lu > Fix For: 0.21.0, 0.22.0 > > Attachments: M1683-1.patch, MAPREDUCE-1683_part2_yhadoop_20_10.patch, > MAPREDUCE-1683_yhadoop_20_9.patch, MAPREDUCE-1683_yhadoop_20_S.patch, > mr-1683-trunk-v1.patch > > > The {{ClusterStatus}} constructor makes two JNI calls to the {{Runtime}} to > fetch memory information. {{ClusterStatus}} instances are often created > inside the {{JobTracker}} to obtain other, unrelated metrics (sometimes from > schedulers' inner loops). Given that this information is related to the > {{JobTracker}} process and not the cluster, the metrics are also available > via {{JvmMetrics}}, and the jsps can gather this information for themselves: > these fields can be removed from {{ClusterStatus}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869959#action_12869959 ] Hadoop QA commented on MAPREDUCE-1073: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440953/mapreduce-1073--2010-04-06.patch against trunk revision 946955. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/199/console This message is automatically generated. > Progress reported for pipes tasks is incorrect. > --- > > Key: MAPREDUCE-1073 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 0.20.1 >Reporter: Sreekanth Ramakrishnan > Attachments: mapreduce-1073--2010-03-31.patch, > mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch > > > Currently in pipes, > {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, > OutputCollector, Reporter)}} we do the following: > {code} > while (input.next(key, value)) { > downlink.mapItem(key, value); > if(skipping) { > downlink.flush(); > } > } > {code} > This would result in consumption of all the records for current task and > taking task progress to 100% whereas the actual pipes application would be > trailing behind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1808) Have a configurable metric reporting CPU/disk usage per user
[ https://issues.apache.org/jira/browse/MAPREDUCE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1808: - Status: Patch Available (was: Open) Assignee: Alex Kozlov > Have a configurable metric reporting CPU/disk usage per user > > > Key: MAPREDUCE-1808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1808 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: tasktracker >Reporter: Alex Kozlov >Assignee: Alex Kozlov > Attachments: HADOOP-6755.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Many organizations are looking at resource usage per department/group/user > for diagnostic and resource allocation purposes. It should be > straightforward to implement a metric showing the simple resource usage like > CPU time and disk I/O per user and aggregate them using Ganglia. > Eventually, we can create an API for pluggable metrics (there is one for > Jobtracker and Tasktracker). > Let me know your thoughts. > Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Moved: (MAPREDUCE-1808) Have a configurable metric reporting CPU/disk usage per user
[ https://issues.apache.org/jira/browse/MAPREDUCE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas moved HADOOP-6755 to MAPREDUCE-1808: -- Project: Hadoop Map/Reduce (was: Hadoop Common) Key: MAPREDUCE-1808 (was: HADOOP-6755) Component/s: tasktracker (was: metrics) > Have a configurable metric reporting CPU/disk usage per user > > > Key: MAPREDUCE-1808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1808 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: tasktracker >Reporter: Alex Kozlov > Attachments: HADOOP-6755.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Many organizations are looking at resource usage per department/group/user > for diagnostic and resource allocation purposes. It should be > straightforward to implement a metric showing the simple resource usage like > CPU time and disk I/O per user and aggregate them using Ganglia. > Eventually, we can create an API for pluggable metrics (there is one for > Jobtracker and Tasktracker). > Let me know your thoughts. > Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1808) Have a configurable metric reporting CPU/disk usage per user
[ https://issues.apache.org/jira/browse/MAPREDUCE-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1808: - Status: Open (was: Patch Available) > Have a configurable metric reporting CPU/disk usage per user > > > Key: MAPREDUCE-1808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1808 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: tasktracker >Reporter: Alex Kozlov > Attachments: HADOOP-6755.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Many organizations are looking at resource usage per department/group/user > for diagnostic and resource allocation purposes. It should be > straightforward to implement a metric showing the simple resource usage like > CPU time and disk I/O per user and aggregate them using Ganglia. > Eventually, we can create an API for pluggable metrics (there is one for > Jobtracker and Tasktracker). > Let me know your thoughts. > Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1641) Job submission should fail if same uri is added for mapred.cache.files and mapred.cache.archives
[ https://issues.apache.org/jira/browse/MAPREDUCE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869933#action_12869933 ] Amareshwari Sriramadasu commented on MAPREDUCE-1641: The new patch does not apply to trunk. I tried to resolve conflicts and apply. Then, it does not compile also. Can you please update the patch to trunk? > Job submission should fail if same uri is added for mapred.cache.files and > mapred.cache.archives > > > Key: MAPREDUCE-1641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1641 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache >Reporter: Amareshwari Sriramadasu >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: BZ-3539321--off-0-20-101--2010-04-20.patch, > duped-files-archives--off-0-20-101--2010-04-21.patch, > duped-files-archives--off-0-20-101--2010-04-23--1819.patch, > mapreduce-1641--2010-04-27.patch, mapreduce-1641--2010-05-19.patch, > patch-1641-ydist-bugfix.txt > > > The behavior of mapred.cache.files and mapred.cache.archives is different > during localization in the following way: > If a jar file is added to mapred.cache.files, it will be localized under > TaskTracker under a unique path. > If a jar file is added to mapred.cache.archives, it will be localized under a > unique path in a directory named the jar file name, and will be unarchived > under the same directory. > If same jar file is passed for both the configurations, the behavior > undefined. Thus the job submission should fail. > Currently, since distributed cache processes files before archives, the jar > file will be just localized and not unarchived. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1543) Log messages of JobACLsManager should use security logging of HADOOP-6586
[ https://issues.apache.org/jira/browse/MAPREDUCE-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1543: - Attachment: M1543-3.patch Merged with trunk. > Log messages of JobACLsManager should use security logging of HADOOP-6586 > - > > Key: MAPREDUCE-1543 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1543 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Reporter: Vinod K V >Assignee: Luke Lu > Fix For: 0.22.0 > > Attachments: hadoop-mapreduce.audit.log, M1543-3.patch, > mapreduce-1543-y20s-3.patch, mapreduce-1543-y20s.patch, > mr-1543-trunk-v2.patch, mr-1543-v1.9.2.patch > > > {{JobACLsManager}} added in MAPREDUCE-1307 logs the successes and failures > w.r.t job-level authorization in the corresponding Daemons' logs. The log > messages should instead use security logging of HADOOP-6586. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1617) TestBadRecords failed once in our test runs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1617: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed +1 I committed this. Thanks Amar and Luke! > TestBadRecords failed once in our test runs > --- > > Key: MAPREDUCE-1617 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1617 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.20.2 >Reporter: Amareshwari Sriramadasu >Assignee: Luke Lu > Fix For: 0.22.0 > > Attachments: mr-1617-trunk-v1.patch, mr-1617-trunk-v2.patch, > mr-1617-v1.3.patch, TestBadRecords.txt > > > org.apache.hadoop.mapred.TestBadRecords.testBadMapRed failed with the > following > exception: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142) > at > org.apache.hadoop.mapred.TestBadRecords.runMapReduce(TestBadRecords.java:94) > at > org.apache.hadoop.mapred.TestBadRecords.testBadMapRed(TestBadRecords.java:211) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-572) If #link is missing from uri format of -cacheArchive then streaming does not throw error.
[ https://issues.apache.org/jira/browse/MAPREDUCE-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869921#action_12869921 ] Hadoop QA commented on MAPREDUCE-572: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12445034/patch-572.txt against trunk revision 946833. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/198/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/198/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/198/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/198/console This message is automatically generated. > If #link is missing from uri format of -cacheArchive then streaming does not > throw error. > - > > Key: MAPREDUCE-572 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-572 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Karam Singh >Assignee: Amareshwari Sriramadasu >Priority: Minor > Fix For: 0.22.0 > > Attachments: patch-572.txt > > > Ran hadoop streaming command as -: > bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out > -mapper "xargs cat" -reducer "bin/cat" -cahceArchive hdfs://h:p/pathofJarFile > Streaming submits job to jobtracker and map fails. > For similar with -cacheFile -: > bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -input in -output out > -mapper "xargs cat" -reducer "bin/cat" -cahceFile hdfs://h:p/pathofFile > followinng error is repoerted back -: > [ > You need to specify the uris as hdfs://host:port/#linkname,Please specify a > different link name for all of your caching URIs > ] > Streaming should check about present #link after uri of cacheArchive and > should throw proper error . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1073: - Status: Patch Available (was: Open) > Progress reported for pipes tasks is incorrect. > --- > > Key: MAPREDUCE-1073 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 0.20.1 >Reporter: Sreekanth Ramakrishnan > Attachments: mapreduce-1073--2010-03-31.patch, > mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch > > > Currently in pipes, > {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, > OutputCollector, Reporter)}} we do the following: > {code} > while (input.next(key, value)) { > downlink.mapItem(key, value); > if(skipping) { > downlink.flush(); > } > } > {code} > This would result in consumption of all the records for current task and > taking task progress to 100% whereas the actual pipes application would be > trailing behind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1073: - Status: Open (was: Patch Available) > Progress reported for pipes tasks is incorrect. > --- > > Key: MAPREDUCE-1073 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: pipes >Affects Versions: 0.20.1 >Reporter: Sreekanth Ramakrishnan > Attachments: mapreduce-1073--2010-03-31.patch, > mapreduce-1073--2010-04-06.patch, MAPREDUCE-1073_yhadoop20.patch > > > Currently in pipes, > {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, > OutputCollector, Reporter)}} we do the following: > {code} > while (input.next(key, value)) { > downlink.mapItem(key, value); > if(skipping) { > downlink.flush(); > } > } > {code} > This would result in consumption of all the records for current task and > taking task progress to 100% whereas the actual pipes application would be > trailing behind. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1662) TaskRunner.prepare() and close() can be removed
[ https://issues.apache.org/jira/browse/MAPREDUCE-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1662: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Assignee: Amareshwari Sriramadasu Resolution: Fixed +1 I committed this. Thanks, Amareshwari! > TaskRunner.prepare() and close() can be removed > --- > > Key: MAPREDUCE-1662 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1662 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1662.txt > > > TaskRunner.prepare() and close() methods call only mapOutputFile.removeAll(). > The removeAll() call is a always a no-op in prepare(), because the directory > is always empty during start up of the task. The removeAll() call in close() > is useless, because it is followed by a attempt directory cleanup. Since the > map output files are in attempt directory, the call to close() is useless. > After MAPREDUCE-842, these calls are under TaskTracker space, passing the > wrong conf. Now, the calls do not make sense at all. > I think we can remove the methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat reassigned MAPREDUCE-1778: - Assignee: Amar Kamat > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Amar Kamat > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1683) Remove JNI calls from ClusterStatus cstr
[ https://issues.apache.org/jira/browse/MAPREDUCE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1683: - Status: Patch Available (was: Open) > Remove JNI calls from ClusterStatus cstr > > > Key: MAPREDUCE-1683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2 >Reporter: Chris Douglas >Assignee: Luke Lu > Fix For: 0.21.0, 0.22.0 > > Attachments: M1683-1.patch, MAPREDUCE-1683_part2_yhadoop_20_10.patch, > MAPREDUCE-1683_yhadoop_20_9.patch, MAPREDUCE-1683_yhadoop_20_S.patch, > mr-1683-trunk-v1.patch > > > The {{ClusterStatus}} constructor makes two JNI calls to the {{Runtime}} to > fetch memory information. {{ClusterStatus}} instances are often created > inside the {{JobTracker}} to obtain other, unrelated metrics (sometimes from > schedulers' inner loops). Given that this information is related to the > {{JobTracker}} process and not the cluster, the metrics are also available > via {{JvmMetrics}}, and the jsps can gather this information for themselves: > these fields can be removed from {{ClusterStatus}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1683) Remove JNI calls from ClusterStatus cstr
[ https://issues.apache.org/jira/browse/MAPREDUCE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1683: - Attachment: M1683-1.patch That's a good idea. Updated patch > Remove JNI calls from ClusterStatus cstr > > > Key: MAPREDUCE-1683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2 >Reporter: Chris Douglas >Assignee: Luke Lu > Fix For: 0.21.0, 0.22.0 > > Attachments: M1683-1.patch, MAPREDUCE-1683_part2_yhadoop_20_10.patch, > MAPREDUCE-1683_yhadoop_20_9.patch, MAPREDUCE-1683_yhadoop_20_S.patch, > mr-1683-trunk-v1.patch > > > The {{ClusterStatus}} constructor makes two JNI calls to the {{Runtime}} to > fetch memory information. {{ClusterStatus}} instances are often created > inside the {{JobTracker}} to obtain other, unrelated metrics (sometimes from > schedulers' inner loops). Given that this information is related to the > {{JobTracker}} process and not the cluster, the metrics are also available > via {{JvmMetrics}}, and the jsps can gather this information for themselves: > these fields can be removed from {{ClusterStatus}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1683) Remove JNI calls from ClusterStatus cstr
[ https://issues.apache.org/jira/browse/MAPREDUCE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1683: - Status: Open (was: Patch Available) > Remove JNI calls from ClusterStatus cstr > > > Key: MAPREDUCE-1683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2 >Reporter: Chris Douglas >Assignee: Luke Lu > Fix For: 0.21.0, 0.22.0 > > Attachments: M1683-1.patch, MAPREDUCE-1683_part2_yhadoop_20_10.patch, > MAPREDUCE-1683_yhadoop_20_9.patch, MAPREDUCE-1683_yhadoop_20_S.patch, > mr-1683-trunk-v1.patch > > > The {{ClusterStatus}} constructor makes two JNI calls to the {{Runtime}} to > fetch memory information. {{ClusterStatus}} instances are often created > inside the {{JobTracker}} to obtain other, unrelated metrics (sometimes from > schedulers' inner loops). Given that this information is related to the > {{JobTracker}} process and not the cluster, the metrics are also available > via {{JvmMetrics}}, and the jsps can gather this information for themselves: > these fields can be removed from {{ClusterStatus}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1807) TestQueueManager can take long enough to time out
[ https://issues.apache.org/jira/browse/MAPREDUCE-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-1807. Resolution: Duplicate I agree it is fixed by MAPREDUCE-28. Resolving as duplicate Dick, please re-open if you don't think so. > TestQueueManager can take long enough to time out > - > > Key: MAPREDUCE-1807 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1807 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King > > Sometimes TestQueueManager takes such a long time that the JUnit engine times > out and declares it a failure. We should fix this, possibly by splitting the > file's 19 test cases into two or more manageable test sets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-629) Modify TestQueueManager to improve execution time
[ https://issues.apache.org/jira/browse/MAPREDUCE-629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu resolved MAPREDUCE-629. --- Resolution: Duplicate Fixed by MAPREDUCE-28 > Modify TestQueueManager to improve execution time > - > > Key: MAPREDUCE-629 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-629 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jothi Padmanabhan >Priority: Minor > > With a few small changes, the run time of this test can be brought down by > half. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-488) JobTracker webui should report heap memory used
[ https://issues.apache.org/jira/browse/MAPREDUCE-488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-488: - Summary: JobTracker webui should report heap memory used (was: ClusterStatus should report heap memory used) Description: As of today JobTracker's webui reports _total-available-heap-memory_ and _max-heap-memory_. I think it will be useful to show the _actual_ heap memory used i.e {{total - free}}. (was: As of today {{ClusterStatus}} reports _total-available-heap-memory_ and _max-heap-memory_. I think it will be useful to show the _actual_ heap memory used i.e {{total - free}}. Note that this was introduced by HADOOP-4435.) > JobTracker webui should report heap memory used > --- > > Key: MAPREDUCE-488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-488 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Amar Kamat >Priority: Minor > Attachments: HADOOP-4929-v1.0.patch > > > As of today JobTracker's webui reports _total-available-heap-memory_ and > _max-heap-memory_. I think it will be useful to show the _actual_ heap memory > used i.e {{total - free}}. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.