[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Release Note: (was: Resubmit the patch for fixing the find bug warnings.) > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563385#comment-13563385 ] Hadoop QA commented on MAPREDUCE-4961: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566612/MAPREDUCE-4961.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3279//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3279//console This message is automatically generated. > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Release Note: Resubmit the patch for fixing the find bug warnings. Status: Patch Available (was: Open) > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Attachment: (was: MAPREDUCE-4961.patch) > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Status: Open (was: Patch Available) > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Attachment: MAPREDUCE-4961.patch Update the patch for fixing the find bug warning. > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4963: -- Resolution: Fixed Fix Version/s: 1.2.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Robert. Committed to branch-1. > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 1.2.0 > > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563343#comment-13563343 ] Hadoop QA commented on MAPREDUCE-4961: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566604/MAPREDUCE-4961.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//console This message is automatically generated. > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563341#comment-13563341 ] Alejandro Abdelnur commented on MAPREDUCE-4963: --- +1 > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563325#comment-13563325 ] Hadoop QA commented on MAPREDUCE-4963: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566603/MAPREDUCE-4963.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3277//console This message is automatically generated. > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Status: Patch Available (was: Open) The primary modifications are: 1. Go through ShuffleConsumerPlugin.runLocal when isLocal is set true. This makes the code path for isLocal almost the same as !isLocal except the copy phase was completed. 2. Default Shuffle implementation will route runLocal to MergeManager.closeLocal for allow MergeManager implementation handling the merge staff. Please kindly help review. > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen updated MAPREDUCE-4961: -- Attachment: MAPREDUCE-4961.patch Patch for the fix attached. > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Attachments: MAPREDUCE-4961.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563321#comment-13563321 ] Robert Kanter commented on MAPREDUCE-4963: -- The patch fixes the problem by keeping a separate counter for each set of buckets and checking the length of the buckets. I also added a test that does something similar to the above example. > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-4963: - Status: Patch Available (was: Open) > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-4963: - Attachment: MAPREDUCE-4963.patch > StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" > statistics for new TaskTrackers > > > Key: MAPREDUCE-4963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.1.1 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-4963.patch > > > The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", > "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last > Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses > buckets to manage when to shift task counts from "Last Hour" to "Last Day" > and out of "Last Day". After the JT has been running for a while, the > connected TTs will have the max number of buckets and will keep shifting them > at each update. If a new TT connects (or an old on rejoins), it won't have > the max number of buckets, but the code that drops the buckets uses the same > counter for all sets of buckets. This means that new TTs will prematurely > drop their buckets and the stats will be incorrect. > example: > # Max buckets is 5 > # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) > # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) > # TaskTracker B runs 3 tasks and TaskTracker A runs 5 > # An update occurs > # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) > # TaskTracker B should have [3] but it will drop that bucket after adding it > during the update and instead have [ ] again (i.e. 0) > # TaskTracker B will keep doing that forever and always show 0 in the web UI > We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers
Robert Kanter created MAPREDUCE-4963: Summary: StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers Key: MAPREDUCE-4963 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.1.1 Reporter: Robert Kanter Assignee: Robert Kanter The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last Hour" per Task Tracker which is displayed on the JobTracker web UI. It uses buckets to manage when to shift task counts from "Last Hour" to "Last Day" and out of "Last Day". After the JT has been running for a while, the connected TTs will have the max number of buckets and will keep shifting them at each update. If a new TT connects (or an old on rejoins), it won't have the max number of buckets, but the code that drops the buckets uses the same counter for all sets of buckets. This means that new TTs will prematurely drop their buckets and the stats will be incorrect. example: # Max buckets is 5 # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19) # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0) # TaskTracker B runs 3 tasks and TaskTracker A runs 5 # An update occurs # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20) # TaskTracker B should have [3] but it will drop that bucket after adding it during the update and instead have [ ] again (i.e. 0) # TaskTracker B will keep doing that forever and always show 0 in the web UI We can fix this by not using the same counter for all sets of buckets -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen reassigned MAPREDUCE-4961: - Assignee: Jerry Chen > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4958) close method of RawKeyValueIterator is not called after finish using.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jerry Chen reassigned MAPREDUCE-4958: - Assignee: Jerry Chen > close method of RawKeyValueIterator is not called after finish using. > - > > Key: MAPREDUCE-4958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4958 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: trunk >Reporter: Jerry Chen >Assignee: Jerry Chen > Original Estimate: 48h > Remaining Estimate: 48h > > I observed that the close method of the RawKeyValueIterator returned from > MergeManager is not called. > Which will cause resource leaks for RawKeyValueIterator implementation which > depends on the RawKeyValueIterator.close for doing cleanup when finished. > Some other places in MapTask also not follow the convension to call > RawKeyValueIterator.close after use it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563288#comment-13563288 ] Hadoop QA commented on MAPREDUCE-4962: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566593/MAPREDUCE-4962.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3276//console This message is automatically generated. > jobdetails.jsp uses display name instead of real name to get counters > - > > Key: MAPREDUCE-4962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker, mrv1 >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4962.patch > > > jobdetails.jsp displays details for a job including its counters. Counters > may have different real names and display names, but the display names are > used to look the counter values up, so counter values can incorrectly show up > as 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-4962: -- Attachment: MAPREDUCE-4962.patch > jobdetails.jsp uses display name instead of real name to get counters > - > > Key: MAPREDUCE-4962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker, mrv1 >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4962.patch > > > jobdetails.jsp displays details for a job including its counters. Counters > may have different real names and display names, but the display names are > used to look the counter values up, so counter values can incorrectly show up > as 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-4962: -- Status: Patch Available (was: Open) > jobdetails.jsp uses display name instead of real name to get counters > - > > Key: MAPREDUCE-4962 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker, mrv1 >Affects Versions: 1.1.1 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4962.patch > > > jobdetails.jsp displays details for a job including its counters. Counters > may have different real names and display names, but the display names are > used to look the counter values up, so counter values can incorrectly show up > as 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters
Sandy Ryza created MAPREDUCE-4962: - Summary: jobdetails.jsp uses display name instead of real name to get counters Key: MAPREDUCE-4962 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, mrv1 Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza jobdetails.jsp displays details for a job including its counters. Counters may have different real names and display names, but the display names are used to look the counter values up, so counter values can incorrectly show up as 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563228#comment-13563228 ] Hudson commented on MAPREDUCE-4049: --- Integrated in Hadoop-trunk-Commit #3282 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3282/]) Amending MR CHANGES.txt to reflect that MAPREDUCE-4049/4809/4807/4808 are in branch-2 (Revision 1438799) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438799 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt > plugin for generic shuffle service > -- > > Key: MAPREDUCE-4049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: performance, task, tasktracker >Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 >Reporter: Avner BenHanoch >Assignee: Avner BenHanoch > Labels: merge, plugin, rdma, shuffle > Fix For: 2.0.3-alpha > > Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, > MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch > > > Support generic shuffle service as set of two plugins: ShuffleProvider & > ShuffleConsumer. > This will satisfy the following needs: > # Better shuffle and merge performance. For example: we are working on > shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, > or Infiniband) instead of using the current HTTP shuffle. Based on the fast > RDMA shuffle, the plugin can also utilize a suitable merge approach during > the intermediate merges. Hence, getting much better performance. > # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden > dependency of NodeManager with a specific version of mapreduce shuffle > (currently targeted to 0.24.0). > References: > # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu > from Auburn University with others, > [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] > # I am attaching 2 documents with suggested Top Level Design for both plugins > (currently, based on 1.0 branch) > # I am providing link for downloading UDA - Mellanox's open source plugin > that implements generic shuffle service using RDMA and levitated merge. > Note: At this phase, the code is in C++ through JNI and you should consider > it as beta only. Still, it can serve anyone that wants to implement or > contribute to levitated merge. (Please be advised that levitated merge is > mostly suit in very fast networks) - > [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-2264: -- Fix Version/s: (was: 3.0.0) 2.0.3-alpha > Job status exceeds 100% in some cases > -- > > Key: MAPREDUCE-2264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2, 0.20.205.0 >Reporter: Adam Kramer >Assignee: Devaraj K > Labels: critical-0.22.0 > Fix For: 1.2.0, 2.0.3-alpha > > Attachments: MAPREDUCE-2264-0.20.205-1.patch, > MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, > MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, > MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, > MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, > MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp > > > I'm looking now at my jobtracker's list of running reduce tasks. One of them > is 120.05% complete, the other is 107.28% complete. > I understand that these numbers are estimates, but there is no case in which > an estimate of 100% for a non-complete task is better than an estimate of > 99.99%, nor is there any case in which an estimate greater than 100% is valid. > I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-2454: -- Fix Version/s: (was: 3.0.0) 2.0.3-alpha > Allow external sorter plugin for MR > --- > > Key: MAPREDUCE-2454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan >Priority: Minor > Labels: features, performance, plugin, sort > Fix For: 2.0.3-alpha > > Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, > KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, > mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, > mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, > mapreduce-2454.patch, mapreduce-2454-protection-change.patch, > mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, > ReduceInputSorter.java > > > Define interfaces and some abstract classes in the Hadoop framework to > facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4808: -- Fix Version/s: (was: 3.0.0) 2.0.3-alpha > Refactor MapOutput and MergeManager to facilitate reuse by Shuffle > implementations > -- > > Key: MAPREDUCE-4808 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Arun C Murthy >Assignee: Mariappan Asokan > Fix For: 2.0.3-alpha > > Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, > M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, > mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, > mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, > MergeManagerPlugin.pdf, MR-4808.patch > > > Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for > alternate implementations to be able to reuse portions of the default > implementation. > This would come with the strong caveat that these classes are LimitedPrivate > and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4807: -- Fix Version/s: (was: trunk) 2.0.3-alpha > Allow MapOutputBuffer to be pluggable > - > > Key: MAPREDUCE-4807 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Arun C Murthy >Assignee: Mariappan Asokan > Fix For: 2.0.3-alpha > > Attachments: COMBO-mapreduce-4809-4807.patch, > COMBO-mapreduce-4809-4807.patch, COMBO-mapreduce-4809-4807.patch, > mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, > mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch > > > Allow MapOutputBuffer to be pluggable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4809) Change visibility of classes for pluggable sort changes
[ https://issues.apache.org/jira/browse/MAPREDUCE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4809: -- Fix Version/s: (was: trunk) 2.0.3-alpha > Change visibility of classes for pluggable sort changes > --- > > Key: MAPREDUCE-4809 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4809 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 2.0.2-alpha >Reporter: Arun C Murthy >Assignee: Mariappan Asokan > Fix For: 2.0.3-alpha > > Attachments: MAPREDUCE-4809-1.patch, mapreduce-4809.patch, > mapreduce-4809.patch, mapreduce-4809.patch > > > Make classes required for MAPREDUCE-2454 to be java public (with > LimitedPrivate) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4049: -- Fix Version/s: (was: 3.0.0) 2.0.3-alpha > plugin for generic shuffle service > -- > > Key: MAPREDUCE-4049 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: performance, task, tasktracker >Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0 >Reporter: Avner BenHanoch >Assignee: Avner BenHanoch > Labels: merge, plugin, rdma, shuffle > Fix For: 2.0.3-alpha > > Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, > MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch > > > Support generic shuffle service as set of two plugins: ShuffleProvider & > ShuffleConsumer. > This will satisfy the following needs: > # Better shuffle and merge performance. For example: we are working on > shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, > or Infiniband) instead of using the current HTTP shuffle. Based on the fast > RDMA shuffle, the plugin can also utilize a suitable merge approach during > the intermediate merges. Hence, getting much better performance. > # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden > dependency of NodeManager with a specific version of mapreduce shuffle > (currently targeted to 0.24.0). > References: > # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu > from Auburn University with others, > [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf] > # I am attaching 2 documents with suggested Top Level Design for both plugins > (currently, based on 1.0 branch) > # I am providing link for downloading UDA - Mellanox's open source plugin > that implements generic shuffle service using RDMA and levitated merge. > Note: At this phase, the code is in C++ through JNI and you should consider > it as beta only. Still, it can serve anyone that wants to implement or > contribute to levitated merge. (Please be advised that levitated merge is > mostly suit in very fast networks) - > [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4918) Better error message in TrackerDistributedCacheManager.ancestorsHaveExecutePermissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562990#comment-13562990 ] Hadoop QA commented on MAPREDUCE-4918: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564360/MAPREDUCE-4918.1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3275//console This message is automatically generated. > Better error message in > TrackerDistributedCacheManager.ancestorsHaveExecutePermissions > -- > > Key: MAPREDUCE-4918 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4918 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Xuan Gong >Priority: Minor > Attachments: MAPREDUCE-4918.1.patch > > > Better logging/error message in > TrackerDistributedCacheManager.ancestorsHaveExecutePermissions should help > debugging (e.g. MAPREDUCE-4916). We should log the offending parent directory > with the incorrect permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4918) Better error message in TrackerDistributedCacheManager.ancestorsHaveExecutePermissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated MAPREDUCE-4918: - Status: Patch Available (was: Open) > Better error message in > TrackerDistributedCacheManager.ancestorsHaveExecutePermissions > -- > > Key: MAPREDUCE-4918 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4918 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arun C Murthy >Assignee: Xuan Gong >Priority: Minor > Attachments: MAPREDUCE-4918.1.patch > > > Better logging/error message in > TrackerDistributedCacheManager.ancestorsHaveExecutePermissions should help > debugging (e.g. MAPREDUCE-4916). We should log the offending parent directory > with the incorrect permissions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2208) Flexible CSV text parser InputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562696#comment-13562696 ] Marcelo Elias Del Valle commented on MAPREDUCE-2208: Created an improved version of a CSVInputFormat, able to read multiline CSVs, just in case it interests: https://github.com/mvallebr/CSVInputFormat > Flexible CSV text parser InputFormat > > > Key: MAPREDUCE-2208 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2208 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Lance Norskog >Priority: Trivial > Attachments: CSVTextInputFormat.java, TestCSVTextFormat.java > > > CSVTextInputFormat is a configurable CSV parser tuned to most of the > csv-style datasets I've found. The Hadoop samples I've seen all > FileInputFormat and Mapper. They drop the Longwritable key > and parse the Text value as a CSV line. But, they are all custom-coded for > the format. > CSVTextInputFormat takes any csv-encoded file and rearrange the fields into > the format required by a Mapper. You can drop fields & rearrange them. There > is also a random sampling option to make training/test runs easier. > Attached are CSVTextInputFormat.java and a unit test for it. Both go into > org.apache.hadoop.mapreduce.lib.input under src/java and test/mapred/src. > This is compiled against hadoop-0.0.20. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4709) Counters that track max values
[ https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562685#comment-13562685 ] Harsh J commented on MAPREDUCE-4709: Hi Arun, Sorry we forgot to link the discussion, but please also see http://search-hadoop.com/m/cuZMf2humC > Counters that track max values > -- > > Key: MAPREDUCE-4709 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeremy Lewi >Priority: Minor > > A nice feature to help monitor MR jobs would be mapreduce counters that track > the maximum of some metric across all workers. These trackers would work just > like regular counters except it would track the max value of all arguments > passed to the "increment" function as opposed to summing them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562655#comment-13562655 ] Hudson commented on MAPREDUCE-2264: --- Integrated in Hadoop-Mapreduce-trunk #1324 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1324/]) MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and sandyr via tucu) (Revision 1438277) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java > Job status exceeds 100% in some cases > -- > > Key: MAPREDUCE-2264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2, 0.20.205.0 >Reporter: Adam Kramer >Assignee: Devaraj K > Labels: critical-0.22.0 > Fix For: 1.2.0, 3.0.0 > > Attachments: MAPREDUCE-2264-0.20.205-1.patch, > MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, > MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, > MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, > MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, > MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp > > > I'm looking now at my jobtracker's list of running reduce tasks. One of them > is 120.05% complete, the other is 107.28% complete. > I understand that these numbers are estimates, but there is no case in which > an estimate of 100% for a non-complete task is better than an estimate of > 99.99%, nor is there any case in which an estimate greater than 100% is valid. > I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562641#comment-13562641 ] Hudson commented on MAPREDUCE-2264: --- Integrated in Hadoop-Hdfs-trunk #1296 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1296/]) MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and sandyr via tucu) (Revision 1438277) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java > Job status exceeds 100% in some cases > -- > > Key: MAPREDUCE-2264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2, 0.20.205.0 >Reporter: Adam Kramer >Assignee: Devaraj K > Labels: critical-0.22.0 > Fix For: 1.2.0, 3.0.0 > > Attachments: MAPREDUCE-2264-0.20.205-1.patch, > MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, > MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, > MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, > MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, > MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp > > > I'm looking now at my jobtracker's list of running reduce tasks. One of them > is 120.05% complete, the other is 107.28% complete. > I understand that these numbers are estimates, but there is no case in which > an estimate of 100% for a non-complete task is better than an estimate of > 99.99%, nor is there any case in which an estimate greater than 100% is valid. > I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562621#comment-13562621 ] Gelesh commented on MAPREDUCE-4882: --- Could you please share how is it impacting ? > Error in estimating the length of the output file in Spill Phase > > > Key: MAPREDUCE-4882 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2, 1.0.3 > Environment: Any Environment >Reporter: Lijie Xu > Labels: patch > Original Estimate: 1h > Remaining Estimate: 1h > > The sortAndSpill() method in MapTask.java has an error in estimating the > length of the output file. > The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - > bufend) + bufstart" when "bufend < bufstart". > Here is the original code in MapTask.java. > private void sortAndSpill() throws IOException, ClassNotFoundException, >InterruptedException { > //approximate the length of the output file to be the length of the > //buffer + header lengths for the partitions > long size = (bufend >= bufstart > ? bufend - bufstart > : (bufvoid - bufend) + bufstart) + > partitions * APPROX_HEADER_LENGTH; > FSDataOutputStream out = null; > -- > I had a test on "TeraSort". A snippet from mapper's log is as follows: > MapTask: Spilling map output: record full = true > MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440 > MapTask: kvstart = 262142; kvend = 131069; length = 655360 > MapTask: Finished spill 3 > In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = > 52428700 (52 MB) because the number of spilled records is 524287 and each > record costs 100B. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562607#comment-13562607 ] Tom White commented on MAPREDUCE-4951: -- +1 on the latest patch. > Container preemption interpreted as task failure > > > Key: MAPREDUCE-4951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mr-am, mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, > MAPREDUCE-4951.patch > > > When YARN reports a completed container to the MR AM, it always interprets it > as a failure. This can lead to a job failing because too many of its tasks > failed, when in fact they only failed because the scheduler preempted them. > MR needs to recognize the special exit code value of -100 and interpret it as > a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2931) CLONE - LocalJobRunner should support parallel mapper execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-2931: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks, Sandy. > CLONE - LocalJobRunner should support parallel mapper execution > --- > > Key: MAPREDUCE-2931 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2931 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Forest Tan >Assignee: Sandy Ryza > Attachments: MAPREDUCE-1367-branch1.patch > > > The LocalJobRunner currently supports only a single execution thread. Given > the prevalence of multi-core CPUs, it makes sense to allow users to run > multiple tasks in parallel for improved performance on small (local-only) > jobs. > It is necessary to patch back MAPREDUCE-1367 into Hadoop 0.20.X version. > Also, MapReduce-434 should be submitted together. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562589#comment-13562589 ] Hudson commented on MAPREDUCE-2264: --- Integrated in Hadoop-Yarn-trunk #107 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/107/]) MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and sandyr via tucu) (Revision 1438277) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java > Job status exceeds 100% in some cases > -- > > Key: MAPREDUCE-2264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.2, 0.20.205.0 >Reporter: Adam Kramer >Assignee: Devaraj K > Labels: critical-0.22.0 > Fix For: 1.2.0, 3.0.0 > > Attachments: MAPREDUCE-2264-0.20.205-1.patch, > MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, > MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, > MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, > MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, > MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp > > > I'm looking now at my jobtracker's list of running reduce tasks. One of them > is 120.05% complete, the other is 107.28% complete. > I understand that these numbers are estimates, but there is no case in which > an estimate of 100% for a non-complete task is better than an estimate of > 99.99%, nor is there any case in which an estimate greater than 100% is valid. > I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4709) Counters that track max values
[ https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562567#comment-13562567 ] Arun A K commented on MAPREDUCE-4709: - @Jeremy Lewi, Could you please elaborate on the problem with an example? > Counters that track max values > -- > > Key: MAPREDUCE-4709 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeremy Lewi >Priority: Minor > > A nice feature to help monitor MR jobs would be mapreduce counters that track > the maximum of some metric across all workers. These trackers would work just > like regular counters except it would track the max value of all arguments > passed to the "increment" function as opposed to summing them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4875) coverage fixing for org.apache.hadoop.mapred
[ https://issues.apache.org/jira/browse/MAPREDUCE-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562531#comment-13562531 ] Hadoop QA commented on MAPREDUCE-4875: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566466/MAPREDUCE-4875-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 17 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3274//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3274//console This message is automatically generated. > coverage fixing for org.apache.hadoop.mapred > > > Key: MAPREDUCE-4875 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4875 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 >Reporter: Aleksey Gorshkov > Fix For: 3.0.0, 2.0.3-alpha, 0.23.6 > > Attachments: MAPREDUCE-4875-branch-0.23.patch, > MAPREDUCE-4875-trunk.patch > > > added some tests for org.apache.hadoop.mapred > MAPREDUCE-4875-trunk.patch for trunk and branch-2 > MAPREDUCE-4875-branch-0.23.patch for branch-0.23 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4961: -- Assignee: (was: Tsuyoshi OZAWA) > Map reduce running local should also go through ShuffleConsumerPlugin for > enabling different MergeManager implementations > - > > Key: MAPREDUCE-4961 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Jerry Chen > Original Estimate: 72h > Remaining Estimate: 72h > > MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 > extends Shuffle to be able to provide different MergeManager implementations. > While using these pluggable features, I find that when a map reduce is > running locally, a RawKeyValueIterator was returned directly from a static > call of Merge.merge, which break the assumption that the Shuffle may provide > different merge methods although there is no copy phase for this situation. > The use case is when I am implementating a hash-based MergeManager, we don't > need sort in map side, while when running the map reduce locally, the > hash-based MergeManager will have no chance to be used as it goes directly to > Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete. > So we need to move the code calling Merger.merge from Reduce Task to > ShuffleConsumerPlugin implementation, so that the Suffle implementation can > decide how to do the merge and return corresponding iterator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira