[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Release Note:   (was: Resubmit the patch for fixing the find bug warnings.)

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563385#comment-13563385
 ] 

Hadoop QA commented on MAPREDUCE-4961:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566612/MAPREDUCE-4961.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3279//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3279//console

This message is automatically generated.

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Release Note: Resubmit the patch for fixing the find bug warnings.
  Status: Patch Available  (was: Open)

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Attachment: (was: MAPREDUCE-4961.patch)

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Status: Open  (was: Patch Available)

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Attachment: MAPREDUCE-4961.patch

Update the patch for fixing the find bug warning.

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch, MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4963:
--

   Resolution: Fixed
Fix Version/s: 1.2.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Robert. Committed to branch-1.

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 1.2.0
>
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563343#comment-13563343
 ] 

Hadoop QA commented on MAPREDUCE-4961:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566604/MAPREDUCE-4961.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3278//console

This message is automatically generated.

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563341#comment-13563341
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4963:
---

+1

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563325#comment-13563325
 ] 

Hadoop QA commented on MAPREDUCE-4963:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566603/MAPREDUCE-4963.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3277//console

This message is automatically generated.

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Status: Patch Available  (was: Open)

The primary modifications are:
1. Go through ShuffleConsumerPlugin.runLocal when isLocal is set true. This 
makes the code path for isLocal almost the same as !isLocal except the copy 
phase was completed.

2. Default Shuffle implementation will route runLocal to 
MergeManager.closeLocal for allow MergeManager implementation handling the 
merge staff.

Please kindly help review.

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen updated MAPREDUCE-4961:
--

Attachment: MAPREDUCE-4961.patch

Patch for the fix attached.

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
> Attachments: MAPREDUCE-4961.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563321#comment-13563321
 ] 

Robert Kanter commented on MAPREDUCE-4963:
--

The patch fixes the problem by keeping a separate counter for each set of 
buckets and checking the length of the buckets.  I also added a test that does 
something similar to the above example.  

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-4963:
-

Status: Patch Available  (was: Open)

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-4963:
-

Attachment: MAPREDUCE-4963.patch

> StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" 
> statistics for new TaskTrackers
> 
>
> Key: MAPREDUCE-4963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.1.1
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-4963.patch
>
>
> The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
> "Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
> Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
> buckets to manage when to shift task counts from "Last Hour" to "Last Day" 
> and out of "Last Day".  After the JT has been running for a while, the 
> connected TTs will have the max number of buckets and will keep shifting them 
> at each update.  If a new TT connects (or an old on rejoins), it won't have 
> the max number of buckets, but the code that drops the buckets uses the same 
> counter for all sets of buckets.  This means that new TTs will prematurely 
> drop their buckets and the stats will be incorrect.  
> example:
> # Max buckets is 5
> # TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
> # A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
> # TaskTracker B runs 3 tasks and TaskTracker A runs 5
> # An update occurs
> # TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
> # TaskTracker B should have [3] but it will drop that bucket after adding it 
> during the update and instead have [ ] again (i.e. 0)
> # TaskTracker B will keep doing that forever and always show 0 in the web UI
> We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4963) StatisticsCollector improperly keeps track of "Last Day" and "Last Hour" statistics for new TaskTrackers

2013-01-25 Thread Robert Kanter (JIRA)
Robert Kanter created MAPREDUCE-4963:


 Summary: StatisticsCollector improperly keeps track of "Last Day" 
and "Last Hour" statistics for new TaskTrackers
 Key: MAPREDUCE-4963
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4963
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.1
Reporter: Robert Kanter
Assignee: Robert Kanter


The StatisticsCollector keeps track of updates to the "Total Tasks Last Day", 
"Succeed Tasks Last Day", "Total Tasks Last Hour", and "Succeeded Tasks Last 
Hour" per Task Tracker which is displayed on the JobTracker web UI.  It uses 
buckets to manage when to shift task counts from "Last Hour" to "Last Day" and 
out of "Last Day".  After the JT has been running for a while, the connected 
TTs will have the max number of buckets and will keep shifting them at each 
update.  If a new TT connects (or an old on rejoins), it won't have the max 
number of buckets, but the code that drops the buckets uses the same counter 
for all sets of buckets.  This means that new TTs will prematurely drop their 
buckets and the stats will be incorrect.  

example:
# Max buckets is 5
# TaskTracker A has these values in its buckets [4, 2, 0, 3, 10] (i.e. 19)
# A new TaskTracker, B, connects; it has nothing in its buckets: [ ] (i.e. 0)
# TaskTracker B runs 3 tasks and TaskTracker A runs 5
# An update occurs
# TaskTracker A has [2, 0, 3, 10, 5] (i.e. 20)
# TaskTracker B should have [3] but it will drop that bucket after adding it 
during the update and instead have [ ] again (i.e. 0)
# TaskTracker B will keep doing that forever and always show 0 in the web UI

We can fix this by not using the same counter for all sets of buckets

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen reassigned MAPREDUCE-4961:
-

Assignee: Jerry Chen

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-4958) close method of RawKeyValueIterator is not called after finish using.

2013-01-25 Thread Jerry Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Chen reassigned MAPREDUCE-4958:
-

Assignee: Jerry Chen

> close method of RawKeyValueIterator is not called after finish using.
> -
>
> Key: MAPREDUCE-4958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4958
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: trunk
>Reporter: Jerry Chen
>Assignee: Jerry Chen
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I observed that the close method of the RawKeyValueIterator returned from 
> MergeManager is not called.
> Which will cause resource leaks for RawKeyValueIterator implementation which 
> depends on the RawKeyValueIterator.close for doing cleanup when finished.
> Some other places in MapTask also not follow the convension to call 
> RawKeyValueIterator.close after use it. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563288#comment-13563288
 ] 

Hadoop QA commented on MAPREDUCE-4962:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566593/MAPREDUCE-4962.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3276//console

This message is automatically generated.

> jobdetails.jsp uses display name instead of real name to get counters
> -
>
> Key: MAPREDUCE-4962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, mrv1
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4962.patch
>
>
> jobdetails.jsp displays details for a job including its counters.  Counters 
> may have different real names and display names, but the display names are 
> used to look the counter values up, so counter values can incorrectly show up 
> as 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters

2013-01-25 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4962:
--

Attachment: MAPREDUCE-4962.patch

> jobdetails.jsp uses display name instead of real name to get counters
> -
>
> Key: MAPREDUCE-4962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, mrv1
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4962.patch
>
>
> jobdetails.jsp displays details for a job including its counters.  Counters 
> may have different real names and display names, but the display names are 
> used to look the counter values up, so counter values can incorrectly show up 
> as 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters

2013-01-25 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-4962:
--

Status: Patch Available  (was: Open)

> jobdetails.jsp uses display name instead of real name to get counters
> -
>
> Key: MAPREDUCE-4962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker, mrv1
>Affects Versions: 1.1.1
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4962.patch
>
>
> jobdetails.jsp displays details for a job including its counters.  Counters 
> may have different real names and display names, but the display names are 
> used to look the counter values up, so counter values can incorrectly show up 
> as 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4962) jobdetails.jsp uses display name instead of real name to get counters

2013-01-25 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-4962:
-

 Summary: jobdetails.jsp uses display name instead of real name to 
get counters
 Key: MAPREDUCE-4962
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4962
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker, mrv1
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


jobdetails.jsp displays details for a job including its counters.  Counters may 
have different real names and display names, but the display names are used to 
look the counter values up, so counter values can incorrectly show up as 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563228#comment-13563228
 ] 

Hudson commented on MAPREDUCE-4049:
---

Integrated in Hadoop-trunk-Commit #3282 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3282/])
Amending MR CHANGES.txt to reflect that MAPREDUCE-4049/4809/4807/4808 are 
in branch-2 (Revision 1438799)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438799
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt


> plugin for generic shuffle service
> --
>
> Key: MAPREDUCE-4049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: performance, task, tasktracker
>Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>Reporter: Avner BenHanoch
>Assignee: Avner BenHanoch
>  Labels: merge, plugin, rdma, shuffle
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin 
> that implements generic shuffle service using RDMA and levitated merge.  
> Note: At this phase, the code is in C++ through JNI and you should consider 
> it as beta only.  Still, it can serve anyone that wants to implement or 
> contribute to levitated merge. (Please be advised that levitated merge is 
> mostly suit in very fast networks) - 
> [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-2264:
--

Fix Version/s: (was: 3.0.0)
   2.0.3-alpha

> Job status exceeds 100% in some cases 
> --
>
> Key: MAPREDUCE-2264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Adam Kramer
>Assignee: Devaraj K
>  Labels: critical-0.22.0
> Fix For: 1.2.0, 2.0.3-alpha
>
> Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
> MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
> MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
> MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
> MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
> MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp
>
>
> I'm looking now at my jobtracker's list of running reduce tasks. One of them 
> is 120.05% complete, the other is 107.28% complete.
> I understand that these numbers are estimates, but there is no case in which 
> an estimate of 100% for a non-complete task is better than an estimate of 
> 99.99%, nor is there any case in which an estimate greater than 100% is valid.
> I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-2454:
--

Fix Version/s: (was: 3.0.0)
   2.0.3-alpha

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Fix For: 2.0.3-alpha
>
> Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
> KeyValueIterator.java, MapOutputSorterAbstract.java, MapOutputSorter.java, 
> mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, 
> mapreduce-2454-new-test.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
> mapreduce-2454.patch, mapreduce-2454-protection-change.patch, 
> mr-2454-on-mr-279-build82.patch.gz, MR-2454-trunkPatchPreview.gz, 
> ReduceInputSorter.java
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4808:
--

Fix Version/s: (was: 3.0.0)
   2.0.3-alpha

> Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
> implementations
> --
>
> Key: MAPREDUCE-4808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
> M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
> MergeManagerPlugin.pdf, MR-4808.patch
>
>
> Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
> alternate implementations to be able to reuse portions of the default 
> implementation. 
> This would come with the strong caveat that these classes are LimitedPrivate 
> and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4807) Allow MapOutputBuffer to be pluggable

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4807:
--

Fix Version/s: (was: trunk)
   2.0.3-alpha

> Allow MapOutputBuffer to be pluggable
> -
>
> Key: MAPREDUCE-4807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4807
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Arun C Murthy
>Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: COMBO-mapreduce-4809-4807.patch, 
> COMBO-mapreduce-4809-4807.patch, COMBO-mapreduce-4809-4807.patch, 
> mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch, 
> mapreduce-4807.patch, mapreduce-4807.patch, mapreduce-4807.patch
>
>
> Allow MapOutputBuffer to be pluggable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4809) Change visibility of classes for pluggable sort changes

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4809:
--

Fix Version/s: (was: trunk)
   2.0.3-alpha

> Change visibility of classes for pluggable sort changes
> ---
>
> Key: MAPREDUCE-4809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4809
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 2.0.2-alpha
>Reporter: Arun C Murthy
>Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: MAPREDUCE-4809-1.patch, mapreduce-4809.patch, 
> mapreduce-4809.patch, mapreduce-4809.patch
>
>
> Make classes required for MAPREDUCE-2454 to be java public (with 
> LimitedPrivate)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service

2013-01-25 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4049:
--

Fix Version/s: (was: 3.0.0)
   2.0.3-alpha

> plugin for generic shuffle service
> --
>
> Key: MAPREDUCE-4049
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: performance, task, tasktracker
>Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>Reporter: Avner BenHanoch
>Assignee: Avner BenHanoch
>  Labels: merge, plugin, rdma, shuffle
> Fix For: 2.0.3-alpha
>
> Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Plugin Design.rtf, 
> MAPREDUCE-4049--branch-1.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & 
> ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on 
> shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
> or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
> RDMA shuffle, the plugin can also utilize a suitable merge approach during 
> the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
> dependency of NodeManager with a specific version of mapreduce shuffle 
> (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
> from Auburn University with others, 
> [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins 
> (currently, based on 1.0 branch)
> # I am providing link for downloading UDA - Mellanox's open source plugin 
> that implements generic shuffle service using RDMA and levitated merge.  
> Note: At this phase, the code is in C++ through JNI and you should consider 
> it as beta only.  Still, it can serve anyone that wants to implement or 
> contribute to levitated merge. (Please be advised that levitated merge is 
> mostly suit in very fast networks) - 
> [http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family=144&menu_section=69]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4918) Better error message in TrackerDistributedCacheManager.ancestorsHaveExecutePermissions

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562990#comment-13562990
 ] 

Hadoop QA commented on MAPREDUCE-4918:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12564360/MAPREDUCE-4918.1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3275//console

This message is automatically generated.

> Better error message in 
> TrackerDistributedCacheManager.ancestorsHaveExecutePermissions
> --
>
> Key: MAPREDUCE-4918
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4918
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: MAPREDUCE-4918.1.patch
>
>
> Better logging/error message in 
> TrackerDistributedCacheManager.ancestorsHaveExecutePermissions should help 
> debugging (e.g. MAPREDUCE-4916). We should log the offending parent directory 
> with the incorrect permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4918) Better error message in TrackerDistributedCacheManager.ancestorsHaveExecutePermissions

2013-01-25 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated MAPREDUCE-4918:
-

Status: Patch Available  (was: Open)

> Better error message in 
> TrackerDistributedCacheManager.ancestorsHaveExecutePermissions
> --
>
> Key: MAPREDUCE-4918
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4918
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Arun C Murthy
>Assignee: Xuan Gong
>Priority: Minor
> Attachments: MAPREDUCE-4918.1.patch
>
>
> Better logging/error message in 
> TrackerDistributedCacheManager.ancestorsHaveExecutePermissions should help 
> debugging (e.g. MAPREDUCE-4916). We should log the offending parent directory 
> with the incorrect permissions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2208) Flexible CSV text parser InputFormat

2013-01-25 Thread Marcelo Elias Del Valle (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562696#comment-13562696
 ] 

Marcelo Elias Del Valle commented on MAPREDUCE-2208:


Created an improved version of a CSVInputFormat, able to read multiline CSVs, 
just in case it interests: https://github.com/mvallebr/CSVInputFormat

> Flexible CSV text parser InputFormat
> 
>
> Key: MAPREDUCE-2208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2208
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Lance Norskog
>Priority: Trivial
> Attachments: CSVTextInputFormat.java, TestCSVTextFormat.java
>
>
> CSVTextInputFormat is a configurable CSV parser tuned to most of the 
> csv-style datasets I've found. The Hadoop samples I've seen all 
> FileInputFormat and Mapper. They drop the Longwritable key 
> and parse the Text value as a CSV line. But, they are all custom-coded for 
> the format.
> CSVTextInputFormat takes any csv-encoded file and rearrange the fields into 
> the format required by a Mapper. You can drop fields & rearrange them. There 
> is also a random sampling option to make training/test runs easier.
> Attached are CSVTextInputFormat.java and a unit test for it. Both go into 
> org.apache.hadoop.mapreduce.lib.input under src/java and test/mapred/src.
> This is compiled against hadoop-0.0.20.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4709) Counters that track max values

2013-01-25 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562685#comment-13562685
 ] 

Harsh J commented on MAPREDUCE-4709:


Hi Arun,

Sorry we forgot to link the discussion, but please also see 
http://search-hadoop.com/m/cuZMf2humC

> Counters that track max values
> --
>
> Key: MAPREDUCE-4709
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jeremy Lewi
>Priority: Minor
>
> A nice feature to help monitor MR jobs would be mapreduce counters that track 
> the maximum of some metric across all workers. These trackers would work just 
> like regular counters except it would track the max value of all arguments 
> passed to the "increment" function as opposed to summing them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562655#comment-13562655
 ] 

Hudson commented on MAPREDUCE-2264:
---

Integrated in Hadoop-Mapreduce-trunk #1324 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1324/])
MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and 
sandyr via tucu) (Revision 1438277)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java


> Job status exceeds 100% in some cases 
> --
>
> Key: MAPREDUCE-2264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Adam Kramer
>Assignee: Devaraj K
>  Labels: critical-0.22.0
> Fix For: 1.2.0, 3.0.0
>
> Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
> MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
> MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
> MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
> MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
> MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp
>
>
> I'm looking now at my jobtracker's list of running reduce tasks. One of them 
> is 120.05% complete, the other is 107.28% complete.
> I understand that these numbers are estimates, but there is no case in which 
> an estimate of 100% for a non-complete task is better than an estimate of 
> 99.99%, nor is there any case in which an estimate greater than 100% is valid.
> I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562641#comment-13562641
 ] 

Hudson commented on MAPREDUCE-2264:
---

Integrated in Hadoop-Hdfs-trunk #1296 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1296/])
MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and 
sandyr via tucu) (Revision 1438277)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java


> Job status exceeds 100% in some cases 
> --
>
> Key: MAPREDUCE-2264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Adam Kramer
>Assignee: Devaraj K
>  Labels: critical-0.22.0
> Fix For: 1.2.0, 3.0.0
>
> Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
> MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
> MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
> MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
> MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
> MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp
>
>
> I'm looking now at my jobtracker's list of running reduce tasks. One of them 
> is 120.05% complete, the other is 107.28% complete.
> I understand that these numbers are estimates, but there is no case in which 
> an estimate of 100% for a non-complete task is better than an estimate of 
> 99.99%, nor is there any case in which an estimate greater than 100% is valid.
> I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2013-01-25 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562621#comment-13562621
 ] 

Gelesh commented on MAPREDUCE-4882:
---

Could you please share how is it impacting ?

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>  Labels: patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-01-25 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562607#comment-13562607
 ] 

Tom White commented on MAPREDUCE-4951:
--

+1 on the latest patch.

> Container preemption interpreted as task failure
> 
>
> Key: MAPREDUCE-4951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mr-am, mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
> MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2931) CLONE - LocalJobRunner should support parallel mapper execution

2013-01-25 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-2931:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Sandy.

> CLONE - LocalJobRunner should support parallel mapper execution
> ---
>
> Key: MAPREDUCE-2931
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2931
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Forest Tan
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-1367-branch1.patch
>
>
> The LocalJobRunner currently supports only a single execution thread. Given 
> the prevalence of multi-core CPUs, it makes sense to allow users to run 
> multiple tasks in parallel for improved performance on small (local-only) 
> jobs.
> It is necessary to patch back MAPREDUCE-1367 into Hadoop 0.20.X version. 
> Also, MapReduce-434 should be submitted together.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2013-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562589#comment-13562589
 ] 

Hudson commented on MAPREDUCE-2264:
---

Integrated in Hadoop-Yarn-trunk #107 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/107/])
MAPREDUCE-2264. Job status exceeds 100% in some cases. (devaraj.k and 
sandyr via tucu) (Revision 1438277)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1438277
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Merger.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestMerger.java


> Job status exceeds 100% in some cases 
> --
>
> Key: MAPREDUCE-2264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2, 0.20.205.0
>Reporter: Adam Kramer
>Assignee: Devaraj K
>  Labels: critical-0.22.0
> Fix For: 1.2.0, 3.0.0
>
> Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
> MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
> MAPREDUCE-2264-branch-1-1.patch, MAPREDUCE-2264-branch-1-2.patch, 
> MAPREDUCE-2264-branch-1.patch, MAPREDUCE-2264-trunk-1.patch, 
> MAPREDUCE-2264-trunk-1.patch, MAPREDUCE-2264-trunk-2.patch, 
> MAPREDUCE-2264-trunk-3.patch, MAPREDUCE-2264-trunk.patch, more than 100%.bmp
>
>
> I'm looking now at my jobtracker's list of running reduce tasks. One of them 
> is 120.05% complete, the other is 107.28% complete.
> I understand that these numbers are estimates, but there is no case in which 
> an estimate of 100% for a non-complete task is better than an estimate of 
> 99.99%, nor is there any case in which an estimate greater than 100% is valid.
> I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4709) Counters that track max values

2013-01-25 Thread Arun A K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562567#comment-13562567
 ] 

Arun A K commented on MAPREDUCE-4709:
-

@Jeremy Lewi, 
Could you please elaborate on the problem with an example? 

> Counters that track max values
> --
>
> Key: MAPREDUCE-4709
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4709
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jeremy Lewi
>Priority: Minor
>
> A nice feature to help monitor MR jobs would be mapreduce counters that track 
> the maximum of some metric across all workers. These trackers would work just 
> like regular counters except it would track the max value of all arguments 
> passed to the "increment" function as opposed to summing them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4875) coverage fixing for org.apache.hadoop.mapred

2013-01-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562531#comment-13562531
 ] 

Hadoop QA commented on MAPREDUCE-4875:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12566466/MAPREDUCE-4875-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3274//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3274//console

This message is automatically generated.

> coverage fixing for org.apache.hadoop.mapred
> 
>
> Key: MAPREDUCE-4875
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4875
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
>Reporter: Aleksey Gorshkov
> Fix For: 3.0.0, 2.0.3-alpha, 0.23.6
>
> Attachments: MAPREDUCE-4875-branch-0.23.patch, 
> MAPREDUCE-4875-trunk.patch
>
>
> added  some tests for org.apache.hadoop.mapred
> MAPREDUCE-4875-trunk.patch for trunk and branch-2
> MAPREDUCE-4875-branch-0.23.patch for branch-0.23

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4961) Map reduce running local should also go through ShuffleConsumerPlugin for enabling different MergeManager implementations

2013-01-25 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4961:
--

Assignee: (was: Tsuyoshi OZAWA)

> Map reduce running local should also go through ShuffleConsumerPlugin for 
> enabling different MergeManager implementations
> -
>
> Key: MAPREDUCE-4961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Jerry Chen
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> MAPREDUCE-4049 provide the ability for pluggable Shuffle and MAPREDUCE-4080 
> extends Shuffle to be able to provide different MergeManager implementations. 
> While using these pluggable features, I find that when a map reduce is 
> running locally, a RawKeyValueIterator was returned directly from a static 
> call of Merge.merge, which break the assumption that the Shuffle may provide 
> different merge methods although there is no copy phase for this situation.
> The use case is when I am implementating a hash-based MergeManager, we don't 
> need sort in map side, while when running the map reduce locally, the 
> hash-based MergeManager will have no chance to be used as it goes directly to 
> Merger.merge. This makes the pluggable Shuffle and MergeManager incomplete.
> So we need to move the code calling Merger.merge from Reduce Task to 
> ShuffleConsumerPlugin implementation, so that the Suffle implementation can 
> decide how to do the merge and return corresponding iterator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira