[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-24 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561763#comment-13561763
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Alejandro, Arun, Chris, and Tom,
  Now that MAPREDUCE-4807, MAPREDUCE-4809, and MAPREDUCE-4808 were committed, I 
would like to work on MAPREDUCCE-4039(sort avoidance) as two contributed 
plugins.  One will be to avoid sorting on the map side and the other on the 
reduce side.  These plugins will open up more use cases.  For example, Jerry 
can implement hash aggregation in the Combiner and/or in the Reducer without 
any overhead of the sort.

If there is no objection from you, I will post a brief description of a design 
in MAPREDUCE-4039 soon.  Please let me know.

Thanks.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560548#comment-13560548
 ] 

Hudson commented on MAPREDUCE-4808:
---

Integrated in Hadoop-Yarn-trunk #105 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/105/])
MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by 
Shuffle implementations. (masokan via tucu) (Revision 1436936)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560656#comment-13560656
 ] 

Hudson commented on MAPREDUCE-4808:
---

Integrated in Hadoop-Hdfs-trunk #1294 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1294/])
MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by 
Shuffle implementations. (masokan via tucu) (Revision 1436936)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560704#comment-13560704
 ] 

Hudson commented on MAPREDUCE-4808:
---

Integrated in Hadoop-Mapreduce-trunk #1322 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1322/])
MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by 
Shuffle implementations. (masokan via tucu) (Revision 1436936)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559658#comment-13559658
 ] 

Hudson commented on MAPREDUCE-4808:
---

Integrated in Hadoop-trunk-Commit #3266 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3266/])
MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by 
Shuffle implementations. (masokan via tucu) (Revision 1436936)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-22 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559785#comment-13559785
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Alejandro, thanks for committing this.

Alejandro, Arun, Chris, and Tom, thanks to all of you for providing valuable 
comments and feedback.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 3.0.0

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558584#comment-13558584
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

Fair point.

This looks ready to commit. Unless someone would like additional changes, I'll 
plan to push it in tomorrow.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557508#comment-13557508
 ] 

Arun C Murthy commented on MAPREDUCE-4808:
--

bq. I will try to explain a simple use case of an external implementation of 
merge on the reduce side. Let us say this merge implementation has some fixed 
area of memory (Java byte array) allocated to store the shuffled data. This may 
be done to avoid frequent garbage collection by JVM or for better processor 
cache efficiency.

Asokan - this is the first time I've heard this use case which seems something 
Syncsort can take advantage of, and, as a consequence, I've been viewing from 
the lens of 'limit-N/hash-join' merge etc.

In future, being clear and upfront about use-cases will obviously prevent 
further such confusion.



Having said that, I still feel a better approach would be to use a custom 
shuffle via MAPREDUCE-4049 and friends since you get more control - for e.g. 
you might want to defer shuffle based on memory on the heap (byte[]) and memory 
outside heap (JNI or DirectBuffers) for Syncsort plugin - and clearly, the 
current MergeManager will not suffice for such.

However, if this unblocks you in the short run I think the approach is fine. 
Thanks for the clarification. I'll take another look at the details on the 
patch once you upload it, but seem mostly fine to me. Thanks.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557635#comment-13557635
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

bq. So, I renamed MergeManagerPlugin to MergeManagerI since it is just an 
interface. Is that okay?

That's not a naming convention used anywhere else in Hadoop. I'd rather not 
introduce it for this case. Renaming test classes and problems with 
backporting(?) are not issues.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, 
 MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557681#comment-13557681
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

Mhhh, I would prefer not rename MergeManager class as fixes in trunk (after the 
rename) will require manual patching in branch 0.23 and maintenance releases of 
branch 2. Why not leave the original MergerManagerPlugin name?

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, 
 MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557788#comment-13557788
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

Sure, me good. thx

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, 
 MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557798#comment-13557798
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Chris,
  Thanks for posting the proper patch.  I am ashamed of myself:) I should have 
done it myself without coming up with excuses.  I ran all mapreduce tests and 
verified your patch.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, 
 MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557809#comment-13557809
 ] 

Hadoop QA commented on MAPREDUCE-4808:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565578/M4808-0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//console

This message is automatically generated.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, 
 MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557837#comment-13557837
 ] 

Hadoop QA commented on MAPREDUCE-4808:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12565595/M4808-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3254//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3254//console

This message is automatically generated.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, 
 M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556247#comment-13556247
 ] 

Arun C Murthy commented on MAPREDUCE-4808:
--

bq. The goal is to be able to write alternate implementations of the Shuffle

Alejandro - it seems like you understand something about the use-case that I 
don't. Maybe you  Asokan have had a private chat? 

What are the use-cases for alternate implementations of the Shuffle? Like Chris 
also mentioned with MAPREDUCE-4049 we already allow alternate implementations 
of Shuffle, is this redundant then?

bq. While some of this logic replacement could be done at Merge level as you 
suggested, other, like MapOutput allocation cannot be done there as this is 
driven by the MergeManager. 

So, a combination of MapOutput re-factor and Merger interface should suffice?

IAC, what are the use-cases for alternate implementations of MapOutput? Or, is 
it the MapOutput re-factor merely a code-hygiene issue?



I'm not trying to be difficult here. But, I feel like I just don't understand 
the use-case. So, I'd appreciate if we could focus on concrete use-cases for 
the plugin. I admit I still am having a hard time understanding why we need 
this complexity.

Thanks.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556405#comment-13556405
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
  MAPREDUCE-4049 expects the plugin implementer to implement the shuffle from 
scratch.  With the default implementation of HTTP shuffle being robust and 
secure it is possible to reuse it in majority of the situations.

The alternate implementation of MapOutput can be left to the plugin 
implementer.  For example, it can be optimized to use less JVM memory and 
minimize Java garbage collection.

Some of the concrete use cases for the plugin are: hash aggregation, hash join, 
limit-N query, etc.

Thanks.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556536#comment-13556536
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

Asokan, the concern is that even breaking an API, even if it's marked unstable, 
is an incompatible change. Since the pluggable shuffle is particularly useful 
for frameworks, breaking this contract could require 
patching/validation/rewrite of plugin and optimizer code in projects that 
invest in it (Hive, Pig, etc.). Moreover, if we wanted to change the default 
{{Shuffle}} to a different implementation, then user/framework code would 
perform badly- or break- unless we exposed this implementation-specific 
mechanism in the _new_ impl. So it's fair to press for use cases, to ensure 
it's _sufficient_ and that the abstraction could apply to most {{Shuffle}} 
implementations.

Personally, I'm ambivalent about exposing this as an API and am +1 on the patch 
overall (mostly because I like the {{MapOutput}} refactoring). The user can 
always configure the current {{Shuffle}}, which is exactly how frameworks would 
handle this until they port/specialize their efficient {{MergeManager}} plugin.

As a compromise, would it make sense to just add a protected 
{{createMergeManager}} method to the {{Shuffle}}? The user still needs to 
configure their custom {{Shuffle}} impl now, but that's better than the 
inevitable future where they configure both. It also makes its tie to this 
implementation explicit.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556580#comment-13556580
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

Chris, are you suggesting?

* remove the MergeManagerPlugin interface
* introduce a protected createMergerManager() in the Shuffle class to 
instantiate (via new)  initialize the existing MergerManager.




 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556595#comment-13556595
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
  I will think about your suggestion to make the Merger class pluggable and 
post my findings for different use cases.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556600#comment-13556600
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Chris,
  I will work on creating a real working plugin for the use cases to show that 
the proposed API is sufficient to handle them.

-- Asokan

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556602#comment-13556602
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Alejandro,
  If the MergeManagerPlugin is to be removed, it should be possible to extend 
the framework's MergeManager by an external implementation.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556606#comment-13556606
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Alejandro,
  I meant to ask whether it is okay to make the existing MergeManager to be 
extendable?

-- Asokan

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556705#comment-13556705
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Arun,
  I will try to explain a simple use case of an external implementation of 
merge on the reduce side.  Let us say this merge implementation has some fixed 
area of memory (Java byte array) allocated to store the shuffled data.  This 
may be done to avoid frequent garbage collection by JVM or for better processor 
cache efficiency.

Looking at the methods in the {{Merge}} class, they either accept input to the 
merge in disk files(array of {{Path}} objects) or memory segments(list of 
{{Segment}} objects.)  The former is not suitable since merge is done in memory 
first and any intermediate merged output file is under the control of the 
plugin implementation.  The latter is not suitable because memory for the 
shuffled data is not under the control of the plugin implementation.

Ideally, if an {{InputStream}} object is available, the external implementation 
can read shuffled data from the stream to the fixed area of memory at a 
specific offset in the byte array.

With the {{MergeManagerPlugin,}} the external implementation will get the HTTP 
connection's {{InputStream}} object via the {{shuffle()}} method in 
{{MapOutput}} object.  In addition, if merge goes though multiple passes 
because the memory area is limited in size, there should be some way for the 
{{Shuffle}} to wait until memory is released by a merge pass.  There is no 
method in {{Merge}} for that either.

I find that it is possible to define the interaction points between current 
{{Shuffle}} and {{MergeManager}} using the {{MergeManagerPlugin}} interface.  
The plugin interface has only three methods and it allows the external plugin 
to have a lot of freedom in its implementation.  As a side effect, the 
{{MapOutput}} is also refactored.

Hope I explained this well.  If you have any questions, please let me know.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556738#comment-13556738
 ] 

Chris Douglas commented on MAPREDUCE-4808:
--

+1 Looked through it; the latest patch lgtm. Asokan, is that sufficient for 
your use cases? Arun?

_Very_ minor, optional nit: {{s/MergeManager/MergeManagerImpl/}} and 
{{s/MergeManagerPlugin/MergeManager/}}. There's an argument to be made for 
doing the same with the {{ShuffleScheduler}} while we're at it, but neither of 
these are blocking, IMO.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-17 Thread Mariappan Asokan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556749#comment-13556749
 ] 

Mariappan Asokan commented on MAPREDUCE-4808:
-

Hi Chris,
  Thanks for your quick feedback.  I looked at the patch.  It has one minor 
nit.  The {{createMergeManager}} method should take 
{{ShuffleConsumerPlugin.Context}} object. I will go over it one more time, work 
out the change, run tests, and post the patch shortly.

Thanks.

-- Asokan


 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-16 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555211#comment-13555211
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

The goal is to be able to write alternate implementations of the Shuffle, like 
the ones Asokan and Jerry are trying to do, while reusing functionality 
provided by the Hadoop default implementation. For example, being able to 
leverage the logic in the default shuffle (ie fetchers), while replacing the 
merge logic and merge resources allocation logic driven by the MergeManager. 
While some of this logic replacement could be done at Merge level as you 
suggested, other, like MapOutput allocation cannot be done there as this is 
driven by the MergeManager. The refactoring of MapOutput allocation from struct 
to  classes permits alternate implementations that can reuse memory buffers 
thus reducing JVM heap allocation.

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations

2013-01-16 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555602#comment-13555602
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4808:
---

Assuming all concerns/questions have been addressed, can we move forward and 
commit the latest patch?

 Refactor MapOutput and MergeManager to facilitate reuse by Shuffle 
 implementations
 --

 Key: MAPREDUCE-4808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.2-alpha
Reporter: Arun C Murthy
Assignee: Mariappan Asokan
 Fix For: 2.0.3-alpha

 Attachments: COMBO-mapreduce-4809-4812-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, 
 mapreduce-4808.patch, MergeManagerPlugin.pdf


 Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for 
 alternate implementations to be able to reuse portions of the default 
 implementation. 
 This would come with the strong caveat that these classes are LimitedPrivate 
 and Unstable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira