[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561763#comment-13561763 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Alejandro, Arun, Chris, and Tom, Now that MAPREDUCE-4807, MAPREDUCE-4809, and MAPREDUCE-4808 were committed, I would like to work on MAPREDUCCE-4039(sort avoidance) as two contributed plugins. One will be to avoid sorting on the map side and the other on the reduce side. These plugins will open up more use cases. For example, Jerry can implement hash aggregation in the Combiner and/or in the Reducer without any overhead of the sort. If there is no objection from you, I will post a brief description of a design in MAPREDUCE-4039 soon. Please let me know. Thanks. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560548#comment-13560548 ] Hudson commented on MAPREDUCE-4808: --- Integrated in Hadoop-Yarn-trunk #105 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/105/]) MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations. (masokan via tucu) (Revision 1436936) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560656#comment-13560656 ] Hudson commented on MAPREDUCE-4808: --- Integrated in Hadoop-Hdfs-trunk #1294 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1294/]) MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations. (masokan via tucu) (Revision 1436936) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560704#comment-13560704 ] Hudson commented on MAPREDUCE-4808: --- Integrated in Hadoop-Mapreduce-trunk #1322 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1322/]) MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations. (masokan via tucu) (Revision 1436936) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559658#comment-13559658 ] Hudson commented on MAPREDUCE-4808: --- Integrated in Hadoop-trunk-Commit #3266 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3266/]) MAPREDUCE-4808. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations. (masokan via tucu) (Revision 1436936) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1436936 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManagerImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeThread.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/OnDiskMapOutput.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestMergeManager.java Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559785#comment-13559785 ] Mariappan Asokan commented on MAPREDUCE-4808: - Alejandro, thanks for committing this. Alejandro, Arun, Chris, and Tom, thanks to all of you for providing valuable comments and feedback. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 3.0.0 Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13558584#comment-13558584 ] Chris Douglas commented on MAPREDUCE-4808: -- Fair point. This looks ready to commit. Unless someone would like additional changes, I'll plan to push it in tomorrow. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557508#comment-13557508 ] Arun C Murthy commented on MAPREDUCE-4808: -- bq. I will try to explain a simple use case of an external implementation of merge on the reduce side. Let us say this merge implementation has some fixed area of memory (Java byte array) allocated to store the shuffled data. This may be done to avoid frequent garbage collection by JVM or for better processor cache efficiency. Asokan - this is the first time I've heard this use case which seems something Syncsort can take advantage of, and, as a consequence, I've been viewing from the lens of 'limit-N/hash-join' merge etc. In future, being clear and upfront about use-cases will obviously prevent further such confusion. Having said that, I still feel a better approach would be to use a custom shuffle via MAPREDUCE-4049 and friends since you get more control - for e.g. you might want to defer shuffle based on memory on the heap (byte[]) and memory outside heap (JNI or DirectBuffers) for Syncsort plugin - and clearly, the current MergeManager will not suffice for such. However, if this unblocks you in the short run I think the approach is fine. Thanks for the clarification. I'll take another look at the details on the patch once you upload it, but seem mostly fine to me. Thanks. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557635#comment-13557635 ] Chris Douglas commented on MAPREDUCE-4808: -- bq. So, I renamed MergeManagerPlugin to MergeManagerI since it is just an interface. Is that okay? That's not a naming convention used anywhere else in Hadoop. I'd rather not introduce it for this case. Renaming test classes and problems with backporting(?) are not issues. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557681#comment-13557681 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- Mhhh, I would prefer not rename MergeManager class as fixes in trunk (after the rename) will require manual patching in branch 0.23 and maintenance releases of branch 2. Why not leave the original MergerManagerPlugin name? Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557788#comment-13557788 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- Sure, me good. thx Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557798#comment-13557798 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Chris, Thanks for posting the proper patch. I am ashamed of myself:) I should have done it myself without coming up with excuses. I ran all mapreduce tests and verified your patch. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557809#comment-13557809 ] Hadoop QA commented on MAPREDUCE-4808: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565578/M4808-0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3252//console This message is automatically generated. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13557837#comment-13557837 ] Hadoop QA commented on MAPREDUCE-4808: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565595/M4808-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3254//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3254//console This message is automatically generated. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, M4808-0.patch, M4808-1.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556247#comment-13556247 ] Arun C Murthy commented on MAPREDUCE-4808: -- bq. The goal is to be able to write alternate implementations of the Shuffle Alejandro - it seems like you understand something about the use-case that I don't. Maybe you Asokan have had a private chat? What are the use-cases for alternate implementations of the Shuffle? Like Chris also mentioned with MAPREDUCE-4049 we already allow alternate implementations of Shuffle, is this redundant then? bq. While some of this logic replacement could be done at Merge level as you suggested, other, like MapOutput allocation cannot be done there as this is driven by the MergeManager. So, a combination of MapOutput re-factor and Merger interface should suffice? IAC, what are the use-cases for alternate implementations of MapOutput? Or, is it the MapOutput re-factor merely a code-hygiene issue? I'm not trying to be difficult here. But, I feel like I just don't understand the use-case. So, I'd appreciate if we could focus on concrete use-cases for the plugin. I admit I still am having a hard time understanding why we need this complexity. Thanks. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556405#comment-13556405 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, MAPREDUCE-4049 expects the plugin implementer to implement the shuffle from scratch. With the default implementation of HTTP shuffle being robust and secure it is possible to reuse it in majority of the situations. The alternate implementation of MapOutput can be left to the plugin implementer. For example, it can be optimized to use less JVM memory and minimize Java garbage collection. Some of the concrete use cases for the plugin are: hash aggregation, hash join, limit-N query, etc. Thanks. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556536#comment-13556536 ] Chris Douglas commented on MAPREDUCE-4808: -- Asokan, the concern is that even breaking an API, even if it's marked unstable, is an incompatible change. Since the pluggable shuffle is particularly useful for frameworks, breaking this contract could require patching/validation/rewrite of plugin and optimizer code in projects that invest in it (Hive, Pig, etc.). Moreover, if we wanted to change the default {{Shuffle}} to a different implementation, then user/framework code would perform badly- or break- unless we exposed this implementation-specific mechanism in the _new_ impl. So it's fair to press for use cases, to ensure it's _sufficient_ and that the abstraction could apply to most {{Shuffle}} implementations. Personally, I'm ambivalent about exposing this as an API and am +1 on the patch overall (mostly because I like the {{MapOutput}} refactoring). The user can always configure the current {{Shuffle}}, which is exactly how frameworks would handle this until they port/specialize their efficient {{MergeManager}} plugin. As a compromise, would it make sense to just add a protected {{createMergeManager}} method to the {{Shuffle}}? The user still needs to configure their custom {{Shuffle}} impl now, but that's better than the inevitable future where they configure both. It also makes its tie to this implementation explicit. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556580#comment-13556580 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- Chris, are you suggesting? * remove the MergeManagerPlugin interface * introduce a protected createMergerManager() in the Shuffle class to instantiate (via new) initialize the existing MergerManager. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556595#comment-13556595 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, I will think about your suggestion to make the Merger class pluggable and post my findings for different use cases. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556600#comment-13556600 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Chris, I will work on creating a real working plugin for the use cases to show that the proposed API is sufficient to handle them. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556602#comment-13556602 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Alejandro, If the MergeManagerPlugin is to be removed, it should be possible to extend the framework's MergeManager by an external implementation. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556606#comment-13556606 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Alejandro, I meant to ask whether it is okay to make the existing MergeManager to be extendable? -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556705#comment-13556705 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Arun, I will try to explain a simple use case of an external implementation of merge on the reduce side. Let us say this merge implementation has some fixed area of memory (Java byte array) allocated to store the shuffled data. This may be done to avoid frequent garbage collection by JVM or for better processor cache efficiency. Looking at the methods in the {{Merge}} class, they either accept input to the merge in disk files(array of {{Path}} objects) or memory segments(list of {{Segment}} objects.) The former is not suitable since merge is done in memory first and any intermediate merged output file is under the control of the plugin implementation. The latter is not suitable because memory for the shuffled data is not under the control of the plugin implementation. Ideally, if an {{InputStream}} object is available, the external implementation can read shuffled data from the stream to the fixed area of memory at a specific offset in the byte array. With the {{MergeManagerPlugin,}} the external implementation will get the HTTP connection's {{InputStream}} object via the {{shuffle()}} method in {{MapOutput}} object. In addition, if merge goes though multiple passes because the memory area is limited in size, there should be some way for the {{Shuffle}} to wait until memory is released by a merge pass. There is no method in {{Merge}} for that either. I find that it is possible to define the interaction points between current {{Shuffle}} and {{MergeManager}} using the {{MergeManagerPlugin}} interface. The plugin interface has only three methods and it allows the external plugin to have a lot of freedom in its implementation. As a side effect, the {{MapOutput}} is also refactored. Hope I explained this well. If you have any questions, please let me know. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556738#comment-13556738 ] Chris Douglas commented on MAPREDUCE-4808: -- +1 Looked through it; the latest patch lgtm. Asokan, is that sufficient for your use cases? Arun? _Very_ minor, optional nit: {{s/MergeManager/MergeManagerImpl/}} and {{s/MergeManagerPlugin/MergeManager/}}. There's an argument to be made for doing the same with the {{ShuffleScheduler}} while we're at it, but neither of these are blocking, IMO. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556749#comment-13556749 ] Mariappan Asokan commented on MAPREDUCE-4808: - Hi Chris, Thanks for your quick feedback. I looked at the patch. It has one minor nit. The {{createMergeManager}} method should take {{ShuffleConsumerPlugin.Context}} object. I will go over it one more time, work out the change, run tests, and post the patch shortly. Thanks. -- Asokan Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Arun C Murthy Assignee: Mariappan Asokan Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf, MR-4808.patch Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555211#comment-13555211 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- The goal is to be able to write alternate implementations of the Shuffle, like the ones Asokan and Jerry are trying to do, while reusing functionality provided by the Hadoop default implementation. For example, being able to leverage the logic in the default shuffle (ie fetchers), while replacing the merge logic and merge resources allocation logic driven by the MergeManager. While some of this logic replacement could be done at Merge level as you suggested, other, like MapOutput allocation cannot be done there as this is driven by the MergeManager. The refactoring of MapOutput allocation from struct to classes permits alternate implementations that can reuse memory buffers thus reducing JVM heap allocation. Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4808) Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555602#comment-13555602 ] Alejandro Abdelnur commented on MAPREDUCE-4808: --- Assuming all concerns/questions have been addressed, can we move forward and commit the latest patch? Refactor MapOutput and MergeManager to facilitate reuse by Shuffle implementations -- Key: MAPREDUCE-4808 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.2-alpha Reporter: Arun C Murthy Assignee: Mariappan Asokan Fix For: 2.0.3-alpha Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf Now that Shuffle is pluggable (MAPREDUCE-4049), it would be convenient for alternate implementations to be able to reuse portions of the default implementation. This would come with the strong caveat that these classes are LimitedPrivate and Unstable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira