Author: tucu Date: Wed Feb 6 19:52:25 2013 New Revision: 1443168 URL: http://svn.apache.org/viewvc?rev=1443168&view=rev Log: MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. (tucu)
Added: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Modified: hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt?rev=1443168&r1=1443167&r2=1443168&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt (original) +++ hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt Wed Feb 6 19:52:25 2013 @@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06 MAPREDUCE-4971. Minor extensibility enhancements to Counters & FileOutputFormat. (Arun C Murthy via sseth) + MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. + (tucu) + OPTIMIZATIONS MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of Added: hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm?rev=1443168&view=auto ============================================================================== --- hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm (added) +++ hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm Wed Feb 6 19:52:25 2013 @@ -0,0 +1,96 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort + --- + --- + ${maven.build.timestamp} + +Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort + + \[ {{{./index.html}Go Back}} \] + +* Introduction + + The pluggable shuffle and pluggable sort capabilities allow replacing the + built in shuffle and sort logic with alternate implementations. Example use + cases for this are: using a different application protocol other than HTTP + such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or + replacing the sort logic with custom algorithms that enable Hash aggregation + and Limit-N query. + + <<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are + experimental and unstable. This means the provided APIs may change and break + compatibility in future versions of Hadoop. + +* Implementing a Custom Shuffle and a Custom Sort + + A custom shuffle implementation requires a + <<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>> + implementation class running in the NodeManagers and a + <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class + running in the Reducer tasks. + + The default implementations provided by Hadoop can be used as references: + + * <<<org.apache.hadoop.mapred.ShuffleHandler>>> + + * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> + + A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>> + implementation class running in the Mapper tasks and (optionally, depending + on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> + implementation class running in the Reducer tasks. + + The default implementations provided by Hadoop can be used as references: + + * <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> + + * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> + +* Configuration + + Except for the auxiliary service running in the NodeManagers serving the + shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components + run in the job tasks. This means, they can be configured on per job basis. + The auxiliary service servicing the Shuffle must be configured in the + NodeManagers configuration. + +** Job Configuration Properties (on per job basis): + +*--------------------------------------+---------------------+-----------------+ +| <<Property>> | <<Default Value>> | <<Explanation>> | +*--------------------------------------+---------------------+-----------------+ +| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> | The <<<ShuffleConsumerPlugin>>> implementation to use | +*--------------------------------------+---------------------+-----------------+ +| <<<mapreduce.job.map.output.collector.class>>> | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use | +*--------------------------------------+---------------------+-----------------+ + + These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs. + +** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes: + +*--------------------------------------+---------------------+-----------------+ +| <<Property>> | <<Default Value>> | <<Explanation>> | +*--------------------------------------+---------------------+-----------------+ +| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name | +*--------------------------------------+---------------------+-----------------+ +| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>> | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use | +*--------------------------------------+---------------------+-----------------+ + + <<IMPORTANT:>> If setting an auxiliary service in addition the default + <<<mapreduce.shuffle>>> service, then a new service key should be added to the + <<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>. + Then the property defining the corresponding class must be + <<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>. + \ No newline at end of file