[
https://issues.apache.org/jira/browse/MAPREDUCE-5643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
tang shanjiang updated MAPREDUCE-5643:
--------------------------------------
Attachment: DynamicMR_TCC_SupplementalMaterial.pdf
DynamicMR A Dynamic Slot Allocation Optimization Framework for
MapReduce Clusters.pdf
A technique report on DynamicMR
> DynamicMR: A Dynamic Slot Utilization Optimization Framework for Hadoop MRv1
> ----------------------------------------------------------------------------
>
> Key: MAPREDUCE-5643
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5643
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: contrib/fair-share
> Affects Versions: 1.2.1
> Reporter: tang shanjiang
> Assignee: tang shanjiang
> Labels: performance
> Attachments: DynamicMR A Dynamic Slot Allocation Optimization
> Framework for MapReduce Clusters.pdf, DynamicMR-0.1.1-patch,
> DynamicMR_TCC_SupplementalMaterial.pdf, README
>
>
> Hadoop MRv1 uses the slot-based resource model with the static configuration
> of map/reduce slots. There is a strict utility constrain that map tasks can
> only run on map slots and reduce tasks can only use reduce slots. Due to the
> rigid execution order between map and reduce tasks in a MapReduce
> environment, slots can be severely under-utilized, which significantly
> degrades the performance.
> In contrast to YARN that gives up the slot-based resource model and propose a
> container-based model to maximize the resource utilization via unawareness of
> the types of map/reduce tasks, we keep the slot-based model and propose a
> dynamic slot utilization optimization system called DynamicMR to improve the
> performance of Hadoop by maximizing the slots utilization as well as slot
> utilization efficiency while guaranteeing the fairness across pools. It
> consists of three types of scheduling components, namely, Dynamic Hadoop Fair
> Scheduler (DHFS), Dynamic Speculative Task Scheduler (DSTS), and Data
> Locality Maximization Scheduler (DLMS).
> Our tests show that DynamicMR outperforms YARN for MapReduce workloads with
> multiple jobs, especially when the number of jobs is large. The explanation
> is that, given a certain number of resources, it is obvious that the
> performance for the case with a ratio control of concurrently running map and
> reduce tasks is better than without control. Because without control, it
> easily occurs that there are too many reduce tasks running, causing the
> network to be a bottleneck seriously. For YARN, both map and reduce tasks can
> run on any idle container. There is no control mechanism for the ratio of
> resource allocation between map and reduce tasks. It means that when there
> are pending reduce tasks, the idle container will be most likely possessed by
> them. In contrast, DynamicMR follows the traditional slot-based model. In
> contrast to the ’hard’ constrain of slot allocation that map slots have to be
> allocated to map tasks and reduce tasks should be dispatched to reduce tasks,
> DynamicMR obeys a ’soft’ constrain of slot allocation to allow that map slot
> can be allocated to reduce task and vice versa. But whenever there are
> pending map tasks, the map slot should be given to map tasks first, and the
> rule is similar for reduce tasks. It means that, the traditional way of
> static map/reduce slot configuration for the ratio control of running
> map/reduce tasks still works for DynamicMR. In comparison to YARN which
> maximizes the resource utilization only, DynamicMR can maximize the slot
> resource utilization and meanwhile dynamically control the ratio of running
> map/reduce tasks via map/reduce slot configuration.
--
This message was sent by Atlassian JIRA
(v6.2#6252)