[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
---------------------------------

    Attachment:     (was: CacheOutputStream.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5605
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 1.0.1
>         Environment: x86-64 Linux/Unix
> jdk7 preferred
>            Reporter: Ming Chen
>            Assignee: Ming Chen
>         Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to