[ https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Chen updated MAPREDUCE-5605: --------------------------------- Attachment: (was: JVMId.java) > Memory-centric MapReduce aiming to solve the I/O bottleneck > ----------------------------------------------------------- > > Key: MAPREDUCE-5605 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 1.0.1 > Environment: x86-64 Linux/Unix > jdk7 preferred > Reporter: Ming Chen > Assignee: Ming Chen > Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, > MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, > MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, > MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, > OutputCollector.java, OutputCommitter.java, OutputFormat.java, > OutputLogFilter.java, Partitioner.java, RamManager.java, > RawBufferedOutputStream.java, RawHistoryFileServlet.java, > RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, > ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, > ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, > SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, > TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, > TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, > TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, > TaskTrackerAction.java, TaskTrackerInstrumentation.java, > TaskTrackerStatus.java, TextOutputFormat.java > > > Memory is a very important resource to bridge the gap between CPUs and I/O > devices. So the idea is to maximize the usage of memory to solve the problem > of I/O bottleneck. We developed a multi-threaded task execution engine, which > runs in a single JVM on a node. In the execution engine, we have implemented > the algorithm of memory scheduling to realize global memory management, based > on which we further developed the techniques such as sequential disk > accessing, multi-cache and solved the problem of full garbage collection in > the JVM. We have conducted extensive experiments with comparison against the > native Hadoop platform. The results show that the Mammoth system can reduce > the job execution time by more than 40% in typical cases, without requiring > any modifications of the Hadoop programs. When a system is short of memory, > Mammoth can improve the performance by up to 4 times, as observed for I/O > intensive applications, such as PageRank. -- This message was sent by Atlassian JIRA (v6.1#6144)