[ https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812644#comment-13812644 ]
Karthik Kambatla commented on MAPREDUCE-5605: --------------------------------------------- For uploading a patch, it is recommended to create the diff using git or svn. For instance, I use {noformat} git diff --no-prefix <latest-commit> <base-commit> {noformat} The base commit could be the hadoop version you used. > Memory-centric MapReduce aiming to solve the I/O bottleneck > ----------------------------------------------------------- > > Key: MAPREDUCE-5605 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 1.0.1 > Environment: x86-64 Linux/Unix > jdk7 preferred > Reporter: Ming Chen > Assignee: Ming Chen > Attachments: OutputCollector.java, OutputCommitter.java, > OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, > RawBufferedOutputStream.java, RawHistoryFileServlet.java, > RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, > ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, > ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, > SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, > TaskInProgress.java, TaskLog.java, TaskLogAppender.java > > > Memory is a very important resource to bridge the gap between CPUs and I/O > devices. So the idea is to maximize the usage of memory to solve the problem > of I/O bottleneck. We developed a multi-threaded task execution engine, which > runs in a single JVM on a node. In the execution engine, we have implemented > the algorithm of memory scheduling to realize global memory management, based > on which we further developed the techniques such as sequential disk > accessing, multi-cache and solved the problem of full garbage collection in > the JVM. We have conducted extensive experiments with comparison against the > native Hadoop platform. The results show that the Mammoth system can reduce > the job execution time by more than 40% in typical cases, without requiring > any modifications of the Hadoop programs. When a system is short of memory, > Mammoth can improve the performance by up to 4 times, as observed for I/O > intensive applications, such as PageRank. -- This message was sent by Atlassian JIRA (v6.1#6144)