subject:"\[jira\] \[Updated\] \(MAPREDUCE\-5605\) Memory\-centric MapReduce aiming to solve the I\/O bottleneck"

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2015-05-05 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5605:

Labels: BB2015-05-TBR  (was: )

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>  Labels: BB2015-05-TBR
> Fix For: 1.0.1
>
> Attachments: MAPREDUCE-5605-v1.patch, TR-mammoth-HUST.pdf, 
> hadoop-core-1.0.1-mammoth-0.9.0.jar
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2014-04-23 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: TR-mammoth-HUST.pdf

The design, evaluation and discoveries are included in the paper.

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Fix For: 1.0.1
>
> Attachments: MAPREDUCE-5605-v1.patch, TR-mammoth-HUST.pdf, 
> hadoop-core-1.0.1-mammoth-0.9.0.jar
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-11 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Tags: memory-centric multi-thread optimization task  (was: memory-centric 
muluti-thread optimization task)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Fix For: 1.0.1
>
> Attachments: MAPREDUCE-5605-v1.patch, 
> hadoop-core-1.0.1-mammoth-0.9.0.jar
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-05 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: hadoop-core-1.0.1-mammoth-0.9.0.jar

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Fix For: 1.0.1
>
> Attachments: MAPREDUCE-5605-v1.patch, 
> hadoop-core-1.0.1-mammoth-0.9.0.jar
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-05 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Fix Version/s: 1.0.1

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Fix For: 1.0.1
>
> Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-04 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Environment: 
x86-64 Linux/Unix
64-bit jdk7 preferred

  was:
x86-64 Linux/Unix
jdk7 preferred


> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> 64-bit jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-04 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Description: Memory is a very important resource to bridge the gap between 
CPUs and I/O devices. So the idea is to maximize the usage of memory to solve 
the problem of I/O bottleneck. We developed a multi-threaded task execution 
engine, which runs in a single JVM on a node. In the execution engine, we have 
implemented the algorithm of memory scheduling to realize global memory 
management, based on which we further developed the techniques such as 
sequential disk accessing, multi-cache and solved the problem of full garbage 
collection in the JVM. The benchmark results shows that it can get impressive 
improvement in typical cases. When the a system is relatively short of memory 
(eg, HPC, small- and medium-size enterprises), the improvement will be even 
more impressive.  (was: Memory is a very important resource to bridge the gap 
between CPUs and I/O devices. So the idea is to maximize the usage of memory to 
solve the problem of I/O bottleneck. We developed a multi-threaded task 
execution engine, which runs in a single JVM on a node. In the execution 
engine, we have implemented the algorithm of memory scheduling to realize 
global memory management, based on which we further developed the techniques 
such as sequential disk accessing, multi-cache and solved the problem of full 
garbage collection in the JVM. We have conducted extensive experiments with 
comparison against the native Hadoop platform. The results show that the 
Mammoth system can reduce the job execution time by more than 40% in typical 
cases, without requiring any modifications of the Hadoop programs. When a 
system is short of memory, Mammoth can improve the performance by up to 4 
times, as observed for I/O intensive applications, such as PageRank. )

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. The benchmark results shows that it can get impressive improvement 
> in typical cases. When the a system is relatively short of memory (eg, HPC, 
> small- and medium-size enterprises), the improvement will be even more 
> impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-04 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: MAPREDUCE-5605-v1.patch

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-04 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: release-1.0.1.patch

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-04 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: release-1.0.1.patch)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RunningJob.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: Task.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: SequenceFileOutputFormat.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: SpillScheduler.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RoundQueue.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ReinitTrackerAction.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: OutputFormat.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ReduceTaskStatus.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RecordReader.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: Partitioner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RamManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RawKeyValueIterator.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RawHistoryFileServlet.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskInProgress.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: OutputLogFilter.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: RawBufferedOutputStream.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ReduceRamManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: OutputCommitter.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskLog.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ReduceTaskRunner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ReduceTask.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: OutputCollector.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskLogAppender.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskRunner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MergeSorter.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskTracker.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskTrackerInstrumentation.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskStatus.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: Operation.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskLogServlet.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskReport.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskLogsTruncater.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskScheduler.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskMemoryManagerThread.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskTrackerAction.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: Merger.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: OutputCollector.java, OutputCommitter.java, 
> OutputFormat.java, OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapTaskRunner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: JvmTask.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapTask.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TaskTrackerStatus.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MemoryElement.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapRunner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: TextOutputFormat.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapTaskStatus.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapRamManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: JvmManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapTaskCompletionEventsUpdate.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: MapOutputFile.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: MapTaskStatus.java, MemoryElement.java, 
> MergeSorter.java, Merger.java, Operation.java, OutputCollector.java, 
> OutputCommitter.java, OutputFormat.java, OutputLogFilter.java, 
> Partitioner.java, RamManager.java, RawBufferedOutputStream.java, 
> RawHistoryFileServlet.java, RawKeyValueIterator.java, RecordReader.java, 
> ReduceRamManager.java, ReduceTask.java, ReduceTaskRunner.java, 
> ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, 
> RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, 
> Task.java, TaskInProgress.java, TaskLog.java, TaskLogAppender.java, 
> TaskLogServlet.java, TaskLogsTruncater.java, TaskMemoryManagerThread.java, 
> TaskReport.java, TaskRunner.java, TaskScheduler.java, TaskStatus.java, 
> TaskTracker.java, TaskTrackerAction.java, TaskTrackerInstrumentation.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: TextOutputFormat.java
TaskTrackerStatus.java
TaskTrackerInstrumentation.java
TaskTrackerAction.java
TaskTracker.java
TaskStatus.java
TaskScheduler.java
TaskRunner.java
TaskReport.java
TaskMemoryManagerThread.java
TaskLogsTruncater.java
TaskLogServlet.java
TaskLogAppender.java
TaskLog.java
TaskInProgress.java
Task.java
SpillScheduler.java
SequenceFileOutputFormat.java
RunningJob.java
RoundQueue.java
ReinitTrackerAction.java
ReduceTaskStatus.java
ReduceTaskRunner.java
ReduceTask.java
ReduceRamManager.java
RecordReader.java
RawKeyValueIterator.java
RawHistoryFileServlet.java
RawBufferedOutputStream.java
RamManager.java
Partitioner.java
OutputLogFilter.java
OutputFormat.java
OutputCommitter.java
OutputCollector.java
Operation.java
MergeSorter.java
Merger.java
MemoryElement.java
MapTaskStatus.java
MapTaskRunner.java
MapTaskCompletionEventsUpdate.java
MapTask.java
MapRunner.java
MapRamManager.java
MapOutputFile.java
JvmTask.java
JvmManager.java
JVMId.java
JobTaskRunner.java
JobConf.java
IFile.java
DefaultJvmMemoryManager.java
ChildRamManager.java
Child.java
CachePool.java
CacheOutputStream.java
CacheFile.java

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: ChildRamManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: IFile.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: DefaultJvmMemoryManager.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: Child.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: CacheFile.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: CacheOutputStream.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: JobTaskRunner.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JvmManager.java, JvmTask.java, MapOutputFile.java, 
> MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: CachePool.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: JobConf.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-03 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Attachment: (was: JVMId.java)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
> Attachments: JobTaskRunner.java, JvmManager.java, JvmTask.java, 
> MapOutputFile.java, MapRamManager.java, MapRunner.java, MapTask.java, 
> MapTaskCompletionEventsUpdate.java, MapTaskRunner.java, MapTaskStatus.java, 
> MemoryElement.java, MergeSorter.java, Merger.java, Operation.java, 
> OutputCollector.java, OutputCommitter.java, OutputFormat.java, 
> OutputLogFilter.java, Partitioner.java, RamManager.java, 
> RawBufferedOutputStream.java, RawHistoryFileServlet.java, 
> RawKeyValueIterator.java, RecordReader.java, ReduceRamManager.java, 
> ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, 
> ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, 
> SequenceFileOutputFormat.java, SpillScheduler.java, Task.java, 
> TaskInProgress.java, TaskLog.java, TaskLogAppender.java, TaskLogServlet.java, 
> TaskLogsTruncater.java, TaskMemoryManagerThread.java, TaskReport.java, 
> TaskRunner.java, TaskScheduler.java, TaskStatus.java, TaskTracker.java, 
> TaskTrackerAction.java, TaskTrackerInstrumentation.java, 
> TaskTrackerStatus.java, TextOutputFormat.java
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-02 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Status: Patch Available  (was: In Progress)

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck

2013-11-02 Thread Ming Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Chen updated MAPREDUCE-5605:
-

Description: Memory is a very important resource to bridge the gap between 
CPUs and I/O devices. So the idea is to maximize the usage of memory to solve 
the problem of I/O bottleneck. We developed a multi-threaded task execution 
engine, which runs in a single JVM on a node. In the execution engine, we have 
implemented the algorithm of memory scheduling to realize global memory 
management, based on which we further developed the techniques such as 
sequential disk accessing, multi-cache and solved the problem of full garbage 
collection in the JVM. We have conducted extensive experiments with comparison 
against the native Hadoop platform. The results show that the Mammoth system 
can reduce the job execution time by more than 40% in typical cases, without 
requiring any modifications of the Hadoop programs. When a system is short of 
memory, Mammoth can improve the performance by up to 4 times, as observed for 
I/O intensive applications, such as PageRank.   (was: Memory is a very 
important resource to bridge the gap between
CPUs and I/O devices. So the idea is to maximize the usage of memory to solve 
the problem of I/O bottleneck. We developed a multi-threaded task execution 
engine, which runs in a single JVM on a node. In the execution engine, we have 
implemented the algorithm of memory scheduling to realize global memory 
management, based on which we further developed the techniques such as 
sequential disk accessing, multi-cache and solved the problem of full garbage 
collection in the JVM. We have conducted extensive experiments with comparison 
against the native Hadoop platform. The results show that the Mammoth system 
can reduce the job execution time by more than 40% in typical cases, without 
requiring any modifications of the Hadoop programs. When a system is short of 
memory, Mammoth can improve the performance by up to 4 times, as observed for 
I/O intensive applications, such as PageRank. )

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> ---
>
> Key: MAPREDUCE-5605
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.0.1
> Environment: x86-64 Linux/Unix
> jdk7 preferred
>Reporter: Ming Chen
>Assignee: Ming Chen
>
> Memory is a very important resource to bridge the gap between CPUs and I/O 
> devices. So the idea is to maximize the usage of memory to solve the problem 
> of I/O bottleneck. We developed a multi-threaded task execution engine, which 
> runs in a single JVM on a node. In the execution engine, we have implemented 
> the algorithm of memory scheduling to realize global memory management, based 
> on which we further developed the techniques such as sequential disk 
> accessing, multi-cache and solved the problem of full garbage collection in 
> the JVM. We have conducted extensive experiments with comparison against the 
> native Hadoop platform. The results show that the Mammoth system can reduce 
> the job execution time by more than 40% in typical cases, without requiring 
> any modifications of the Hadoop programs. When a system is short of memory, 
> Mammoth can improve the performance by up to 4 times, as observed for I/O 
> intensive applications, such as PageRank. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

71 matches

Mail list logo