[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Hadoop QA (JIRA) Fri, 05 Sep 2014 20:12:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124284#comment-14124284
 ]


Hadoop QA commented on MAPREDUCE-2841:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12666979/mr-2841-merge-2.txt
  against trunk revision e6420fe.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 71 new 
or modified test files.

      {color:red}-1 javac{color}.  The applied patch generated 1304 javac 
compiler warnings (more than the trunk's current 1264 warnings).

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.
        See 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

        {color:red}-1 release audit{color}.  The applied patch generated 8 
release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-assemblies 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//console

This message is automatically generated.

> Task level native optimization
> ------------------------------
>
>                 Key: MAPREDUCE-2841
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>         Environment: x86-64 Linux/Unix
>            Reporter: Binglin Chang
>            Assignee: Sean Zhong
>         Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
> MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, 
> dualpivotv20-0.patch, fb-shuffle.patch, 
> hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, 
> mr-2841-merge-2.txt, mr-2841-merge.txt
>
>
> I'm recently working on native optimization for MapTask based on JNI. 
> The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
> emitted by mapper, therefore sort, spill, IFile serialization can all be done 
> in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
> results:
> 1. Sort is about 3x-10x as fast as java(only binary string compare is 
> supported)
> 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
> CRC32C is used, things can get much faster(1G/
> 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
> prevent mid-spill
> This leads to a total speed up of 2x~3x for the whole MapTask, if 
> IdentityMapper(mapper does nothing) is used
> There are limitations of course, currently only Text and BytesWritable is 
> supported, and I have not think through many things right now, such as how to 
> support map side combine. I had some discussion with somebody familiar with 
> hive, it seems that these limitations won't be much problem for Hive to 
> benefit from those optimizations, at least. Advices or discussions about 
> improving compatibility are most welcome:) 
> Currently NativeMapOutputCollector has a static method called canEnable(), 
> which checks if key/value type, comparator type, combiner are all compatible, 
> then MapTask can choose to enable NativeMapOutputCollector.
> This is only a preliminary test, more work need to be done. I expect better 
> final results, and I believe similar optimization can be adopt to reduce task 
> and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

Reply via email to