[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786814#action_12786814
 ] 

Todd Lipcon commented on MAPREDUCE-1270:
----------------------------------------

This is pretty interesting. How are you implementing TaskUmbilicalProtocol?

> Hadoop C++ Extention
> --------------------
>
>                 Key: MAPREDUCE-1270
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.20.1
>         Environment:  hadoop linux
>            Reporter: Wang Shouyan
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>    1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>    2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>    3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>    What we want to do: 
>    1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>    2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>    3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>    at first, 1 and 2, then 3.  
>    What's the difference with PIPES:
>    1 Yes, We will reuse most PIPES code.
>    2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to