[ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833445#action_12833445
 ] 

Owen O'Malley commented on MAPREDUCE-326:
-----------------------------------------

@Tom: Your interface would be perfectly implementable on *top* of the context 
object API with very very little overhead or work. I haven't seen *any* 
motivation to introduce a new API below the user-level ones. If you moved your 
API into o.a.h.mapreduce.lib.raw and marked it unstable framework writers could 
experiment with it.

@Chris Dyer: Pipes and streaming certainly need a major pass through to clean 
up their performance, although benchmarks have shown that for sort, which is 
the worst case, pipe's performance is comparable to Java. Pipes *would* get 
much easier if it moved to the context object API. Moving to Tom's API wouldn't 
help at all over the context object API. 

To avoid the needless serialization, pipe's applications should be using 
SequenceFileAsBinaryInputFormat (and OutputFormat). That said, to maximize 
compatibility with Java, pipe's applications are allowed to use any input or 
output format.

> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>         Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf
>
>
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to 
> use arbitrary types complicate the design and lead to lots of object creates 
> and other overhead that a byte oriented design would not suffer.  I believe 
> the lowest level implementation of hadoop map-reduce should have byte string 
> oriented APIs (for keys and values).  This API would be more performant, 
> simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to