[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

Owen O'Malley (JIRA) Tue, 16 Feb 2010 10:41:53 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834370#action_12834370
 ]


Owen O'Malley commented on MAPREDUCE-326:
-----------------------------------------

{quote}
This seems very similar to what I suggested yesterday. Is there a notable 
difference?
{quote}

Yes. To implement yours, a serializer needs to write into a ByteBuffer so that 
it can hand it to the framework. So your's is a complicated way of implementing 
my write(int, ByteBuffer, ByteBuffer). The advantage of my 
RawKeyValueOutputStream is that:
* the serializer can write directly to it without putting it into a ByteBuffer
* the mapper doesn't need to pre-declare the sizes of the key and value.

If the goal is to help non-Java frameworks, the best choice is write(int, 
ByteBuffer,ByteBuffer), because they can just pass the DirectByteBuffers that 
they read from the underlying stream. If the goal is to enable other object 
serialization models, some variant of the RawKeyValueOutputStream makes sense, 
because they can use the stream to serialize the objects. Since I think that 

{quote}
Many folks do seem to have expressed interest in this approach.
{quote}
I disagree. They all have goals and none of them are solved by adding new 
abstraction levels.
* Joydeep said he wants sort on output, which is being addressed elsewhere
* Chris Dyer wants efficient pipes, which only needs the raw write.
* Eric14 is primarily motivated by simplifying APIs and avoiding buffer copies, 
which argues against adding new levels of abstraction.

I'm not against adding the new method into MapContext and a raw map/reduce api 
in a contrib module. That will let us build experience with it. I am very 
against adding a new level of abstraction at this point.


> The lowest level map-reduce APIs should be byte oriented
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: eric baldeschwieler
>         Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf
>
>
> As discussed here:
> https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
> The templates, serializers and other complexities that allow map-reduce to 
> use arbitrary types complicate the design and lead to lots of object creates 
> and other overhead that a byte oriented design would not suffer.  I believe 
> the lowest level implementation of hadoop map-reduce should have byte string 
> oriented APIs (for keys and values).  This API would be more performant, 
> simpler and more easily cross language.
> The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

Reply via email to