[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934624#action_12934624
 ] 

Konstantin Boudnik commented on HADOOP-6685:
--------------------------------------------

bq. However, in the case of a PB serialization, for example, the PB library is 
not used in Hadoop except in the serialization code for serializing the user's 
data type. So it's a user-level concern, and should be compiled as such - 
putting it in core Hadoop is asking for trouble in the future, since the Hadoop 
releases won't keep track with the union of PB, Thrift, and Avro releases. 
These serialization plugins should be stand alone, or at least easily 
re-compilable in a way that doesn't involve recompiling all of Hadoop, such as 
a contrib module. The user just treats the plugin JAR as another code 
dependency.

+1 on Tom's point: having a variety of serialisation frameworks in a product is 
a good thing. Unless it doesn't come with the cost of possible mess they might 
cause if their public APIs start deviating in a way that core Hadoop will have 
to be changed to keep user applications working. Testing those is another 
thing: if Hadoop claims to support something explicitly somebody needs to make 
an effort and guarantee that it is so.

Having a clean abstraction for serialisation and pluggable frameworks as a user 
wish sounds like a reasonable compromise.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to