[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932396#action_12932396
 ] 

Tom White commented on HADOOP-6685:
-----------------------------------

Here's my feedback on the patch:

# I think the new serializations should be optional dependencies. Mandating a 
particular version of Thrift, Protocol Buffers, and Avro is going to cause 
problems for folks down the line, since we would be tying the version to 
Hadoop's release cycle, which is infrequent. By making the serializations 
libraries (or contrib modules, as in MAPREDUCE-376, MAPREDUCE-377) makes them 
independent, and will make it easier to support the version of the 
serialization library the user wants.
# I preferred the version where the Serialization could choose the way it 
serialized itself. In the current patch, if you wrote Avro data in a 
SequenceFile you would have Writables for the file container, and a PB-encoded 
Avro schema for the serialization. Having so many serialization mechanisms is 
potentially brittle.
# I'm not sure we need the full generally of PB for serializing serializations. 
If the serialization could choose its self-serialization mechanism, then 
TypedSerialization could just write its type using as a Text object. Doing this 
would remove the core dependency on PB, and allow 1.


> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to