[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932396#action_12932396 ]
Tom White commented on HADOOP-6685: ----------------------------------- Here's my feedback on the patch: # I think the new serializations should be optional dependencies. Mandating a particular version of Thrift, Protocol Buffers, and Avro is going to cause problems for folks down the line, since we would be tying the version to Hadoop's release cycle, which is infrequent. By making the serializations libraries (or contrib modules, as in MAPREDUCE-376, MAPREDUCE-377) makes them independent, and will make it easier to support the version of the serialization library the user wants. # I preferred the version where the Serialization could choose the way it serialized itself. In the current patch, if you wrote Avro data in a SequenceFile you would have Writables for the file container, and a PB-encoded Avro schema for the serialization. Having so many serialization mechanisms is potentially brittle. # I'm not sure we need the full generally of PB for serializing serializations. If the serialization could choose its self-serialization mechanism, then TypedSerialization could just write its type using as a Text object. Doing this would remove the core dependency on PB, and allow 1. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.