[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931595#action_12931595 ]
Doug Cutting commented on HADOOP-6685: -------------------------------------- > That support is much easier if the metadata for each serialization is in a > separate structure and not dumped into the Configuration. Got it. Thanks for clarifying. As I've commented earlier in this issue, I prefer the use of simple textual formats (properties, XML, JSon, etc.) for metadata and configuration data, as in HTTP, SMTP, and most config file formats, rather than binary data. Such textual formats seem to me to be more natural when bootstrapping interoperable systems. Metadata and configuration data are not usually performance or size sensitive, the normal motivation for the use of binary. > Providing customer choice over the serialization is much richer than forcing > them into a single one. I agree that we should provide a general-purpose API that does not force a particular serialization, but we should encourage a primary serialization to provide better interoperability. > Any file format that only supports one serialization doesn't meet my needs. I certainly don't think we should mandate a single file format, and we don't at present. But I think we should focus our support around a single format. A format that contains multiple serializations is harder to support across multiple languages and greatly increases the chance that you'll have data that cannot be processed by another system. As an existence proof, Google seems to get a lot of mileage with a single preferred serialization. Thanks for responding to my concerns. I am -0 on this patch as currently implemented: I think we could do better but I will not block progress. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: serial.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.