[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934009#action_12934009 ]
Arun C Murthy commented on HADOOP-6685: --------------------------------------- {quote} > So, the patch, as it stands allows SequenceFiles to use the new serialization > framework i.e. adds a feature. Are you against this feature? Can you please > explain why? Yes, I am against this feature. I've explained why several times above, and will try again now. Creating new concrete data formats that are functionally equivalent to other concrete formats decreases ecosystem interoperability, flexibility and maintainability. Above I cited the Dremel paper, whose section 2 outlines a scenario that they argue is only possible because all of the systems involved share a single common serialization and file format. {quote} Thanks for laying it out again. Your objections seem very unreasonable to me. I understand you prefer to have a single Avro-based data format, but Hadoop is a software framework used by many people and organizations. People and organizations already have data in different formats. Dremel is implemented and used by a single organization who have specific a technical and historical context. What they need and use isn't something everyone on the planet can. Hadoop as a framework should not be in the business of dictating formats. We should facilitate and encourage users and organizations to use inter-operable formats, not necessarily the *one* format. IAC, this seems like a discussion which belongs elsewhere - I just don't see how blocking a feature is useful. You can ask for it to be done in a separate jira, which we can, but this specific objection of yours is very unreasonable, IMO. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.