[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934627#action_12934627 ]
Steve Loughran commented on HADOOP-6685: ---------------------------------------- I've been doing lots of JSON work recently, net.sf.jsonobject, gson, jackson, etc: so many parsers, so many others dependencies. Any of those that try to reimplement the dream of WS-* (seamless serialisation between native objects) is repeating the same mistakes. But it's good for shoving stuff around, serving up over HTTP, parsing in different languages. Compared to XML, the fact that Xerces ships on all Hadoop-compatible JVMs, gives XML an edge, one that DOM takes away in its pain of use. I'm =0 on it internally. Less painful than XML, but the extra dependencies and time I waste converting from different java models of the graph hurts. And like XML, you end up escaping and base-64-ing stuff. The ASF would veto any release of Hadoop that depended on an unreleased in-incubation artifact. This would complicate any plan to branch to 0.22, or at least release it, unless the build file was set up to exclude thrift-specific code. But if HDFS already depends that, that's something in the schedule plans anyway, and Hadoop core + hdfs will depend on a specific thrift version. +1 to tom's suggestion of keeping PB off in a contrib package, the same for thrift if HDFS can remove its dependencies. =0 to binary config data vs map<string, string>. Binary is efficient but brittle, map easier to debug. Question is, what would the performance cost of staying in string maps be? > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.