[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934627#action_12934627
 ] 

Steve Loughran commented on HADOOP-6685:
----------------------------------------

I've been doing lots of JSON work recently, net.sf.jsonobject, gson, jackson, 
etc: so many parsers, so many others dependencies. Any of those that try to 
reimplement the dream of WS-* (seamless serialisation between native objects) 
is repeating the same mistakes. But it's good for shoving stuff around, serving 
up over HTTP, parsing in different languages. Compared to XML, the fact that 
Xerces ships on all Hadoop-compatible JVMs, gives XML an edge, one that DOM 
takes away in its pain of use. I'm =0 on it internally. Less painful than XML, 
but the extra dependencies and time I waste converting from different java 
models of the graph hurts. And like XML, you end up escaping and base-64-ing 
stuff. 

The ASF would veto any release of Hadoop that depended on an unreleased 
in-incubation artifact. This would complicate any plan to branch to 0.22, or at 
least release it, unless the build file was set up to exclude thrift-specific 
code. But if HDFS already depends that, that's something in the schedule plans 
anyway, and Hadoop core + hdfs will depend on a specific thrift version.

+1 to tom's suggestion of keeping PB off in a contrib package, the same for 
thrift if HDFS can remove its dependencies. 

=0 to binary config data vs map<string, string>. Binary is efficient but 
brittle, map easier to debug. Question is, what would the performance cost of 
staying in string maps be?



> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to