[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935580#action_12935580
 ] 

Ryan Holmes commented on HADOOP-6685:
-------------------------------------

bq. Avro is already a dependency. Thrift is already a dependency for HDFS (see 
HDFS-1484). I'm only adding ProtocolBuffers, which is a commonly used 
serialization format that many users including me find extremely useful.
This line of reasoning is overly general and could be used to support the 
addition of literally any dependency (i.e. dependency x already exists, so it's 
OK to add y). 

Hadoop should focus on providing a pluggable API for serialization rather than 
providing specific internal implementations (optional implementations would be 
fine).  I also think Hadoop will benefit greatly in the long term by promoting 
a single, default serialization and file format for new users. I was under the 
impression that this was a shared goal and that the chosen format was Avro. 
Adding a direct dependency on Protocol Buffers and increasing the scope of 
dependency on Thrift seems to directly contradict that goal.

bq. In MAPREDUCE-980, you took out the custom JSON parser and replaced it with 
calls into Avro. Using ProtoBuf is efficient and meant that I wrote 2 lines of 
code. If I used JSON, I would need to write a parser and printer.
Can't you use Jackson, which is already a dependency? 


> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to