[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935580#action_12935580 ]
Ryan Holmes commented on HADOOP-6685: ------------------------------------- bq. Avro is already a dependency. Thrift is already a dependency for HDFS (see HDFS-1484). I'm only adding ProtocolBuffers, which is a commonly used serialization format that many users including me find extremely useful. This line of reasoning is overly general and could be used to support the addition of literally any dependency (i.e. dependency x already exists, so it's OK to add y). Hadoop should focus on providing a pluggable API for serialization rather than providing specific internal implementations (optional implementations would be fine). I also think Hadoop will benefit greatly in the long term by promoting a single, default serialization and file format for new users. I was under the impression that this was a shared goal and that the chosen format was Avro. Adding a direct dependency on Protocol Buffers and increasing the scope of dependency on Thrift seems to directly contradict that goal. bq. In MAPREDUCE-980, you took out the custom JSON parser and replaced it with calls into Avro. Using ProtoBuf is efficient and meant that I wrote 2 lines of code. If I used JSON, I would need to write a parser and printer. Can't you use Jackson, which is already a dependency? > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.