[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965119#action_12965119 ]
Owen O'Malley commented on HADOOP-6685: --------------------------------------- {quote} I don't yet see an advantage for opaque binary. {quote} The two different solutions that have been proposed for serialization metadata are: * string to string map (HADOOP-6165, HADOOP-6120, HADOOP-6323, HADOOP-6443, MAPREDUCE-1126) ** type unsafe - users may put the wrong type of value into a slot ** unchecked keys - users may misspell a key and get the default value by mistake ** complete visibility - details of implementation are completely visible to user and impossible to change * opaque blob (HADOOP-6685) ** may be encoded as binary or text ** given a versioned format (ProtoBuf, JSON, Thrift, XML), is completely extensible ** since interface is via API *** it is type-safe *** all of the setters and getters are checked for validity by the compiler *** can specify the visibility to the user *** it can be easily documented via javadoc In both cases, the metadata is specific to the serialization and can't be interpreted without reference to the corresponding serialization. {quote} A common, transparent configuration data format will simplify the creation of configuration editing tools. {quote} Since the metadata is specific to each serialization, there are no common interfaces to support those editing tools. So the string to string maps give the appearance of a common format, but without the semantics it isn't possible to write tools to edit it. In both implementations, it is easy to write dumpers. {quote} As we consider replacing serialization metadata we should probably look for a solution that's appropriate for replacing all configuration data {quote} When MAPREDUCE-1183 is done, there is very little need for the string to string map of Configuration. We will have it for a long time to support old applications, but users won't need it. It would be nice to move the servers away from the current XML encoded string to string configurations, but that is *way* outside the scope of this jira. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Fix For: 0.22.0 > > Attachments: libthrift.jar, serial.patch, serial4.patch, > serial6.patch, serial7.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.