[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966667#action_12966667
 ] 

Doug Cutting commented on HADOOP-6685:
--------------------------------------

> We don't have one right now. We have XML and JSON. Neither is user-friendly.

We don't use currently use JSON for configuration data.  Today we use 
Map<String,String> as the configuration data model.  This is usually serialized 
as XML and sometimes in other forms (e.g., inside a SequenceFile).  The 
simplicity of this model permits differing serializations without significant 
loss of transparency or interoperability.  This model interoperates well with 
Java properties, including system properties, with environment variables, etc.  
Appending a prefix to keys has been demonstrated to be an effective if 
inelegant way to implement nesting in this model.  This model does not easily 
map to objects, nor does it provide any type support.

If we wish to use a more complex data model, that's nestable, that's more 
strongly typed and that can be easily mapped to objects, then a standard 
serialization, like JSON or YAML, is a good way to still ensure transparency 
and interoperability.

YAML could work well as a data model.  Nesting YAML requires adjusting 
indentation, while JSON permits simple string appends to nest.  But if a Java 
API like YamlBeans is used, then indentation would be handled automatically.

If we can read/write YAML, what reason is there to support arbitrary binary 
configuration data?

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to