[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969031#action_12969031
 ] 

Allen Wittenauer commented on HADOOP-6685:
------------------------------------------

> works against integrating external configuration systems with existing 
> components

 I'm thinking of when we are past the limitations of the existing components.  
What if we don't pass files around for configuration information at all?  Then 
does making sure that everything can be represented as a UTF-16 string make 
sense?  I don't think it does.

> Do we have much binary configuration data?

Given that it is currently impossible, the answer is obviously no.

But this seems like a major flaw of the existing system.  Who are we to dictate 
what the user can/can't put in what is essentially a private part of the 
configuration name space?  Hadoop as a framework shouldn't care what the 
representation of that value is if it doesn't have to read it.  If I want to 
build a mass documentation signing system and provide the binary representation 
of the CA cert as a configuration option to my serializer, why shouldn't I be 
able to do that?  If I want to work in UTF-32 and pass information as a config 
option to my serializer, why shouldn't I be able to do that?

Now one could argue that I could base64 encode my data or do the wacky !!binary 
thing that YAML does  (JSON doesn't support binary, so to me, that instantly 
eliminates it.  Even crusty x.500 supports binary!  ... and XML... well, you 
all know how I feel about it. *smile*).  But why should I take a performance 
hit  to support my use case?

I don't see the value in support the existing system when it has what I would 
say is a major flaw.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: serial.patch, serial4.patch, serial6.patch, 
> serial7.patch, serial9.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to