[ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934009#action_12934009
 ] 

Arun C Murthy commented on HADOOP-6685:
---------------------------------------

{quote}
> So, the patch, as it stands allows SequenceFiles to use the new serialization 
> framework i.e. adds a feature. Are you against this feature? Can you please 
> explain why?

Yes, I am against this feature. I've explained why several times above, and 
will try again now. Creating new concrete data formats that are functionally 
equivalent to other concrete formats decreases ecosystem interoperability, 
flexibility and maintainability. Above I cited the Dremel paper, whose section 
2 outlines a scenario that they argue is only possible because all of the 
systems involved share a single common serialization and file format.
{quote}

Thanks for laying it out again.

Your objections seem very unreasonable to me. 

I understand you prefer to have a single Avro-based data format, but Hadoop is 
a software framework used by many people and organizations. People and 
organizations already have data in different formats. 

Dremel is implemented and used by a single organization who have specific a 
technical and historical context. What they need and use isn't something 
everyone on the planet can. 

Hadoop as a framework should not be in the business of dictating formats. 

We should facilitate and encourage users and organizations to use 
inter-operable formats, not necessarily the *one* format.

IAC, this seems like a discussion which belongs elsewhere - I just don't see 
how blocking a feature is useful. You can ask for it to be done in a separate 
jira, which we can, but this specific objection of yours is very unreasonable, 
IMO.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.22.0
>
>         Attachments: libthrift.jar, serial.patch, serial4.patch, 
> serial6.patch, serial7.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to