[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Owen O'Malley (JIRA) Fri, 12 Nov 2010 16:04:47 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931588#action_12931588
 ]


Owen O'Malley commented on HADOOP-6685:
---------------------------------------

{quote}
Owen, thanks for the slides
{quote}

You're welcome. Everyone had seen them before, but I wanted to make sure they 
were easily available for this conversation.

{quote}
I don't see a direct relation between this issue and the issue of simplifying 
the implementation of efficient map-side joins (MAPREDUCE-1183, more or less). 
Am I missing the connection, or is this a distinct issue?
{quote}

It is related because we want to support context-specific serializations. That 
support is much easier if the metadata for each serialization is in a separate 
structure and not dumped into the Configuration. This is the same problem that 
comes from MAPREDUCE-1183 for InputFormats, Mappers, etc. They are similar 
issues and it would be nice to have a consistent solutions.

{quote}
File formats are forever.
{quote}

I'm adding no new file formats. I'm just making the ones that we've had for 
years have more useful.

{quote}
We badly need to add support for a higher-level object serialization system 
than Writable.
{quote}

I obviously agree enough that I'm working on supporting it. Providing customer 
choice over the serialization is much richer than forcing them into a single 
one. They each have different design decisions, by making the choice pluggable 
the *user* can decide. I understand that you want Avro everywhere. Other users 
have other priorities.

{quote}
But I'm not convinced its wise to add such support to the exisiting Java-only 
container file formats.
{quote}

I'm supporting the containers we have. I'd love for someone to implement 
SequenceFiles or TFiles in C. That is an orthogonal issue. Any file format that 
only supports one serialization doesn't meet my needs.

This change should have no impact on any current applications. Very few of them 
depend on the serialization library directly. My hope is that by extending the 
library and supporting a wider range of serializations, users will be able to 
code their applications using the types that *they* find convenient.

> Change the generic serialization framework API to use serialization-specific 
> bytes instead of Map<String,String> for configuration
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-6685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6685
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: serial.patch, SerializationAtSummit.pdf
>
>
> Currently, the generic serialization framework uses Map<String,String> for 
> the serialization specific configuration. Since this data is really internal 
> to the specific serialization, I think we should change it to be an opaque 
> binary blob. This will simplify the interface for defining specific 
> serializations for different contexts (MAPREDUCE-1462). It will also move us 
> toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6685) Change the generic serialization framework API to use serialization-specific bytes instead of Map for configuration

Reply via email to