[ https://issues.apache.org/jira/browse/HADOOP-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931588#action_12931588 ]
Owen O'Malley commented on HADOOP-6685: --------------------------------------- {quote} Owen, thanks for the slides {quote} You're welcome. Everyone had seen them before, but I wanted to make sure they were easily available for this conversation. {quote} I don't see a direct relation between this issue and the issue of simplifying the implementation of efficient map-side joins (MAPREDUCE-1183, more or less). Am I missing the connection, or is this a distinct issue? {quote} It is related because we want to support context-specific serializations. That support is much easier if the metadata for each serialization is in a separate structure and not dumped into the Configuration. This is the same problem that comes from MAPREDUCE-1183 for InputFormats, Mappers, etc. They are similar issues and it would be nice to have a consistent solutions. {quote} File formats are forever. {quote} I'm adding no new file formats. I'm just making the ones that we've had for years have more useful. {quote} We badly need to add support for a higher-level object serialization system than Writable. {quote} I obviously agree enough that I'm working on supporting it. Providing customer choice over the serialization is much richer than forcing them into a single one. They each have different design decisions, by making the choice pluggable the *user* can decide. I understand that you want Avro everywhere. Other users have other priorities. {quote} But I'm not convinced its wise to add such support to the exisiting Java-only container file formats. {quote} I'm supporting the containers we have. I'd love for someone to implement SequenceFiles or TFiles in C. That is an orthogonal issue. Any file format that only supports one serialization doesn't meet my needs. This change should have no impact on any current applications. Very few of them depend on the serialization library directly. My hope is that by extending the library and supporting a wider range of serializations, users will be able to code their applications using the types that *they* find convenient. > Change the generic serialization framework API to use serialization-specific > bytes instead of Map<String,String> for configuration > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-6685 > URL: https://issues.apache.org/jira/browse/HADOOP-6685 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Owen O'Malley > Assignee: Owen O'Malley > Attachments: serial.patch, SerializationAtSummit.pdf > > > Currently, the generic serialization framework uses Map<String,String> for > the serialization specific configuration. Since this data is really internal > to the specific serialization, I think we should change it to be an opaque > binary blob. This will simplify the interface for defining specific > serializations for different contexts (MAPREDUCE-1462). It will also move us > toward having serialized objects for Mappers, Reducers, etc (MAPREDUCE-1183). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.