[
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532998
]
Tom White commented on HADOOP-1986:
-----------------------------------
Vivek,
> I'm thinking about serialization not just for key-value pairs for Map/Reduce,
> but also in other places
I agree that it would be useful to have a common serialization mechanism for
all parts of Hadoop. The serialization mechanism proposed so far is likely to
be applicable more widely since it so general - it talks in terms of
input/output streams and parameterized types.
This Jira issue is confined to the MapReduce part, since we have to start
somewhere. I think it would be a useful exercise to think through the
implications of the design for other parts of Hadoop before committing any
changes though.
> I don't think you want a serializer/deserializer per class.
Not per concrete class, agreed. But per base class (e.g. Writable,
Serializable, Thriftable, etc).
> Someone still needs to implement the code for serializing/deserializing that
> class and I don't see any
> discussion on Hadoop support for Thrift or Record which the user can just
> invoke. plus, if you think of
> using this mechanism for Hadoop RPC, we will have so many instances of the
> Serializer<T> interface. You're
> far better off having a HadoopSerializer class that takes in any object and
> automatically
> serializes/deserializes it. All a user has to do is decide which
> serialization platform to use.
I think you pretty much describe where I would like to get to. If people are
using Thrift for example (and there is a common Thrift interface) then there
would be a ThriftSerializer that would just work for people, with little or no
configuration. While it should still be relatively easy to write a custom
serializer/deserializer, most people will use the standard ones for the
standard serializing platforms.
There is a question about where these serializers would go - e.g. would
ThriftSerializer go in core Hadoop?
> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
> Key: HADOOP-1986
> URL: https://issues.apache.org/jira/browse/HADOOP-1986
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Tom White
> Fix For: 0.16.0
>
> Attachments: SerializableWritable.java
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable
> key-value pairs. While it's possible to write Writable wrappers for other
> serialization frameworks (such as Thrift), this is not very convenient: it
> would be nicer to be able to use arbitrary types directly, without explicit
> wrapping and unwrapping.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.