[
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533022
]
Owen O'Malley commented on HADOOP-1986:
---------------------------------------
Vivek,
No one was suggesting a serializer per a concrete class, except in the case
of Thrift if they don't implement a generic interface. Your proposal doesn't
address how the mapping from an Object to Serializer is managed. I think my
suggestion provides the most flexability since you only need one serializer per
a root class and they don't have any requirements on the implementation classes
at all. Basically, each serialization library that someone wanted to use with
Hadoop would have a single generic serializaer and a library routine would do
the lookups at the first level:
{code}
public interface Serializer<T> {
void serialize(T t, OutputStream out) throws IOException;
void deserialize(T t, InputStream in) throws IOException;
// Get the base class that this serializer will work on
Class<T> getTargetClass();
}
{code}
org.apache.hadoop.io.serializer.WritableSerializer would be coded to read and
write any Writable, while org.apache.hadoop.io.serializer.ThriftSerializer
would read and write any Thrift type.
I'd probably make a utility class:
{code}
class org.apache.hadoop.io.serializer.SerializerFactory extends Configured {
Serializer<T> getSerializer(Class<? extends T> cls);
}
{code}
and presumably the SerializerFactory would include a cache from the class to
serializer class (hopefully with weak references to allow garbage collection).
This would allow you to remove all references to Writable in SequenceFile and
the map/reduce classes. Any object could be written into sequence files or
passed around in map/reduce jobs. It would be cool and should result in only a
modest amount of confusion to the users.
Furthermore, since it makes only relatively minor use of reflection, a C++
implementation along similar lines should be feasible. (Although it would be a
lot more expensive to evaluate, because dynamic_cast is outrageously expensive
because of the C++ multiple inheritance semantics.)
> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
> Key: HADOOP-1986
> URL: https://issues.apache.org/jira/browse/HADOOP-1986
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Tom White
> Fix For: 0.16.0
>
> Attachments: SerializableWritable.java
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable
> key-value pairs. While it's possible to write Writable wrappers for other
> serialization frameworks (such as Thrift), this is not very convenient: it
> would be nicer to be able to use arbitrary types directly, without explicit
> wrapping and unwrapping.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.