[ 
https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533022
 ] 

Owen O'Malley commented on HADOOP-1986:
---------------------------------------

Vivek,
   No one was suggesting a serializer per a concrete class, except in the case 
of Thrift if they don't implement a generic interface. Your proposal doesn't 
address how the mapping from an Object to Serializer is managed. I think my 
suggestion provides the most flexability since you only need one serializer per 
a root class and they don't have any requirements on the implementation classes 
at all. Basically, each serialization library that someone wanted to use with 
Hadoop would have a single generic serializaer and a library routine would do 
the lookups at the first level:

{code}
public interface Serializer<T> {
  void serialize(T t, OutputStream out) throws IOException;
  void deserialize(T t, InputStream in) throws IOException;
  // Get the base class that this serializer will work on
  Class<T> getTargetClass();
}
{code}

org.apache.hadoop.io.serializer.WritableSerializer would be coded to read and 
write any Writable, while org.apache.hadoop.io.serializer.ThriftSerializer 
would read and write any Thrift type.

I'd probably make a utility class:

{code}
class org.apache.hadoop.io.serializer.SerializerFactory extends Configured {
  Serializer<T> getSerializer(Class<? extends T> cls);
}
{code}

and presumably the SerializerFactory would include a cache from the class to 
serializer class (hopefully with weak references to allow garbage collection). 
This would allow you to remove all references to Writable in SequenceFile and 
the map/reduce classes. Any object could be written into sequence files or 
passed around in map/reduce jobs. It would be cool and should result in only a 
modest amount of confusion to the users. 

Furthermore, since it makes only relatively minor use of reflection, a C++ 
implementation along similar lines should be feasible. (Although it would be a 
lot more expensive to evaluate, because dynamic_cast is outrageously expensive 
because of the C++ multiple inheritance semantics.)

> Add support for a general serialization mechanism for Map Reduce
> ----------------------------------------------------------------
>
>                 Key: HADOOP-1986
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1986
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Tom White
>             Fix For: 0.16.0
>
>         Attachments: SerializableWritable.java
>
>
> Currently Map Reduce programs have to use WritableComparable-Writable 
> key-value pairs. While it's possible to write Writable wrappers for other 
> serialization frameworks (such as Thrift), this is not very convenient: it 
> would be nicer to be able to use arbitrary types directly, without explicit 
> wrapping and unwrapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to