[ https://issues.apache.org/jira/browse/HADOOP-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533022 ]
Owen O'Malley commented on HADOOP-1986: --------------------------------------- Vivek, No one was suggesting a serializer per a concrete class, except in the case of Thrift if they don't implement a generic interface. Your proposal doesn't address how the mapping from an Object to Serializer is managed. I think my suggestion provides the most flexability since you only need one serializer per a root class and they don't have any requirements on the implementation classes at all. Basically, each serialization library that someone wanted to use with Hadoop would have a single generic serializaer and a library routine would do the lookups at the first level: {code} public interface Serializer<T> { void serialize(T t, OutputStream out) throws IOException; void deserialize(T t, InputStream in) throws IOException; // Get the base class that this serializer will work on Class<T> getTargetClass(); } {code} org.apache.hadoop.io.serializer.WritableSerializer would be coded to read and write any Writable, while org.apache.hadoop.io.serializer.ThriftSerializer would read and write any Thrift type. I'd probably make a utility class: {code} class org.apache.hadoop.io.serializer.SerializerFactory extends Configured { Serializer<T> getSerializer(Class<? extends T> cls); } {code} and presumably the SerializerFactory would include a cache from the class to serializer class (hopefully with weak references to allow garbage collection). This would allow you to remove all references to Writable in SequenceFile and the map/reduce classes. Any object could be written into sequence files or passed around in map/reduce jobs. It would be cool and should result in only a modest amount of confusion to the users. Furthermore, since it makes only relatively minor use of reflection, a C++ implementation along similar lines should be feasible. (Although it would be a lot more expensive to evaluate, because dynamic_cast is outrageously expensive because of the C++ multiple inheritance semantics.) > Add support for a general serialization mechanism for Map Reduce > ---------------------------------------------------------------- > > Key: HADOOP-1986 > URL: https://issues.apache.org/jira/browse/HADOOP-1986 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Tom White > Fix For: 0.16.0 > > Attachments: SerializableWritable.java > > > Currently Map Reduce programs have to use WritableComparable-Writable > key-value pairs. While it's possible to write Writable wrappers for other > serialization frameworks (such as Thrift), this is not very convenient: it > would be nicer to be able to use arbitrary types directly, without explicit > wrapping and unwrapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.