Curt Cox wrote:
I'm curious why the new Writable interface was chosen rather than
using Serializable.

The Writable interface is subtly different than Serializable. Serializable does not assume the class of stored values is known. So each instance is tagged with its class. ObjectOutputStream and ObjectInputStream optimize this somewhat, so that 5-byte handles are written for instances of a class after the first. But object sequences with handles cannot be then accessed randomly, since they rely on stream state. This complicates things like sorting.

Writable, on the other hand, assumes that the application knows the expected class. The application must be able to create an instance in order to call readFields(). So the class need not be stored with each instance. This results in considerably more compact binary files, straightforward random access and generally higher performance.

Arguably Hadoop could use Serializable. One could override writeObject or writeExternal for each class whose serialization was performance critical. (MapReduce is very i/o intensive, so nearly every class's serialization is performance critical.) One could implement ObjectOutputStream.writeObjectOverride() and ObjectInputStream.readObjectOverride() to use a more compact representation, that, e.g., did not need to tag each top-level instance in a file with its class. This would probably require as least as much code as Haddop has in Writable, ObjectWritable, etc., and the code would be a bit more complicated, since it would be trying to work around a different typing model. But it might have the advantage of better built-in version control. Or would it?

Serializable's version mechanism is to have classes define a static named serialVersionUID. This permits one to protect against incompatible changes, but does not easily permit one to implement back-compatibility. For that, the application must explicitly deal with versions. It must reason, in a class-specific manner, about the version that was written while reading, to decide what to do. But Serializeable's version mechanism does not support this any more or less than Writable.

See, for example, the "Design Considerations" section of:

http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-control_p.html

So, in summary, I don't think Serializable holds many advantages: it wouldn't substantially reduce the amount of code, and it wouldn't solve versioning.

Doug

Reply via email to