On Dec 7, 2010, at 2:37 PM, Roy T. Fielding wrote:
The proposal is to change the extension mechanism incompatibly with
unclear benefits,
Good, these are technical reasons. The benefits can be cleared by
docs.
By incompatible, I assume you mean forward-compatibility of old
versions
of Hadoop reading newer files. Can we fix that by having the new
implementation use the old file format by default until it is
configured
to use one of the new interfaces for writing?
There are two goals here. The first is to extend the serialization
plugin interface. The current patch does things completely compatibly
including a shim that will use the previous plugins to satisfy the new
API. The benefits are also clear. Avro serialization is possible when
it wasn't previously. It also provides a wide range of opportunities
that weren't previously possible.
The file format was changed as a demonstration that the serialization
interface was useful and complete. The file change is also backwards
compatible and will automatically read old versions of the file. Old
versions of the code will complain with an error message if they are
given a new version. This is exactly the pattern we have used in the
past.
So, no there are no technical issues with the patch as it stands.
You keep referring to the kernel as if it were a product. I don't see
a kernel product in the list of things released by Apache Hadoop.
The kernel is a very loosely defined concept. Utilities that are
currently used by the framework are "kernel" others are just used by
the users. Some classes are clearly kernel and some are clearly
library, but there are some such as BooleanWritable that aren't
obvious. It would take a fair amount of work and likely some
duplication to segregate out the library code. I also worry that
creating such a project would make Hadoop less useful out of the box
and decrease the value of the Apache release of Hadoop.
But back to the original point. Doug's (and Tom's) veto was based on:
1. Modification to SequenceFile.
2. It introduces a dependence on Protocol Buffers.
There was strong consensus that SequenceFile was required and should
be updated as the framework evolves. The second is not a technical
reason. I believe that the entire veto should be considered invalid.
-- Owen