On Dec 7, 2010, at 9:26 AM, Doug Cutting wrote:
On 12/07/2010 08:12 AM, Arun C Murthy wrote:
Blocking extensions to SequenceFile is unreasonable as has been
noted by
several folks, there is no *technical* reason to do that.
The change to SequenceFile is incompatible with older versions of
Hadoop. It changes the file's version number so that older versions
will not be able to read data written by newer versions. This is a
technical issue.
The new code reads the new or old versions of SequenceFile seamlessly
using auto-detection of the version. The old code fails with an
explicit message saying that it can't read this version. This is the
only mechanism available when upgrading a file format with a single
version number and is the mechanism that we've used 6 times in the past.
If we'd used ProtocolBuffers for the SequenceFile header, we'd have
more options for backwards compatibility, but we didn't.
-- Owen