On 12/07/2010 10:25 AM, Owen O'Malley wrote:
The new code reads the new or old versions of SequenceFile seamlessly
using auto-detection of the version. The old code fails with an explicit
message saying that it can't read this version. This is the only
mechanism available when upgrading a file format with a single version
number and is the mechanism that we've used 6 times in the past.

The last such change was nearly four years ago, in:

https://issues.apache.org/jira/browse/HADOOP-732

The quantity of data stored in SequenceFiles has greatly increased over the past four years. The project's concern for compatibility has also correspondingly increased over that time.

The new format version might not be written when folks are using Writable or some other serialization currently supported by SequenceFile. The only situation in your patch where the new version is required is for Avro. You might simply drop support for Avro and leave the file version number alone since Avro already includes a container file format. Or you might only use the new format version for non-class-determined serializations like Avro. Or you might use SequenceFile's existing metadata for non-class-determined serializations like Avro and leave the file version number alone.

Doug

Reply via email to