Re: [VOTE] Direction for Hadoop development

Doug Cutting Wed, 08 Dec 2010 11:21:10 -0800

On 12/07/2010 10:25 AM, Owen O'Malley wrote:

The new code reads the new or old versions of SequenceFile seamlessly
using auto-detection of the version. The old code fails with an explicit
message saying that it can't read this version. This is the only
mechanism available when upgrading a file format with a single version
number and is the mechanism that we've used 6 times in the past.


The last such change was nearly four years ago, in:

https://issues.apache.org/jira/browse/HADOOP-732

The quantity of data stored in SequenceFiles has greatly increased overthe past four years. The project's concern for compatibility has alsocorrespondingly increased over that time.

The new format version might not be written when folks are usingWritable or some other serialization currently supported bySequenceFile. The only situation in your patch where the new version isrequired is for Avro. You might simply drop support for Avro and leavethe file version number alone since Avro already includes a containerfile format. Or you might only use the new format version fornon-class-determined serializations like Avro. Or you might useSequenceFile's existing metadata for non-class-determined serializationslike Avro and leave the file version number alone.


Doug

Re: [VOTE] Direction for Hadoop development

Reply via email to