Thanks harsh: In any case, I'm really curious about how it is that sequence file headers are formatted, as the documentation in the SequenceFile javadocs seems to be very generic.
To make my questions more concrete: 1) I notice that the FileSplit class has a getStart() function. It is documented as returning the place to start "processing". Does that imply that a FileSplit does, or does not include a header? http://hadoop.apache.org/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html#getStart%28%29 2) Also, Its not clear to me that how compression and serialization are related. These are two inticrately coupled aspects of HDFS file writing, and im not sure what the idiom for coordinating the compression of records to the deserialization is.