cross-posting this from cdh-users group where it received little interest: is there a bug in SequenceFile.sync()? This is from cdh4.3.0:
/** Seek to the next sync mark past a given position.*/ public synchronized void sync(long position) throws IOException { if (position+SYNC_SIZE >= end) { seek(end); return; } if (position < headerEnd) { // seek directly to first record in.seek(headerEnd); <==== should this not call seek (ie this.seek) instead? // note the sync marker "seen" in the header syncSeen = true; return; } the problem is that when you sync to the start of a compressed file, the noBufferedKeys and valuesDecompressed isn't reset so a block read isn't triggered. When you subsequently call next() you're potentially getting keys from the buffer which still contains keys from the previous position of the file.