cross-posting this from cdh-users group where it received little interest:

is there a bug in SequenceFile.sync()?  This is from cdh4.3.0:

    /** Seek to the next sync mark past a given position.*/
    public synchronized void sync(long position) throws IOException {
      if (position+SYNC_SIZE >= end) {
        seek(end);
        return;
      }

      if (position < headerEnd) {
        // seek directly to first record
        in.seek(headerEnd);                                         <====
should this not call seek (ie this.seek) instead?
        // note the sync marker "seen" in the header
        syncSeen = true;
        return;
      }

the problem is that when you sync to the start of a compressed file, the
noBufferedKeys and valuesDecompressed isn't reset so a block read isn't
triggered.  When you subsequently call next() you're potentially getting
keys from the buffer which still contains keys from the previous position
of the file.

Reply via email to