Colin Patrick McCabe created HADOOP-9667: --------------------------------------------
Summary: SequenceFile: Reset keys and values when syncing to a place before the header Key: HADOOP-9667 URL: https://issues.apache.org/jira/browse/HADOOP-9667 Project: Hadoop Common Issue Type: Bug Reporter: Colin Patrick McCabe Priority: Minor There seems to be a bug in the {{SequenceFile#sync}} function. Thanks to Christopher Ng for this report: {code} /** Seek to the next sync mark past a given position.*/ public synchronized void sync(long position) throws IOException { if (position+SYNC_SIZE >= end) { seek(end); return; } if (position < headerEnd) { // seek directly to first record in.seek(headerEnd); <==== should this not call seek (ie this.seek) instead? // note the sync marker "seen" in the header syncSeen = true; return; } {code} the problem is that when you sync to the start of a compressed file, the noBufferedKeys and valuesDecompressed isn't reset so a block read isn't triggered. When you subsequently call next() you're potentially getting keys from the buffer which still contains keys from the previous position of the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira