MapFile.Reader does not seek to first entry for multi-valued key
----------------------------------------------------------------
Key: HADOOP-6494
URL: https://issues.apache.org/jira/browse/HADOOP-6494
Project: Hadoop Common
Issue Type: Bug
Components: io
Reporter: Peter Spiro
Priority: Minor
When a MapFile contains a key with multiple entries and one of these entries
other than the first happens to be stored in the index, then the Reader's
seek() and get*() methods will generally not return the first entry, making it
impossible to retrieve all of the key's entries using next().
One easy solution would be to modify the Writer's append() method to only index
an entry if it's the first entry belonging to its key, e.g.:
public synchronized void append(WritableComparable key, Writable val)
throws IOException {
boolean equalsLastKey = (size != 0 && comparator.compare(lastKey, key) ==
0);
checkKey(key);
boolean largeEnoughInterval = size % indexInterval == 0;
if (largeEnoughInterval && !equalsLastKey) { // add an index
entry
position.set(data.getLength()); // point to current eof
index.append(key, position);
}
data.append(key, val); // append key/value to data
if (!largeEnoughInterval || !equalsLastKey)
size++;
}
(The size variable should then be renamed to something more accurate.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.