Having an interesting issue fixing up a technical point in the class SequenceFileVectorIterable. Its Iterator is wrong in that hasNext() advances the iteration and next() doesn't. There's a way to fix this easily: the Iterator just needs to always read one item ahead to know whether a next one exists.
However doing this the straightforward way, the Iterator doesn't know the current key/value it's on -- always the next one. This is an issue since in the one usage of this class, in VectorDumper, the current key is accessed for printing. The Iterator could just store the last key/value it saw. However, the key can be an arbitrary Writable. To do this correctly, the Writable would have to be Cloneable and be clone()-ed, which is not guaranteed and maybe undesirable. 1. We can remove the option in VectorDumper to print keys to fix this, since that's the only thing that wants to read the current key. How bad is that? 2. We can iterate directly over the SequenceFile in VectorDumper to get desired behavior. OK? Then actually SequenceFileVectorIterable goes away. Sean
