Having an interesting issue fixing up a technical point in the class
SequenceFileVectorIterable. Its Iterator is wrong in that hasNext()
advances the iteration and next() doesn't. There's a way to fix this
easily: the Iterator just needs to always read one item ahead to know
whether a next one exists.

However doing this the straightforward way, the Iterator doesn't know
the current key/value it's on -- always the next one. This is an issue
since in the one usage of this class, in VectorDumper, the current key
is accessed for printing.

The Iterator could just store the last key/value it saw. However, the
key can be an arbitrary Writable. To do this correctly, the Writable
would have to be Cloneable and be clone()-ed, which is not guaranteed
and maybe undesirable.

1. We can remove the option in VectorDumper to print keys to fix this,
since that's the only thing that wants to read the current key. How
bad is that?
2. We can iterate directly over the SequenceFile in VectorDumper to
get desired behavior. OK? Then actually SequenceFileVectorIterable
goes away.

Sean

Reply via email to