Hi all!

I'm looking for details on major compaction. Some of my colleagues and I
have been working on an iterator which we are attaching at major compaction
scope. The logic of this iterator requires that it always see entire rows -
ie. iterates over all KV entries which make up all versions of a given row.
>From the Accumulo documentation, we had assumed this was guaranteed for
major compactions since tablets are partitioned at row boundaries.

However, we are seeing some intermittent (and fairly rare) occurrences of
incorrect behaviour from our iterator. Having reviewed and tested the
iterator logic, we are quite confident it works as intended. Were we
incorrect in thinking that only entire rows will take part in major
compactions? Are there instances where a major compaction within a tablet
will see only partial rows? On reviewing the documentation, it seems this *may
*be possible when a major compaction is called to merge a subset of RFiles
in a given tablet, but it's not very clear. Would anyone be able to clarify
this for us?

Issues with our iterator logic may also occur if reseeks are performed
during a major compaction. However, from our reading of the available
documentation, we got the impression that reseeks do not occur during major
compaction and we can't see why they would be. Is this guaranteed or are
there cases where a reseek may in fact be called during major compaction?

Sorry for the long, involved questions but any clarification would help us
greatly and be very appreciated :)

Hope you all are having a good week,
Bradley Barber

Reply via email to