Hi all! I'm looking for details on major compaction. Some of my colleagues and I have been working on an iterator which we are attaching at major compaction scope. The logic of this iterator requires that it always see entire rows - ie. iterates over all KV entries which make up all versions of a given row. >From the Accumulo documentation, we had assumed this was guaranteed for major compactions since tablets are partitioned at row boundaries.
However, we are seeing some intermittent (and fairly rare) occurrences of incorrect behaviour from our iterator. Having reviewed and tested the iterator logic, we are quite confident it works as intended. Were we incorrect in thinking that only entire rows will take part in major compactions? Are there instances where a major compaction within a tablet will see only partial rows? On reviewing the documentation, it seems this *may *be possible when a major compaction is called to merge a subset of RFiles in a given tablet, but it's not very clear. Would anyone be able to clarify this for us? Issues with our iterator logic may also occur if reseeks are performed during a major compaction. However, from our reading of the available documentation, we got the impression that reseeks do not occur during major compaction and we can't see why they would be. Is this guaranteed or are there cases where a reseek may in fact be called during major compaction? Sorry for the long, involved questions but any clarification would help us greatly and be very appreciated :) Hope you all are having a good week, Bradley Barber
