Github user keith-turner commented on the issue:
https://github.com/apache/accumulo/pull/224
> @keith-turner we talked about this yesterday, but I wanted to post it
here. What would happen if a file is deleted, like maybe compacted and gc'd,
after the file list is grabbed?
@mjwall I had not thought of this case and currently have no handling for
it. Yet another win for code reviews.
I think the best solution to this problem is to introduce a new inaccuracy
counter called `deleted`. There are already a few inaccuracy counters reported
when gather summary information. I will add another comment that shows where
these can be found.
At first I thought I could circle back and use the file that replaced a
missing file. However this approach has a problem. Multiple deleted files
could have been compacted into the replacement file, and for some of those
deleted files we may have already gathered and merged summary information.
Trying to avoid this problem would make gathering summaries more expensive. In
order to keep gathering summaries fast, I think it would be best to just report
the problem. If someone really wants to avoid this problem, they can clone the
table and make the request against the clone. I can put this avoidance
strategy in the javadoc for `deleted`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---