keith-turner opened a new pull request, #5620: URL: https://github.com/apache/accumulo/pull/5620
When a tablet has too many files and its not compacting with the configured compaction ratio the planner would search for the highest ratio that would cause a compaction. Encountered a problem with this search were two constraints in the code would cause nothing to be found. One constraint was a check to only accept compactions that satisfied `filesToCompact.size() < goalCompactionSize`. The other constraint was that `DefaultCompactionPlanner.findDataFilesToCompact()` would find the largest set of small files to compact. Putting these two together in some cases `findDataFilesToCompact()` would keep finding a set of files smaller than `goalCompactionSize` and log a warning that it could not find anything. However the set it did find would have been good to compact. Looking into fixing this came to the conclusion that the behavior of `findDataFilesToCompact()` where it finds the largest set of small files to compact is probably not optimal when varying the compaction ratio. Also its probably best to not set a minimum size to compact, its ok if it takes multiple compactions to resolve the issue. Changed the code to find the files with highest ratio and dropped those two other constraints. Added a test that causes the two constraints to conflict and this new test will not pass with the old code. Adjusted some of the existing test because of the change in behavior. Added a new config option to limit the minimum compaction ratio of the search. Defaulted this to 1.1 which may be extreme, not sure. Files with a difference in size of 10x will have a ratio of 1.1. This is roughly the minimum ratio that the old code would search. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
