keith-turner opened a new pull request, #5620:
URL: https://github.com/apache/accumulo/pull/5620

   When a tablet has too many files and its not compacting with the configured 
compaction ratio the planner would search for the highest ratio that would 
cause a compaction.  Encountered a problem with this search were two 
constraints in the code would cause nothing to be found. One constraint was a 
check to only accept compactions that satisfied `filesToCompact.size() < 
goalCompactionSize`.  The other constraint was that 
`DefaultCompactionPlanner.findDataFilesToCompact()` would find the largest set 
of small files to compact. Putting these two together in some cases 
`findDataFilesToCompact()` would keep finding a set of files smaller than 
`goalCompactionSize` and log a warning that it could not find anything.  
However the set it did find would have been good to compact.
   
   Looking into fixing this came to the conclusion that the behavior of 
`findDataFilesToCompact()` where it finds the largest set of small files to 
compact is probably not optimal when varying the compaction ratio. Also its 
probably best to not set a minimum size to compact, its ok if it takes multiple 
compactions to resolve the issue.  Changed the code to find the files with 
highest ratio and dropped those two other constraints.
   
   Added a test that causes the two constraints to conflict and this new test 
will not pass with the old code.  Adjusted some of the existing test because of 
the change in behavior.
   
   Added a new config option to limit the minimum compaction ratio of the 
search. Defaulted this to 1.1 which may be extreme, not sure.  Files with a 
difference in size of 10x will have a ratio of 1.1.  This is roughly the 
minimum ratio that the old code would search.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to