dlmarion commented on code in PR #5620:
URL: https://github.com/apache/accumulo/pull/5620#discussion_r2132101785
##########
core/src/main/java/org/apache/accumulo/core/spi/compaction/DefaultCompactionPlanner.java:
##########
@@ -124,8 +124,9 @@
* </ol>
* For example, given a tablet with 20 files, and table.file.max is 15 and no
compactions are
* planned. If the compaction ratio is set to 3, then this plugin will find
the largest compaction
- * ratio less than 3 that results in a compaction.
- *
+ * ratio less than 3 that results in a compaction. The lowest compaction ratio
that will be
+ * considered in this search defaults to 1.1. Starting in 2.1.4, thw lower
bound for the search can
Review Comment:
```suggestion
* considered in this search defaults to 1.1. Starting in 2.1.4, the lower
bound for the search can
```
##########
core/src/main/java/org/apache/accumulo/core/spi/compaction/DefaultCompactionPlanner.java:
##########
@@ -368,52 +374,49 @@ static int
getMaxTabletFiles(ServiceEnvironment.Configuration configuration) {
*/
private Collection<CompactableFile>
findFilesToCompactWithLowerRatio(PlanningParameters params,
long maxSizeToCompact, int maxTabletFiles) {
- double lowRatio = 1.0;
- double highRatio = params.getRatio();
-
- Preconditions.checkArgument(highRatio >= lowRatio);
var candidates = Set.copyOf(params.getCandidates());
- Collection<CompactableFile> found = Set.of();
-
- int goalCompactionSize = candidates.size() - maxTabletFiles + 1;
- if (goalCompactionSize > maxFilesToCompact) {
- // The tablet is way over max tablet files, so multiple compactions will
be needed. Therefore,
- // do not set a goal size for this compaction and find the largest
compaction ratio that will
- // compact some set of files.
- goalCompactionSize = 0;
- }
-
- // Do a binary search of the compaction ratios.
- while (highRatio - lowRatio > .1) {
- double ratioToCheck = (highRatio - lowRatio) / 2 + lowRatio;
-
- // This is continually resorting the list of files in the following
call, could optimize this
- var filesToCompact =
- findDataFilesToCompact(candidates, ratioToCheck, maxFilesToCompact,
maxSizeToCompact);
-
- log.trace("Tried ratio {} and found {} {} {}", ratioToCheck,
filesToCompact,
- filesToCompact.size() >= goalCompactionSize, goalCompactionSize);
+ List<CompactableFile> sortedFiles = sortAndLimitByMaxSize(candidates,
maxSizeToCompact);
+
+ List<CompactableFile> found = List.of();
+ double largestRatioSeen = Double.MIN_VALUE;
+
+ if (sortedFiles.size() > 1) {
+ int windowStart = 0;
+ int windowEnd = Math.min(sortedFiles.size(), maxFilesToCompact);
+
+ while (windowEnd <= sortedFiles.size()) {
+ var filesInWindow = sortedFiles.subList(windowStart, windowEnd);
+
+ long sum = filesInWindow.get(0).getEstimatedSize();
+ for (int i = 1; i < filesInWindow.size(); i++) {
+ long size = filesInWindow.get(i).getEstimatedSize();
+ sum += size;
+ // This is the compaction ratio needed to compact these files
+ double neededCompactionRatio = sum / (double) size;
Review Comment:
Estimated size for a bulk imported file is zero, right? This could cause
division by zero error?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]