tolbertam commented on code in PR #4126:
URL: https://github.com/apache/cassandra/pull/4126#discussion_r2296385350
##########
src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:
##########
@@ -484,8 +498,8 @@ private void cleanupAndUpdateStats(RepairTurn turn,
AutoRepairConfig.RepairType
TimeUnit.SECONDS.toDays(repairState.getClusterRepairTimeInSec()));
}
repairState.setLastRepairTime(timeFunc.get());
-
repairState.setRepairInProgress(false);
Review Comment:
Should we also reset ToRepair/AlreadyRepaired metrics here when we are done
with repairs? Otherwise these metrics will be set to their completed values
until the next repair, although that may be desirable, as it'll be more evident
how much data was covered in the last repair, so maybe its better the way it
currently is.
##########
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java:
##########
@@ -486,7 +487,7 @@ static SizedRepairAssignment
merge(List<SizedRepairAssignment> assignments)
@VisibleForTesting
protected List<SizedRepairAssignment> getRepairAssignmentsForTable(String
keyspaceName, String tableName, Range<Token> tokenRange)
{
- List<SizeEstimate> sizeEstimates = getRangeSizeEstimate(keyspaceName,
tableName, tokenRange);
+ List<AutoRepairUtils.SizeEstimate> sizeEstimates =
AutoRepairUtils.getRangeSizeEstimate(repairType, keyspaceName, tableName,
tokenRange);
Review Comment:
One thing we could maybe improve on here is that we already do this in
`calcTotalBytesToBeRepaired` it looks like `KeyspaceRepairPlan` has
`ksTablesEstimatedBytes`, I think we could just reuse that here to prevent
touching the disk again to recalculate this data.
There is only one downside to this, in that since we're now doing the
evaluation at the start, if the amount of data change between issuing repairs,
this may skew things a bit, but I think its an acceptable tradeoff.
##########
src/java/org/apache/cassandra/repair/autorepair/AutoRepairUtils.java:
##########
@@ -1186,4 +1196,197 @@ public static Collection<Range<Token>>
split(Range<Token> tokenRange, int number
}
return ranges;
}
+
Review Comment:
Nice that we are able to reuse this code :)
##########
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java:
##########
@@ -224,6 +215,16 @@ public Iterator<KeyspaceRepairAssignments>
getRepairAssignments(boolean primaryR
return new BytesBasedRepairAssignmentIterator(primaryRangeOnly,
repairPlans);
}
+ @Override
+ public long adjustTotalBytesToRepair(long totalBytesToRepair)
+ {
+ if (totalBytesToRepair > maxBytesPerSchedule.toBytes())
+ {
+ return maxBytesPerSchedule.toBytes();
+ }
+ return totalBytesToRepair;
Review Comment:
Can be simplified using `Math.min()`, e.g.:
```suggestion
return Math.min(totalBytesToRepair, maxBytesPerSchedule.toBytes());
```
##########
src/java/org/apache/cassandra/repair/autorepair/IAutoRepairTokenRangeSplitter.java:
##########
@@ -49,6 +49,27 @@ public interface IAutoRepairTokenRangeSplitter
*/
Iterator<KeyspaceRepairAssignments> getRepairAssignments(boolean
primaryRangeOnly, List<PrioritizedRepairPlan> repairPlans);
+ /**
+ * Adjusts the total bytes to be repaired within a schedule.
+ * <p>
+ * There are two splitter implementations supported:
+ * <ol>
+ * <li>{@link FixedSplitTokenRangeSplitter} — No upper cap on the number
of bytes to be repaired. It repairs all
+ * qualified bytes on this node.</li>
+ * <li>{@link RepairTokenRangeSplitter} — Applies an upper cap on the
maximum number of bytes to be repaired per
+ * schedule.</li>
+ * </ol>
+ *
+ * @param totalBytesToRepair the total bytes available to be repaired
+ * @return the adjusted {@code totalBytesToRepair} based on the
implementation; for
+ * {@link FixedSplitTokenRangeSplitter}, the output will be the
same as the input, whereas for
+ * {@link RepairTokenRangeSplitter}, the output may be capped
based on the maximum allowed bytes
+ * per schedule.
+ */
+ default long adjustTotalBytesToRepair(long totalBytesToRepair)
Review Comment:
It doesn't look like this is currently being used currently.
##########
src/java/org/apache/cassandra/repair/autorepair/FixedSplitTokenRangeSplitter.java:
##########
@@ -117,20 +107,24 @@ private KeyspaceRepairAssignments
getRepairAssignmentsForKeyspace(boolean primar
if (byKeyspace)
{
+ long totalBytes = repairPlan.getEstimatedBytes();
+ long bytesPerRange = Math.max(1, totalBytes / splitsPerRange);
Review Comment:
Approximating bytes per range as a function of `totalBytes / splitsPerRange`
can work, but it could also be pretty inaccurate if there are wide partitions.
I think its fine but probably just need to document it.
##########
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java:
##########
@@ -224,6 +215,16 @@ public Iterator<KeyspaceRepairAssignments>
getRepairAssignments(boolean primaryR
return new BytesBasedRepairAssignmentIterator(primaryRangeOnly,
repairPlans);
}
+ @Override
+ public long adjustTotalBytesToRepair(long totalBytesToRepair)
+ {
+ if (totalBytesToRepair > maxBytesPerSchedule.toBytes())
+ {
+ return maxBytesPerSchedule.toBytes();
+ }
+ return totalBytesToRepair;
Review Comment:
One concern here is that `maxBytesPerSchedule` is an upper bound, but not
actually how many bytes may be repaired in a schedule. I think if we complete
our schedule plan, maybe we should just adjust the totalBytesToRepairMetric.
I suppose we may have the same problem with
`totalKeyspaceRepairPlansToRepair` as we may not actually repair all keyspaces.
##########
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java:
##########
@@ -808,8 +680,6 @@ public String toString()
protected static class SizedRepairAssignment extends RepairAssignment {
Review Comment:
Outside of description, this class is now the same as `RepairAssignment`,
wonder if we could just promote 'description' up to `RepairAssignment` as I
think it might be generally useful to annotate `RepairAssignment` with
descriptions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]