kfaraz commented on code in PR #17770:
URL: https://github.com/apache/druid/pull/17770#discussion_r1977751659
##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -273,9 +273,17 @@ private Interval findIntervalForKill(
)
{
final DateTime minStartTime =
datasourceToLastKillIntervalEnd.get(dataSource);
+
+ // Once the first segment from a datasource is killed, we have a valid
minStartTime.
+ // Restricting the upper bound to scan segments metadata while running the
kill task results in a efficient SQL query.
+ final DateTime maxIntervalFromLastKill = minStartTime != null
+ ?
DateTimes.min(minStartTime.plus(maxIntervalToKill),
+
DateTimes.nowUtc().minus(durationToRetain))
+ :
DateTimes.nowUtc().minus(durationToRetain);
+
Review Comment:
It's probably cleaner to just initialize the `maxEndTime` directly like so:
```suggestion
final DateTime maxEndTime;
if (ignoreDurationToRetain) {
maxEndTime = DateTimes.COMPARE_DATE_AS_STRING_MAX;
} else if (minStartTime == null) {
maxEndTime = DateTimes.nowUtc().minus(durationToRetain);
} else {
// If we have already killed a segment, limit the kill interval
based on the minStartTime
maxEndTime = DateTimes.min(
DateTimes.nowUtc().minus(durationToRetain),
minStartTime.plus(maxIntervalToKill)
);
}
= minStartTime != null
?
DateTimes.min(minStartTime.plus(maxIntervalToKill),
DateTimes.nowUtc().minus(durationToRetain))
:
DateTimes.nowUtc().minus(durationToRetain);
```
##########
server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java:
##########
@@ -273,9 +273,17 @@ private Interval findIntervalForKill(
)
{
final DateTime minStartTime =
datasourceToLastKillIntervalEnd.get(dataSource);
+
+ // Once the first segment from a datasource is killed, we have a valid
minStartTime.
+ // Restricting the upper bound to scan segments metadata while running the
kill task results in a efficient SQL query.
+ final DateTime maxIntervalFromLastKill = minStartTime != null
+ ?
DateTimes.min(minStartTime.plus(maxIntervalToKill),
+
DateTimes.nowUtc().minus(durationToRetain))
+ :
DateTimes.nowUtc().minus(durationToRetain);
+
final DateTime maxEndTime = ignoreDurationToRetain
Review Comment:
You may remove this assignment if you use the code suggestion above.
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill()
validateLastKillStateAndReset(DS1, YEAR_OLD);
}
+ @Test
+ public void testRestrictKillQueryToMaxInterval()
+ {
+ configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration())
+ .withMaxIntervalToKill(MAX_KILL_INTERVAL);
+
+ initDuty();
+
+ createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29));
Review Comment:
```suggestion
// For a new datasource, the duration to retain is used to determine
kill interval
createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29));
```
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill()
validateLastKillStateAndReset(DS1, YEAR_OLD);
}
+ @Test
+ public void testRestrictKillQueryToMaxInterval()
Review Comment:
```suggestion
public void testMaxIntervalToKillOverridesDurationToRetain()
```
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -87,6 +88,9 @@ public class KillUnusedSegmentsTest
private static final String DS2 = "DS2";
private static final String DS3 = "DS3";
+ private static final Period RETAIN_DURATION = Period.hours(6);
Review Comment:
Non-blocker: Maybe also add a test where retain duration is bigger so that
it leads to smaller kill interval and thus is given precedence over the
`maxIntervalToKill`.
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill()
validateLastKillStateAndReset(DS1, YEAR_OLD);
}
+ @Test
+ public void testRestrictKillQueryToMaxInterval()
+ {
+ configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration())
+ .withMaxIntervalToKill(MAX_KILL_INTERVAL);
Review Comment:
```suggestion
configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration())
.withMaxIntervalToKill(MAX_KILL_INTERVAL);
```
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill()
validateLastKillStateAndReset(DS1, YEAR_OLD);
}
+ @Test
+ public void testRestrictKillQueryToMaxInterval()
+ {
+ configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration())
+ .withMaxIntervalToKill(MAX_KILL_INTERVAL);
Review Comment:
Nit: The fields `RETAIN_DURATION` and `MAX_KILL_INTERVAL` probably don't
need to be constants as they are only used in the new test. Not using constants
here might make the test more readable.
##########
server/src/test/java/org/apache/druid/server/coordinator/duty/KillUnusedSegmentsTest.java:
##########
@@ -604,6 +608,29 @@ public void testLowerMaxIntervalToKill()
validateLastKillStateAndReset(DS1, YEAR_OLD);
}
+ @Test
+ public void testRestrictKillQueryToMaxInterval()
+ {
+ configBuilder.withDurationToRetain(RETAIN_DURATION.toStandardDuration())
+ .withMaxIntervalToKill(MAX_KILL_INTERVAL);
+
+ initDuty();
+
+ createAndAddUnusedSegment(DS1, MONTH_OLD, VERSION, NOW.minusDays(29));
+ CoordinatorRunStats newDatasourceStats = runDutyAndGetStats();
+
+ Assert.assertEquals(1,
newDatasourceStats.get(Stats.Kill.ELIGIBLE_UNUSED_SEGMENTS, DS1_STAT_KEY));
+ validateLastKillStateAndReset(DS1, MONTH_OLD);
+
+ createAndAddUnusedSegment(DS1, FIFTEEN_DAY_OLD, VERSION,
NOW.minusDays(14));
Review Comment:
```suggestion
// For a datasource where kill has already happened, maxIntervalToKill
is used
// if it leads to a smaller kill interval than durationToRetain
createAndAddUnusedSegment(DS1, FIFTEEN_DAY_OLD, VERSION,
NOW.minusDays(14));
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]