kfaraz commented on code in PR #16334:
URL: https://github.com/apache/druid/pull/16334#discussion_r1580415217
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/LagBasedAutoScalerConfig.java:
##########
@@ -61,7 +62,8 @@ public LagBasedAutoScalerConfig(
@Nullable @JsonProperty("scaleInStep") Integer scaleInStep,
@Nullable @JsonProperty("scaleOutStep") Integer scaleOutStep,
@Nullable @JsonProperty("enableTaskAutoScaler") Boolean
enableTaskAutoScaler,
- @Nullable @JsonProperty("minTriggerScaleActionFrequencyMillis") Long
minTriggerScaleActionFrequencyMillis
+ @Nullable @JsonProperty("minTriggerScaleActionFrequencyMillis") Long
minTriggerScaleActionFrequencyMillis,
+ @Nullable @JsonProperty("useDefaultTotalLag") Boolean
useDefaultTotalLag
Review Comment:
I agree that the name should be `lagStatsType` or if you just want to use
max, then `useMaxLag`. I don't mind it being an enum, my only concern is that
it is limiting. What do we do if we want something outside of total/max/avg?
The approach in this PR is not exactly the same as the original PR, because
here we drive the decision by a config and not a method on the `Supervisor`
itself.
I would prefer it we could make the following changes in this PR:
1. If we intend to drive the decision via a config, then let's get rid of
the method `computeLagForAutoScaler()` altogether.
2. Add a new field `preferredScalingMetric` to `LagStats` (returned by
`computeLagStats()`). This field can take the same enum values as the new
config. For now, this value will be `TOTAL` for both Kinesis and Kafka. After
we have done some validation, the `KinesisSupervisor.computeLagStats()` would
return a `LagStats` with `preferredScalingMetric = MAX`.
3. The `LagBasedAutoScaler` will ultimately decide which metric to use. The
new config will essentially override the `preferredScalingMetric`. If the
config is not specified, we use the `preferredScalingMetric` returned by the
`computeLagStats` method.
4. Have a plan on how we will eventually get rid of the new config or
atleast compute its default value better, because a single default value of
this config is not going to work for all kinds of supervisors. We wouldn't want
users to have to specify this config in every supervisor spec. I think step 2
above will help with this.
---
My concerns with the original approach were:
1. It was computing the lag in a roundabout manner. Supervisor needed to
expose two methods `computeLagStats` and `getLagMetric`. Then the auto-scaler
had to use the result of these two methods to determine the actual lag. If it
is the Supervisor which drives this computation, the Supervisor might as well
have a single method that returns the required lag value.
2. Enum is limited to TOTAL, MAX, AVG. Having a separate method for
`computeLagForAutoScaler` allows us to keep the implementation more flexible. I
don't know when we would need a value outside of total / max / avg though,
perhaps if we wanted a normalized / truncated value of one of these metrics.
3. I didn't fully like the idea of adding a new method
`computeLagForAutoScaler` either because the method `computeLagStats` itself
was meant only to be used by the auto-scaler. Why have one more method? I
suggested that approach as I didn't have a better alternative at that time
which could move things along faster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]