kfaraz commented on code in PR #16334:
URL: https://github.com/apache/druid/pull/16334#discussion_r1580415217


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/autoscaler/LagBasedAutoScalerConfig.java:
##########
@@ -61,7 +62,8 @@ public LagBasedAutoScalerConfig(
           @Nullable @JsonProperty("scaleInStep") Integer scaleInStep,
           @Nullable @JsonProperty("scaleOutStep") Integer scaleOutStep,
           @Nullable @JsonProperty("enableTaskAutoScaler") Boolean 
enableTaskAutoScaler,
-          @Nullable @JsonProperty("minTriggerScaleActionFrequencyMillis") Long 
minTriggerScaleActionFrequencyMillis
+          @Nullable @JsonProperty("minTriggerScaleActionFrequencyMillis") Long 
minTriggerScaleActionFrequencyMillis,
+          @Nullable @JsonProperty("useDefaultTotalLag") Boolean 
useDefaultTotalLag

Review Comment:
   I agree that the name should be `lagStatsType` or if you just want to use 
max, then `useMaxLag`. I don't mind it being an enum, my only concern is that 
it is limiting. What do we do if we want something outside of total/max/avg?
   
   The approach in this PR is not exactly the same as the original PR, because 
here we drive the decision by a config and not a method on the `Supervisor` 
itself.
   
   I would prefer it we could make the following changes in this PR:
   1. If we intend to drive the decision via a config, then let's get rid of 
the method `computeLagForAutoScaler()` altogether.
   2. Add a new field `preferredScalingMetric` to `LagStats` (returned by 
`computeLagStats()`). This field can take the same enum values as the new 
config. For now, this value will be `TOTAL` for both Kinesis and Kafka. After 
we have done some validation, the `KinesisSupervisor.computeLagStats()` would 
return a `LagStats` with `preferredScalingMetric = MAX`.
   3. The `LagBasedAutoScaler` will ultimately decide which metric to use. The 
new config will essentially override the `preferredScalingMetric`. If the 
config is not specified, we use the `preferredScalingMetric` returned by the 
`computeLagStats` method.
   4. Have a plan on how we will eventually get rid of the new config or 
atleast compute its default value better, because a single default value of 
this config is not going to work for all kinds of supervisors. We wouldn't want 
users to have to specify this config in every supervisor spec. I think step 2 
above will help with this.
   
   ---
   
   My concerns with the original approach were:
   1. It was computing the lag in a roundabout manner. Supervisor needed to 
expose two methods `computeLagStats` and `getLagMetric`. Then the auto-scaler 
had to use the result of these two methods to determine the actual lag. If it 
is the Supervisor which drives this computation, the Supervisor might as well 
have a single method that returns the required lag value.
   2. Enum is limited to TOTAL, MAX, AVG. Having a separate method for 
`computeLagForAutoScaler` allows us to keep the implementation more flexible. I 
don't know when we would need a value outside of total / max / avg though, 
perhaps if we wanted a normalized / truncated value of one of these metrics.
   3. I didn't fully like the idea of adding a new method 
`computeLagForAutoScaler` either because the method `computeLagStats` itself 
was meant only to be used by the auto-scaler. Why have one more method? I 
suggested that approach as I didn't have a better alternative at that time 
which could move things along faster.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to