1996fanrui commented on code in PR #726:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/726#discussion_r1422419813


##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricEvaluator.java:
##########
@@ -297,6 +301,28 @@ private void computeTargetDataRate(
         }
     }
 
+    @VisibleForTesting
+    protected static Map<ScalingMetric, EvaluatedScalingMetric> 
evaluateGlobalMetrics(
+            SortedMap<Instant, CollectedMetrics> metricHistory) {
+        var latest = 
metricHistory.get(metricHistory.lastKey()).getGlobalMetrics();
+        var out = new HashMap<ScalingMetric, EvaluatedScalingMetric>();
+
+        var gcPressure = latest.getOrDefault(GC_PRESSURE, Double.NaN);
+        var lastHeapUsage = latest.getOrDefault(HEAP_USAGE, Double.NaN);
+
+        out.put(GC_PRESSURE, EvaluatedScalingMetric.of(gcPressure));
+        out.put(
+                HEAP_USAGE,
+                new EvaluatedScalingMetric(
+                        lastHeapUsage, getAverageGlobalMetric(HEAP_USAGE, 
metricHistory)));
+        return out;
+    }
+
+    private static double getAverageGlobalMetric(
+            ScalingMetric metric, SortedMap<Instant, CollectedMetrics> 
metricsHistory) {
+        return getAverage(metric, null, metricsHistory);
+    }
+
     public static double getAverage(
             ScalingMetric metric,
             JobVertexID jobVertexId,

Review Comment:
   ```suggestion
               @Nullable JobVertexID jobVertexId,
   ```
   
   Could we add the `@Nullable`and add some comments here? When it's null, it's 
`global metrics`.



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/config/AutoScalerOptions.java:
##########
@@ -222,6 +222,18 @@ private static ConfigOptions.OptionBuilder 
autoScalerConfig(String key) {
                     .withDescription(
                             "Processing rate increase threshold for detecting 
ineffective scaling threshold. 0.1 means if we do not accomplish at least 10% 
of the desired capacity increase with scaling, the action is marked 
ineffective.");
 
+    public static final ConfigOption<Double> GC_PRESSURE_THRESHOLD =
+            autoScalerConfig("memory.gc-pressure.threshold")
+                    .doubleType()
+                    .defaultValue(0.3)
+                    .withDescription("Max allowed GC pressure during scaling 
operations");
+
+    public static final ConfigOption<Double> HEAP_USAGE_THRESHOLD =
+            autoScalerConfig("memory.heap-usage.threshold")
+                    .doubleType()
+                    .defaultValue(0.9)

Review Comment:
   I have 2 questions for this autoscaler rule:
   
   1. Does high heap usage indicate insufficient memory?
       -  When the GC is severe, the memory is indeed not enough. 
       - But when the GC time is very low, and the heap usage is high, 
taskManagers might work well, right? (The memory may be just enough). 
   2. Is the insufficient memory caused by rescale down?
       - The GC is fine before rescaling, but the busy ratio is very low, so 
autoscaler scale down this job.
       - But the GC is severe after rescale down.
       - Could we revert this rescaling?
   



##########
flink-autoscaler/src/main/java/org/apache/flink/autoscaler/metrics/ScalingMetric.java:
##########
@@ -65,7 +65,16 @@ public enum ScalingMetric {
     SCALE_DOWN_RATE_THRESHOLD(false),
 
     /** Expected true processing rate after scale up. */
-    EXPECTED_PROCESSING_RATE(false);
+    EXPECTED_PROCESSING_RATE(false),
+
+    /**
+     * Maximum GC pressure across taskmanagers. Percentage of time spent 
garbage collecting between
+     * 0 (no time in GC) and 1 (100% time in GC).
+     */
+    GC_PRESSURE(false),
+
+    /** Percentage of max heap used (between 0 and 1). */
+    HEAP_USAGE(true);

Review Comment:
   I'm not sure should we add a field here to distinguish each metric is vertex 
level or tm level?
   
   We register some metrics at `AutoscalerFlinkMetrics#registerScalingMetrics`, 
and we register all `ScalingMetric` types for all jobVertices even if it's NaN. 
   
   It's obviously we don't need to register Memory related metrics for 
jobVertices.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to