HuangZhenQiu commented on a change in pull request #8952: URL: https://github.com/apache/flink/pull/8952#discussion_r548939375
########## File path: flink-core/src/main/java/org/apache/flink/configuration/ResourceManagerOptions.java ########## @@ -67,6 +69,33 @@ "for streaming workloads, which may fail if there are not enough slots. Note that this configuration option does not take " + "effect for standalone clusters, where how many slots are allocated is not controlled by Flink."); + /** + * Defines the maximum number of worker (YARN / Mesos / Kubernetes) failures per minute before rejecting subsequent worker + * requests until the failure rate falls below the maximum. It is to quickly catch external dependency caused + * workers failure and wait for retry interval before sending new request. By default, the value is set to 10/min. + */ + public static final ConfigOption<Double> MAXIMUM_WORKERS_FAILURE_RATE = ConfigOptions + .key("resourcemanager.start-worker.max-failure-rate") + .doubleType() + .defaultValue(10.0) + .withDescription("Defines the maximum number of worker (YARN / Mesos) failures per minute before rejecting" + + " subsequent worker requests until the failure rate falls below the maximum. It is to quickly catch" + + " external dependency caused workers failure and terminate job accordingly." + + " By default, the value is set to 10/min."); + + /** + * Defines the worker creation interval in milliseconds. In case of worker creation failures, we should wait for an interval before + * trying to create new workers when the failure rate exceeds. Otherwise, ActiveResourceManager will always re-requesting + * the worker, which keeps the main thread busy. + */ + public static final ConfigOption<Duration> WORKER_CREATION_RETRY_INTERVAL = ConfigOptions + .key("resourcemanager.start-worker.retry-interval") + .durationType() + .defaultValue(Duration.ofMillis(30)) Review comment: Updated to 3 seconds as the default value of original Kubernetes interval. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org