klsince commented on code in PR #11740:
URL: https://github.com/apache/pinot/pull/11740#discussion_r1353180549


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceConfig.java:
##########
@@ -84,6 +84,24 @@ public class RebalanceConfig {
   @ApiModelProperty(example = "false")
   private boolean _updateTargetTier = false;
 
+  // Update job status every this interval as heartbeat, to indicate the job 
is still actively running.
+  @JsonProperty("heartbeatIntervalInMs")
+  @ApiModelProperty(example = "300000")
+  private long _heartbeatIntervalInMs = 300000L;
+
+  // The job is considered as failed if not updating its status by this 
timeout, even though it's IN_PROGRESS status.
+  @JsonProperty("heartbeatTimeoutInMs")
+  @ApiModelProperty(example = "3600000")
+  private long _heartbeatTimeoutInMs = 3600000L;
+
+  @JsonProperty("maxRetry")
+  @ApiModelProperty(example = "3")
+  private int _maxRetry = 3;
+
+  @JsonProperty("retryInitialDelayInMs")
+  @ApiModelProperty(example = "300000")
+  private long _retryInitialDelayInMs = 300000L;

Review Comment:
   I think this is still useful,to avoid herd effect particularly for clusters 
with many tables. If many tables are doing rebalance but all failed like 
because controllers got restarted, then all rebalances would get retried all at 
once when RebalanceChecker task runs next time. If wanted, setting this to 0 
can disable the backoff mechanism.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to