Re: [PR] Docs: Changes for coordinator improvements (druid)

via GitHub Sun, 16 Jul 2023 23:12:46 -0700


kfaraz commented on code in PR #14590:
URL: https://github.com/apache/druid/pull/14590#discussion_r1264911402



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+If provided, the values are simply ignored.

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Changes for coordinator improvements (druid)

Reply via email to