Re: [PR] Docs: Changes for coordinator improvements (druid)

via GitHub Mon, 17 Jul 2023 03:55:39 -0700


writer-jill commented on code in PR #14590:
URL: https://github.com/apache/druid/pull/14590#discussion_r1265154823



##########
docs/configuration/index.md:
##########
@@ -949,10 +949,11 @@ Issuing a GET request at the same URL will return the 
spec that is currently in
 |`millisToWaitBeforeDeleting`|How long does the Coordinator need to be a 
leader before it can start marking overshadowed segments as unused in metadata 
storage.|900000 (15 mins)|
 |`mergeBytesLimit`|The maximum total uncompressed size in bytes of segments to 
merge.|524288000L|
 |`mergeSegmentsLimit`|The maximum number of segments that can be in a single 
[append task](../ingestion/tasks.md).|100|
+|`smartSegmentLoading`|Whether to turn on the new ["smart"-mode of segment 
loading](#smart-segment-loading) which dynamically computes the optimal values 
of several parameters that maximize Coordinator performance.|true|

Review Comment:
   ```suggestion
   |`smartSegmentLoading`|Enables ["smart" segment loading 
mode](#smart-segment-loading) which dynamically computes the optimal values of 
several properties that maximize Coordinator performance.|true|
   ```



##########
docs/configuration/index.md:
##########
@@ -887,7 +887,7 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
 |`druid.serverview.type`|batch or http|Segment discovery method to use. "http" 
enables discovering segments using HTTP instead of ZooKeeper.|http|
-|`druid.coordinator.loadqueuepeon.type`|curator or http|Whether to use "http" 
or "curator" implementation to assign segment loads/drops to historical|http|
+|`druid.coordinator.loadqueuepeon.type`|curator or http|Implementation to use 
to assign segment loads and drops to historicals. Curator-based implementation 
is now deprecated and all users should move to using HTTP-based segment 
assignments.|http|

Review Comment:
   ```suggestion
   |`druid.coordinator.loadqueuepeon.type`|curator or http|Implementation to 
use to assign segment loads and drops to historicals. Curator-based 
implementation is now deprecated, so you should transition to using HTTP-based 
segment assignments.|http|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|

Review Comment:
   ```suggestion
   |`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value does not apply a limit to 
the number of replicas assigned per coordination cycle. If you want to use a 
non-default value for this property, you may want to start with `~20%` of the 
number of segments found on the historical server with the most segments. Use 
the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this property impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (no limit)|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.

Review Comment:
   ```suggestion
   The `smartSegmentLoading` mode simplifies Coordinator configuration for 
segment loading and balancing.
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**

Review Comment:
   ```suggestion
   > If you enable `smartSegmentLoading` mode **and** provide values for the 
following properties, Druid ignores your values. The Coordinator computes them 
automatically.
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|

Review Comment:
   ```suggestion
   |`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment.|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|

Review Comment:
   ```suggestion
   |Property|Computed value|Description|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.

Review Comment:
   ```suggestion
   The computed values are based on the current state of the cluster and are 
designed to optimize Coordinator performance.
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|

Review Comment:
   ```suggestion
   |`replicationThrottleLimit`|2% of used segments, minimum value 100|Prevents 
aggressive replication when a historical disappears only intermittently.|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|
+|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
coordinator period of 1 minute) in the load queue before an alert is raised. 
This value is higher than the previous default of 15 because in 
`smartSegmentLoading` mode, load queues are not limited by size. Thus, segments 
might get assigned to a load queue even if the corresponding server is slow to 
load them.|
+|`maxNonPrimaryReplicantsToLoad`|`Integer.MAX_VALUE` (no limit)|This 
throttling is already handled by `replicationThrottleLimit`.|
+|`maxSegmentsToMove`|2% of used segments, minimum value = 100, maximum value = 
1000|Ensures that some segments are always moving in the cluster to keep it 
well balanced. The maximum value keeps the coordinator run times bounded.|

Review Comment:
   ```suggestion
   |`maxSegmentsToMove`|2% of used segments, minimum value 100, maximum value 
1000|Ensures that some segments are always moving in the cluster to keep it 
well balanced. The maximum value keeps the Coordinator run times bounded.|
   ```



##########
docs/operations/metrics.md:
##########
@@ -283,19 +283,21 @@ These metrics are for the Druid Coordinator and are reset 
each time the Coordina
 
 |Metric|Description|Dimensions|Normal Value|
 |------|-----------|----------|------------|
-|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`tier`|Varies|
-|`segment/moved/count`|Number of segments moved in the cluster.|`tier`|Varies|
-|`segment/unmoved/count`|Number of segments which were chosen for balancing 
but were found to be already optimally placed.|`tier`|Varies|
-|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`tier`|Varies|
-|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.| |Varies|
-|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`tier`|Varies|
-|`segment/cost/raw`|Used in cost balancing. The raw cost of hosting 
segments.|`tier`|Varies|
-|`segment/cost/normalization`|Used in cost balancing. The normalization of 
hosting segments.|`tier`|Varies|
-|`segment/cost/normalized`|Used in cost balancing. The normalized cost of 
hosting segments.|`tier`|Varies|
+|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`dataSource`, `tier`|Varies|
+|`segment/moved/count`|Number of segments moved in the cluster.|`dataSource`, 
`tier`|Varies|
+|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`dataSource`, `tier`|Varies|
+|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.|`dataSource`|Varies|
+|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`dataSource`, `tier`|Varies|
+|`segment/assignSkipped/count`|Number of segments that could not be assigned 
to any server for loading due to replication throttling, no available disk 
space, full load queue, or some other reason.|`dataSource`, `tier`, 
`description`|Varies|
+|`segment/moveSkipped/count`|Number of segments that were chosen for balancing 
but could not be moved either due to already being optimally placed or some 
other reason.|`dataSource`, `tier`, `description`|Varies|

Review Comment:
   ```suggestion
   |`segment/moveSkipped/count`|Number of segments that were chosen for 
balancing but could not be moved. This can occur when segments are already 
optimally placed.|`dataSource`, `tier`, `description`|Varies|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|
+|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
coordinator period of 1 minute) in the load queue before an alert is raised. 
This value is higher than the previous default of 15 because in 
`smartSegmentLoading` mode, load queues are not limited by size. Thus, segments 
might get assigned to a load queue even if the corresponding server is slow to 
load them.|
+|`maxNonPrimaryReplicantsToLoad`|`Integer.MAX_VALUE` (no limit)|This 
throttling is already handled by `replicationThrottleLimit`.|
+|`maxSegmentsToMove`|2% of used segments, minimum value = 100, maximum value = 
1000|Ensures that some segments are always moving in the cluster to keep it 
well balanced. The maximum value keeps the coordinator run times bounded.|
+|`decommissioningMaxPercentOfMaxSegmentsToMove`|100|Prioritizes move of 
segments from decommissioning servers so that they can be terminated quickly.|
+
+When `smartSegmentLoading` is disabled, the configured values of these 
parameters are used without any modification.

Review Comment:
   ```suggestion
   When `smartSegmentLoading` is disabled, Druid uses the configured values of 
these properties.
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.

Review Comment:
   ```suggestion
   If you enable this mode, do not provide values for the properties in the 
table below.```



##########
docs/configuration/index.md:
##########
@@ -949,10 +949,11 @@ Issuing a GET request at the same URL will return the 
spec that is currently in
 |`millisToWaitBeforeDeleting`|How long does the Coordinator need to be a 
leader before it can start marking overshadowed segments as unused in metadata 
storage.|900000 (15 mins)|
 |`mergeBytesLimit`|The maximum total uncompressed size in bytes of segments to 
merge.|524288000L|
 |`mergeSegmentsLimit`|The maximum number of segments that can be in a single 
[append task](../ingestion/tasks.md).|100|
+|`smartSegmentLoading`|Whether to turn on the new ["smart"-mode of segment 
loading](#smart-segment-loading) which dynamically computes the optimal values 
of several parameters that maximize Coordinator performance.|true|
 |`maxSegmentsToMove`|The maximum number of segments that can be moved at any 
given time.|100|
-|`replicantLifetime`|The maximum number of Coordinator runs for a segment to 
be replicated before we start alerting.|15|
-|`replicationThrottleLimit`|The maximum number of segments that can be in the 
replication queue of a historical tier at any given time.|500|
-|`balancerComputeThreads`|Thread pool size for computing moving cost of 
segments in segment balancing. Consider increasing this if you have a lot of 
segments and moving segments starts to get stuck.|1|
+|`replicantLifetime`|The maximum number of Coordinator runs for which a 
segment can wait in the load queue of a Historical before Druid raises an 
alert.|15|
+|`replicationThrottleLimit`|The maximum number of segment replicas that can be 
assigned to a Historical tier in a single Coordinator run. This parameter is a 
defensive measure to prevent Historicals from getting overwhelmed loading extra 
replicas of segments that are already available in the cluster.|500|

Review Comment:
   ```suggestion
   |`replicationThrottleLimit`|The maximum number of segment replicas that can 
be assigned to a historical tier in a single Coordinator run. This property 
prevents historicals from becoming overwhelmed when loading extra replicas of 
segments that are already available in the cluster.|500|
   ```



##########
docs/operations/metrics.md:
##########
@@ -283,19 +283,21 @@ These metrics are for the Druid Coordinator and are reset 
each time the Coordina
 
 |Metric|Description|Dimensions|Normal Value|
 |------|-----------|----------|------------|
-|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`tier`|Varies|
-|`segment/moved/count`|Number of segments moved in the cluster.|`tier`|Varies|
-|`segment/unmoved/count`|Number of segments which were chosen for balancing 
but were found to be already optimally placed.|`tier`|Varies|
-|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`tier`|Varies|
-|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.| |Varies|
-|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`tier`|Varies|
-|`segment/cost/raw`|Used in cost balancing. The raw cost of hosting 
segments.|`tier`|Varies|
-|`segment/cost/normalization`|Used in cost balancing. The normalization of 
hosting segments.|`tier`|Varies|
-|`segment/cost/normalized`|Used in cost balancing. The normalized cost of 
hosting segments.|`tier`|Varies|
+|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`dataSource`, `tier`|Varies|
+|`segment/moved/count`|Number of segments moved in the cluster.|`dataSource`, 
`tier`|Varies|
+|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`dataSource`, `tier`|Varies|
+|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.|`dataSource`|Varies|
+|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`dataSource`, `tier`|Varies|
+|`segment/assignSkipped/count`|Number of segments that could not be assigned 
to any server for loading due to replication throttling, no available disk 
space, full load queue, or some other reason.|`dataSource`, `tier`, 
`description`|Varies|
+|`segment/moveSkipped/count`|Number of segments that were chosen for balancing 
but could not be moved either due to already being optimally placed or some 
other reason.|`dataSource`, `tier`, `description`|Varies|
+|`segment/dropSkipped/count`|Number of segments that could not be dropped from 
any server.|`dataSource`, `tier`, `description`|Varies|
 |`segment/loadQueue/size`|Size in bytes of segments to load.|`server`|Varies|
-|`segment/loadQueue/failed`|Number of segments that failed to load.|`server`|0|
 |`segment/loadQueue/count`|Number of segments to load.|`server`|Varies|
 |`segment/dropQueue/count`|Number of segments to drop.|`server`|Varies|
+|`segment/loadQueue/assigned`|Number of segments assigned for load or drop to 
the load queue of a server.|`dataSource`, `server`|Varies|
+|`segment/loadQueue/success`|Number of segment assignments that completed 
successfully.|`dataSource`, `server`|Varies|
+|`segment/loadQueue/failed`|Number of segment assignments that failed to 
complete.|`dataSource`, `server`|0|
+|`segment/loadQueue/cancelled`|Number of segment assignments that were 
cancelled before completion.|`dataSource`, `server`|Varies|

Review Comment:
   ```suggestion
   |`segment/loadQueue/cancelled`|Number of segment assignments that were 
canceled before completion.|`dataSource`, `server`|Varies|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|
+|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
coordinator period of 1 minute) in the load queue before an alert is raised. 
This value is higher than the previous default of 15 because in 
`smartSegmentLoading` mode, load queues are not limited by size. Thus, segments 
might get assigned to a load queue even if the corresponding server is slow to 
load them.|
+|`maxNonPrimaryReplicantsToLoad`|`Integer.MAX_VALUE` (no limit)|This 
throttling is already handled by `replicationThrottleLimit`.|
+|`maxSegmentsToMove`|2% of used segments, minimum value = 100, maximum value = 
1000|Ensures that some segments are always moving in the cluster to keep it 
well balanced. The maximum value keeps the coordinator run times bounded.|
+|`decommissioningMaxPercentOfMaxSegmentsToMove`|100|Prioritizes move of 
segments from decommissioning servers so that they can be terminated quickly.|
+
+When `smartSegmentLoading` is disabled, the configured values of these 
parameters are used without any modification.
+You should disable this mode only if you want to explicitly set the value of 
any of the above parameters.

Review Comment:
   ```suggestion
   Disable `smartSegmentLoading` only if you want to explicitly set the values 
of any of the above properties.
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|

Review Comment:
   ```suggestion
   |`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size.|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|
+|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
coordinator period of 1 minute) in the load queue before an alert is raised. 
This value is higher than the previous default of 15 because in 
`smartSegmentLoading` mode, load queues are not limited by size. Thus, segments 
might get assigned to a load queue even if the corresponding server is slow to 
load them.|

Review Comment:
   ```suggestion
   |`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
Coordinator period of 1 minute) in the load queue before an alert is raised. In 
`smartSegmentLoading` mode, load queues are not limited by size. Segments might 
therefore assigned to a load queue even if the corresponding server is slow to 
load them.|
   ```



##########
docs/configuration/index.md:
##########
@@ -961,9 +962,29 @@ Issuing a GET request at the same URL will return the spec 
that is currently in
 |`decommissioningMaxPercentOfMaxSegmentsToMove`| Upper limit of segments the 
Coordinator can move from decommissioning servers to active non-decommissioning 
servers during a single run. This value is relative to the total maximum number 
of segments that can be moved at any given time based upon the value of 
`maxSegmentsToMove`.<br /><br />If 
`decommissioningMaxPercentOfMaxSegmentsToMove` is 0, the Coordinator does not 
move segments to decommissioning servers, effectively putting them in a type of 
"maintenance" mode. In this case, decommissioning servers do not participate in 
balancing or assignment by load rules. The Coordinator still considers segments 
on decommissioning servers as candidates to replicate on active servers.<br 
/><br />Decommissioning can stall if there are no available active servers to 
move the segments to. You can use the maximum percent of decommissioning 
segment movements to prioritize balancing or to decrease commissioning time to 
prevent active servers from b
 eing overloaded. The value must be between 0 and 100.|70|
 |`pauseCoordination`| Boolean flag for whether or not the coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
Interface. Such duties include: Segment balancing, Segment compaction, 
Submitting kill tasks for unused segments (if enabled), Logging of used 
segments in the cluster, Marking of newly unused or overshadowed segments, 
Matching and execution of load/drop rules for used segments, Unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS Name Nodes with downtime and don't want the coordinator to 
be directing Historical Nodes to hit the Name Node with API requests until 
maintenance is done and the deep store is declared healthy for use again. |
 false|
 |`replicateAfterLoadTimeout`| Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow historicals 
in the cluster. However, the slow historical may still load the segment later 
and the coordinator may issue drop requests if the segment is 
over-replicated.|false|
-|`maxNonPrimaryReplicantsToLoad`|This is the maximum number of non-primary 
segment replicants to load per Coordination run. This number can be set to put 
a hard upper limit on the number of replicants loaded. It is a tool that can 
help prevent long delays in new data being available for query after events 
that require many non-primary replicants to be loaded by the cluster; such as a 
Historical node disconnecting from the cluster. The default value essentially 
means there is no limit on the number of replicants loaded per coordination 
cycle. If you want to use a non-default value for this config, you may want to 
start with it being `~20%` of the number of segments found on your Historical 
server with the most segments. You can use the Druid metric, `coordinator/time` 
with the filter `duty=org.apache.druid.server.coordinator.duty.RunRules` to see 
how different values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE`|
+|`maxNonPrimaryReplicantsToLoad`|The maximum number of replicas that can be 
assigned across all tiers in a single Coordinator run. This parameter serves 
the same purpose as `replicationThrottleLimit` except this limit applies at the 
cluster-level instead of per tier. The default value essentially means that 
there is no limit on the number of replicas assigned per coordination cycle. If 
you want to use a non-default value for this config, you may want to start with 
it being `~20%` of the number of segments found on the Historical server with 
the most segments. Use the Druid metric, `coordinator/time` with the filter 
`duty=org.apache.druid.server.coordinator.duty.RunRules` to see how different 
values of this config impact your Coordinator execution 
time.|`Integer.MAX_VALUE` (i.e. no limit)|
 
+##### Smart segment loading
 
+The `smartSegmentLoading` mode of the Coordinator makes configuring it for 
segment loading and balancing much easier.
+In this mode, the Coordinator does not require the user to provide values of 
the following parameters and computes them automatically instead.
+**If provided, the values are simply ignored.**
+The computed values are based on the current state of the cluster and are 
meant to optimize the performance of the Coordinator.
+
+|Property|Computed value|Explanation|
+|--------|--------------|-----------|
+|`useRoundRobinSegmentAssignment`|true|Speeds up segment assignment|
+|`maxSegmentsInNodeLoadingQueue`|0|Removes the limit on load queue size|
+|`replicationThrottleLimit`|2% of used segments, minimum value = 100|Ensures 
that replication is not done too aggressively in case of a historical 
disappearing only intermittently.|
+|`replicantLifetime`|60|Allows segments to wait about an hour (assuming a 
coordinator period of 1 minute) in the load queue before an alert is raised. 
This value is higher than the previous default of 15 because in 
`smartSegmentLoading` mode, load queues are not limited by size. Thus, segments 
might get assigned to a load queue even if the corresponding server is slow to 
load them.|
+|`maxNonPrimaryReplicantsToLoad`|`Integer.MAX_VALUE` (no limit)|This 
throttling is already handled by `replicationThrottleLimit`.|
+|`maxSegmentsToMove`|2% of used segments, minimum value = 100, maximum value = 
1000|Ensures that some segments are always moving in the cluster to keep it 
well balanced. The maximum value keeps the coordinator run times bounded.|
+|`decommissioningMaxPercentOfMaxSegmentsToMove`|100|Prioritizes move of 
segments from decommissioning servers so that they can be terminated quickly.|

Review Comment:
   ```suggestion
   |`decommissioningMaxPercentOfMaxSegmentsToMove`|100|Prioritizes the move of 
segments from decommissioning servers so that they can be terminated quickly.|
   ```



##########
docs/operations/metrics.md:
##########
@@ -283,19 +283,21 @@ These metrics are for the Druid Coordinator and are reset 
each time the Coordina
 
 |Metric|Description|Dimensions|Normal Value|
 |------|-----------|----------|------------|
-|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`tier`|Varies|
-|`segment/moved/count`|Number of segments moved in the cluster.|`tier`|Varies|
-|`segment/unmoved/count`|Number of segments which were chosen for balancing 
but were found to be already optimally placed.|`tier`|Varies|
-|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`tier`|Varies|
-|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.| |Varies|
-|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`tier`|Varies|
-|`segment/cost/raw`|Used in cost balancing. The raw cost of hosting 
segments.|`tier`|Varies|
-|`segment/cost/normalization`|Used in cost balancing. The normalization of 
hosting segments.|`tier`|Varies|
-|`segment/cost/normalized`|Used in cost balancing. The normalized cost of 
hosting segments.|`tier`|Varies|
+|`segment/assigned/count`|Number of segments assigned to be loaded in the 
cluster.|`dataSource`, `tier`|Varies|
+|`segment/moved/count`|Number of segments moved in the cluster.|`dataSource`, 
`tier`|Varies|
+|`segment/dropped/count`|Number of segments chosen to be dropped from the 
cluster due to being over-replicated.|`dataSource`, `tier`|Varies|
+|`segment/deleted/count`|Number of segments marked as unused due to drop 
rules.|`dataSource`|Varies|
+|`segment/unneeded/count`|Number of segments dropped due to being marked as 
unused.|`dataSource`, `tier`|Varies|
+|`segment/assignSkipped/count`|Number of segments that could not be assigned 
to any server for loading due to replication throttling, no available disk 
space, full load queue, or some other reason.|`dataSource`, `tier`, 
`description`|Varies|

Review Comment:
   ```suggestion
   |`segment/assignSkipped/count`|Number of segments that could not be assigned 
to any server for loading. This can occur due to replication throttling, no 
available disk space, or a full load queue.|`dataSource`, `tier`, 
`description`|Varies|
   ```



##########
docs/configuration/index.md:
##########
@@ -949,10 +949,11 @@ Issuing a GET request at the same URL will return the 
spec that is currently in
 |`millisToWaitBeforeDeleting`|How long does the Coordinator need to be a 
leader before it can start marking overshadowed segments as unused in metadata 
storage.|900000 (15 mins)|
 |`mergeBytesLimit`|The maximum total uncompressed size in bytes of segments to 
merge.|524288000L|
 |`mergeSegmentsLimit`|The maximum number of segments that can be in a single 
[append task](../ingestion/tasks.md).|100|
+|`smartSegmentLoading`|Whether to turn on the new ["smart"-mode of segment 
loading](#smart-segment-loading) which dynamically computes the optimal values 
of several parameters that maximize Coordinator performance.|true|
 |`maxSegmentsToMove`|The maximum number of segments that can be moved at any 
given time.|100|
-|`replicantLifetime`|The maximum number of Coordinator runs for a segment to 
be replicated before we start alerting.|15|
-|`replicationThrottleLimit`|The maximum number of segments that can be in the 
replication queue of a historical tier at any given time.|500|
-|`balancerComputeThreads`|Thread pool size for computing moving cost of 
segments in segment balancing. Consider increasing this if you have a lot of 
segments and moving segments starts to get stuck.|1|
+|`replicantLifetime`|The maximum number of Coordinator runs for which a 
segment can wait in the load queue of a Historical before Druid raises an 
alert.|15|
+|`replicationThrottleLimit`|The maximum number of segment replicas that can be 
assigned to a Historical tier in a single Coordinator run. This parameter is a 
defensive measure to prevent Historicals from getting overwhelmed loading extra 
replicas of segments that are already available in the cluster.|500|
+|`balancerComputeThreads`|Thread pool size for computing moving cost of 
segments during segment balancing. Consider increasing this if you have a lot 
of segments and moving segments starts to get stuck.|1|

Review Comment:
   ```suggestion
   |`balancerComputeThreads`|Thread pool size for computing moving cost of 
segments during segment balancing. Consider increasing this if you have a lot 
of segments and moving segments begins to stall.|1|
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Changes for coordinator improvements (druid)

Reply via email to