Re: [PR] Add the capability to turboload segments onto historicals (druid)

via GitHub Tue, 11 Mar 2025 11:15:50 -0700


vtlim commented on code in PR #17775:
URL: https://github.com/apache/druid/pull/17775#discussion_r1989873296



##########
docs/configuration/index.md:
##########
@@ -885,7 +885,7 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 |`druid.coordinator.kill.maxInterval`|The largest interval, as an [ISO 8601 
duration](https://en.wikipedia.org/wiki/ISO_8601#Durations), of segments to 
delete per kill task. Set to zero, e.g. `PT0S`, for unlimited. This only 
applies when `druid.coordinator.kill.on=true`.|`P30D`|
 |`druid.coordinator.balancer.strategy`|The [balancing 
strategy](../design/coordinator.md#balancing-segments-in-a-tier) used by the 
Coordinator to distribute segments among the Historical servers in a tier. The 
`cost` strategy distributes segments by minimizing a cost function, 
`diskNormalized` weights these costs with the disk usage ratios of the servers 
and `random` distributes segments randomly.|`cost`|
 |`druid.coordinator.loadqueuepeon.http.repeatDelay`|The start and repeat delay 
(in milliseconds) for the load queue peon, which manages the load/drop queue of 
segments for any server.|1 minute|
-|`druid.coordinator.loadqueuepeon.http.batchSize`|Number of segment load/drop 
requests to batch in one HTTP request. Note that it must be smaller than 
`druid.segmentCache.numLoadingThreads` config on Historical service.|1|
+|`druid.coordinator.loadqueuepeon.http.batchSize`|Number of segment load/drop 
requests to batch in one HTTP request. Note that it must be smaller than 
`druid.segmentCache.numLoadingThreads` config on Historical service. If the 
value is not provided, automatically sets the value to the `numLoadingThreads` 
on the historical. | `druid.segmentCache.numLoadingThreads` |

Review Comment:
   If `numLoadingThreads` will be the default, does the previous sentence need 
to change to say "smaller than or equal to"?
   
   Also need to capitalize Historical.



##########
docs/configuration/index.md:
##########
@@ -953,6 +953,7 @@ The following table shows the dynamic configuration 
properties for the Coordinat
 |`decommissioningNodes`|List of Historical servers to decommission. 
Coordinator will not assign new segments to decommissioning servers, and 
segments will be moved away from them to be placed on non-decommissioning 
servers at the maximum rate specified by `maxSegmentsToMove`.|none|
 |`pauseCoordination`|Boolean flag for whether or not the Coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
interface. Such duties include: segment balancing, segment compaction, 
submitting kill tasks for unused segments (if enabled), logging of used 
segments in the cluster, marking of newly unused or overshadowed segments, 
matching and execution of load/drop rules for used segments, unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS name nodes with downtime and don't want the Coordinator to 
be directing Historical nodes to hit the name node with API requests until 
maintenance is done and the deep store is declared healthy for use again.|fa
 lse|
 |`replicateAfterLoadTimeout`|Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the Coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow Historicals 
in the cluster. However, the slow Historical may still load the segment later 
and the Coordinator may issue drop requests if the segment is 
over-replicated.|false|
+|`turboLoadingNodes`|List of Historical servers to place in turbo loading 
mode. This causes the historical to load segments faster at the cost of query 
performance. For any performance increase, the runtime parameter 
`druid.coordinator.loadqueuepeon.http.batchSize` must not be configured. |none|

Review Comment:
   ```suggestion
   |`turboLoadingNodes`|List of Historical servers to place in turbo loading 
mode. These Historicals will load segments more quickly but at the cost of 
query performance. For any performance increase, don't configure the runtime 
parameter `druid.coordinator.loadqueuepeon.http.batchSize`. |none|
   ```



##########
docs/configuration/index.md:
##########
@@ -953,6 +953,7 @@ The following table shows the dynamic configuration 
properties for the Coordinat
 |`decommissioningNodes`|List of Historical servers to decommission. 
Coordinator will not assign new segments to decommissioning servers, and 
segments will be moved away from them to be placed on non-decommissioning 
servers at the maximum rate specified by `maxSegmentsToMove`.|none|
 |`pauseCoordination`|Boolean flag for whether or not the Coordinator should 
execute its various duties of coordinating the cluster. Setting this to true 
essentially pauses all coordination work while allowing the API to remain up. 
Duties that are paused include all classes that implement the `CoordinatorDuty` 
interface. Such duties include: segment balancing, segment compaction, 
submitting kill tasks for unused segments (if enabled), logging of used 
segments in the cluster, marking of newly unused or overshadowed segments, 
matching and execution of load/drop rules for used segments, unloading segments 
that are no longer marked as used from Historical servers. An example of when 
an admin may want to pause coordination would be if they are doing deep storage 
maintenance on HDFS name nodes with downtime and don't want the Coordinator to 
be directing Historical nodes to hit the name node with API requests until 
maintenance is done and the deep store is declared healthy for use again.|fa
 lse|
 |`replicateAfterLoadTimeout`|Boolean flag for whether or not additional 
replication is needed for segments that have failed to load due to the expiry 
of `druid.coordinator.load.timeout`. If this is set to true, the Coordinator 
will attempt to replicate the failed segment on a different historical server. 
This helps improve the segment availability if there are a few slow Historicals 
in the cluster. However, the slow Historical may still load the segment later 
and the Coordinator may issue drop requests if the segment is 
over-replicated.|false|
+|`turboLoadingNodes`|List of Historical servers to place in turbo loading 
mode. This causes the historical to load segments faster at the cost of query 
performance. For any performance increase, the runtime parameter 
`druid.coordinator.loadqueuepeon.http.batchSize` must not be configured. |none|

Review Comment:
   It's not immediately clear how `turboLoadingNodes` relates to `batchSize`. 
Does it mean that configuring `batchSize` _with_ `turboLoadingNodes` impacts 
query performance worse than just turbo mode? Or does it mean that not 
configuring the batch size will lead to query performance increase when in 
turbo mode?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add the capability to turboload segments onto historicals (druid)

Reply via email to