Re: [PR] Historical Startup -- Configurable loading strategy (druid)

via GitHub Mon, 27 Oct 2025 03:08:20 -0700


GWphua commented on code in PR #18687:
URL: https://github.com/apache/druid/pull/18687#discussion_r2465083710



##########
docs/configuration/index.md:
##########
@@ -1602,6 +1604,14 @@ In `druid.segmentCache.locationSelector.strategy`, one 
of `leastBytesUsed`, `rou
 
 Note that if `druid.segmentCache.numLoadingThreads` > 1, multiple threads can 
download different segments at the same time. In this case, with the 
`leastBytesUsed` strategy or `mostAvailableSize` strategy, Historicals may 
select a sub-optimal storage location because each decision is based on a 
snapshot of the storage location status of when a segment is requested to 
download.
 
+In `druid.segmentCache.startupLoadStrategy`, one of `loadAllEagerly`, 
`loadAllLazily`, or `loadEagerlyBeforePeriod` could be specified to represent 
the strategy to load segments when starting the Historical service.
+
+|Strategy|Description|
+|--------|-----------|
+|`loadAllEagerly`|The default startup strategy. The Historical service will 
load all segment column metadata immediately during the initial startup 
process.|
+|`loadAllLazily`|To significantly improve historical system startup time, 
segments are not loaded during the initial startup sequence. Instead, the 
loading cost is deferred, and will be incurred the first time a segment is 
referenced by a query.|
+|`loadEagerlyBeforePeriod`|Provides a balance between fast startup and query 
performance. The Historical service will eagerly load column metadata only for 
segments that fall within the most recent period defined by 
`druid.segmentCache.startupLoadPeriod`. Segments outside this recent period 
will be loaded on-demand when first queried.|

Review Comment:
   > How feasible/extensible is it to accept a map of datasource to load 
period, to allow configurable periods per datasource? (similar to the 
`loadByPeriod` - load 
[rules](https://druid.apache.org/docs/latest/operations/rule-configuration#period-load-rule)
 config where each datasource can have different load retention rules)
   > 
   > I think having that option would allow a lot more flexibility to operators 
as the query workloads can be vastly different.
   
   I feel we can leave this for another PR, since it is out of scope of this 
intended PR. WDYT? @abhishekrb19 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Historical Startup -- Configurable loading strategy (druid)

Reply via email to