GWphua commented on code in PR #18687: URL: https://github.com/apache/druid/pull/18687#discussion_r2465083710
########## docs/configuration/index.md: ########## @@ -1602,6 +1604,14 @@ In `druid.segmentCache.locationSelector.strategy`, one of `leastBytesUsed`, `rou Note that if `druid.segmentCache.numLoadingThreads` > 1, multiple threads can download different segments at the same time. In this case, with the `leastBytesUsed` strategy or `mostAvailableSize` strategy, Historicals may select a sub-optimal storage location because each decision is based on a snapshot of the storage location status of when a segment is requested to download. +In `druid.segmentCache.startupLoadStrategy`, one of `loadAllEagerly`, `loadAllLazily`, or `loadEagerlyBeforePeriod` could be specified to represent the strategy to load segments when starting the Historical service. + +|Strategy|Description| +|--------|-----------| +|`loadAllEagerly`|The default startup strategy. The Historical service will load all segment column metadata immediately during the initial startup process.| +|`loadAllLazily`|To significantly improve historical system startup time, segments are not loaded during the initial startup sequence. Instead, the loading cost is deferred, and will be incurred the first time a segment is referenced by a query.| +|`loadEagerlyBeforePeriod`|Provides a balance between fast startup and query performance. The Historical service will eagerly load column metadata only for segments that fall within the most recent period defined by `druid.segmentCache.startupLoadPeriod`. Segments outside this recent period will be loaded on-demand when first queried.| Review Comment: > How feasible/extensible is it to accept a map of datasource to load period, to allow configurable periods per datasource? (similar to the `loadByPeriod` - load [rules](https://druid.apache.org/docs/latest/operations/rule-configuration#period-load-rule) config where each datasource can have different load retention rules) > > I think having that option would allow a lot more flexibility to operators as the query workloads can be vastly different. I feel we can leave this for another PR, since it is out of scope of this intended PR. WDYT? @abhishekrb19 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
