abhishekrb19 commented on code in PR #18687:
URL: https://github.com/apache/druid/pull/18687#discussion_r2458328111


##########
docs/configuration/index.md:
##########
@@ -1585,7 +1585,9 @@ These Historical configurations can be defined in the 
`historical/runtime.proper
 |`druid.segmentCache.announceIntervalMillis`|How frequently to announce 
segments while segments are loading from cache. Set this value to zero to wait 
for all segments to be loaded before announcing.|5000 (5 seconds)|
 |`druid.segmentCache.numLoadingThreads`|How many segments to drop or load 
concurrently from deep storage. Note that the work of loading segments involves 
downloading segments from deep storage, decompressing them and loading them to 
a memory mapped location. So the work is not all I/O Bound. Depending on CPU 
and network load, one could possibly increase this config to a higher 
value.|max(1,Number of cores / 6)|
 |`druid.segmentCache.numBootstrapThreads`|How many segments to load 
concurrently during historical startup.|`druid.segmentCache.numLoadingThreads`|
-|`druid.segmentCache.lazyLoadOnStart`|Whether or not to load segment columns 
metadata lazily during historical startup. When set to true, Historical startup 
time will be dramatically improved by deferring segment loading until the first 
time that segment takes part in a query, which will incur this cost 
instead.|false|
+|`druid.segmentCache.lazyLoadOnStart`|_DEPRECATED_ Use 
`druid.segmentCache.startupLoadStrategy` instead. Whether or not to load 
segment columns metadata lazily during historical startup. When set to true, 
Historical startup time will be dramatically improved by deferring segment 
loading until the first time that segment takes part in a query, which will 
incur this cost instead.|false|
+|`druid.segmentCache.startupLoadStrategy`|Selects the segment column metadata 
loading strategy during historical startup. Possible values are 
`loadAllEagerly`, `loadAllLazily`, and `loadEagerlyBeforePeriod`. More details 
on each strategy below.|`loadAllEagerly`|
+|`druid.segmentCache.startupLoadPeriod`| Used only when startup load strategy 
`loadEagerlyBeforePeriod` is configured. Suppose timestamp `t` is when the 
Historical started up. Any segment metadata with interval that does not overlap 
with the interval of `[t - startupLoadPeriod, t]` will be lazily loaded.|`P7D`|

Review Comment:
   Why do we default to `P7D`? IMO this should be a required property 
configured by operators and fail startup if unspecified (since there's no 
reasonable default that will apply to most users).



##########
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java:
##########
@@ -84,11 +97,31 @@ public List<StorageLocationConfig> getLocations()
     return locations;
   }
 
+  /**
+   * @deprecated Use {@link #getStartupCacheLoadStrategy()} instead.
+   * Removal of this method in the future will requires a change in {@link 
#getStartupCacheLoadStrategy()}
+   * to default to {@link LoadAllEagerlyStrategy#STRATEGY_NAME} when {@link 
#startupLoadStrategy} is null.
+   */
+  @Deprecated
   public boolean isLazyLoadOnStart()
   {
     return lazyLoadOnStart;
   }
 
+  public String getStartupCacheLoadStrategy()
+  {
+    return startupLoadStrategy == null
+           ? isLazyLoadOnStart()

Review Comment:
   Do we have validations that we don't have incompatible configurations 
enabled accidentally? for example, when `lazyLoadOnStart = true` and 
`startupLoadStrategy` are both  set



##########
server/src/main/java/org/apache/druid/server/coordination/startup/LoadEagerlyBeforePeriod.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.coordination.startup;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.timeline.DataSegment;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+import org.joda.time.Period;
+
+public class LoadEagerlyBeforePeriod implements 
HistoricalStartupCacheLoadStrategy

Review Comment:
   Please add a brief javadoc for these implementations



##########
server/src/main/java/org/apache/druid/server/coordination/startup/HistoricalStartupCacheLoadStrategyFactory.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.coordination.startup;
+
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.loading.SegmentLoaderConfig;
+
+public class HistoricalStartupCacheLoadStrategyFactory
+{
+  public static HistoricalStartupCacheLoadStrategy 
factorize(SegmentLoaderConfig config)
+  {
+    String strategyName = config.getStartupCacheLoadStrategy();
+    switch (strategyName) {
+      case LoadAllLazilyStrategy.STRATEGY_NAME:
+        return new LoadAllLazilyStrategy();
+      case LoadAllEagerlyStrategy.STRATEGY_NAME:
+        return new LoadAllEagerlyStrategy();
+      case LoadEagerlyBeforePeriod.STRATEGY_NAME:
+        return new LoadEagerlyBeforePeriod(config.getStartupLoadPeriod());
+      default:
+        throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+                            .ofCategory(DruidException.Category.UNSUPPORTED)
+                            .build("Unknown configured Historical Startup 
Loading Strategy[%s]", strategyName);

Review Comment:
   please see my related comment on simplifying this with 
`getStartupCacheLoadStrategy`



##########
docs/configuration/index.md:
##########
@@ -1602,6 +1604,14 @@ In `druid.segmentCache.locationSelector.strategy`, one 
of `leastBytesUsed`, `rou
 
 Note that if `druid.segmentCache.numLoadingThreads` > 1, multiple threads can 
download different segments at the same time. In this case, with the 
`leastBytesUsed` strategy or `mostAvailableSize` strategy, Historicals may 
select a sub-optimal storage location because each decision is based on a 
snapshot of the storage location status of when a segment is requested to 
download.
 
+In `druid.segmentCache.startupLoadStrategy`, one of `loadAllEagerly`, 
`loadAllLazily`, or `loadEagerlyBeforePeriod` could be specified to represent 
the strategy to load segments when starting the Historical service.
+
+|Strategy|Description|
+|--------|-----------|
+|`loadAllEagerly`|The default startup strategy. The Historical service will 
load all segment column metadata immediately during the initial startup 
process.|
+|`loadAllLazily`|To significantly improve historical system startup time, 
segments are not loaded during the initial startup sequence. Instead, the 
loading cost is deferred, and will be incurred the first time a segment is 
referenced by a query.|
+|`loadEagerlyBeforePeriod`|Provides a balance between fast startup and query 
performance. The Historical service will eagerly load column metadata only for 
segments that fall within the most recent period defined by 
`druid.segmentCache.startupLoadPeriod`. Segments outside this recent period 
will be loaded on-demand when first queried.|

Review Comment:
   How feasible/extensible is it to accept a map of datasource to load period, 
to allow configurable periods per datasource? (similar to the `loadByPeriod` - 
load 
[rules](https://druid.apache.org/docs/latest/operations/rule-configuration#period-load-rule)
 config where each datasource can have different load retention rules)
   
   I think having that option would allow a lot more flexibility to operators 
as the query workloads can be vastly different.



##########
server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java:
##########
@@ -853,7 +858,8 @@ public void mount(StorageLocation mountLocation) throws 
SegmentLoadingException
           final SegmentizerFactory factory = getSegmentFactory(storageDir);
 
           @SuppressWarnings("ObjectEquality")
-          final boolean lazy = config.isLazyLoadOnStart() && lazyLoadCallback 
!= SegmentLazyLoadFailCallback.NOOP;
+          final boolean lazy = loadStrategy.shouldLoadLazily(dataSegment)
+                               && lazyLoadCallback != 
SegmentLazyLoadFailCallback.NOOP;

Review Comment:
   Just to confirm that this would only affect bootstrapping flow and not other 
features like virtual fabric / on-demand queries that @clintropolis was working 
on? 
   I'm asking because this config usage is deep in the segment load flow that 
can  inadvertently affect other features since it's used by 4-5 related callers 
(I can help double check on this later)
   
   
   Alternatively, it'd be cleaner and safer to move this logic to 
`SegmentCacheBootstrapper` where we know certainly that it'd be secluded to the 
segment bootstrapping logic on data servers.
   
   



##########
server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java:
##########
@@ -84,11 +97,31 @@ public List<StorageLocationConfig> getLocations()
     return locations;
   }
 
+  /**
+   * @deprecated Use {@link #getStartupCacheLoadStrategy()} instead.
+   * Removal of this method in the future will requires a change in {@link 
#getStartupCacheLoadStrategy()}
+   * to default to {@link LoadAllEagerlyStrategy#STRATEGY_NAME} when {@link 
#startupLoadStrategy} is null.
+   */
+  @Deprecated
   public boolean isLazyLoadOnStart()
   {
     return lazyLoadOnStart;
   }
 
+  public String getStartupCacheLoadStrategy()

Review Comment:
   This API contract seems weird...it's returning the strategy name rather than 
the concrete strategy itself. What do you think about inlining the logic from 
`factorize()` and returning `HistoricalStartupCacheLoadStrategy`? 



##########
server/src/main/java/org/apache/druid/server/coordination/startup/HistoricalStartupCacheLoadStrategy.java:
##########
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.coordination.startup;
+
+import org.apache.druid.timeline.DataSegment;
+
+public interface HistoricalStartupCacheLoadStrategy
+{
+  boolean shouldLoadLazily(DataSegment segment);

Review Comment:
   Please add javadocs on this interface and the method



##########
server/src/main/java/org/apache/druid/server/coordination/startup/HistoricalStartupCacheLoadStrategyFactory.java:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.coordination.startup;
+
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.loading.SegmentLoaderConfig;
+
+public class HistoricalStartupCacheLoadStrategyFactory
+{
+  public static HistoricalStartupCacheLoadStrategy 
factorize(SegmentLoaderConfig config)
+  {
+    String strategyName = config.getStartupCacheLoadStrategy();
+    switch (strategyName) {
+      case LoadAllLazilyStrategy.STRATEGY_NAME:
+        return new LoadAllLazilyStrategy();
+      case LoadAllEagerlyStrategy.STRATEGY_NAME:
+        return new LoadAllEagerlyStrategy();
+      case LoadEagerlyBeforePeriod.STRATEGY_NAME:
+        return new LoadEagerlyBeforePeriod(config.getStartupLoadPeriod());
+      default:
+        throw DruidException.forPersona(DruidException.Persona.OPERATOR)
+                            .ofCategory(DruidException.Category.UNSUPPORTED)
+                            .build("Unknown configured Historical Startup 
Loading Strategy[%s]", strategyName);

Review Comment:
   This could just be a simpler helper method in `SegmentLocalCacheManager` 
where this is being used. I think calling it `factorize()` is not needed.



##########
server/src/main/java/org/apache/druid/server/coordination/startup/LoadEagerlyBeforePeriod.java:
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.coordination.startup;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.timeline.DataSegment;
+import org.joda.time.DateTime;
+import org.joda.time.Interval;
+import org.joda.time.Period;
+
+public class LoadEagerlyBeforePeriod implements 
HistoricalStartupCacheLoadStrategy
+{
+  private static final Logger log = new Logger(LoadEagerlyBeforePeriod.class);
+  public static final String STRATEGY_NAME = "loadEagerlyBeforePeriod";
+
+  private final Interval eagerLoadingWindow;

Review Comment:
   nit: `eagerLoadingInterval`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to