Re: [PR] Speed up node boot times by parallelizing buffer acquisition (druid)

via GitHub Fri, 20 Feb 2026 21:03:01 -0800


kfaraz commented on code in PR #19025:
URL: https://github.com/apache/druid/pull/19025#discussion_r2835836876



##########
processing/src/main/java/org/apache/druid/query/DruidProcessingConfig.java:
##########
@@ -202,5 +206,10 @@ public boolean isNumMergeBuffersConfigured()
   {
     return numMergeBuffersConfigured;
   }
+
+  public boolean isParallelPoolInit()

Review Comment:
   1-line javadoc might be helpful here.



##########
docs/configuration/index.md:
##########
@@ -1636,6 +1637,7 @@ Druid uses Jetty to serve HTTP requests.
 |`druid.processing.numTimeoutThreads`|The number of processing threads to have 
available for handling per-segment query timeouts. Setting this value to `0` 
removes the ability to service per-segment timeouts, irrespective of 
`perSegmentTimeout` query context parameter. As these threads are just 
servicing timers, it's recommended to set this value to some small percent 
(e.g. 5%) of the total query processing cores available to the historical.|0|
 |`druid.processing.fifo`|If the processing queue should treat tasks of equal 
priority in a FIFO manner|`true`|
 |`druid.processing.tmpDir`|Path where temporary files created while processing 
a query should be stored. If specified, this configuration takes priority over 
the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`|
+|`druid.processing.parallelPoolInit`|(EXPERIMENTAL) Allows all 
merge/processing pools to be allocated in parallel on process launch. This 
significantly speeds up node launch times.|`false`|

Review Comment:
   +1, since the property is listed under the respective types, we might as 
well say "speeds up Broker launch times" and "speeds up Historical launch 
times".



##########
docs/configuration/index.md:
##########
@@ -1380,6 +1380,7 @@ Processing properties set on the Middle Manager are 
passed through to Peons.
 |`druid.processing.fifo`|Enables the processing queue to treat tasks of equal 
priority in a FIFO manner.|`true`|
 |`druid.processing.tmpDir`|Path where temporary files created while processing 
a query should be stored. If specified, this configuration takes priority over 
the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`|
 |`druid.processing.intermediaryData.storage.type`|Storage type for 
intermediary segments of data shuffle between native parallel index tasks. <br 
/>Set to `local` to store segment files in the local storage of the Middle 
Manager or Indexer. <br />Set to `deepstore` to use configured deep storage for 
better fault tolerance during rolling updates. When the storage type is 
`deepstore`, Druid stores the data in the `shuffle-data` directory under the 
configured deep storage path. Druid does not support automated cleanup for the 
`shuffle-data` directory. You can set up cloud storage lifecycle rules for 
automated cleanup of data at the `shuffle-data` prefix location.|`local`|
+|`druid.processing.parallelPoolInit`|(EXPERIMENTAL) Allows all 
merge/processing pools to be allocated in parallel on process launch. This 
significantly speeds up node launch times.|`false`|

Review Comment:
   Please include this update in the other places too, and also call out the 
potential caveat with the locking/starvation if the node doesn't have enough 
resources.



##########
processing/src/main/java/org/apache/druid/query/DruidProcessingConfig.java:
##########
@@ -86,6 +89,7 @@ public DruidProcessingConfig(
     this.tmpDir = Configs.valueOrDefault(tmpDir, 
System.getProperty("java.io.tmpdir"));
     this.buffer = Configs.valueOrDefault(buffer, new 
DruidProcessingBufferConfig());
     this.indexes = Configs.valueOrDefault(indexes, new 
DruidProcessingIndexesConfig());
+    this.parallelPoolInit = parallelPoolInit != null && parallelPoolInit;

Review Comment:
   Nit: easier to follow
   ```suggestion
       this.parallelPoolInit = Configs.valueOrDefault(parallelPoolInit, false);
   ```



##########
docs/configuration/index.md:
##########
@@ -1380,6 +1380,7 @@ Processing properties set on the Middle Manager are 
passed through to Peons.
 |`druid.processing.fifo`|Enables the processing queue to treat tasks of equal 
priority in a FIFO manner.|`true`|
 |`druid.processing.tmpDir`|Path where temporary files created while processing 
a query should be stored. If specified, this configuration takes priority over 
the default `java.io.tmpdir` path.|path represented by `java.io.tmpdir`|
 |`druid.processing.intermediaryData.storage.type`|Storage type for 
intermediary segments of data shuffle between native parallel index tasks. <br 
/>Set to `local` to store segment files in the local storage of the Middle 
Manager or Indexer. <br />Set to `deepstore` to use configured deep storage for 
better fault tolerance during rolling updates. When the storage type is 
`deepstore`, Druid stores the data in the `shuffle-data` directory under the 
configured deep storage path. Druid does not support automated cleanup for the 
`shuffle-data` directory. You can set up cloud storage lifecycle rules for 
automated cleanup of data at the `shuffle-data` prefix location.|`local`|
+|`druid.processing.parallelPoolInit`|(EXPERIMENTAL) Allows all 
merge/processing pools to be allocated in parallel on process launch. This 
significantly speeds up node launch times.|`false`|

Review Comment:
   (Maybe also include an estimate of what qualifies as a "big" number of 
"large" buffers so that operators know when to turn this on)
   ```suggestion
   |`druid.processing.parallelPoolInit`|(EXPERIMENTAL) Allows all 
merge/processing pools to be allocated in parallel on process launch. This may 
significantly speed up node launch times if allocating several large 
buffers.|`false`|
   ```



##########
website/.spelling:
##########
@@ -1883,6 +1883,7 @@ minTopNThreshold
 parallelMergeInitialYieldRows
 parallelMergeParallelism
 parallelMergeSmallBatchRows
+parallelPoolInit

Review Comment:
   Did the spellcheck fail? I was under the impression that back-quoted stuff 
was exempt from spelling checks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Speed up node boot times by parallelizing buffer acquisition (druid)

Reply via email to