samarthjain commented on a change in pull request #7088: Improve parallelism of 
zookeeper based segment change processing
URL: https://github.com/apache/incubator-druid/pull/7088#discussion_r277127360
 
 

 ##########
 File path: docs/content/configuration/index.md
 ##########
 @@ -1254,9 +1254,9 @@ These Historical configurations can be defined in the 
`historical/runtime.proper
 |`druid.segmentCache.dropSegmentDelayMillis`|How long a process delays before 
completely dropping segment.|30000 (30 seconds)|
 |`druid.segmentCache.infoDir`|Historical processes keep track of the segments 
they are serving so that when the process is restarted they can reload the same 
segments without waiting for the Coordinator to reassign. This path defines 
where this metadata is kept. Directory will be created if 
needed.|${first_location}/info_dir|
 |`druid.segmentCache.announceIntervalMillis`|How frequently to announce 
segments while segments are loading from cache. Set this value to zero to wait 
for all segments to be loaded before announcing.|5000 (5 seconds)|
-|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load 
concurrently from from deep storage.|10|
-|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of 
threads for executing callback actions associated with loading or dropping of 
segments.|2|
-|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads 
to use for monitoring deletion of zk nodes|1|
+|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load 
concurrently from deep storage. Note that the work of loading segments involves 
downloading segments from deep storage, decompressing them and loading them to 
a memory mapped location. So the work is not all I/O Bound. Depending on CPU 
and network load, one could possibly increase this config to a higher 
value.|Number of cores|
+|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of 
threads for executing callback actions associated with loading or dropping of 
segments. One might want to increase this number when noticing clusters are 
lagging behind w.r.t. balancing segments across historical nodes.|2|
+|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads 
to use for monitoring deletion of zk nodes. Tasks in this pool get scheduled to 
run at time `druid.coordinator.load.timeout` after a segment is added to the 
queue. Increase this number if you see segments not getting loaded or dropped 
even after `druid.coordinator.load.timeout` since it is possible they are not 
getting re-assigned to the queue of other historicals soon enough.|1|
 
 Review comment:
   To be honest, this is a fairly advanced setting and the operator would need 
to know the nitty-gritty details of segment assignment and loading. 
   
   The zookeeper node created for processing a segment load/drop should be 
deleted within `druid.coordinator.load.timeout`. If the node doesn't get 
deleted, then it means the historical failed to process the request. When such 
a timeout happens, queue peon needs to mark the change request as failed and 
invoke the ```failAssign``` method. Invocation of fail assign effectively tells 
the coordinator to assign this change request to another historical. With 
several concurrent change requests in flight, it is possible that these tasks 
that invoke ```failAssign``` and not getting run fast enough. Such a scenario 
though is very unlikely which is why the default number of threads is 1.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to