samarthjain commented on a change in pull request #7088: Improve parallelism of zookeeper based segment change processing URL: https://github.com/apache/incubator-druid/pull/7088#discussion_r277127360
########## File path: docs/content/configuration/index.md ########## @@ -1254,9 +1254,9 @@ These Historical configurations can be defined in the `historical/runtime.proper |`druid.segmentCache.dropSegmentDelayMillis`|How long a process delays before completely dropping segment.|30000 (30 seconds)| |`druid.segmentCache.infoDir`|Historical processes keep track of the segments they are serving so that when the process is restarted they can reload the same segments without waiting for the Coordinator to reassign. This path defines where this metadata is kept. Directory will be created if needed.|${first_location}/info_dir| |`druid.segmentCache.announceIntervalMillis`|How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing.|5000 (5 seconds)| -|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load concurrently from from deep storage.|10| -|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of threads for executing callback actions associated with loading or dropping of segments.|2| -|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads to use for monitoring deletion of zk nodes|1| +|`druid.segmentCache.numLoadingThreads`|How many segments to drop or load concurrently from deep storage. Note that the work of loading segments involves downloading segments from deep storage, decompressing them and loading them to a memory mapped location. So the work is not all I/O Bound. Depending on CPU and network load, one could possibly increase this config to a higher value.|Number of cores| +|`druid.coordinator.loadqueuepeon.curator.numCallbackThreads`|Number of threads for executing callback actions associated with loading or dropping of segments. One might want to increase this number when noticing clusters are lagging behind w.r.t. balancing segments across historical nodes.|2| +|`druid.coordinator.loadqueuepeon.curator.numMonitorThreads`|Number of threads to use for monitoring deletion of zk nodes. Tasks in this pool get scheduled to run at time `druid.coordinator.load.timeout` after a segment is added to the queue. Increase this number if you see segments not getting loaded or dropped even after `druid.coordinator.load.timeout` since it is possible they are not getting re-assigned to the queue of other historicals soon enough.|1| Review comment: To be honest, this is a fairly advanced setting and the operator would need to know the nitty-gritty details of segment assignment and loading. The zookeeper node created for processing a segment load/drop should be deleted within `druid.coordinator.load.timeout`. If the node doesn't get deleted, then it means the historical failed to process the request. When such a timeout happens, queue peon needs to mark the change request as failed and invoke the ```failAssign``` method. Invocation of fail assign effectively tells the coordinator to assign this change request to another historical. With several concurrent change requests in flight, it is possible that these tasks that invoke ```failAssign``` and not getting run fast enough. Such a scenario though is very unlikely which is why the default number of threads is 1. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org