clintropolis commented on issue #8578: parallel broker merges on fork join pool URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-548141222 ### Hyper-threading and default ForkJoinPool parallelism and per query parallelism configuration values Another side-note before I dig into benchmark results, I wanted to call out that I modified the default values for `druid.processing.merge.pool.parallelism` and `druid.processing.merge.pool.defaultMaxQueryParallelism` to be 0.75 and 0.5 times `Runtime.getRuntime().availableProcessors()` instead of 1.5 and 1.0 respectively, to make an assumption that there are 2 hyper-threads per physical core. This run of `ParallelMergeCombiningSequenceBenchmark` on an `m5.4xl`, which has 8 physical cores but reports 16 cores through `Runtime.getRuntime.availableProcessors()` since that is the 'vCPU' count, illustrates the reason for this. ``` Benchmark (concurrentSequenceConsumers) (inputSequenceType) (numSequences) (rowsPerSequence) (strategy) Mode Cnt Score Error Units ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 combiningMergeSequence-same-thread avgt 20 2975.944 ± 3.406 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-1-10ms-256-1024 avgt 20 2903.598 ± 7.425 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-4-10ms-256-1024 avgt 20 918.689 ± 7.172 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-6-10ms-256-1024 avgt 20 549.228 ± 1.545 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-8-10ms-256-1024 avgt 20 394.795 ± 1.110 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-12-10ms-256-1024 avgt 20 707.393 ± 2.392 ms/op ParallelMergeCombiningSequenceBenchmark.exec 1 non-blocking 128 75000 parallelism-16-10ms-256-1024 avgt 20 2069.897 ± 35.939 ms/op ``` This is of course because hyper-threads are not _real_ cores, so when actually fully utilizing a CPU to do work, counting them as such results in slower overall throughput. This is in line with the fine print on hyper-threads I think. This assumption for the default value might be wrong, but it held true for both my laptop and AWS, so maybe isn't unreasonable. I updated the user facing documentation on these settings to include this assumption, so that if an operators hardware differs and has more or no hyper-threading the value should be adjusted accordingly.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org