clintropolis commented on issue #8578: parallel broker merges on fork join pool
URL: https://github.com/apache/incubator-druid/pull/8578#issuecomment-548141222
 
 
   ### Hyper-threading and default ForkJoinPool parallelism and per query 
parallelism configuration values
   
   Another side-note before I dig into benchmark results, I wanted to call out 
that I modified the default values for 
`druid.processing.merge.pool.parallelism` and 
`druid.processing.merge.pool.defaultMaxQueryParallelism` to be 0.75 and 0.5 
times `Runtime.getRuntime().availableProcessors()` instead of 1.5 and 1.0 
respectively, to make an assumption that there are 2 hyper-threads per physical 
core.
   
   This run of `ParallelMergeCombiningSequenceBenchmark` on an `m5.4xl`, which 
has 8 physical cores but reports 16 cores through 
`Runtime.getRuntime.availableProcessors()` since that is the 'vCPU' count, 
illustrates the reason for this.
   
   ```
   Benchmark                                     (concurrentSequenceConsumers)  
(inputSequenceType)  (numSequences)  (rowsPerSequence)                          
(strategy)  Mode  Cnt     Score    Error  Units
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000  
combiningMergeSequence-same-thread  avgt   20  2975.944 ±  3.406  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000         
parallelism-1-10ms-256-1024  avgt   20  2903.598 ±  7.425  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000         
parallelism-4-10ms-256-1024  avgt   20   918.689 ±  7.172  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000         
parallelism-6-10ms-256-1024  avgt   20   549.228 ±  1.545  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000         
parallelism-8-10ms-256-1024  avgt   20   394.795 ±  1.110  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000        
parallelism-12-10ms-256-1024  avgt   20   707.393 ±  2.392  ms/op
   ParallelMergeCombiningSequenceBenchmark.exec                              1  
       non-blocking             128              75000        
parallelism-16-10ms-256-1024  avgt   20  2069.897 ± 35.939  ms/op
   ```
   
   This is of course because hyper-threads are not _real_ cores, so when 
actually fully utilizing a CPU to do work, counting them as such results in 
slower overall throughput. This is in line with the fine print on hyper-threads 
I think. 
   
   This assumption for the default value might be wrong, but it held true for 
both my laptop and AWS, so maybe isn't unreasonable. I updated the user facing 
documentation on these settings to include this assumption, so that if an 
operators hardware differs and has more or no hyper-threading the value should 
be adjusted accordingly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to