gsmiller commented on issue #15905:
URL: https://github.com/apache/lucene/issues/15905#issuecomment-4184071194

   Hmm... actually option 3 might be worse than I thought. I put together a 
micro-benchmark (#15933) and keeping track of the parallel ordinal array adds 
quite a bit of overhead. Here's the raw output of the benchmark:
   ```
   Benchmark                                       (numDocIds)  (numLeaves)   
Mode  Cnt     Score    Error   Units
   PartitionByLeafBenchmark.arraysSortOnly                 100            5  
thrpt   15  1213.898 ± 23.702  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                 100           50  
thrpt   15  1146.033 ± 30.084  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                 100          200  
thrpt   15   849.219 ± 10.757  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                1000            5  
thrpt   15    84.469 ±  0.637  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                1000           50  
thrpt   15    65.685 ±  3.071  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly                1000          200  
thrpt   15    88.932 ±  5.194  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly               10000            5  
thrpt   15     5.031 ±  0.394  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly               10000           50  
thrpt   15     5.562 ±  0.467  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly               10000          200  
thrpt   15     5.691 ±  0.560  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly              100000            5  
thrpt   15     0.265 ±  0.008  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly              100000           50  
thrpt   15     0.265 ±  0.007  ops/ms
   PartitionByLeafBenchmark.arraysSortOnly              100000          200  
thrpt   15     0.265 ±  0.007  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals          100            5  
thrpt   15   911.904 ± 31.288  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals          100           50  
thrpt   15   866.220 ± 22.404  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals          100          200  
thrpt   15   690.300 ± 17.029  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals         1000            5  
thrpt   15    63.449 ±  3.054  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals         1000           50  
thrpt   15    64.223 ±  2.953  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals         1000          200  
thrpt   15    69.654 ±  0.704  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals        10000            5  
thrpt   15     3.443 ±  0.234  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals        10000           50  
thrpt   15     3.471 ±  0.283  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals        10000          200  
thrpt   15     3.200 ±  0.322  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals       100000            5  
thrpt   15     0.174 ±  0.006  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals       100000           50  
thrpt   15     0.174 ±  0.005  ops/ms
   PartitionByLeafBenchmark.introSortWithOrdinals       100000          200  
thrpt   15     0.166 ±  0.011  ops/ms
   ```
   
   And a summary of the results:
   | numDocIds | numLeaves | Arrays.sort (ops/ms) | IntroSorter+ordinals 
(ops/ms) | Overhead |
   
|-----------|-----------|---------------------|-------------------------------|----------|
   | 100       | 5         | 1214                | 912                          
 | ~25%     |
   | 100       | 50        | 1146                | 866                          
 | ~24%     |
   | 100       | 200       | 849                 | 690                          
 | ~19%     |
   | 1,000     | 5         | 84                  | 63                           
 | ~25%     |
   | 1,000     | 50        | 66                  | 64                           
 | ~3%      |
   | 1,000     | 200       | 89                  | 70                           
 | ~22%     |
   | 10,000    | 5         | 5.0                 | 3.4                          
 | ~32%     |
   | 10,000    | 50        | 5.6                 | 3.5                          
 | ~38%     |
   | 10,000    | 200       | 5.7                 | 3.2                          
 | ~44%     |
   | 100,000   | 5         | 0.265               | 0.174                        
 | ~34%     |
   | 100,000   | 50        | 0.265               | 0.174                        
 | ~34%     |
   | 100,000   | 200       | 0.265               | 0.166                        
 | ~37%     |
   
   That said... I wonder how much we'd care about this in practice? This 
operation would probably be done once in the handling of a given search, and I 
suspect the difference in performance here would really show up in a meaningful 
way relative to the other work being done. But maybe I'm wrong? I dunno. WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to