gsmiller commented on issue #15934:
URL: https://github.com/apache/lucene/issues/15934#issuecomment-4199711524
I forgot to mention that the above two benchmarks were on a MacBook Pro (M3
chip). For fun, I ran benchmarks on two Linux boxes as well (x86 and ARM).
Capturing benchmark results from all three machines here (some of this is
duplicate from previous comments, but wanted to capture it in a single place).
Leading with an AI-produced summary of results and dropping the raw jmh output
at the bottom for reference.
**Approaches**
Both approaches use Arrays#sort for sorting. The difference is only in the
partition step (mapping sorted docIDs to index segments):
- **Linear**: scan sorted doc IDs, advance leaf pointer on boundary crossings
- **BinarySearch**: iterate leaf boundaries, binary search for each in
sorted doc IDs (with O(1) empty-leaf skip and early termination)
**General Observations**
These observations generally hold true across all hardware.
- At 10K+ doc IDs, the sort dominates — partition strategy makes no/small
measurable difference.
- At smaller doc counts with few leaves, binary search is faster.
- At small doc counts with many leaves (50+), linear scan wins.
**Benchmark Hardware**
| | MacBook Pro | x86 Linux | ARM Linux
|
|----------------|-------------------|------------------------|----------------------|
| CPU | Apple M3 Pro | Xeon Platinum 8175M | Neoverse-N1
(Graviton2) |
| L1d cache | 64 KB | 32 KB | 64 KB
|
| L2 cache | 4 MB | 1 MB | 1 MB
|
**MacBook Pro Results**
| numDocIds | numLeaves | Linear (ops/ms) | BinarySearch (ops/ms) |
Difference |
|-----------|-----------|------------------|-----------------------|------------|
| 100 | 5 | 1,195 | 1,444 | BS +21%
✅ |
| 100 | 10 | 1,069 | 1,210 | BS +13%
✅ |
| 100 | 20 | 805 | 1,141 | BS +42%
✅ |
| 100 | 50 | 1,199 | 885 | Linear
+35% ❌ |
| 100 | 200 | 742 | 549 | Linear
+35% ❌ |
| 1,000 | 5 | 80 | 101 | BS +26%
✅ |
| 1,000 | 10 | 84 | 101 | BS +21%
✅ |
| 1,000 | 20 | 101 | 99 | ~tie
|
| 1,000 | 50 | 98 | 97 | ~tie
|
| 1,000 | 200 | 91 | 80 | Linear
+14% ❌ |
| 10,000 | 5 | 5.6 | 6.0 | ~tie
|
| 10,000 | 10 | 5.8 | 6.1 | ~tie
|
| 10,000 | 20 | 5.2 | 6.4 | ~tie
|
| 10,000 | 50 | 5.4 | 5.4 | ~tie
|
| 10,000 | 200 | 5.6 | 5.6 | ~tie
|
| 100,000 | 5 | 0.252 | 0.262 | ~tie
|
| 100,000 | 10 | 0.261 | 0.247 | ~tie
|
| 100,000 | 20 | 0.258 | 0.262 | ~tie
|
| 100,000 | 50 | 0.255 | 0.261 | ~tie
|
| 100,000 | 200 | 0.257 | 0.257 | ~tie
|
**x86 Linux Results**
| numDocIds | numLeaves | Linear (ops/ms) | BS (ops/ms) | Difference |
|-----------|-----------|------------------|-------------|------------|
| 100 | 5 | 1,452 | 1,845 | BS +27% ✅ |
| 100 | 10 | 1,454 | 1,412 | ~tie |
| 100 | 20 | 1,186 | 1,131 | ~tie |
| 100 | 50 | 845 | 758 | Linear +12% ❌ |
| 100 | 200 | 493 | 375 | Linear +31% ❌ |
| 1,000 | 5 | 140 | 169 | BS +21% ✅ |
| 1,000 | 10 | 127 | 166 | BS +31% ✅ |
| 1,000 | 20 | 132 | 154 | BS +17% ✅ |
| 1,000 | 50 | 128 | 150 | BS +17% ✅ |
| 1,000 | 200 | 112 | 94 | Linear +20% ❌ |
| 10,000 | 5 | 10.6 | 11.8 | BS +11% ✅ |
| 10,000 | 10 | 11.0 | 12.0 | BS +9% ✅ |
| 10,000 | 20 | 11.1 | 11.7 | BS +6% ✅ |
| 10,000 | 50 | 11.4 | 11.8 | ~tie |
| 10,000 | 200 | 10.0 | 10.7 | BS +7% ✅ |
| 100,000 | 5 | 0.781 | 0.767 | ~tie |
| 100,000 | 10 | 0.775 | 0.824 | BS +6% ✅ |
| 100,000 | 20 | 0.780 | 0.833 | BS +7% ✅ |
| 100,000 | 50 | 0.791 | 0.845 | BS +7% ✅ |
| 100,000 | 200 | 0.778 | 0.810 | BS +4% ✅ |
**ARM Linux Results**
| numDocIds | numLeaves | Linear (ops/ms) | BS (ops/ms) | Difference |
|-----------|-----------|------------------|-------------|------------|
| 100 | 5 | 512 | 571 | BS +12% ✅ |
| 100 | 10 | 497 | 492 | ~tie |
| 100 | 20 | 448 | 415 | Linear +8% ❌ |
| 100 | 50 | 399 | 296 | Linear +35% ❌ |
| 100 | 200 | 266 | 208 | Linear +28% ❌ |
| 1,000 | 5 | 34 | 38 | BS +12% ✅ |
| 1,000 | 10 | 34 | 36 | BS +5% ✅ |
| 1,000 | 20 | 31 | 37 | BS +19% ✅ |
| 1,000 | 50 | 33 | 34 | ~tie |
| 1,000 | 200 | 32 | 27 | Linear +19% ❌ |
| 10,000 | | ~1.65 | ~1.69 | ~tie |
| 100,000 | | ~0.132 | ~0.131 | ~tie |
**Full Benchmark Output**
*MacBook Pro*
```
Benchmark (numDocIds)
(numLeaves) Mode Cnt Score Error Units
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
5 thrpt 15 1444.092 ± 79.023 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
10 thrpt 15 1209.812 ± 145.064 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
20 thrpt 15 1141.308 ± 78.308 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
50 thrpt 15 885.316 ± 63.670 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
200 thrpt 15 549.317 ± 31.922 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
5 thrpt 15 101.334 ± 6.847 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
10 thrpt 15 101.028 ± 2.527 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
20 thrpt 15 99.230 ± 7.228 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
50 thrpt 15 97.103 ± 7.316 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
200 thrpt 15 79.708 ± 2.740 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
5 thrpt 15 5.995 ± 0.387 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
10 thrpt 15 6.134 ± 0.801 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
20 thrpt 15 6.402 ± 0.561 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
50 thrpt 15 5.425 ± 0.267 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
200 thrpt 15 5.632 ± 0.548 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
5 thrpt 15 0.262 ± 0.005 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
10 thrpt 15 0.247 ± 0.008 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
20 thrpt 15 0.262 ± 0.008 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
50 thrpt 15 0.261 ± 0.005 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
200 thrpt 15 0.257 ± 0.008 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
5 thrpt 15 1194.994 ± 78.870 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
10 thrpt 15 1068.638 ± 52.030 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
20 thrpt 15 805.194 ± 42.816 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
50 thrpt 15 1198.694 ± 104.503 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
200 thrpt 15 741.903 ± 40.591 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
5 thrpt 15 80.241 ± 3.126 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
10 thrpt 15 83.524 ± 5.633 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
20 thrpt 15 101.379 ± 1.832 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
50 thrpt 15 98.004 ± 4.570 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
200 thrpt 15 91.082 ± 3.440 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
5 thrpt 15 5.556 ± 0.474 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
10 thrpt 15 5.763 ± 0.511 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
20 thrpt 15 5.240 ± 0.415 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
50 thrpt 15 5.410 ± 0.390 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
200 thrpt 15 5.632 ± 0.530 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
5 thrpt 15 0.252 ± 0.005 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
10 thrpt 15 0.261 ± 0.004 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
20 thrpt 15 0.258 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
50 thrpt 15 0.255 ± 0.006 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
200 thrpt 15 0.257 ± 0.010 ops/ms
```
*x86 Linux*
```
Benchmark (numDocIds)
(numLeaves) Mode Cnt Score Error Units
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
5 thrpt 15 1844.918 ± 176.890 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
10 thrpt 15 1412.154 ± 324.330 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
20 thrpt 15 1131.319 ± 77.487 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
50 thrpt 15 757.905 ± 96.660 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
200 thrpt 15 375.193 ± 2.563 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
5 thrpt 15 169.033 ± 6.231 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
10 thrpt 15 166.043 ± 4.689 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
20 thrpt 15 153.623 ± 21.208 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
50 thrpt 15 149.702 ± 6.099 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
200 thrpt 15 93.763 ± 1.001 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
5 thrpt 15 11.756 ± 0.062 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
10 thrpt 15 12.018 ± 0.295 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
20 thrpt 15 11.703 ± 0.246 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
50 thrpt 15 11.843 ± 0.394 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
200 thrpt 15 10.685 ± 0.180 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
5 thrpt 15 0.767 ± 0.101 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
10 thrpt 15 0.824 ± 0.020 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
20 thrpt 15 0.833 ± 0.018 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
50 thrpt 15 0.845 ± 0.029 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
200 thrpt 15 0.810 ± 0.085 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
5 thrpt 15 1451.994 ± 197.622 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
10 thrpt 15 1453.638 ± 227.219 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
20 thrpt 15 1185.798 ± 223.198 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
50 thrpt 15 845.484 ± 29.405 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
200 thrpt 15 492.919 ± 15.587 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
5 thrpt 15 140.073 ± 5.135 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
10 thrpt 15 127.405 ± 5.352 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
20 thrpt 15 132.117 ± 9.990 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
50 thrpt 15 128.341 ± 3.279 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
200 thrpt 15 112.499 ± 0.516 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
5 thrpt 15 10.605 ± 0.483 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
10 thrpt 15 10.997 ± 0.491 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
20 thrpt 15 11.051 ± 0.469 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
50 thrpt 15 11.441 ± 0.592 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
200 thrpt 15 9.976 ± 1.625 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
5 thrpt 15 0.781 ± 0.020 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
10 thrpt 15 0.775 ± 0.022 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
20 thrpt 15 0.780 ± 0.022 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
50 thrpt 15 0.791 ± 0.025 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
200 thrpt 15 0.778 ± 0.023 ops/ms
```
*ARM Linux*
```
Benchmark (numDocIds)
(numLeaves) Mode Cnt Score Error Units
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
5 thrpt 15 570.680 ± 106.464 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
10 thrpt 15 491.883 ± 58.971 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
20 thrpt 15 414.814 ± 28.513 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
50 thrpt 15 296.198 ± 10.976 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100
200 thrpt 15 207.925 ± 17.364 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
5 thrpt 15 38.223 ± 4.308 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
10 thrpt 15 35.501 ± 2.138 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
20 thrpt 15 37.333 ± 4.049 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
50 thrpt 15 33.573 ± 0.910 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 1000
200 thrpt 15 26.707 ± 0.242 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
5 thrpt 15 1.693 ± 0.025 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
10 thrpt 15 1.701 ± 0.017 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
20 thrpt 15 1.687 ± 0.016 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
50 thrpt 15 1.683 ± 0.027 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 10000
200 thrpt 15 1.673 ± 0.027 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
5 thrpt 15 0.131 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
10 thrpt 15 0.131 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
20 thrpt 15 0.133 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
50 thrpt 15 0.132 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortBinarySearchPartition 100000
200 thrpt 15 0.130 ± 0.003 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
5 thrpt 15 511.695 ± 70.070 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
10 thrpt 15 496.877 ± 29.344 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
20 thrpt 15 447.930 ± 96.207 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
50 thrpt 15 398.911 ± 8.159 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100
200 thrpt 15 266.050 ± 23.467 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
5 thrpt 15 33.924 ± 3.368 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
10 thrpt 15 33.691 ± 1.337 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
20 thrpt 15 31.348 ± 1.457 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
50 thrpt 15 32.600 ± 0.374 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 1000
200 thrpt 15 31.894 ± 2.135 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
5 thrpt 15 1.637 ± 0.014 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
10 thrpt 15 1.667 ± 0.019 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
20 thrpt 15 1.670 ± 0.023 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
50 thrpt 15 1.657 ± 0.020 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 10000
200 thrpt 15 1.641 ± 0.007 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
5 thrpt 15 0.133 ± 0.001 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
10 thrpt 15 0.130 ± 0.001 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
20 thrpt 15 0.133 ± 0.002 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
50 thrpt 15 0.131 ± 0.001 ops/ms
PartitionByLeafBenchmark.arraysSortOnly 100000
200 thrpt 15 0.132 ± 0.002 ops/ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]