neoremind commented on PR #16145:
URL: https://github.com/apache/lucene/pull/16145#issuecomment-4594402925

   I spent some time spinning up a [JMH 
benchmark](https://github.com/apache/lucene/compare/main...neoremind:lucene:verify_16145?expand=1)
 to simulate and vet.
   
   The setup is an 8G file on my EC2 m5.2xlarge (8 vCPU, 16G mem, io2 EBS with 
~338us 4K random read latency direct io bypass page cache verified with `fio`); 
openjdk "25.0.2" 2026-01-20.
   
   The idea is: each read (4k/8k/16k) randomly picks between a small hot region 
(16MB, stays cached) and a random offset across the full 8G file. I assume this 
could roughly mimics HNSW scenario, some nodes are warm while others are deep 
in the graph and cold. Kernal page cache is cleared before each iteration. Test 
run for 6s for each iter, not enough time to fully warm page cache.
   
   ### Benchmark results (focus on 4k read)
   
   **With this PR, compare all hint modes:**
   
   | coldReadPct | noPrefetch | Normal | Sequential | **Random** |
   |---|---|---|---|---|
   | 10% | 81 us | 82 us | 86 us | **40 us** |
   | 50% | 400 us | 334 us | 328 us | **188 us** |
   | 90% | 734 us | 349 us | 345 us | **348 us** |
   
   **With PR vs. without PR comparison:**
   
   | coldReadPct | Without PR | With PR | Speed up |
   |---|---|---|---|
   | 10% | 66 us | 40 us | **1.7x faster** |
   | 50% | 270 us | 188 us | **1.4x faster** |
   | 90% | 361 us | 348 us | almost same |
   
   **Fully warm page cache (no page cache clear before each iter), with PR 
applied:**
   
   | coldReadPct | noPrefetch / Normal / Sequential | Random |
   |---|---|---|
   | 10% | ~0.30 us | 1.37 us |
   | 50% | ~0.47 us | 1.54 us |
   | 90% | ~0.61 us | 1.64 us |
   
   ### Key findings
   
   1. The change does help random access pattern. At 10%/50% cold reads, it is 
~1.7x/~1.4x faster than without the PR (40us/188us vs. 66us/270us). It is 
because the backoff logic using shared counter skips `madvise` when warm hits 
are often at certain time period, making prefetch a no-op. What's worth noting 
is, even without this PR, the `RANDOM` hint already helps compared to 
noPrefetch and `NORMAL`, the existing backoff does let some `madvise` calls 
through by chance, this PR makes `RANDOM` access faster with consistent 
prefetch.
   
   2. At 90% cold, no improvement, because cold reads keep resetting the shared 
counter to 0, so the backoff never kicks in. The problem only stands when warm 
hits push the shared counter away before a cold read arrives.
   
   3. In page cache fully warm case, there is indeed overhead, just ~1.1us per 
read call. This aligns with what @mikemccand points out `isLoaded()` is 
somewhat costly. But this is the tradeoff, as long as there are any cold pages, 
the savings of prefetching pages outweigh this 1.1us overhead, and the net win 
is bigger with more cold reads. As @mikemccand, @jimczi point out, if we remove 
`isLoaded()`, we also have to verify the overhead of `isLoaded()` probe vs. 
always `madvise`?
   
   Note that this is a microbenchmark, I think it would be more sound to vet 
with a real-world workload like in HNSW scenario, but the direction is positive.
   
   JMH result details
   <details>
   <summary>candidate benchmark (clear page cache - mimic cold page 
access)</summary>
   
   ```
   Benchmark                                     (coldReadPct)                  
 (dataDir)              (fileName)  (fileSizeGB)  (hotRegionMB)  
(prefetchLength)  Mode  Cnt    Score      Error  Units
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3   80.835 ±   70.514  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3   80.708 ±  119.700  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3   82.326 ±   80.527  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  399.633 ±  419.646  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  408.596 ±  326.960  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  405.251 ±  474.788  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  734.384 ± 1010.659  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  734.215 ±  932.702  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  739.150 ±  701.159  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3   82.132 ±  121.437  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3   85.855 ±  107.871  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3   85.510 ±   73.271  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  334.346 ±  131.156  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  347.240 ±  478.895  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  373.239 ±  842.645  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  348.674 ±  311.629  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  382.573 ±  104.376  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  423.491 ±  202.213  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3   39.944 ±   15.283  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3   43.664 ±   20.228  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3   50.725 ±    7.693  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  188.247 ±   55.046  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  207.542 ±  143.460  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  236.605 ±  120.293  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  348.257 ±  206.218  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  368.074 ±  325.322  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  431.100 ±  241.599  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3   86.163 ±   74.133  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3   84.713 ±   85.901  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3   84.914 ±  105.107  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  327.904 ±  134.254  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  338.804 ±  127.809  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  354.642 ±   39.018  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  345.348 ±  148.671  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  370.560 ±  364.124  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  429.385 ±  311.389  us/op
   ```
   </details>
   
   <details>
   <summary>candidate benchmark (all warm pages in page cache)</summary>
   
   `cat file > /dev/null` before benchmark, but disable page cache clearing.
   
   ```
   Benchmark                                     (coldReadPct)                  
 (dataDir)              (fileName)  (fileSizeGB)  (hotRegionMB)  
(prefetchLength)  Mode  Cnt  Score   Error  Units
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.286 ± 0.098  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.463 ± 0.040  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  0.832 ± 0.055  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.472 ± 0.076  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.733 ± 0.119  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.286 ± 0.049  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.600 ± 0.106  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.948 ± 0.100  us/op
   PrefetchBenchmark.noPrefetchReadRandom                   90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.654 ± 0.221  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.296 ± 0.028  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.468 ± 0.007  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  0.851 ± 0.043  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.471 ± 0.097  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.758 ± 0.061  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.325 ± 0.219  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.612 ± 0.063  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.951 ± 0.014  us/op
   PrefetchBenchmark.prefetchNormalThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.643 ± 0.033  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  1.366 ± 0.512  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  1.560 ± 0.299  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.967 ± 0.644  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  1.541 ± 0.040  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  1.815 ± 0.177  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  2.424 ± 0.078  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  1.641 ± 0.056  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  1.986 ± 0.107  us/op
   PrefetchBenchmark.prefetchRandomThenRead                 90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  2.699 ± 0.150  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.295 ± 0.015  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.490 ± 0.097  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  0.839 ± 0.038  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.484 ± 0.077  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.746 ± 0.021  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.286 ± 0.043  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  0.608 ± 0.081  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  0.945 ± 0.005  us/op
   PrefetchBenchmark.prefetchSequentialThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  1.654 ± 0.252  us/op
   ```
   </details>
   
   <details>
   <summary>baseline benchmark</summary>
   
   Without this PR changes, only test `prefetchRandomThenRead` to do 
apple-to-apple comparison.
   ```
   Benchmark                                 (coldReadPct)                   
(dataDir)              (fileName)  (fileSizeGB)  (hotRegionMB)  
(prefetchLength)  Mode  Cnt    Score      Error  Units
   PrefetchBenchmark.prefetchRandomThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3   66.179 ±   38.100  us/op
   PrefetchBenchmark.prefetchRandomThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3   98.342 ±   38.340  us/op
   PrefetchBenchmark.prefetchRandomThenRead             10  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  164.219 ±   14.844  us/op
   PrefetchBenchmark.prefetchRandomThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  270.401 ±  138.404  us/op
   PrefetchBenchmark.prefetchRandomThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  368.736 ±  564.566  us/op
   PrefetchBenchmark.prefetchRandomThenRead             50  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  643.159 ± 1996.410  us/op
   PrefetchBenchmark.prefetchRandomThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             4096  avgt    3  360.715 ±  227.521  us/op
   PrefetchBenchmark.prefetchRandomThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
             8192  avgt    3  394.207 ±  381.675  us/op
   PrefetchBenchmark.prefetchRandomThenRead             90  
/home/ec2-user/environment  prefetch_bench_data_8G             8             16 
            16384  avgt    3  446.483 ±  270.331  us/op
   ```
   </details>
   
   <details>
   <summary>fio test</summary>
   
   Average: 338 µs per 4K random read
   P50: 310 µs
   P90: 383 µs
   P99: 840 µs
   ```
   $ sudo fio --name=randread --ioengine=libaio --direct=1 --bs=4k \
   >   --iodepth=1 --rw=randread --size=1G --runtime=10 --time_based \
   >   --filename=/home/ec2-user/environment/prefetch_bench_data_8G
   randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1
   fio-3.32
   Starting 1 process
   Jobs: 1 (f=1): [r(1)][100.0%][r=11.4MiB/s][r=2930 IOPS][eta 00m:00s]
   randread: (groupid=0, jobs=1): err= 0: pid=65728: Mon Jun  1 10:04:31 2026
     read: IOPS=2954, BW=11.5MiB/s (12.1MB/s)(115MiB/10001msec)
       slat (nsec): min=2224, max=25262, avg=2487.67, stdev=450.97
       clat (usec): min=215, max=5466, avg=335.40, stdev=133.15
        lat (usec): min=217, max=5469, avg=337.88, stdev=133.15
       clat percentiles (usec):
        |  1.00th=[  265],  5.00th=[  277], 10.00th=[  281], 20.00th=[  293],
        | 30.00th=[  297], 40.00th=[  306], 50.00th=[  310], 60.00th=[  318],
        | 70.00th=[  326], 80.00th=[  343], 90.00th=[  383], 95.00th=[  457],
        | 99.00th=[  840], 99.50th=[ 1074], 99.90th=[ 2114], 99.95th=[ 2474],
        | 99.99th=[ 4752]
      bw (  KiB/s): min=11248, max=12104, per=100.00%, avg=11829.89, 
stdev=237.18, samples=19
      iops        : min= 2812, max= 3026, avg=2957.47, stdev=59.30, samples=19
     lat (usec)   : 250=0.09%, 500=96.27%, 750=2.25%, 1000=0.76%
     lat (msec)   : 2=0.52%, 4=0.09%, 10=0.02%
     cpu          : usr=0.45%, sys=1.48%, ctx=29548, majf=0, minf=12
     IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
        submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
        issued rwts: total=29548,0,0,0 short=0,0,0,0 dropped=0,0,0,0
        latency   : target=0, window=0, percentile=100.00%, depth=1
   
   Run status group 0 (all jobs):
      READ: bw=11.5MiB/s (12.1MB/s), 11.5MiB/s-11.5MiB/s (12.1MB/s-12.1MB/s), 
io=115MiB (121MB), run=10001-10001msec
   
   Disk stats (read/write):
     nvme0n1: ios=29257/106, merge=0/37, ticks=9714/54, in_queue=9769, 
util=96.35%
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to