[PR] [WIP] Concurrent merging with join set algorithm [lucene]

via GitHub Sun, 21 Sep 2025 20:05:38 -0700


zhaih opened a new pull request, #15208:
URL: https://github.com/apache/lucene/pull/15208


   ### Description
   In #14331  @mayya-sharipova made a great change to introduce this smart 
algorithm of computing join set and then use it to accelerate the graph merge. 
This PR tries to make the above algorithm work with the concurrent graph merge.
   
   ### Progress
   I made the first version which just naively let each thread deal with one 
segment and then handle rest of the nodes like before and get a quite mixed and 
ineresting benchmark result. Basically the changed code only performs better in 
10M doc situation, which kind of making sense since I made it handle one 
segment per thread, so if there's not enough segments it won't be able to make 
use of all the threads.
   
   ### Benchmark
   #### cand
   ```
   recall  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.784        0.254   0.249        0.980   1000000   100      50       64    
    250         no     28.48      35108.66           98.94             1        
  435.66       381.470      381.470       HNSW
    0.761        0.266   0.261        0.981   2000000   100      50       64    
    250         no     61.42      32562.15          231.87             1        
  875.69       762.939      762.939       HNSW
    0.738        0.281   0.276        0.982   5000000   100      50       64    
    250         no    182.75      27359.33          624.20             1        
 2213.94      1907.349     1907.349       HNSW
    0.716        0.310   0.295        0.952  10000000   100      50       64    
    250         no    394.47      25350.47          782.60             1        
 4456.40      3814.697     3814.697       HNSW
   ```
   
   #### baseline
   ```
   recall  latency(ms)  netCPU  avgCpuCount      nDoc  topK  fanout  maxConn  
beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  
index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
    0.817        0.284   0.270        0.951   1000000   100      50       64    
    250         no     28.31      35318.22           66.91             1        
  444.55       381.470      381.470       HNSW
    0.794        0.303   0.288        0.950   2000000   100      50       64    
    250         no     63.60      31447.53          158.45             1        
  894.32       762.939      762.939       HNSW
    0.774        0.306   0.301        0.984   5000000   100      50       64    
    250         no    180.95      27632.40          253.59             1        
 2274.45      1907.349     1907.349       HNSW
    0.755        0.334   0.329        0.985  10000000   100      50       64    
    250         no    433.43      23071.94         1022.34             1        
 4591.37      3814.697     3814.697       HNSW
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [WIP] Concurrent merging with join set algorithm [lucene]

Reply via email to