mayya-sharipova edited a comment on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1058148842


   I've benchmarked the results with ann-benchmarks on glove-100-angular (M:16, 
 efConstruction:100)
   
   - candidate: this PR, where RAMBufferSizeMB similarly is set to **16Mb**, 
also force merge at the end.
   - baseline: main branch where we unset RAMBufferSizeMB, which defaults to 
**16Mb** with force merge at the end.
   
   **Results**
   - Indexing candidate took 586 secs, around **10 mins**
   - Indexing baseline took 1099 secs, around **18mins**
   - search performance is the same.
   
   
   <details>
    <summary>Details on the search performance </summary>
   
   |             | baseline recall | baseline QPS | candidate recall | 
candidate QPS |
   | ----------- | --------------: | -----------: | ---------------: | 
------------: |
   | n_cands=10  |           0.486 |     3995.468 |            0.463 |      
3636.417 |
   | n_cands=20  |           0.532 |     3261.435 |            0.529 |      
3356.358 |
   | n_cands=40  |           0.608 |     2685.442 |            0.603 |      
2494.603 |
   | n_cands=80  |           0.683 |     1874.002 |            0.682 |      
1884.534 |
   | n_cands=120 |           0.723 |     1474.137 |            0.721 |      
1445.883 |
   | n_cands=200 |           0.766 |     1048.531 |            0.766 |      
1070.614 |
   | n_cands=400 |           0.819 |      554.110 |            0.819 |       
639.026 |
   | n_cands=600 |           0.844 |      464.523 |            0.845 |       
435.123 |
   | n_cands=800 |           0.861 |      355.228 |            0.862 |       
329.773 |
   
   </details>
   
   <details>
    <summary>Candidate indexing details </summary>
   
   Indexing output
   
    ```txt
   IW 0 [2022-03-03T14:30:49.413950Z; main]: init: create=true reader=null
      ramBufferSizeMB=16.0
       maxBufferedDocs=-1
   IW 0 [2022-03-03T14:30:49.424202Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1183514 documents; now flush
   IW 0 [2022-03-03T14:30:50.824200Z; main]: now flush at close
   IW 0 [2022-03-03T14:30:50.824401Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-03-03T14:30:50.824515Z; main]:   index before flush
   DW 0 [2022-03-03T14:30:50.824557Z; main]: startFullFlush
   DW 0 [2022-03-03T14:30:50.827209Z; main]: anyChanges? numDocsInRam=1183514 
deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-03-03T14:30:50.831053Z; main]: flush postings as segment _0 
numDocs=1183514
   HNSW 0 [2022-03-03T14:30:52.334343Z; main]: build graph from 1183514 vectors
   ...
   HNSW 0 [2022-03-03T14:40:31.049504Z; main]: built 1180000 in 5585/578724 ms
   ...
   IW 0 [2022-03-03T14:40:33.492318Z; main]: 582671 msec to write vectors
   IFD 0 [2022-03-03T14:40:34.655718Z; main]: 20 msec to checkpoint
   Indexed 1183514 documents in 585s
   Force merge index in luceneknn-100-16-100.train-16-100.index
   IFD 1 [2022-03-03T14:40:34.671943Z; main]: 0 msec to checkpoint
   Built index in 586.944657087326
   ```
   
   **Files in the index**
   
   ```txt
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 _0.fdm
    10080 -rw-r--r--  1 mayyasharipova  staff   4.6M  3 Mar 14:30 _0.fdt
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 
_0_Lucene90FieldsIndex-doc_ids_0.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 
_0_Lucene90FieldsIndexfile_pointers_1.tmp
   929304 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 
_0_Lucene91HnswVectorsFormat_0.vec
   924624 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 
_0_Lucene91HnswVectorsFormat_0.vec_temp_3.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 
_0_Lucene91HnswVectorsFormat_0.vem
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 
_0_Lucene91HnswVectorsFormat_0.vex
   953168 -rw-r--r--  1 mayyasharipova  staff   451M  3 Mar 14:30 
_0_knn_buffered_vectors_temp_2.tmp
        0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 write.lock
   ```
   
   **Files after force merge**
   ```txt
         8 -rw-r--r--  1 mayyasharipova  staff   297B  3 Mar 14:40 _0.cfe
   1105112 -rw-r--r--  1 mayyasharipova  staff   538M  3 Mar 14:40 _0.cfs
         8 -rw-r--r--  1 mayyasharipova  staff   376B  3 Mar 14:40 _0.si
         8 -rw-r--r--  1 mayyasharipova  staff   154B  3 Mar 14:40 segments_1
         0 -rw-r--r--  1 mayyasharipova  staff     0B  3 Mar 14:30 write.lock
   ```
   
   
   </details>
   
   
   <details>
    <summary>Baseline indexing details</summary>
   
   Indexing output
   
    ```txt
   Built index in 1099.0846738815308
   ```
   
   **Files after force merge**
   
   ```txt
   drwxr-xr-x  12 mayyasharipova  staff   384B  3 Mar 15:14 .
   drwxr-xr-x  42 mayyasharipova  staff   1.3K  3 Mar 15:14 ..
   -rw-r--r--   1 mayyasharipova  staff   201B  3 Mar 15:03 _w.fdm
   -rw-r--r--   1 mayyasharipova  staff   4.6M  3 Mar 15:03 _w.fdt
   -rw-r--r--   1 mayyasharipova  staff   3.5K  3 Mar 15:03 _w.fdx
   -rw-r--r--   1 mayyasharipova  staff   192B  3 Mar 15:14 _w.fnm
   -rw-r--r--   1 mayyasharipova  staff   532B  3 Mar 15:14 _w.si
   -rw-r--r--   1 mayyasharipova  staff   451M  3 Mar 15:14 
_w_Lucene91HnswVectorsFormat_0.vec
   -rw-r--r--   1 mayyasharipova  staff   309K  3 Mar 15:14 
_w_Lucene91HnswVectorsFormat_0.vem
   -rw-r--r--   1 mayyasharipova  staff    82M  3 Mar 15:14 
_w_Lucene91HnswVectorsFormat_0.vex
   -rw-r--r--   1 mayyasharipova  staff   154B  3 Mar 15:14 segments_2
   -rw-r--r--   1 mayyasharipova  staff     0B  3 Mar 14:56 write.lock
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to