mayya-sharipova edited a comment on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1058148842
I've benchmarked the results with ann-benchmarks on glove-100-angular (M:16, efConstruction:100) - baseline: main branch where we unset RAMBufferSizeMB, which defaults to **16Mb** with force merge at the end of indexing. - candidate: this PR, where RAMBufferSizeMB is set to default **16Mb** with force merge at the end of indexing. **Results** - Indexing baseline took 1099 secs, around **18mins** - Indexing candidate took 586 secs, around **10 mins** - search performance is the same. <details> <summary>Details on the search performance </summary> | | baseline recall | baseline QPS | candidate recall | candidate QPS | | ----------- | --------------: | -----------: | ---------------: | ------------: | | n_cands=10 | 0.486 | 3995.468 | 0.463 | 3636.417 | | n_cands=20 | 0.532 | 3261.435 | 0.529 | 3356.358 | | n_cands=40 | 0.608 | 2685.442 | 0.603 | 2494.603 | | n_cands=80 | 0.683 | 1874.002 | 0.682 | 1884.534 | | n_cands=120 | 0.723 | 1474.137 | 0.721 | 1445.883 | | n_cands=200 | 0.766 | 1048.531 | 0.766 | 1070.614 | | n_cands=400 | 0.819 | 554.110 | 0.819 | 639.026 | | n_cands=600 | 0.844 | 464.523 | 0.845 | 435.123 | | n_cands=800 | 0.861 | 355.228 | 0.862 | 329.773 | </details> <details> <summary>Candidate indexing details </summary> Indexing output ```txt IW 0 [2022-03-03T14:30:49.413950Z; main]: init: create=true reader=null ramBufferSizeMB=16.0 maxBufferedDocs=-1 IW 0 [2022-03-03T14:30:49.424202Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Done indexing 1183514 documents; now flush IW 0 [2022-03-03T14:30:50.824200Z; main]: now flush at close IW 0 [2022-03-03T14:30:50.824401Z; main]: start flush: applyAllDeletes=true IW 0 [2022-03-03T14:30:50.824515Z; main]: index before flush DW 0 [2022-03-03T14:30:50.824557Z; main]: startFullFlush DW 0 [2022-03-03T14:30:50.827209Z; main]: anyChanges? numDocsInRam=1183514 deletes=false hasTickets:false pendingChangesInFullFlush: false DWPT 0 [2022-03-03T14:30:50.831053Z; main]: flush postings as segment _0 numDocs=1183514 HNSW 0 [2022-03-03T14:30:52.334343Z; main]: build graph from 1183514 vectors ... HNSW 0 [2022-03-03T14:40:31.049504Z; main]: built 1180000 in 5585/578724 ms ... IW 0 [2022-03-03T14:40:33.492318Z; main]: 582671 msec to write vectors IFD 0 [2022-03-03T14:40:34.655718Z; main]: 20 msec to checkpoint Indexed 1183514 documents in 585s Force merge index in luceneknn-100-16-100.train-16-100.index IFD 1 [2022-03-03T14:40:34.671943Z; main]: 0 msec to checkpoint Built index in 586.944657087326 ``` **Files in the index** ```txt 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0.fdm 10080 -rw-r--r-- 1 mayyasharipova staff 4.6M 3 Mar 14:30 _0.fdt 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene90FieldsIndex-doc_ids_0.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene90FieldsIndexfile_pointers_1.tmp 929304 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec 924624 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec_temp_3.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vem 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vex 953168 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_knn_buffered_vectors_temp_2.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 write.lock ``` **Files after force merge** ```txt 8 -rw-r--r-- 1 mayyasharipova staff 297B 3 Mar 14:40 _0.cfe 1105112 -rw-r--r-- 1 mayyasharipova staff 538M 3 Mar 14:40 _0.cfs 8 -rw-r--r-- 1 mayyasharipova staff 376B 3 Mar 14:40 _0.si 8 -rw-r--r-- 1 mayyasharipova staff 154B 3 Mar 14:40 segments_1 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 write.lock ``` </details> <details> <summary>Baseline indexing details</summary> Indexing output ```txt Built index in 1099.0846738815308 ``` **Files after force merge** ```txt drwxr-xr-x 12 mayyasharipova staff 384B 3 Mar 15:14 . drwxr-xr-x 42 mayyasharipova staff 1.3K 3 Mar 15:14 .. -rw-r--r-- 1 mayyasharipova staff 201B 3 Mar 15:03 _w.fdm -rw-r--r-- 1 mayyasharipova staff 4.6M 3 Mar 15:03 _w.fdt -rw-r--r-- 1 mayyasharipova staff 3.5K 3 Mar 15:03 _w.fdx -rw-r--r-- 1 mayyasharipova staff 192B 3 Mar 15:14 _w.fnm -rw-r--r-- 1 mayyasharipova staff 532B 3 Mar 15:14 _w.si -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vec -rw-r--r-- 1 mayyasharipova staff 309K 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vem -rw-r--r-- 1 mayyasharipova staff 82M 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vex -rw-r--r-- 1 mayyasharipova staff 154B 3 Mar 15:14 segments_2 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:56 write.lock ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org