mayya-sharipova edited a comment on pull request #728: URL: https://github.com/apache/lucene/pull/728#issuecomment-1058148842
I've benchmarked the results with ann-benchmarks on glove-100-angular (M:16, efConstruction:100) - baseline: main branch where we unset RAMBufferSizeMB, which defaults to **16Mb** with segments force merged to 1. - candidate: this PR, where RAMBufferSizeMB similarly is set to **16Mb**, also force merge at the end. **Indexing** - baseline took 1099 secs, around **18mins** - candidate took 586 secs, around **10 mins** - search performance is the same. 2022-03-03T15:01:49.958373Z; main IW 1 [2022-03-03T15:14:33.924666Z; main] <details> <summary>Details on the search performance </summary> </details> <details> <summary>Details on the candidate </summary> Indexing output ```txt IW 0 [2022-03-03T14:30:49.413950Z; main]: init: create=true reader=null ramBufferSizeMB=16.0 maxBufferedDocs=-1 IW 0 [2022-03-03T14:30:49.424202Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Done indexing 1183514 documents; now flush IW 0 [2022-03-03T14:30:50.824200Z; main]: now flush at close IW 0 [2022-03-03T14:30:50.824401Z; main]: start flush: applyAllDeletes=true IW 0 [2022-03-03T14:30:50.824515Z; main]: index before flush DW 0 [2022-03-03T14:30:50.824557Z; main]: startFullFlush DW 0 [2022-03-03T14:30:50.827209Z; main]: anyChanges? numDocsInRam=1183514 deletes=false hasTickets:false pendingChangesInFullFlush: false DWPT 0 [2022-03-03T14:30:50.831053Z; main]: flush postings as segment _0 numDocs=1183514 HNSW 0 [2022-03-03T14:30:52.334343Z; main]: build graph from 1183514 vectors ... HNSW 0 [2022-03-03T14:40:31.049504Z; main]: built 1180000 in 5585/578724 ms ... IW 0 [2022-03-03T14:40:33.492318Z; main]: 582671 msec to write vectors IFD 0 [2022-03-03T14:40:34.655718Z; main]: 20 msec to checkpoint Indexed 1183514 documents in 585s Force merge index in luceneknn-100-16-100.train-16-100.index IFD 1 [2022-03-03T14:40:34.671943Z; main]: 0 msec to checkpoint Built index in 586.944657087326 ``` **Files in the index** ```txt 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0.fdm 10080 -rw-r--r-- 1 mayyasharipova staff 4.6M 3 Mar 14:30 _0.fdt 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene90FieldsIndex-doc_ids_0.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene90FieldsIndexfile_pointers_1.tmp 929304 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec 924624 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vec_temp_3.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vem 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 _0_Lucene91HnswVectorsFormat_0.vex 953168 -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 14:30 _0_knn_buffered_vectors_temp_2.tmp 0 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:30 write.lock ``` </details> <details> <summary>Details on the baseline </summary> Indexing output ```txt Built index in 1099.0846738815308 ``` **Files in the index** ```txt drwxr-xr-x 12 mayyasharipova staff 384B 3 Mar 15:14 . drwxr-xr-x 42 mayyasharipova staff 1.3K 3 Mar 15:14 .. -rw-r--r-- 1 mayyasharipova staff 201B 3 Mar 15:03 _w.fdm -rw-r--r-- 1 mayyasharipova staff 4.6M 3 Mar 15:03 _w.fdt -rw-r--r-- 1 mayyasharipova staff 3.5K 3 Mar 15:03 _w.fdx -rw-r--r-- 1 mayyasharipova staff 192B 3 Mar 15:14 _w.fnm -rw-r--r-- 1 mayyasharipova staff 532B 3 Mar 15:14 _w.si -rw-r--r-- 1 mayyasharipova staff 451M 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vec -rw-r--r-- 1 mayyasharipova staff 309K 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vem -rw-r--r-- 1 mayyasharipova staff 82M 3 Mar 15:14 _w_Lucene91HnswVectorsFormat_0.vex -rw-r--r-- 1 mayyasharipova staff 154B 3 Mar 15:14 segments_2 -rw-r--r-- 1 mayyasharipova staff 0B 3 Mar 14:56 write.lock ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org