mayya-sharipova commented on PR #992: URL: https://github.com/apache/lucene/pull/992#issuecomment-1178060346
@jtibshirani Thanks for another set of comments, I will work on addressing them. ---- Meanwhile, I have run another set of benchmarks on a different dataset sift-128-euclidean M:16 efConstruction:100. And similar results were observed here: - the whole indexing + flush approximately the same (533s sec in baseline VS 538s in candidate) - baseline: indexing is fast, but flush takes 532 sec - candidate: indexing takes most time, and flush is very fast - 1.8 sec ### Baseline (main branch): ```bash IW 0 [2022-07-07T18:27:08.982483Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Done indexing 1000000 documents; now flush IW 0 [2022-07-07T18:27:09.935570Z; main]: now flush at close IW 0 [2022-07-07T18:27:09.936155Z; main]: start flush: applyAllDeletes=true IW 0 [2022-07-07T18:27:09.936850Z; main]: index before flush DW 0 [2022-07-07T18:27:09.936917Z; main]: startFullFlush DW 0 [2022-07-07T18:27:09.941606Z; main]: anyChanges? numDocsInRam=1000000 deletes=false hasTickets:false pendingChangesInFullFlush: false DWPT 0 [2022-07-07T18:27:09.951278Z; main]: flush postings as segment _1 numDocs=1000000 IW 0 [2022-07-07T18:27:09.952530Z; main]: 0 msec to write norms IW 0 [2022-07-07T18:27:09.952902Z; main]: 0 msec to write docValues IW 0 [2022-07-07T18:27:09.953073Z; main]: 0 msec to write points HNSW 0 [2022-07-07T18:27:11.094024Z; main]: build graph from 1000000 vectors HNSW 0 [2022-07-07T18:35:55.150931Z; main]: built 990000 in 6450/524148 ms IW 0 [2022-07-07T18:36:01.320864Z; main]: 531459 msec to write vectors IW 0 [2022-07-07T18:36:01.336914Z; main]: 15 msec to finish stored fields IW 0 [2022-07-07T18:36:01.337204Z; main]: 0 msec to write postings and finish vectors IW 0 [2022-07-07T18:36:01.337924Z; main]: 0 msec to write fieldInfos DWPT 0 [2022-07-07T18:36:02.197589Z; main]: flush time 532338.523458 msec Indexed 1000000 documents in 533s ``` ### Candidate (this PR with the changes so far): ```bash IW 0 [2022-07-07T17:44:01.642762Z; main]: MMapDirectory.UNMAP_SUPPORTED=true Done indexing 1000000 documents; now flush IW 0 [2022-07-07T17:52:58.049830Z; main]: now flush at close IW 0 [2022-07-07T17:52:58.050277Z; main]: start flush: applyAllDeletes=true IW 0 [2022-07-07T17:52:58.050726Z; main]: index before flush DW 0 [2022-07-07T17:52:58.050776Z; main]: startFullFlush DW 0 [2022-07-07T17:52:58.056958Z; main]: anyChanges? numDocsInRam=1000000 deletes=false hasTickets:false pendingChangesInFullFlush: false DWPT 0 [2022-07-07T17:52:58.066937Z; main]: flush postings as segment _0 numDocs=1000000 IW 0 [2022-07-07T17:52:58.068554Z; main]: 0 msec to write norms IW 0 [2022-07-07T17:52:58.068864Z; main]: 0 msec to write docValues IW 0 [2022-07-07T17:52:58.068958Z; main]: 0 msec to write points IW 0 [2022-07-07T17:52:59.017719Z; main]: 947 msec to write vectors IW 0 [2022-07-07T17:52:59.038544Z; main]: 19 msec to finish stored fields IW 0 [2022-07-07T17:52:59.039281Z; main]: 0 msec to write postings and finish vectors IW 0 [2022-07-07T17:52:59.043069Z; main]: 3 msec to write fieldInfos DWPT 0 [2022-07-07T17:52:59.915562Z; main]: flush time 1848.19675 msec Indexed 1000000 documents in 538s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org