mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1178060346

   @jtibshirani Thanks for another set of comments, I will work on addressing 
them.
   ----
   
   Meanwhile, I have run another set of benchmarks on a different dataset 
sift-128-euclidean M:16 efConstruction:100.
   And similar results were observed here:
   
   - the whole indexing + flush approximately the same (533s sec in baseline VS 
538s in candidate)
   - baseline: indexing is fast, but flush takes 532 sec
   - candidate: indexing takes most time, and flush is very fast - 1.8 sec
   
   ### Baseline (main branch): 
   ```bash
   IW 0 [2022-07-07T18:27:08.982483Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1000000 documents; now flush
   IW 0 [2022-07-07T18:27:09.935570Z; main]: now flush at close
   IW 0 [2022-07-07T18:27:09.936155Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-07-07T18:27:09.936850Z; main]:   index before flush
   DW 0 [2022-07-07T18:27:09.936917Z; main]: startFullFlush
   DW 0 [2022-07-07T18:27:09.941606Z; main]: anyChanges? numDocsInRam=1000000 
deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-07-07T18:27:09.951278Z; main]: flush postings as segment _1 
numDocs=1000000
   IW 0 [2022-07-07T18:27:09.952530Z; main]: 0 msec to write norms
   IW 0 [2022-07-07T18:27:09.952902Z; main]: 0 msec to write docValues
   IW 0 [2022-07-07T18:27:09.953073Z; main]: 0 msec to write points
   HNSW 0 [2022-07-07T18:27:11.094024Z; main]: build graph from 1000000 vectors
   
   HNSW 0 [2022-07-07T18:35:55.150931Z; main]: built 990000 in 6450/524148 ms
   IW 0 [2022-07-07T18:36:01.320864Z; main]: 531459 msec to write vectors
   IW 0 [2022-07-07T18:36:01.336914Z; main]: 15 msec to finish stored fields
   IW 0 [2022-07-07T18:36:01.337204Z; main]: 0 msec to write postings and 
finish vectors
   IW 0 [2022-07-07T18:36:01.337924Z; main]: 0 msec to write fieldInfos
   
   DWPT 0 [2022-07-07T18:36:02.197589Z; main]: flush time 532338.523458 msec
   Indexed 1000000 documents in 533s
   ```
   
   ### Candidate (this PR with the changes so far): 
   
   ```bash
   IW 0 [2022-07-07T17:44:01.642762Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1000000 documents; now flush
   IW 0 [2022-07-07T17:52:58.049830Z; main]: now flush at close
   IW 0 [2022-07-07T17:52:58.050277Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-07-07T17:52:58.050726Z; main]:   index before flush
   DW 0 [2022-07-07T17:52:58.050776Z; main]: startFullFlush
   DW 0 [2022-07-07T17:52:58.056958Z; main]: anyChanges? numDocsInRam=1000000 
deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-07-07T17:52:58.066937Z; main]: flush postings as segment _0 
numDocs=1000000
   IW 0 [2022-07-07T17:52:58.068554Z; main]: 0 msec to write norms
   IW 0 [2022-07-07T17:52:58.068864Z; main]: 0 msec to write docValues
   IW 0 [2022-07-07T17:52:58.068958Z; main]: 0 msec to write points
   IW 0 [2022-07-07T17:52:59.017719Z; main]: 947 msec to write vectors
   IW 0 [2022-07-07T17:52:59.038544Z; main]: 19 msec to finish stored fields
   IW 0 [2022-07-07T17:52:59.039281Z; main]: 0 msec to write postings and 
finish vectors
   IW 0 [2022-07-07T17:52:59.043069Z; main]: 3 msec to write fieldInfos
   
   DWPT 0 [2022-07-07T17:52:59.915562Z; main]: flush time 1848.19675 msec
   Indexed 1000000 documents in 538s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to