mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1190803247

   @jpountz  I have run another set of benchmarks on  dataset 
   **sift-128-euclidean M:16 efConstruction:100 with index sort on 
SortField.Type.LONG**, where I added an extra index sort field: 
`NumericDocValuesField` with random long values. 
   
   Observed results:
   
   - the whole indexing + flush is slightly faster on the candidate (548s sec 
in candidate VS 654s in baseline)
   - baseline: indexing is fast, but flush takes 653 sec
   - candidate: indexing takes most time, and flush is very fast - 3 sec
   
   
   Comparison with [unsorted 
case](https://github.com/apache/lucene/pull/992#issuecomment-1178060346) that 
was done before:
   - baseline: indexing time increased from 533s sec to 654s
   - candidate: indexing time increased from 538s sec to 548s 
      -  in particular, reconstructing the graph using new ordinals doesn't 
seem to take much time: 866 ms or 0.8 s
   
   
   **Baseline**
   
   ```bash
   IW 0 [2022-07-20T21:00:49.727575Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1000000 documents; now flush
   IW 0 [2022-07-20T21:00:51.099538Z; main]: now flush at close
   IW 0 [2022-07-20T21:00:51.100162Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-07-20T21:00:51.100936Z; main]:   index before flush
   DW 0 [2022-07-20T21:00:51.101006Z; main]: startFullFlush
   DW 0 [2022-07-20T21:00:51.107445Z; main]: anyChanges? numDocsInRam=1000000 
deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-07-20T21:00:51.119428Z; main]: flush postings as segment _3 
numDocs=1000000
   IW 0 [2022-07-20T21:00:51.715470Z; main]: 0 msec to write norms
   IW 0 [2022-07-20T21:00:51.852081Z; main]: 136 msec to write docValues
   IW 0 [2022-07-20T21:00:51.852305Z; main]: 0 msec to write points
   HNSW 0 [2022-07-20T21:00:53.264684Z; main]: build graph from 1000000 vectors
   
   HNSW 0 [2022-07-20T21:11:34.590292Z; main]: built 990000 in 7288/641320 ms
   HNSW 0 [2022-07-20T21:11:34.590292Z; main]: built 990000 in 7288/641320 ms
   IW 0 [2022-07-20T21:11:42.662461Z; main]: 650804 msec to write vectors
   IW 0 [2022-07-20T21:11:43.334377Z; main]: 671 msec to finish stored fields
   IW 0 [2022-07-20T21:11:43.334611Z; main]: 0 msec to write postings and 
finish vectors
   IW 0 [2022-07-20T21:11:43.336506Z; main]: 0 msec to write fieldInfos
   
   DWPT 0 [2022-07-20T21:11:44.244388Z; main]: flush time 653120.381917 msec
   IW 0 [2022-07-20T21:11:44.247650Z; main]: publishFlushedSegment 
_3(10.0.0):c1000000:[indexSort=<long: "sortkey">]:...
   
   Indexed 1000000 documents in 654s
   ```
   
   **Candidate**
   
   ```bash
   IW 0 [2022-07-20T18:35:41.879858Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
   Done indexing 1000000 documents; now flush
   IW 0 [2022-07-20T18:44:46.109074Z; main]: now flush at close
   IW 0 [2022-07-20T18:44:46.109804Z; main]:   start flush: applyAllDeletes=true
   IW 0 [2022-07-20T18:44:46.110587Z; main]:   index before flush
   DW 0 [2022-07-20T18:44:46.110689Z; main]: startFullFlush
   DW 0 [2022-07-20T18:44:46.115672Z; main]: anyChanges? numDocsInRam=1000000 
deletes=false hasTickets:false pendingChangesInFullFlush: false
   DWPT 0 [2022-07-20T18:44:46.126626Z; main]: flush postings as segment _2 
numDocs=1000000
   IW 0 [2022-07-20T18:44:46.741747Z; main]: 0 msec to write norms
   IW 0 [2022-07-20T18:44:46.864200Z; main]: 121 msec to write docValues
   IW 0 [2022-07-20T18:44:46.864364Z; main]: 0 msec to write points
   IndexWriter 0 [2022-07-20T18:44:47.609637Z; main]: starting reconstructing 
graph ordinals 63362025298959
   IndexWriter 0 [2022-07-20T18:44:48.476035Z; main]: finished reconstructing 
graph ordinals 63362892156709
   IW 0 [2022-07-20T18:44:48.481920Z; main]: 1617 msec to write vectors
   IW 0 [2022-07-20T18:44:49.166673Z; main]: 683 msec to finish stored fields
   IW 0 [2022-07-20T18:44:49.167432Z; main]: 0 msec to write postings and 
finish vectors
   IW 0 [2022-07-20T18:44:49.174701Z; main]: 6 msec to write fieldInfos
   
   IFD 0 [2022-07-20T18:44:50.072852Z; main]: now checkpoint 
"_2(10.0.0):c1000000:[indexSort=<long: "sortkey">]:..
   
   DWPT 0 [2022-07-20T18:44:50.058801Z; main]: flush time 3931.69475 msec
   Indexed 1000000 documents in 548s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to