mayya-sharipova commented on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1059210295


   @rmuir Thanks a lot for your review and explanation of of IndexWriter 
behavior.
   
   > If IndexWriter shouldn't buffer vectors, then can it simply stream vectors 
to the codec api?  This would be similar to how StoredFields and TermVectors 
work today (see e.g. StoredFieldsConsumer).
   
   That's a great suggestion. I can explore how we can stream vectors directly 
to codec and build HNSW graphs on the fly.
   
   > I'm suspicious of the reported performance improvement based on looking at 
your benchmark output, I don't think its realistic. Looks like nothing else was 
indexed in any other way (docvalues/postings/etc), nobody ever called reopen() 
to force any flushes, so with the benchmark you ran, IW just wrote one big 
segment, avoiding all merging. So everything looks fantastic on paper, but this 
isn't realistic. ...It is easy to run into the same trap when benchmarking e.g. 
stored fields and other things. But it isn't really a performance improvement.
   
   Thanks for your comment. I guess with this patch I am addressing a scenario 
of initial data loading, which is common in vector search domain.  There is 
only bulk indexing with no background searches. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to