[GitHub] [lucene] jtibshirani edited a comment on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

GitBox Thu, 03 Mar 2022 15:46:56 -0800


jtibshirani edited a comment on pull request #728:
URL: https://github.com/apache/lucene/pull/728#issuecomment-1058674634



   Great that we're exploring this! I had a couple high-level thoughts:
   * If a user had 100 vector fields, then now we might have 100+ files being 
written concurrently, multiplied by the number of segments we're writing at the 
same time. It seems like this could cause problems -- should we only use this 
strategy if there are a relatively small number of vector fields? Having 100 
vector fields sounds farfetched, but I could imagine it happening as users 
experiment with ways to model long text documents.
   * It feels wasteful to be writing the vectors to a temp file in 
`IndexingChain`, then immediately reading and writing them to a temp file again 
`Lucene91HnswVectorsWriter`. I wonder if we could make a top-level 
`OffHeapVectorValues` class that's more broadly visible, so that 
`Lucene91HnswVectorsWriter` could just check if it's dealing with a file-backed 
vector values and avoid creating another one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani edited a comment on pull request #728: LUCENE-10194 Buffer KNN vectors on disk

Reply via email to