[ 
https://issues.apache.org/jira/browse/LUCENE-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova reassigned LUCENE-10194:
----------------------------------------

    Assignee: Mayya Sharipova

> Should IndexWriter buffer KNN vectors on disk?
> ----------------------------------------------
>
>                 Key: LUCENE-10194
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10194
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Mayya Sharipova
>            Priority: Minor
>
> VectorValuesWriter buffers data in memory, like we do for all data structures 
> that are computed on flush. But I wonder if this is the right trade-off.
> The use-case I have in mind is someone trying to load a dataset of vectors in 
> Lucene. Given that HNSW graphs are super expensive to create, we'd ideally 
> load that dataset into a single segment rather than many small segments that 
> then need to be merged together, which in-turn re-creates the HNSW graph.
> Yet buffering vectors in memory is expensive. For instance assuming 256 
> dimensions, each vector consumes 1kB of memory. Should we consider buffering 
> vectors on disk to reduce chances of having to create new segments only 
> because the RAM buffer is full?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to