Heya Michael,

> the first one I traced was referenced by vector writers involved in a merge 
> (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?

Yes, that is holding the raw floats before flush. You should see
nearly the exact same overhead there as you would indexing raw
vectors. I would be surprised if there is a significant memory usage
difference due to Lucene99FlatVectorsWriter when using quantized vs.
not.

The flow is this:

 - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
of it (does this no matter what) and passes on to the next part of the
chain
 - If quantizing, the next part of the chain is
Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
REFERENCE to the array, it doesn't copy it. The float vector array is
then passed to the HNSW indexer (if its being used), which also does
NOT copy, but keeps a reference.
 - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
it directly to the hnsw indexer, which does not copy it, but does add
it to the HNSW graph

> I wonder if there is an opportunity to move some of this off-heap?

I think we could do some things off-heap in the ScalarQuantizer. Maybe
even during "flush", but we would have to adjust the interfaces some
so that the scalarquantizer can know where the vectors are being
stored after the initial flush. Right now there is no way to know the
file nor file handle.

> I can imagine that when we requantize we need to scan all the vectors to 
> determine the new quantization settings?

We shouldn't be scanning every vector. We do take a sampling, though
that sampling can be large. There is here an opportunity for off-heap
action if possible. Though I don't know how we could do that before
flush. I could see the off-heap idea helping on merge.

> Maybe we could do two passes - merge the float vectors while recalculating, 
> and then re-scan to do the actual quantization?

I am not sure what you mean here by "merge the float vectors". If you
mean simply reading the individual float vector files and combining
them into a single file, we already do that separately from
quantizing.

Thank you for digging into this. Glad others are experimenting!

Ben

On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov <msoko...@gmail.com> wrote:
>
> Hi folks. I've been experimenting with our new scalar quantization
> support - yay, thanks for adding it! I'm finding that when I index a
> large number of large vectors, enabling quantization (vs simply
> indexing the full-width floats) requires more heap - I keep getting
> OOMs and have to increase heap size. I took a heap dump, and not
> surprisingly I found some big arrays of floats and bytes, and the
> first one I traced was referenced by vector writers involved in a
> merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
> expected? I wonder if there is an opportunity to move some of this
> off-heap?  I can imagine that when we requantize we need to scan all
> the vectors to determine the new quantization settings?  Maybe we
> could do two passes - merge the float vectors while recalculating, and
> then re-scan to do the actual quantization?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to