Re: scalar quantization heap usage during merge

2024-07-04 Thread Gautam Worah
Thanks for the PR Ben. I'll try to take a look in the next couple of days.
On leave for now..

I got the setup working yesterday, and thought of sharing some learnings.
I changed the LiveIndexWriterConfig#ramBufferSizeMB to 2048 and that made
things work.
I was even able to keep merging on, and was able to create ~10 GB segments
and a bigg index.

Here is what I think is happening:
With larger segments, ie when ramBufferSizeMB was 4096, with the hard cap
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB as 1945, Lucene flushes the segment
with the biggest RAM usage when the RAM usage across writer threads exceeds
4096 (in FlushByRamOrCountsPolicy#onChange). That big segment was enough to
cause OOMs (probably because it was doing something very memory intensive,
loading copies of big arrays before flushing etc, along with incoming
request processing taking heap..).

When I changed the ramBufferSizeMB to 2048, the segments my code was
quantizing over were small, and the flushes were frequent.
Now, when merges happen, IIUC, we don't load these big arrays and instead
just recompute (if needed) the quantized vectors in
mergeQuantizedByteVectorValues.

That operation is probably easier (in terms of memory: since it uses the
quantiles from different segments being merged as input and recomputes the
new quantiles) even for big segments and does not cause OOMs. That is how I
was able to not see OOMs even when merged segments had a size of ~9.9 GB.
Heap size may be reduced to an even lower value by testing out smaller
ramBufferSizeMB sizes.

This is a rough edge though.. tuning a value causing things to work may not
be the best we can provide for our users..
I'll open spinoff issues if I come up with ideas for improvements.

Thanks again for working on this awesome piece of functionality :)

Best,
Gautam Worah.


On Wed, Jul 3, 2024 at 8:04 AM Benjamin Trent  wrote:

> Hey Gautam & Michael,
>
> I opened a PR that will help slightly. It should reduce the heap usage
> by a smallish factor. But, I would still expect the cost to be
> dominated by the `float[]` vectors held in memory before flush.
>
> https://github.com/apache/lucene/pull/13538
>
> The other main overhead is the creation of the ScalarQuantizer. Since
> it requires sorting floating point arrays, I am not 100% sure how we
> can get around the cost. I think we will always need to copy the
> arrays to get the quantiles via `FloatSelector`.
>
> Since the ScalarQuantizer needs to make copies & we can determine what
> the size of those copies are, we should be able to give a better
> estimate of all the memory required for flush (right now the cost of
> building the ScalarQuantizer, because its memory usage is short lived,
> isn't provided).
>
> I can open a different PR for that bug fix.
>
> On Wed, Jul 3, 2024 at 12:43 AM Gautam Worah 
> wrote:
> >
> > Hi Ben,
> >
> > I am working on something very close to what Michael Sokolov has done.
> > I see OOMs on the Writer when it tries to index 130M 8 bit / 4 bit
> quantized vectors on a single big box with a 40 GB heap, with HNSW disabled.
> > I've tried indexing all the vectors as plain vectors converted to floats
> converted to BinaryDocValues and that worked fine.
> > I tried smaller heap sizes starting with 20 GB but they all failed. 40
> GB heap is already quite a bit and hence the deep dive..
> > The Writer process is not doing any other RAM heavy things so I am
> assuming that the memory is dominated by vectors.
> > The vectors originally had 768 dimensions.
> >
> > My process was initially failing when it reached an index size of about
> ~40 GB. OOM stack failures were close to when merges were happening.
> > I tried reducing the number of concurrent merges that were allowed, the
> number of segments that can be merged at once, and that helped, but only a
> little. I was still seeing OOMs.
> > Then, I adopted a NoMergePolicy and was able to build a ~240 GB
> quantized index, but that too, OOMs out before indexing all the docs.
> > The ramBufferSizeMB is 4096, so roughly speaking it has 2.5 GB ish
> segments, and multiple smaller ones per flush.
> >
> > I am assuming the quantization on flush is causing the failures. Which
> operation during flush is taking up so much memory? I don't know ..
> > I don't think the quantization factor (bits) affects the memory much.
> >
> > Do we do the quartile calculation, eventual quantization in a streaming
> fashion?
> > Are there any other things that jump out to you as memory
> bottlenecks/methods that you think would be memory hungry?
> >
> > I have the heap dump and am also analyzing it myself.
> >
> > >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> > of it (does this no matter what)
> >
> > This maybe could've caused the problems.. but if the ramBufferSizeMB is
> small and merges are disabled, it's hard to imagine how 40 GB could've been
> consumed.
> >
> > Best,
> > Gautam Worah.
> >
> >
> > On Wed, Jun 12, 2024 at 9:42 AM Benjamin Trent 
> wrote:
> 

Re: scalar quantization heap usage during merge

2024-07-03 Thread Benjamin Trent
Hey Gautam & Michael,

I opened a PR that will help slightly. It should reduce the heap usage
by a smallish factor. But, I would still expect the cost to be
dominated by the `float[]` vectors held in memory before flush.

https://github.com/apache/lucene/pull/13538

The other main overhead is the creation of the ScalarQuantizer. Since
it requires sorting floating point arrays, I am not 100% sure how we
can get around the cost. I think we will always need to copy the
arrays to get the quantiles via `FloatSelector`.

Since the ScalarQuantizer needs to make copies & we can determine what
the size of those copies are, we should be able to give a better
estimate of all the memory required for flush (right now the cost of
building the ScalarQuantizer, because its memory usage is short lived,
isn't provided).

I can open a different PR for that bug fix.

On Wed, Jul 3, 2024 at 12:43 AM Gautam Worah  wrote:
>
> Hi Ben,
>
> I am working on something very close to what Michael Sokolov has done.
> I see OOMs on the Writer when it tries to index 130M 8 bit / 4 bit quantized 
> vectors on a single big box with a 40 GB heap, with HNSW disabled.
> I've tried indexing all the vectors as plain vectors converted to floats 
> converted to BinaryDocValues and that worked fine.
> I tried smaller heap sizes starting with 20 GB but they all failed. 40 GB 
> heap is already quite a bit and hence the deep dive..
> The Writer process is not doing any other RAM heavy things so I am assuming 
> that the memory is dominated by vectors.
> The vectors originally had 768 dimensions.
>
> My process was initially failing when it reached an index size of about ~40 
> GB. OOM stack failures were close to when merges were happening.
> I tried reducing the number of concurrent merges that were allowed, the 
> number of segments that can be merged at once, and that helped, but only a 
> little. I was still seeing OOMs.
> Then, I adopted a NoMergePolicy and was able to build a ~240 GB quantized 
> index, but that too, OOMs out before indexing all the docs.
> The ramBufferSizeMB is 4096, so roughly speaking it has 2.5 GB ish segments, 
> and multiple smaller ones per flush.
>
> I am assuming the quantization on flush is causing the failures. Which 
> operation during flush is taking up so much memory? I don't know ..
> I don't think the quantization factor (bits) affects the memory much.
>
> Do we do the quartile calculation, eventual quantization in a streaming 
> fashion?
> Are there any other things that jump out to you as memory bottlenecks/methods 
> that you think would be memory hungry?
>
> I have the heap dump and am also analyzing it myself.
>
> >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> of it (does this no matter what)
>
> This maybe could've caused the problems.. but if the ramBufferSizeMB is small 
> and merges are disabled, it's hard to imagine how 40 GB could've been 
> consumed.
>
> Best,
> Gautam Worah.
>
>
> On Wed, Jun 12, 2024 at 9:42 AM Benjamin Trent  wrote:
>>
>> Michael,
>>
>> Empirically, I am not surprised there is an increase in heap usage. We
>> do have extra overhead with the scalar quantization on flush. There
>> may also be some additional heap usage on merge.
>>
>> I just don't think it is via: Lucene99FlatVectorsWriter
>>
>> On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov  wrote:
>> >
>> >  Empirically I thought I saw the need to increase JVM heap with this,
>> > but let me do some more testing to narrow down what is going on. It's
>> > possible the same heap requirements exist for the non-quantized case
>> > and I am just seeing some random vagary of the merge process happening
>> > to tip over a limit. It's also possible I messed something up in
>> > https://github.com/apache/lucene/pull/13469 which I am trying to use
>> > in order to index quantized vectors without building an HNSW graph.
>> >
>> > On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent  
>> > wrote:
>> > >
>> > > Heya Michael,
>> > >
>> > > > the first one I traced was referenced by vector writers involved in a 
>> > > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this 
>> > > > expected?
>> > >
>> > > Yes, that is holding the raw floats before flush. You should see
>> > > nearly the exact same overhead there as you would indexing raw
>> > > vectors. I would be surprised if there is a significant memory usage
>> > > difference due to Lucene99FlatVectorsWriter when using quantized vs.
>> > > not.
>> > >
>> > > The flow is this:
>> > >
>> > >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
>> > > of it (does this no matter what) and passes on to the next part of the
>> > > chain
>> > >  - If quantizing, the next part of the chain is
>> > > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
>> > > REFERENCE to the array, it doesn't copy it. The float vector array is
>> > > then passed to the HNSW indexer (if its being used), which also does
>> > > NOT copy, but keeps 

Re: scalar quantization heap usage during merge

2024-07-02 Thread Gautam Worah
Hi Ben,

I am working on something very close to what Michael Sokolov has done.
I see OOMs on the Writer when it tries to index 130M 8 bit / 4 bit
quantized vectors on a single big box with a 40 GB heap, with HNSW disabled.
I've tried indexing all the vectors as plain vectors converted to floats
converted to BinaryDocValues and that worked fine.
I tried smaller heap sizes starting with 20 GB but they all failed. 40 GB
heap is already quite a bit and hence the deep dive..
The Writer process is not doing any other RAM heavy things so I am assuming
that the memory is dominated by vectors.
The vectors originally had 768 dimensions.

My process was initially failing when it reached an index size of about ~40
GB. OOM stack failures were close to when merges were happening.
I tried reducing the number of concurrent merges that were allowed, the
number of segments that can be merged at once, and that helped, but only a
little. I was still seeing OOMs.
Then, I adopted a NoMergePolicy and was able to build a ~240 GB quantized
index, but that too, OOMs out before indexing all the docs.
The ramBufferSizeMB is 4096, so roughly speaking it has 2.5 GB ish
segments, and multiple smaller ones per flush.

I am assuming the quantization on flush is causing the failures. Which
operation during flush is taking up so much memory? I don't know ..
I don't think the quantization factor (bits) affects the memory much.

Do we do the quartile calculation, eventual quantization in a streaming
fashion?
Are there any other things that jump out to you as memory
bottlenecks/methods that you think would be memory hungry?

I have the heap dump and am also analyzing it myself.

>  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
of it (does this no matter what)

This maybe could've caused the problems.. but if the ramBufferSizeMB is
small and merges are disabled, it's hard to imagine how 40 GB could've been
consumed.

Best,
Gautam Worah.


On Wed, Jun 12, 2024 at 9:42 AM Benjamin Trent 
wrote:

> Michael,
>
> Empirically, I am not surprised there is an increase in heap usage. We
> do have extra overhead with the scalar quantization on flush. There
> may also be some additional heap usage on merge.
>
> I just don't think it is via: Lucene99FlatVectorsWriter
>
> On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov 
> wrote:
> >
> >  Empirically I thought I saw the need to increase JVM heap with this,
> > but let me do some more testing to narrow down what is going on. It's
> > possible the same heap requirements exist for the non-quantized case
> > and I am just seeing some random vagary of the merge process happening
> > to tip over a limit. It's also possible I messed something up in
> > https://github.com/apache/lucene/pull/13469 which I am trying to use
> > in order to index quantized vectors without building an HNSW graph.
> >
> > On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent 
> wrote:
> > >
> > > Heya Michael,
> > >
> > > > the first one I traced was referenced by vector writers involved in
> a merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?
> > >
> > > Yes, that is holding the raw floats before flush. You should see
> > > nearly the exact same overhead there as you would indexing raw
> > > vectors. I would be surprised if there is a significant memory usage
> > > difference due to Lucene99FlatVectorsWriter when using quantized vs.
> > > not.
> > >
> > > The flow is this:
> > >
> > >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> > > of it (does this no matter what) and passes on to the next part of the
> > > chain
> > >  - If quantizing, the next part of the chain is
> > > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
> > > REFERENCE to the array, it doesn't copy it. The float vector array is
> > > then passed to the HNSW indexer (if its being used), which also does
> > > NOT copy, but keeps a reference.
> > >  - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
> > > it directly to the hnsw indexer, which does not copy it, but does add
> > > it to the HNSW graph
> > >
> > > > I wonder if there is an opportunity to move some of this off-heap?
> > >
> > > I think we could do some things off-heap in the ScalarQuantizer. Maybe
> > > even during "flush", but we would have to adjust the interfaces some
> > > so that the scalarquantizer can know where the vectors are being
> > > stored after the initial flush. Right now there is no way to know the
> > > file nor file handle.
> > >
> > > > I can imagine that when we requantize we need to scan all the
> vectors to determine the new quantization settings?
> > >
> > > We shouldn't be scanning every vector. We do take a sampling, though
> > > that sampling can be large. There is here an opportunity for off-heap
> > > action if possible. Though I don't know how we could do that before
> > > flush. I could see the off-heap idea helping on merge.
> > >
> > > > Maybe we could do two passes - 

Re: scalar quantization heap usage during merge

2024-06-12 Thread Benjamin Trent
Michael,

Empirically, I am not surprised there is an increase in heap usage. We
do have extra overhead with the scalar quantization on flush. There
may also be some additional heap usage on merge.

I just don't think it is via: Lucene99FlatVectorsWriter

On Wed, Jun 12, 2024 at 11:55 AM Michael Sokolov  wrote:
>
>  Empirically I thought I saw the need to increase JVM heap with this,
> but let me do some more testing to narrow down what is going on. It's
> possible the same heap requirements exist for the non-quantized case
> and I am just seeing some random vagary of the merge process happening
> to tip over a limit. It's also possible I messed something up in
> https://github.com/apache/lucene/pull/13469 which I am trying to use
> in order to index quantized vectors without building an HNSW graph.
>
> On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent  wrote:
> >
> > Heya Michael,
> >
> > > the first one I traced was referenced by vector writers involved in a 
> > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?
> >
> > Yes, that is holding the raw floats before flush. You should see
> > nearly the exact same overhead there as you would indexing raw
> > vectors. I would be surprised if there is a significant memory usage
> > difference due to Lucene99FlatVectorsWriter when using quantized vs.
> > not.
> >
> > The flow is this:
> >
> >  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> > of it (does this no matter what) and passes on to the next part of the
> > chain
> >  - If quantizing, the next part of the chain is
> > Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
> > REFERENCE to the array, it doesn't copy it. The float vector array is
> > then passed to the HNSW indexer (if its being used), which also does
> > NOT copy, but keeps a reference.
> >  - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
> > it directly to the hnsw indexer, which does not copy it, but does add
> > it to the HNSW graph
> >
> > > I wonder if there is an opportunity to move some of this off-heap?
> >
> > I think we could do some things off-heap in the ScalarQuantizer. Maybe
> > even during "flush", but we would have to adjust the interfaces some
> > so that the scalarquantizer can know where the vectors are being
> > stored after the initial flush. Right now there is no way to know the
> > file nor file handle.
> >
> > > I can imagine that when we requantize we need to scan all the vectors to 
> > > determine the new quantization settings?
> >
> > We shouldn't be scanning every vector. We do take a sampling, though
> > that sampling can be large. There is here an opportunity for off-heap
> > action if possible. Though I don't know how we could do that before
> > flush. I could see the off-heap idea helping on merge.
> >
> > > Maybe we could do two passes - merge the float vectors while 
> > > recalculating, and then re-scan to do the actual quantization?
> >
> > I am not sure what you mean here by "merge the float vectors". If you
> > mean simply reading the individual float vector files and combining
> > them into a single file, we already do that separately from
> > quantizing.
> >
> > Thank you for digging into this. Glad others are experimenting!
> >
> > Ben
> >
> > On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov  wrote:
> > >
> > > Hi folks. I've been experimenting with our new scalar quantization
> > > support - yay, thanks for adding it! I'm finding that when I index a
> > > large number of large vectors, enabling quantization (vs simply
> > > indexing the full-width floats) requires more heap - I keep getting
> > > OOMs and have to increase heap size. I took a heap dump, and not
> > > surprisingly I found some big arrays of floats and bytes, and the
> > > first one I traced was referenced by vector writers involved in a
> > > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
> > > expected? I wonder if there is an opportunity to move some of this
> > > off-heap?  I can imagine that when we requantize we need to scan all
> > > the vectors to determine the new quantization settings?  Maybe we
> > > could do two passes - merge the float vectors while recalculating, and
> > > then re-scan to do the actual quantization?
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, 

Re: scalar quantization heap usage during merge

2024-06-12 Thread Michael Sokolov
 Empirically I thought I saw the need to increase JVM heap with this,
but let me do some more testing to narrow down what is going on. It's
possible the same heap requirements exist for the non-quantized case
and I am just seeing some random vagary of the merge process happening
to tip over a limit. It's also possible I messed something up in
https://github.com/apache/lucene/pull/13469 which I am trying to use
in order to index quantized vectors without building an HNSW graph.

On Wed, Jun 12, 2024 at 10:24 AM Benjamin Trent  wrote:
>
> Heya Michael,
>
> > the first one I traced was referenced by vector writers involved in a merge 
> > (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?
>
> Yes, that is holding the raw floats before flush. You should see
> nearly the exact same overhead there as you would indexing raw
> vectors. I would be surprised if there is a significant memory usage
> difference due to Lucene99FlatVectorsWriter when using quantized vs.
> not.
>
> The flow is this:
>
>  - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
> of it (does this no matter what) and passes on to the next part of the
> chain
>  - If quantizing, the next part of the chain is
> Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
> REFERENCE to the array, it doesn't copy it. The float vector array is
> then passed to the HNSW indexer (if its being used), which also does
> NOT copy, but keeps a reference.
>  - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
> it directly to the hnsw indexer, which does not copy it, but does add
> it to the HNSW graph
>
> > I wonder if there is an opportunity to move some of this off-heap?
>
> I think we could do some things off-heap in the ScalarQuantizer. Maybe
> even during "flush", but we would have to adjust the interfaces some
> so that the scalarquantizer can know where the vectors are being
> stored after the initial flush. Right now there is no way to know the
> file nor file handle.
>
> > I can imagine that when we requantize we need to scan all the vectors to 
> > determine the new quantization settings?
>
> We shouldn't be scanning every vector. We do take a sampling, though
> that sampling can be large. There is here an opportunity for off-heap
> action if possible. Though I don't know how we could do that before
> flush. I could see the off-heap idea helping on merge.
>
> > Maybe we could do two passes - merge the float vectors while recalculating, 
> > and then re-scan to do the actual quantization?
>
> I am not sure what you mean here by "merge the float vectors". If you
> mean simply reading the individual float vector files and combining
> them into a single file, we already do that separately from
> quantizing.
>
> Thank you for digging into this. Glad others are experimenting!
>
> Ben
>
> On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov  wrote:
> >
> > Hi folks. I've been experimenting with our new scalar quantization
> > support - yay, thanks for adding it! I'm finding that when I index a
> > large number of large vectors, enabling quantization (vs simply
> > indexing the full-width floats) requires more heap - I keep getting
> > OOMs and have to increase heap size. I took a heap dump, and not
> > surprisingly I found some big arrays of floats and bytes, and the
> > first one I traced was referenced by vector writers involved in a
> > merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
> > expected? I wonder if there is an opportunity to move some of this
> > off-heap?  I can imagine that when we requantize we need to scan all
> > the vectors to determine the new quantization settings?  Maybe we
> > could do two passes - merge the float vectors while recalculating, and
> > then re-scan to do the actual quantization?
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: scalar quantization heap usage during merge

2024-06-12 Thread Benjamin Trent
Heya Michael,

> the first one I traced was referenced by vector writers involved in a merge 
> (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this expected?

Yes, that is holding the raw floats before flush. You should see
nearly the exact same overhead there as you would indexing raw
vectors. I would be surprised if there is a significant memory usage
difference due to Lucene99FlatVectorsWriter when using quantized vs.
not.

The flow is this:

 - Lucene99FlatVectorsWriter gets the float[] vector and makes a copy
of it (does this no matter what) and passes on to the next part of the
chain
 - If quantizing, the next part of the chain is
Lucene99ScalarQuantizedVectorsWriter.FieldsWriter, which only keeps a
REFERENCE to the array, it doesn't copy it. The float vector array is
then passed to the HNSW indexer (if its being used), which also does
NOT copy, but keeps a reference.
 - If not quantizing but indexing, Lucene99FlatVectorsWriter will pass
it directly to the hnsw indexer, which does not copy it, but does add
it to the HNSW graph

> I wonder if there is an opportunity to move some of this off-heap?

I think we could do some things off-heap in the ScalarQuantizer. Maybe
even during "flush", but we would have to adjust the interfaces some
so that the scalarquantizer can know where the vectors are being
stored after the initial flush. Right now there is no way to know the
file nor file handle.

> I can imagine that when we requantize we need to scan all the vectors to 
> determine the new quantization settings?

We shouldn't be scanning every vector. We do take a sampling, though
that sampling can be large. There is here an opportunity for off-heap
action if possible. Though I don't know how we could do that before
flush. I could see the off-heap idea helping on merge.

> Maybe we could do two passes - merge the float vectors while recalculating, 
> and then re-scan to do the actual quantization?

I am not sure what you mean here by "merge the float vectors". If you
mean simply reading the individual float vector files and combining
them into a single file, we already do that separately from
quantizing.

Thank you for digging into this. Glad others are experimenting!

Ben

On Wed, Jun 12, 2024 at 8:57 AM Michael Sokolov  wrote:
>
> Hi folks. I've been experimenting with our new scalar quantization
> support - yay, thanks for adding it! I'm finding that when I index a
> large number of large vectors, enabling quantization (vs simply
> indexing the full-width floats) requires more heap - I keep getting
> OOMs and have to increase heap size. I took a heap dump, and not
> surprisingly I found some big arrays of floats and bytes, and the
> first one I traced was referenced by vector writers involved in a
> merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
> expected? I wonder if there is an opportunity to move some of this
> off-heap?  I can imagine that when we requantize we need to scan all
> the vectors to determine the new quantization settings?  Maybe we
> could do two passes - merge the float vectors while recalculating, and
> then re-scan to do the actual quantization?
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



scalar quantization heap usage during merge

2024-06-12 Thread Michael Sokolov
Hi folks. I've been experimenting with our new scalar quantization
support - yay, thanks for adding it! I'm finding that when I index a
large number of large vectors, enabling quantization (vs simply
indexing the full-width floats) requires more heap - I keep getting
OOMs and have to increase heap size. I took a heap dump, and not
surprisingly I found some big arrays of floats and bytes, and the
first one I traced was referenced by vector writers involved in a
merge (Lucene99FlatVectorsWriter.FieldsWriter.vectors). Is this
expected? I wonder if there is an opportunity to move some of this
off-heap?  I can imagine that when we requantize we need to scan all
the vectors to determine the new quantization settings?  Maybe we
could do two passes - merge the float vectors while recalculating, and
then re-scan to do the actual quantization?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org