Re: [Proposal] Remove max number of dimensions for KNN vectors

Michael Sokolov Sat, 08 Apr 2023 09:51:05 -0700

What you said about increasing dimensions requiring a bigger ram buffer on
merge is wrong. That's the point I was trying to make. Your concerns about
merge costs are not wrong, but your conclusion that we need to limit
dimensions is not justified.


You complain that hnsw sucks it doesn't scale, but when I show it scales
linearly with dimension you just ignore that and complain about something
entirely different.

You demand that people run all kinds of tests to prove you wrong but when
they do, you don't listen and you won't put in the work yourself or
complain that it's too hard.

Then you complain about people not meeting you half way. Wow

On Sat, Apr 8, 2023, 12:40 PM Robert Muir <rcm...@gmail.com> wrote:

> On Sat, Apr 8, 2023 at 8:33 AM Michael Wechner
> <michael.wech...@wyona.com> wrote:
> >
> > What exactly do you consider reasonable?
>
> Let's begin a real discussion by being HONEST about the current
> status. Please put politically correct or your own company's wishes
> aside, we know it's not in a good state.
>
> Current status is the one guy who wrote the code can set a
> multi-gigabyte ram buffer and index a small dataset with 1024
> dimensions in HOURS (i didn't ask what hardware).
>
> My concerns are everyone else except the one guy, I want it to be
> usable. Increasing dimensions just means even bigger multi-gigabyte
> ram buffer and bigger heap to avoid OOM on merge.
> It is also a permanent backwards compatibility decision, we have to
> support it once we do this and we can't just say "oops" and flip it
> back.
>
> It is unclear to me, if the multi-gigabyte ram buffer is really to
> avoid merges because they are so slow and it would be DAYS otherwise,
> or if its to avoid merges so it doesn't hit OOM.
> Also from personal experience, it takes trial and error (means
> experiencing OOM on merge!!!) before you get those heap values correct
> for your dataset. This usually means starting over which is
> frustrating and wastes more time.
>
> Jim mentioned some ideas about the memory usage in IndexWriter, seems
> to me like its a good idea. maybe the multigigabyte ram buffer can be
> avoided in this way and performance improved by writing bigger
> segments with lucene's defaults. But this doesn't mean we can simply
> ignore the horrors of what happens on merge. merging needs to scale so
> that indexing really scales.
>
> At least it shouldnt spike RAM on trivial data amounts and cause OOM,
> and definitely it shouldnt burn hours and hours of CPU in O(n^2)
> fashion when indexing.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to