Re: [Proposal] Remove max number of dimensions for KNN vectors

Alessandro Benedetti Thu, 06 Apr 2023 08:57:21 -0700

To be clear Robert, I agree with you in not bumping it just to 2048 or
whatever not motivated enough constant.


But I disagree on the performance perspective:
I mean I am absolutely positive in working to improve the current
performances, but I think this is disconnected from that limit.

Not all users need billions of vectors, maybe tomorrow a new chip is
released that speed up the processing 100x or whatever...

The limit as far as I know is not used to initialise or optimise any data
structure, it's only used to raise an exception.

I don't see a big problem in allowing 10k vectors for example but then
majority of people won't be able to use such vectors because slow on the
average computer.
If we just get 1 new user, it's better than 0.
Or well, if it's a reputation thing, than It's a completely different
discussion I guess.


On Thu, 6 Apr 2023, 16:47 Robert Muir, <rcm...@gmail.com> wrote:

> Well, I'm asking ppl actually try to test using such high dimensions.
> Based on my own experience, I consider it unusable. It seems other
> folks may have run into trouble too. If the project committers can't
> even really use vectors with such high dimension counts, then its not
> in an OK state for users, and we shouldn't bump the limit.
>
> I'm happy to discuss/compromise etc, but simply bumping the limit
> without addressing the underlying usability/scalability is a real
> no-go, it is not really solving anything, nor is it giving users any
> freedom or allowing them to do something they couldnt do before.
> Because if it still doesnt work it still doesnt work.
>
> We all need to be on the same page, grounded in reality, not fantasy,
> where if we set a limit of 1024 or 2048, that you can actually index
> vectors with that many dimensions and it actually works and scales.
>
> On Thu, Apr 6, 2023 at 11:38 AM Alessandro Benedetti
> <a.benede...@sease.io> wrote:
> >
> > As I said earlier, a max limit limits usability.
> > It's not forcing users with small vectors to pay the performance penalty
> of big vectors, it's literally preventing some users to use
> Lucene/Solr/Elasticsearch at all.
> > As far as I know, the max limit is used to raise an exception, it's not
> used to initialise or optimise data structures (please correct me if I'm
> wrong).
> >
> > Improving the algorithm performance is a separate discussion.
> > I don't see a correlation with the fact that indexing billions of
> whatever dimensioned vector is slow with a usability parameter.
> >
> > What about potential users that need few high dimensional vectors?
> >
> > As I said before, I am a big +1 for NOT just raise it blindly, but I
> believe we need to remove the limit or size it in a way it's not a problem
> for both users and internal data structure optimizations, if any.
> >
> >
> > On Wed, 5 Apr 2023, 18:54 Robert Muir, <rcm...@gmail.com> wrote:
> >>
> >> I'd ask anyone voting +1 to raise this limit to at least try to index
> >> a few million vectors with 756 or 1024, which is allowed today.
> >>
> >> IMO based on how painful it is, it seems the limit is already too
> >> high, I realize that will sound controversial but please at least try
> >> it out!
> >>
> >> voting +1 without at least doing this is really the
> >> "weak/unscientifically minded" approach.
> >>
> >> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner
> >> <michael.wech...@wyona.com> wrote:
> >> >
> >> > Thanks for your feedback!
> >> >
> >> > I agree, that it should not crash.
> >> >
> >> > So far we did not experience crashes ourselves, but we did not index
> >> > millions of vectors.
> >> >
> >> > I will try to reproduce the crash, maybe this will help us to move
> forward.
> >> >
> >> > Thanks
> >> >
> >> > Michael
> >> >
> >> > Am 05.04.23 um 18:30 schrieb Dawid Weiss:
> >> > >> Can you describe your crash in more detail?
> >> > > I can't. That experiment was a while ago and a quick test to see if
> I
> >> > > could index rather large-ish USPTO (patent office) data as vectors.
> >> > > Couldn't do it then.
> >> > >
> >> > >> How much RAM?
> >> > > My indexing jobs run with rather smallish heaps to give space for
> I/O
> >> > > buffers. Think 4-8GB at most. So yes, it could have been the
> problem.
> >> > > I recall segment merging grew slower and slower and then simply
> >> > > crashed. Lucene should work with low heap requirements, even if it
> >> > > slows down. Throwing ram at the indexing/ segment merging problem
> >> > > is... I don't know - not elegant?
> >> > >
> >> > > Anyway. My main point was to remind folks about how Apache works -
> >> > > code is merged in when there are no vetoes. If Rob (or anybody else)
> >> > > remains unconvinced, he or she can block the change. (I didn't
> invent
> >> > > those rules).
> >> > >
> >> > > D.
> >> > >
> >> > >
> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> > >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to