Re: [Proposal] Remove max number of dimensions for KNN vectors

Alessandro Benedetti Thu, 06 Apr 2023 08:38:44 -0700

As I said earlier, a max limit limits usability.
It's not forcing users with small vectors to pay the performance penalty of
big vectors, it's literally preventing some users to use
Lucene/Solr/Elasticsearch at all.
As far as I know, the max limit is used to raise an exception, it's not
used to initialise or optimise data structures (please correct me if I'm
wrong).


Improving the algorithm performance is a separate discussion.
I don't see a correlation with the fact that indexing billions of whatever
dimensioned vector is slow with a usability parameter.

What about potential users that need few high dimensional vectors?

As I said before, I am a big +1 for NOT just raise it blindly, but I
believe we need to remove the limit or size it in a way it's not a problem
for both users and internal data structure optimizations, if any.


On Wed, 5 Apr 2023, 18:54 Robert Muir, <rcm...@gmail.com> wrote:

> I'd ask anyone voting +1 to raise this limit to at least try to index
> a few million vectors with 756 or 1024, which is allowed today.
>
> IMO based on how painful it is, it seems the limit is already too
> high, I realize that will sound controversial but please at least try
> it out!
>
> voting +1 without at least doing this is really the
> "weak/unscientifically minded" approach.
>
> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner
> <michael.wech...@wyona.com> wrote:
> >
> > Thanks for your feedback!
> >
> > I agree, that it should not crash.
> >
> > So far we did not experience crashes ourselves, but we did not index
> > millions of vectors.
> >
> > I will try to reproduce the crash, maybe this will help us to move
> forward.
> >
> > Thanks
> >
> > Michael
> >
> > Am 05.04.23 um 18:30 schrieb Dawid Weiss:
> > >> Can you describe your crash in more detail?
> > > I can't. That experiment was a while ago and a quick test to see if I
> > > could index rather large-ish USPTO (patent office) data as vectors.
> > > Couldn't do it then.
> > >
> > >> How much RAM?
> > > My indexing jobs run with rather smallish heaps to give space for I/O
> > > buffers. Think 4-8GB at most. So yes, it could have been the problem.
> > > I recall segment merging grew slower and slower and then simply
> > > crashed. Lucene should work with low heap requirements, even if it
> > > slows down. Throwing ram at the indexing/ segment merging problem
> > > is... I don't know - not elegant?
> > >
> > > Anyway. My main point was to remind folks about how Apache works -
> > > code is merged in when there are no vetoes. If Rob (or anybody else)
> > > remains unconvinced, he or she can block the change. (I didn't invent
> > > those rules).
> > >
> > > D.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to