Re: [Proposal] Remove max number of dimensions for KNN vectors

Jack Conradson Wed, 05 Apr 2023 13:50:47 -0700

I don't want to get too far off topic, but I think one of the problems here
is that HNSW doesn't really fit well as a Lucene data structure. The way it
behaves it would be better supported as a live, in-memory data structure
instead of segmented and written to disk for tiny graphs that then need to
be merged. I wonder if it may be a better approach to explore other
possible algorithms that are designed to be on-disk instead of in-memory
even if they require k-means clustering as a trade-off. Maybe with an
on-disk algorithm we could have good enough performance for a
higher-dimensional limit.


On Wed, Apr 5, 2023 at 10:54 AM Robert Muir <rcm...@gmail.com> wrote:

> I'd ask anyone voting +1 to raise this limit to at least try to index
> a few million vectors with 756 or 1024, which is allowed today.
>
> IMO based on how painful it is, it seems the limit is already too
> high, I realize that will sound controversial but please at least try
> it out!
>
> voting +1 without at least doing this is really the
> "weak/unscientifically minded" approach.
>
> On Wed, Apr 5, 2023 at 12:52 PM Michael Wechner
> <michael.wech...@wyona.com> wrote:
> >
> > Thanks for your feedback!
> >
> > I agree, that it should not crash.
> >
> > So far we did not experience crashes ourselves, but we did not index
> > millions of vectors.
> >
> > I will try to reproduce the crash, maybe this will help us to move
> forward.
> >
> > Thanks
> >
> > Michael
> >
> > Am 05.04.23 um 18:30 schrieb Dawid Weiss:
> > >> Can you describe your crash in more detail?
> > > I can't. That experiment was a while ago and a quick test to see if I
> > > could index rather large-ish USPTO (patent office) data as vectors.
> > > Couldn't do it then.
> > >
> > >> How much RAM?
> > > My indexing jobs run with rather smallish heaps to give space for I/O
> > > buffers. Think 4-8GB at most. So yes, it could have been the problem.
> > > I recall segment merging grew slower and slower and then simply
> > > crashed. Lucene should work with low heap requirements, even if it
> > > slows down. Throwing ram at the indexing/ segment merging problem
> > > is... I don't know - not elegant?
> > >
> > > Anyway. My main point was to remind folks about how Apache works -
> > > code is merged in when there are no vetoes. If Rob (or anybody else)
> > > remains unconvinced, he or she can block the change. (I didn't invent
> > > those rules).
> > >
> > > D.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to