Re: [Proposal] Remove max number of dimensions for KNN vectors

jim ferenczi Fri, 07 Apr 2023 14:28:50 -0700

The inference time (and cost) to generate these big vectors must be quite
large too ;).
Regarding the ram buffer, we could drastically reduce the size by writing
the vectors on disk instead of keeping them in the heap. With 1k dimensions
the ram buffer is filled with these vectors quite rapidly.


On Fri, 7 Apr 2023 at 21:59, Robert Muir <[email protected]> wrote:

> On Fri, Apr 7, 2023 at 7:47 AM Michael Sokolov <[email protected]> wrote:
> >
> > 8M 1024d float vectors indexed in 1h48m (16G heap, IW buffer size=1994)
> > 4M 2048d float vectors indexed in 1h44m (w/ 4G heap, IW buffer size=1994)
> >
> > Robert, since you're the only on-the-record veto here, does this
> > change your thinking at all, or if not could you share some test
> > results that didn't go the way you expected? Maybe we can find some
> > mitigation if we focus on a specific issue.
> >
>
> My scale concerns are both space and time. What does the execution
> time look like if you don't set insanely large IW rambuffer? The
> default is 16MB. Just concerned we're shoving some problems under the
> rug :)
>
> Even with the yuge RAMbuffer, we're still talking about almost 2 hours
> to index 4M documents with these 2k vectors. Whereas you'd measure
> this in seconds with typical lucene indexing, its nothing.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to