Sure, I'm adding HNSW support to Cassandra. (Lots more detail on the dev@cassandra list.)
HnswGraph says "The graph may be searched by multiple threads concurrently," but OnHeapHnswGraph has a field cur that gets modified by seek, which is called by Searcher. Bug, or outdated comment? On Thu, Apr 20, 2023 at 1:45 PM Michael Sokolov <msoko...@gmail.com> wrote: > Right RAVectorValues is just fronting an array of vectors and it > doesn't have any intermediate storage or other state (like a file > pointer) so it can support many simultaneous callers. Other > implementations of the interface work differently; see > OffHeapByteVectorValues, which is representing vectors in the index > and implemented using I/O calls. > > If you shared some context about your interest here, we might be able > to help you better. > > On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis <jbel...@gmail.com> wrote: > > > > It looks like I misunderstood how the Builder works, and the RAVV > provided to the constructor does not need to contain any values up front. > Specifically, Lucene95HnswVectorsWriter.FieldWriter adds vectors > incrementally to the RAVV that it gives to the builder as addValue is > called. > > > > On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov <msoko...@gmail.com> > wrote: > >> > >> That class is intended for use by the Lucene index writer - it's not > >> designed as a general purpose class for re-use outside that context. > >> And IndexWriter writes documents to disk in bulk. > >> > >> On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > > >> > Thanks, Michael! > >> > > >> > Looking at the paper by Malkov and Yashunin, it looks like the > algorithm allows for building the hnsw graph incrementally. Why does our > implementation require specifying all the vectors up front to > HnswGraphBuilder.create? > >> > > >> > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com> > wrote: > >> >> > >> >> These vector values have internal buffers they use to return the > vectors. In order to compare two vectors we need to use two independent > sources so that one doesn't overwrite this internal state when fetching the > second vector. > >> >> > >> >> Sorry I forgot the second question and can't see it on my phone. Brb > >> >> > >> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> > wrote: > >> >>> > >> >>> HI all, a couple questions on how HNSW works: > >> >>> > >> >>> 1. What is driving the requirement for two copies of the input > vectors? It looks like the RAVV implementations do shallow copies, so the > vector from A is the same that would be returned by B. What am I missing? > >> >>> > >> >>> 2. What is the intended behavior when adding identical vectors to a > HNSW? It looks like when I supply 10 identical vectors, they all get added > to the graph, but when I search for the nearest neighbors, I only get one > of them in the result set. > >> >>> > >> >>> -- > >> >>> Jonathan Ellis > >> >>> co-founder, http://www.datastax.com > >> >>> @spyced > >> > > >> > > >> > > >> > -- > >> > Jonathan Ellis > >> > co-founder, http://www.datastax.com > >> > @spyced > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > > > -- > > Jonathan Ellis > > co-founder, http://www.datastax.com > > @spyced > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced