Sure, I'm adding HNSW support to Cassandra.  (Lots more detail on the
dev@cassandra list.)

HnswGraph says "The graph may be searched by multiple threads
concurrently," but OnHeapHnswGraph has a field cur that gets modified by
seek, which is called by Searcher.  Bug, or outdated comment?

On Thu, Apr 20, 2023 at 1:45 PM Michael Sokolov <msoko...@gmail.com> wrote:

> Right RAVectorValues is just fronting an array of vectors and it
> doesn't have any intermediate storage or other state (like a file
> pointer) so it can support many simultaneous callers. Other
> implementations of the interface work differently; see
> OffHeapByteVectorValues, which is representing vectors in the index
> and implemented using I/O calls.
>
> If you shared some context about your interest here, we might be able
> to help you better.
>
> On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis <jbel...@gmail.com> wrote:
> >
> > It looks like I misunderstood how the Builder works, and the RAVV
> provided to the constructor does not need to contain any values up front.
> Specifically, Lucene95HnswVectorsWriter.FieldWriter adds vectors
> incrementally to the RAVV that it gives to the builder as addValue is
> called.
> >
> > On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
> >>
> >> That class is intended for use by the Lucene index writer - it's not
> >> designed as a general purpose class for re-use outside that context.
> >> And IndexWriter writes documents to disk in bulk.
> >>
> >> On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >> >
> >> > Thanks, Michael!
> >> >
> >> > Looking at the paper by Malkov and Yashunin, it looks like the
> algorithm allows for building the hnsw graph incrementally.  Why does our
> implementation require specifying all the vectors up front to
> HnswGraphBuilder.create?
> >> >
> >> > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com>
> wrote:
> >> >>
> >> >> These vector values have internal buffers they use to return the
> vectors. In order to compare two vectors we need to use two independent
> sources so that one doesn't overwrite this internal state when fetching the
> second vector.
> >> >>
> >> >> Sorry I forgot the second question and can't see it on my phone. Brb
> >> >>
> >> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >> >>>
> >> >>> HI all, a couple questions on how HNSW works:
> >> >>>
> >> >>> 1. What is driving the requirement for two copies of the input
> vectors?  It looks like the RAVV implementations do shallow copies, so the
> vector from A is the same that would be returned by B.  What am I missing?
> >> >>>
> >> >>> 2. What is the intended behavior when adding identical vectors to a
> HNSW?  It looks like when I supply 10 identical vectors, they all get added
> to the graph, but when I search for the nearest neighbors, I only get one
> of them in the result set.
> >> >>>
> >> >>> --
> >> >>> Jonathan Ellis
> >> >>> co-founder, http://www.datastax.com
> >> >>> @spyced
> >> >
> >> >
> >> >
> >> > --
> >> > Jonathan Ellis
> >> > co-founder, http://www.datastax.com
> >> > @spyced
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to