Correct. They will be ordered closest-first. Unfortunately it's not possible for the near or medium future to do farthest-first. HNSW index gets to log(n) time by only keeping a subset of the closest neighbors for each vector. So you'd need a separate index with a inverse-cosine similarity metric, and it's not possible today to use a custom metric function.
(This has been GA for over a year in Elastic and Solr and so far nobody has needed farthest-first badly enough to add this as an option to the underlying Lucene library.) You can get the distances back today, like this: SELECT my_text, similarity_cosine(my_embedding, ?) FROM my_table ORDER BY my_embedding ANN OF ? LIMIT 2 Then just pass the query vector into both bind variables. On Fri, Jun 16, 2023 at 7:09 AM Andrew Cobley (Staff) < a.e.cob...@dundee.ac.uk> wrote: > Hi, > > I’ve got a question and a request about this CEP > > In the example: > > SELECT * FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1; > > > I presume that limit n will return the nth nearest neighbours? > > If that’s the case what order will they be in? Is it posssible to reverse > the order ? > > Secondly would it be possible to return the calculated distances? This > might be particular important if there are n returned neighbours? > > Andy > ------------------------------ > *From:* Patrick McFadin <pmcfa...@gmail.com> > *Sent:* 15 June 2023 01:03 > *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> > *Subject:* Re: [VOTE] CEP-30 ANN Vector Search > > > > > CAUTION: This email originated from outside the University of Dundee. Do > not click links or open attachments unless you recognise the sender's email > address and know the content is safe. > Andy, > > Good to see you on the ML again! CEP-30 is slated for release with 5.0 > later in the year. Until then, you'll need to do a local build or try it > out in a preview in Astra. A few of us have been talking about creating a > preview docker image since there is some interest in having it run in > k8ssandra. In any case, this is very alpha code and should be treated as > such. Reporting errors or unusual results would be greatly appreciated! > > Patrick > > > > On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) < > a.e.cob...@dundee.ac.uk> wrote: > > Hi All, > > > > Great news this has gone through, I wondering if we have a timescale for > this making it to Beta or release ? I’m asking because we have a project > that would benefit from this approach. > > > > Andy > > > > > > *From: *Jonathan Ellis <jbel...@gmail.com> > *Date: *Tuesday, 30 May 2023 at 14:44 > *To: *dev <dev@cassandra.apache.org> > *Subject: *Re: [VOTE] CEP-30 ANN Vector Search > > > > CAUTION: This email originated from outside the University of Dundee. Do > not click links or open attachments unless you recognise the sender's email > address and know the content is safe. > > Thanks, all. Closing the vote as accepted with 8 binding +1 (including > me) and 11 non-binding votes. > > > > On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis <jbel...@gmail.com> wrote: > > Let's make this official. > > > CEP: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes > > > > POC that demonstrates all the big rocks, including distributed queries: > https://github.com/datastax/cassandra/tree/cep-vsearch > > > -- > > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > > > > -- > > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > > The University of Dundee is a registered Scottish Charity, No: SC015096 > > > The University of Dundee is a registered Scottish Charity, No: SC015096 > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced