Correct.  They will be ordered closest-first.

Unfortunately it's not possible for the near or medium future to do
farthest-first.  HNSW index gets to log(n) time by only keeping a subset of
the closest neighbors for each vector.  So you'd need a separate index with
a inverse-cosine similarity metric, and it's not possible today to use a
custom metric function.

(This has been GA for over a year in Elastic and Solr and so far nobody has
needed farthest-first badly enough to add this as an option to the
underlying Lucene library.)

You can get the distances back today, like this:

SELECT my_text, similarity_cosine(my_embedding, ?)
FROM my_table
ORDER BY my_embedding ANN OF ? LIMIT 2

Then just pass the query vector into both bind variables.

On Fri, Jun 16, 2023 at 7:09 AM Andrew Cobley (Staff) <
a.e.cob...@dundee.ac.uk> wrote:

> Hi,
>
> I’ve got a question and a request about this CEP
>
> In the example:
>
> SELECT * FROM test.foo WHERE j ANN OF [3.4, 7.8, 9.1] limit 1;
>
>
> I presume that limit n will return the nth nearest neighbours?
>
> If that’s the case what order will they be in? Is it posssible to reverse
> the order ?
>
> Secondly would it be possible to return the calculated distances?  This
> might be particular important if there are n returned neighbours?
>
> Andy
> ------------------------------
> *From:* Patrick McFadin <pmcfa...@gmail.com>
> *Sent:* 15 June 2023 01:03
> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
> *Subject:* Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
> Andy,
>
> Good to see you on the ML again! CEP-30 is slated for release with 5.0
> later in the year. Until then, you'll need to do a local build or try it
> out in a preview in Astra. A few of us have been talking about creating a
> preview docker image since there is some interest in having it run in
> k8ssandra. In any case, this is very alpha code and should be treated as
> such. Reporting errors or unusual results would be greatly appreciated!
>
> Patrick
>
>
>
> On Wed, Jun 14, 2023 at 7:10 AM Andrew Cobley (Staff) <
> a.e.cob...@dundee.ac.uk> wrote:
>
> Hi All,
>
>
>
> Great news this has gone through, I wondering if we have a timescale for
> this making it to Beta or release ?  I’m asking because we have a project
> that would benefit from this approach.
>
>
>
> Andy
>
>
>
>
>
> *From: *Jonathan Ellis <jbel...@gmail.com>
> *Date: *Tuesday, 30 May 2023 at 14:44
> *To: *dev <dev@cassandra.apache.org>
> *Subject: *Re: [VOTE] CEP-30 ANN Vector Search
>
>
>
> CAUTION: This email originated from outside the University of Dundee. Do
> not click links or open attachments unless you recognise the sender's email
> address and know the content is safe.
>
> Thanks, all.  Closing the vote as accepted with 8 binding +1 (including
> me) and 11 non-binding votes.
>
>
>
> On Thu, May 25, 2023 at 10:45 AM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> Let's make this official.
>
>
> CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes
>
>
>
> POC that demonstrates all the big rocks, including distributed queries:
> https://github.com/datastax/cassandra/tree/cep-vsearch
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
>
>
> --
>
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>
>
> The University of Dundee is a registered Scottish Charity, No: SC015096
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to