[ 
https://issues.apache.org/jira/browse/SOLR-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Benedetti updated SOLR-17815:
----------------------------------------
    Description: 
ACORN is an interesting approach to optimised filtered vector search: 
https://arxiv.org/abs/2403.04871
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and 
Structured Data
Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia

h1. LUCENE IMPLEMENTATION
This was implemented in Lucene with https://github.com/apache/lucene/pull/14160
Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
that can be used in Solr via org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw

/**
     * Create a new Hnsw strategy
     *
     * @param filteredSearchThreshold threshold for filtered search, a 
percentage value from 0 to
     *     100 where 0 means never use filtered search and 100 means always use 
filtered search.
     */
    public Hnsw(int filteredSearchThreshold) {
      if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
        throw new IllegalArgumentException("filteredSearchThreshold must be >= 
0 and <= 100");
      }
      this.filteredSearchThreshold = filteredSearchThreshold;
    }

h1. DEFAULT
ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as 
the threshold.

 public static class Hnsw extends KnnSearchStrategy {
    public static final Hnsw DEFAULT = new 
Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);

h1. SCOPE OF THIS ISSUE

This issue should study when ACORN is useful or not, and if the default is not 
good enough for Solr.
If not, the expected result from this task is a detailed motivation and the 
implementation of a parameter that gives users the possibility of 
disabling/regulating the ACORN behavior.


  was:
ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
Disabling ACORN can be obtained at KnnSearchStrategy level.

 public static class Hnsw extends KnnSearchStrategy {
    public static final Hnsw DEFAULT = new 
Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);

This issue should study when ACORN is useful or not, and if the default is not 
good enough for Solr, we should make it parametric and explain it how to use it 
to the users.

More info will follow and be better organised soon):


An interesting paper in this area: https://arxiv.org/abs/2403.04871
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and 
Structured Data

Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia

This was implemented in Lucene in https://github.com/apache/lucene/pull/14160

Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
that can be used in Solr via org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw

/**
     * Create a new Hnsw strategy
     *
     * @param filteredSearchThreshold threshold for filtered search, a 
percentage value from 0 to
     *     100 where 0 means never use filtered search and 100 means always use 
filtered search.
     */
    public Hnsw(int filteredSearchThreshold) {
      if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
        throw new IllegalArgumentException("filteredSearchThreshold must be >= 
0 and <= 100");
      }
      this.filteredSearchThreshold = filteredSearchThreshold;
    }

We may pass an additional parameter to disable it (filteredSearchThreshold=0), 
the default is:
 public static final int DEFAULT_FILTERED_SEARCH_THRESHOLD = 60;


> Add parameter to regulate for ACORN-based filtering in vector search?
> ---------------------------------------------------------------------
>
>                 Key: SOLR-17815
>                 URL: https://issues.apache.org/jira/browse/SOLR-17815
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>            Reporter: Alessandro Benedetti
>            Priority: Major
>
> ACORN is an interesting approach to optimised filtered vector search: 
> https://arxiv.org/abs/2403.04871
> ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and 
> Structured Data
> Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
> h1. LUCENE IMPLEMENTATION
> This was implemented in Lucene with 
> https://github.com/apache/lucene/pull/14160
> Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
> that can be used in Solr via 
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> /**
>      * Create a new Hnsw strategy
>      *
>      * @param filteredSearchThreshold threshold for filtered search, a 
> percentage value from 0 to
>      *     100 where 0 means never use filtered search and 100 means always 
> use filtered search.
>      */
>     public Hnsw(int filteredSearchThreshold) {
>       if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
>         throw new IllegalArgumentException("filteredSearchThreshold must be 
> >= 0 and <= 100");
>       }
>       this.filteredSearchThreshold = filteredSearchThreshold;
>     }
> h1. DEFAULT
> ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
> Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as 
> the threshold.
>  public static class Hnsw extends KnnSearchStrategy {
>     public static final Hnsw DEFAULT = new 
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> h1. SCOPE OF THIS ISSUE
> This issue should study when ACORN is useful or not, and if the default is 
> not good enough for Solr.
> If not, the expected result from this task is a detailed motivation and the 
> implementation of a parameter that gives users the possibility of 
> disabling/regulating the ACORN behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to