[ 
https://issues.apache.org/jira/browse/SOLR-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18017039#comment-18017039
 ] 

Anna commented on SOLR-17815:
-----------------------------

[ACORN|https://arxiv.org/pdf/2403.04871] is an algorithm designed to make 
hybrid searches consisting of a filter and a vector search more efficient. This 
approach tackles both the performance limitations of pre- and post- filtering.
It modifies the construction of the HNSW graph and the search on it.


In the [Lucene implementation|https://github.com/apache/lucene/pull/14160], 
only the search part has been integrated, leaving the graph construction 
unchanged (which remains a classic HNSW).
The idea is to traverse the subgraph of the index induced by the set of nodes 
that pass the query filter. Specifically, an expanded neighbour search (two-hop 
expansion) is performed, which allows a greater number of potentially valid 
nodes (that match the filter entered) to be reached. 


[Empirical 
tests|https://docs.google.com/spreadsheets/d/1gk1uybtqleVtDUfhWXActyhW8q_lgG1mlMrOohnJRJA/edit?gid=0#gid=0]
 have shown that this algorithm brings a significant advantage in terms of time 
and recall when the number of documents eliminated by the imposed filter 
exceeds 40% [https://www.elastic.co/search-labs/blog/filtered-hnsw-knn-search]. 
If, on the other hand, the percentage of documents removed is less than 40%, 
the algorithm performs comparably to HNSW. For this reason, Lucene decided to 
set this parameter [DEFAULT_FILTERED_SEARCH_THRESHOLD] to 60 and to use ACORN 
only when the percentage of documents that pass the filter is less than 60%.

I think that in Solr this situation remains valid for most use cases and it may 
make sense to hide this parameter from the Solr administrator to simplify the 
configuration and use of the software (it would be necessary to understand if 
there are borderline cases where this may not be appropriate, and it is instead 
necessary to allow the user to modify this behaviour).

> Add parameter to regulate for ACORN-based filtering in vector search?
> ---------------------------------------------------------------------
>
>                 Key: SOLR-17815
>                 URL: https://issues.apache.org/jira/browse/SOLR-17815
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>            Reporter: Alessandro Benedetti
>            Priority: Major
>
> ACORN is an interesting approach to optimised filtered vector search: 
> https://arxiv.org/abs/2403.04871
> ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and 
> Structured Data
> Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
> h1. LUCENE IMPLEMENTATION
> This was implemented in Lucene with 
> https://github.com/apache/lucene/pull/14160
> Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
> that can be used in Solr via 
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> /**
>      * Create a new Hnsw strategy
>      *
>      * @param filteredSearchThreshold threshold for filtered search, a 
> percentage value from 0 to
>      *     100 where 0 means never use filtered search and 100 means always 
> use filtered search.
>      */
>     public Hnsw(int filteredSearchThreshold) {
>       if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
>         throw new IllegalArgumentException("filteredSearchThreshold must be 
> >= 0 and <= 100");
>       }
>       this.filteredSearchThreshold = filteredSearchThreshold;
>     }
> h1. DEFAULT
> ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
> {code:java}
> /**
>  * Find the <code>k</code> nearest documents to the target vector according 
> to the vectors in the
>  * given field. <code>target</code> vector.
>  *
>  * @param field a field that has been indexed as a {@link 
> KnnFloatVectorField}.
>  * @param target the target of the search
>  * @param k the number of documents to find
>  * @param filter a filter applied before the vector search
>  * @throws IllegalArgumentException if <code>k</code> is less than 1
>  */
> public KnnFloatVectorQuery(String field, float[] target, int k, Query filter) 
> {
>   this(field, target, k, filter, DEFAULT);
> }
> {code}
> focus on the DEFAULT
> that's the
> public static final Hnsw DEFAULT = new 
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> so that's the default search strategy
> now, what does it mean the '60' treshold?
> {code:java}
>  @param filteredSearchThreshold threshold for filtered search, a percentage 
> value from 0 to
> *     100 where 0 means never use filtered search and 100 means always use 
> filtered search.
> {code}
> so with a 0 no ACORN search at all
> with anything greater than 0 Lucene will enable or not ACORN based on this 
> condition:
> {code:java}
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
> if (acceptOrds != null
>     // We can only use filtered search if we know the maxConn
>     && graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
>     && filteredDocCount > 0
>     && hnswStrategy.useFilteredSearch((float) filteredDocCount / 
> graph.size())) {
>   innerSearcher =
>       FilteredHnswGraphSearcher.create(knnCollector.k(), graph, 
> filteredDocCount, acceptOrds);
> {code}
> Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as 
> the threshold.
>  public static class Hnsw extends KnnSearchStrategy {
>     public static final Hnsw DEFAULT = new 
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> h1. SCOPE OF THIS ISSUE
> This issue should study when ACORN is useful or not, and if the default is 
> not good enough for Solr.
> If not, the expected result from this task is a detailed motivation and the 
> implementation of a parameter that gives users the possibility of 
> disabling/regulating the ACORN behavior.
> Having flexibility is great, but it may not be necessary to add the 
> additional complexity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to