[
https://issues.apache.org/jira/browse/SOLR-17815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessandro Benedetti updated SOLR-17815:
----------------------------------------
Description:
ACORN is an interesting approach to optimised filtered vector search:
https://arxiv.org/abs/2403.04871
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and
Structured Data
Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
h1. LUCENE IMPLEMENTATION
This was implemented in Lucene with https://github.com/apache/lucene/pull/14160
Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
that can be used in Solr via org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
/**
* Create a new Hnsw strategy
*
* @param filteredSearchThreshold threshold for filtered search, a
percentage value from 0 to
* 100 where 0 means never use filtered search and 100 means always use
filtered search.
*/
public Hnsw(int filteredSearchThreshold) {
if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
throw new IllegalArgumentException("filteredSearchThreshold must be >=
0 and <= 100");
}
this.filteredSearchThreshold = filteredSearchThreshold;
}
h1. DEFAULT
ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
{code:java}
/**
* Find the <code>k</code> nearest documents to the target vector according to
the vectors in the
* given field. <code>target</code> vector.
*
* @param field a field that has been indexed as a {@link KnnFloatVectorField}.
* @param target the target of the search
* @param k the number of documents to find
* @param filter a filter applied before the vector search
* @throws IllegalArgumentException if <code>k</code> is less than 1
*/
public KnnFloatVectorQuery(String field, float[] target, int k, Query filter) {
this(field, target, k, filter, DEFAULT);
}
{code}
focus on the DEFAULT
that's the
public static final Hnsw DEFAULT = new Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
so that's the default search strategy
now, what does it mean the '60' treshold?
{code:java}
@param filteredSearchThreshold threshold for filtered search, a percentage
value from 0 to
* 100 where 0 means never use filtered search and 100 means always use
filtered search.
{code}
so with a 0 no ACORN search at all
with anything greater than 0 Lucene will enable or not ACORN based on this
condition:
{code:java}
org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
if (acceptOrds != null
// We can only use filtered search if we know the maxConn
&& graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
&& filteredDocCount > 0
&& hnswStrategy.useFilteredSearch((float) filteredDocCount / graph.size()))
{
innerSearcher =
FilteredHnswGraphSearcher.create(knnCollector.k(), graph,
filteredDocCount, acceptOrds);
{code}
Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as
the threshold.
public static class Hnsw extends KnnSearchStrategy {
public static final Hnsw DEFAULT = new
Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
h1. SCOPE OF THIS ISSUE
This issue should study when ACORN is useful or not, and if the default is not
good enough for Solr.
If not, the expected result from this task is a detailed motivation and the
implementation of a parameter that gives users the possibility of
disabling/regulating the ACORN behavior.
Having flexibility is great, but it may not be necessary to add the additional
complexity.
was:
ACORN is an interesting approach to optimised filtered vector search:
https://arxiv.org/abs/2403.04871
ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and
Structured Data
Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
h1. LUCENE IMPLEMENTATION
This was implemented in Lucene with https://github.com/apache/lucene/pull/14160
Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
that can be used in Solr via org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
/**
* Create a new Hnsw strategy
*
* @param filteredSearchThreshold threshold for filtered search, a
percentage value from 0 to
* 100 where 0 means never use filtered search and 100 means always use
filtered search.
*/
public Hnsw(int filteredSearchThreshold) {
if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
throw new IllegalArgumentException("filteredSearchThreshold must be >=
0 and <= 100");
}
this.filteredSearchThreshold = filteredSearchThreshold;
}
h1. DEFAULT
ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
{code:java}
/**
* Find the <code>k</code> nearest documents to the target vector according to
the vectors in the
* given field. <code>target</code> vector.
*
* @param field a field that has been indexed as a {@link KnnFloatVectorField}.
* @param target the target of the search
* @param k the number of documents to find
* @param filter a filter applied before the vector search
* @throws IllegalArgumentException if <code>k</code> is less than 1
*/
public KnnFloatVectorQuery(String field, float[] target, int k, Query filter) {
this(field, target, k, filter, DEFAULT);
}
{code}
focus on the DEFAULT
that's the
public static final Hnsw DEFAULT = new Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
so that's the default search strategy
now, what does it mean the '60' treshold?
{code:java}
@param filteredSearchThreshold threshold for filtered search, a percentage
value from 0 to
* 100 where 0 means never use filtered search and 100 means always use
filtered search.
{code}
so with a 0 no ACORN search at all
with anything greater than 0 Lucene will enable or not ACORN based on this
condition:
org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
if (acceptOrds != null
// We can only use filtered search if we know the maxConn
&& graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
&& filteredDocCount > 0
&& hnswStrategy.useFilteredSearch((float) filteredDocCount / graph.size()))
{
innerSearcher =
FilteredHnswGraphSearcher.create(knnCollector.k(), graph,
filteredDocCount, acceptOrds);
Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as
the threshold.
public static class Hnsw extends KnnSearchStrategy {
public static final Hnsw DEFAULT = new
Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
h1. SCOPE OF THIS ISSUE
This issue should study when ACORN is useful or not, and if the default is not
good enough for Solr.
If not, the expected result from this task is a detailed motivation and the
implementation of a parameter that gives users the possibility of
disabling/regulating the ACORN behavior.
> Add parameter to regulate for ACORN-based filtering in vector search?
> ---------------------------------------------------------------------
>
> Key: SOLR-17815
> URL: https://issues.apache.org/jira/browse/SOLR-17815
> Project: Solr
> Issue Type: New Feature
> Components: vector-search
> Reporter: Alessandro Benedetti
> Priority: Major
>
> ACORN is an interesting approach to optimised filtered vector search:
> https://arxiv.org/abs/2403.04871
> ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and
> Structured Data
> Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia
> h1. LUCENE IMPLEMENTATION
> This was implemented in Lucene with
> https://github.com/apache/lucene/pull/14160
> Specifically in org.apache.lucene.util.hnsw.FilteredHnswGraphSearcher
> that can be used in Solr via
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> /**
> * Create a new Hnsw strategy
> *
> * @param filteredSearchThreshold threshold for filtered search, a
> percentage value from 0 to
> * 100 where 0 means never use filtered search and 100 means always
> use filtered search.
> */
> public Hnsw(int filteredSearchThreshold) {
> if (filteredSearchThreshold < 0 || filteredSearchThreshold > 100) {
> throw new IllegalArgumentException("filteredSearchThreshold must be
> >= 0 and <= 100");
> }
> this.filteredSearchThreshold = filteredSearchThreshold;
> }
> h1. DEFAULT
> ACORN with a threshold of '60' is the default when we upgrade to Lucene 10.x .
> {code:java}
> /**
> * Find the <code>k</code> nearest documents to the target vector according
> to the vectors in the
> * given field. <code>target</code> vector.
> *
> * @param field a field that has been indexed as a {@link
> KnnFloatVectorField}.
> * @param target the target of the search
> * @param k the number of documents to find
> * @param filter a filter applied before the vector search
> * @throws IllegalArgumentException if <code>k</code> is less than 1
> */
> public KnnFloatVectorQuery(String field, float[] target, int k, Query filter)
> {
> this(field, target, k, filter, DEFAULT);
> }
> {code}
> focus on the DEFAULT
> that's the
> public static final Hnsw DEFAULT = new
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> where Hnsw is org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw
> so that's the default search strategy
> now, what does it mean the '60' treshold?
> {code:java}
> @param filteredSearchThreshold threshold for filtered search, a percentage
> value from 0 to
> * 100 where 0 means never use filtered search and 100 means always use
> filtered search.
> {code}
> so with a 0 no ACORN search at all
> with anything greater than 0 Lucene will enable or not ACORN based on this
> condition:
> {code:java}
> org.apache.lucene.search.knn.KnnSearchStrategy.Hnsw#useFilteredSearch
> if (acceptOrds != null
> // We can only use filtered search if we know the maxConn
> && graph.maxConn() != HnswGraph.UNKNOWN_MAX_CONN
> && filteredDocCount > 0
> && hnswStrategy.useFilteredSearch((float) filteredDocCount /
> graph.size())) {
> innerSearcher =
> FilteredHnswGraphSearcher.create(knnCollector.k(), graph,
> filteredDocCount, acceptOrds);
> {code}
> Disabling ACORN can be obtained at the KnnSearchStrategy level passing '0' as
> the threshold.
> public static class Hnsw extends KnnSearchStrategy {
> public static final Hnsw DEFAULT = new
> Hnsw(DEFAULT_FILTERED_SEARCH_THRESHOLD);
> h1. SCOPE OF THIS ISSUE
> This issue should study when ACORN is useful or not, and if the default is
> not good enough for Solr.
> If not, the expected result from this task is a detailed motivation and the
> implementation of a parameter that gives users the possibility of
> disabling/regulating the ACORN behavior.
> Having flexibility is great, but it may not be necessary to add the
> additional complexity.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]