adamjq opened a new pull request, #4492:
URL: https://github.com/apache/solr/pull/4492

   https://issues.apache.org/jira/browse/SOLR-18267
   
   # Description
   
   There are certain use cases, such as highly selective filters on large 
datasets, where it can be more efficient to perform a brute-force KNN search as 
a post-filter, instead of during ANN search.
   
   Solr currently supports this use case with the [vectorSimilarity 
Function](https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#vectorsimilarity-function)
 and an `fq`, but still requires an HNSW graph to be built during indexing when 
using DenseVectorField, even if it's not used during search. The goal of this 
feature is to avoid paying the cost of HNSW graph construction and rebuilding 
ingestion when ANN search isn't used.
   
   # Solution
   
   This PR introduces a new `knnAlgorithm=flat` option to DenseVectorField that 
uses 
[Lucene99FlatVectorsFormat](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsFormat.java).
 This stores vectors in the index (.vec/.vemf files) without building the HNSW 
graph (.vex/.vem files).
   
    `Lucene99FlatVectorsFormat` is not registered in Lucene's SPI, so this PR 
includes a wrapper class SolrFlatVectorFormat that delegates to 
Lucene99FlatVectorsFormat as a workaround. There are examples in other 
Lucene-based engines using a similar pattern to provide a flat vector format 
for exact KNN search that wraps Lucene99FlatVectorsFormat.
   
   ## Limitations
   
   This PR currently doesn't support:
   - `knnAlgorithm=flat` for quantized variants
   - search across flat dense vector fields using the `knn` query parser. Only 
vectorSimilarity is initially supported.
   
   Both features could be shipped as follow-ups.
   
   AI Disclosure: Claude was used to assist with this PR. All code has been 
reviewed and tested by me.
   
   # Tests
   
   Unit tests for Dense Vector Fields and quantized variants.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to 
Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my 
code conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [X] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended, not available for 
branches on forks living under an organisation)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   - [X] I have added documentation for the [Reference 
Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   - [X] I have added a [changelog 
entry](https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc) for my 
change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to