adamjq opened a new pull request, #4492: URL: https://github.com/apache/solr/pull/4492
https://issues.apache.org/jira/browse/SOLR-18267 # Description There are certain use cases, such as highly selective filters on large datasets, where it can be more efficient to perform a brute-force KNN search as a post-filter, instead of during ANN search. Solr currently supports this use case with the [vectorSimilarity Function](https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#vectorsimilarity-function) and an `fq`, but still requires an HNSW graph to be built during indexing when using DenseVectorField, even if it's not used during search. The goal of this feature is to avoid paying the cost of HNSW graph construction and rebuilding ingestion when ANN search isn't used. # Solution This PR introduces a new `knnAlgorithm=flat` option to DenseVectorField that uses [Lucene99FlatVectorsFormat](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsFormat.java). This stores vectors in the index (.vec/.vemf files) without building the HNSW graph (.vex/.vem files). `Lucene99FlatVectorsFormat` is not registered in Lucene's SPI, so this PR includes a wrapper class SolrFlatVectorFormat that delegates to Lucene99FlatVectorsFormat as a workaround. There are examples in other Lucene-based engines using a similar pattern to provide a flat vector format for exact KNN search that wraps Lucene99FlatVectorsFormat. ## Limitations This PR currently doesn't support: - `knnAlgorithm=flat` for quantized variants - search across flat dense vector fields using the `knn` query parser. Only vectorSimilarity is initially supported. Both features could be shipped as follow-ups. AI Disclosure: Claude was used to assist with this PR. All code has been reviewed and tested by me. # Tests Unit tests for Dense Vector Fields and quantized variants. # Checklist Please review the following and check all that apply: - [X] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability. - [X] I have created a Jira issue and added the issue ID to my pull request title. - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation) - [X] I have developed this patch against the `main` branch. - [X] I have run `./gradlew check`. - [X] I have added tests for my changes. - [X] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide) - [X] I have added a [changelog entry](https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc) for my change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
