[
https://issues.apache.org/jira/browse/SOLR-18207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075656#comment-18075656
]
Sanjay Dutt edited comment on SOLR-18207 at 4/23/26 10:03 AM:
--------------------------------------------------------------
Indexed million vectors, float with 1024 dims
||file extension||vector not stored ||vector stored||diff||
|fdt|4.4772 KB|5.54 GB|+5.54|
|cfs|2.46 GB|5.67 GB|+3.21|
|vec|3.98 GB|4.11 GB|+0.14|
was (Author: JIRAUSER305513):
Indexed million vectors
||file extension||vector not stored ||vector stored||diff||
|fdt|4.4772 KB|5.54 GB|+5.54|
|cfs|2.46 GB|5.67 GB|+3.21|
|vec|3.98 GB|4.11 GB|+0.14|
> Add derived stored retrieval for DenseVectorField to avoid duplicate vector
> storage
> -----------------------------------------------------------------------------------
>
> Key: SOLR-18207
> URL: https://issues.apache.org/jira/browse/SOLR-18207
> Project: Solr
> Issue Type: Task
> Reporter: Sanjay Dutt
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Solr DenseVectorField currently stores vector data twice when stored="true":
> once in Lucene’s vector index for kNN/search and again in stored fields for
> retrieval. This increases index size significantly for large vector workloads.
> This change adds an opt-in mode for DenseVectorField that preserves
> stored-field semantics for normal document retrieval while avoiding the
> redundant stored-field copy of the vector payload. Instead, Solr reconstructs
> the returned vector value from Lucene vector data at fetch time.
> Key points:
> * Adds an opt-in field type/property for derived vector retrieval.
> * Avoids writing redundant stored vector bytes at index time.
> * Extends document fetch to materialize vector values from Lucene vector
> readers.
> * Keeps existing behavior unchanged unless the new option is enabled.
> * Documents the fetch-time tradeoff and recommends caution for hot paths
> that return vectors frequently, especially fl=*.
> {code:java}
> <fieldType name="knn_vector_derived"
> class="solr.DenseVectorField"
> vectorDimension="1024"
> similarityFunction="cosine"
> knnAlgorithm="hnsw"
> indexed="true"
> useVectorValuesAsStored="true"/>{code}
> Initial scope:
> * Single-valued vector fields only.
> * Multivalued derived vector retrieval is not supported in this change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]