[ 
https://issues.apache.org/jira/browse/SOLR-18207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Dutt updated SOLR-18207:
-------------------------------
    Component/s: vector-search

> Add derived stored retrieval for DenseVectorField to avoid duplicate vector 
> storage
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-18207
>                 URL: https://issues.apache.org/jira/browse/SOLR-18207
>             Project: Solr
>          Issue Type: Task
>          Components: vector-search
>            Reporter: Sanjay Dutt
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Solr DenseVectorField currently stores vector data twice when stored="true": 
> once in Lucene’s vector index for kNN/search and again in stored fields for 
> retrieval. This increases index size significantly for large vector workloads.
> This change adds an opt-in mode for DenseVectorField that preserves 
> stored-field semantics for normal document retrieval while avoiding the 
> redundant stored-field copy of the vector payload. Instead, Solr reconstructs 
> the returned vector value from Lucene vector data at fetch time.
> Key points:
>  * Adds an opt-in field type/property for derived vector retrieval.
>  * Avoids writing redundant stored vector bytes at index time.
>  * Extends document fetch to materialize vector values from Lucene vector 
> readers.
>  * Keeps existing behavior unchanged unless the new option is enabled.
>  * Documents the fetch-time tradeoff and recommends caution for hot paths 
> that return vectors frequently, especially fl=*.
> {code:java}
> <fieldType name="knn_vector_derived"
>            class="solr.DenseVectorField"
>            vectorDimension="1024"
>            similarityFunction="cosine"
>            knnAlgorithm="hnsw"
>            indexed="true"
>            useVectorValuesAsStored="true"/>{code}
> Initial scope:
>  * Single-valued vector fields only.
>  * Multivalued derived vector retrieval is not supported in this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to