Sanjay Dutt created SOLR-18207:
----------------------------------

             Summary: Add derived stored retrieval for DenseVectorField to 
avoid duplicate vector storage
                 Key: SOLR-18207
                 URL: https://issues.apache.org/jira/browse/SOLR-18207
             Project: Solr
          Issue Type: Task
            Reporter: Sanjay Dutt


Solr DenseVectorField currently stores vector data twice when stored="true": 
once in Lucene’s vector index for kNN/search and again in stored fields for 
retrieval. This increases index size significantly for large vector workloads.

This change adds an opt-in mode for DenseVectorField that preserves 
stored-field semantics for normal document retrieval while avoiding the 
redundant stored-field copy of the vector payload. Instead, Solr reconstructs 
the returned vector value from Lucene vector data at fetch time.

Key points:
 * Adds an opt-in field type/property for derived vector retrieval.
 * Avoids writing redundant stored vector bytes at index time.
 * Extends document fetch to materialize vector values from Lucene vector 
readers.
 * Keeps existing behavior unchanged unless the new option is enabled.
 * Documents the fetch-time tradeoff and recommends caution for hot paths that 
return vectors frequently, especially fl=*.

Initial scope:
 * Single-valued vector fields only.
 * Multivalued derived vector retrieval is not supported in this change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to