[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

David Smiley (Jira) Sun, 01 Jun 2025 22:53:06 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955566#comment-17955566
 ]


David Smiley commented on SOLR-17775:
-------------------------------------

In practice, how much better are you finding this optimization?  I would not 
have guessed it to be considerable.  I suppose this saves the 
ValueSource.getValues call so that it's per segment instead of per doc.  Note 
that caching this will use up some memory... albeit maybe not too much if a 
ValueSource is just loading a number, say.

Perhaps it would make more sense for Solr to process the documents (and thus 
all DocTransformers and also field value retrieval) in doc ID order before then 
sorting them in the desired order?  See DocsStreamer.

I could see doing this with a chunking strategy to cap risks of using too much 
memory.

> Optimize ValueSourceAugmenter
> -----------------------------
>
>                 Key: SOLR-17775
>                 URL: https://issues.apache.org/jira/browse/SOLR-17775
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Yura
>            Priority: Minor
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during 
> transform(), performing expensive binary searches and reader lookups for each 
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during 
> setContext() by:
>  * Collecting and sorting document IDs from DocList
>  * Sequential iteration through sorted documents to calculate values once per 
> reader segment
>  * Storing results in hash map for O(1) lookup during transform()
>  * Fallback to on-demand calculation for documents outside the pre-calculated 
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per 
> document) with efficient "get next document" iteration (sequential processing 
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge 
> cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-17775) Optimize ValueSourceAugmenter

Reply via email to