[
https://issues.apache.org/jira/browse/SOLR-17775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955820#comment-17955820
]
Yura commented on SOLR-17775:
-----------------------------
[~dsmiley], the improvement was quite significant. This is especially
noticeable if you need more than 100 rows. I don’t think it adds much memory
usage. It’s only for the retrieved document IDs, which are usually quite small
(<1000).
The main gain is not in saving calls to {{{}getValues{}}}, but in retrieving
values in order. Lucene data structures are optimized for iteration, not random
seek. Internally, this approach uses {{DocIterator}} jumps from doc[N-1] to
doc[N], instead of jumping from 0 to doc[N]. This is much cheaper.
> Optimize ValueSourceAugmenter
> -----------------------------
>
> Key: SOLR-17775
> URL: https://issues.apache.org/jira/browse/SOLR-17775
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Yura
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Problem
> ValueSourceAugmenter currently calculates function values on-demand during
> transform(), performing expensive binary searches and reader lookups for each
> document individually.
> h3. Solution
> Pre-calculate function values for all result set documents during
> setContext() by:
> * Collecting and sorting document IDs from DocList
> * Sequential iteration through sorted documents to calculate values once per
> reader segment
> * Storing results in hash map for O(1) lookup during transform()
> * Fallback to on-demand calculation for documents outside the pre-calculated
> set (RTG cases)
> h3. Performance Benefit
> Replaces repeated "find document at position N" operations (binary search per
> document) with efficient "get next document" iteration (sequential processing
> within reader segments), significantly reducing lookup overhead.
> h3. Compatibility
> Maintains full backward compatibility through fallback mechanism for edge
> cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]