[
https://issues.apache.org/jira/browse/SOLR-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924296#comment-15924296
]
David Smiley commented on SOLR-10273:
-------------------------------------
bq. Is there a way to check this while building the Document from the
SolrInputDocument instead (may be cheaper?)
I briefly contemplated having DocumentBuilder internally collect the Lucene
IndexableField instances into some other internal Doc-like inner class that
could maintain the largest value as it goes. But that seems over-engineered,
and the post-process scanning code later seems pretty quick to me.
bq. For multi-valued fields, perhaps we should be using the sum of the multiple
fields? As a generalization we could also consider sorting by size, not just
picking out the largest single field.
In the entire Lucene+Solr codebase, the only place where
StoredFieldVisitor.Status.STOP is actually used is the Unified/Postings
highlighters, and only when one field is being highlighted. So if there was an
overall large document (>16KB), and if we didn't move the 2nd largest value to
the end, and if you wanted to highlight on this 2nd largest value alone, and if
there were some additional sizable fields inbetween this 2nd largest value and
the last one.... then yes we're doing more work. I don't think it's worth
bothering with right now?
BTW when I commit this patch, I'll change the min size threshold to 4KB
> Re-order largest field values last in Lucene Document
> -----------------------------------------------------
>
> Key: SOLR-10273
> URL: https://issues.apache.org/jira/browse/SOLR-10273
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: 6.5
>
> Attachments: SOLR_10273_DocumentBuilder_move_longest_to_last.patch
>
>
> (part of umbrella issue SOLR-10117)
> In Solr's {{DocumentBuilder}}, at the very end, we should move the field
> value(s) associated with the largest field (assuming "stored") to be last.
> Lucene's default stored value codec can avoid reading and decompressing the
> last field value when it's not requested. (As of LUCENE-6898).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]