[
https://issues.apache.org/jira/browse/LUCENE-7500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044427#comment-16044427
]
David Smiley commented on LUCENE-7500:
--------------------------------------
bq. WeigthedSpanTermExtractor
It's kinda complicated. So... See {{SrndTermQuery.visitMatchingTerms}} (part
of the surround query parser). It calls
{{MultiFields.getTerms(reader,fieldName)}} which works in terms of
{{MultiFields.getFields(reader)}} -- at least it does today without the patch.
And MultiFields.getFields today gets the Fields off the LeafReader. With that
gone, in the patch getField needs to consult FieldInfos to see which fields
have terms. In the latest patch (2nd iteration), I improved
MultiFields.getTerms to not go through getFields first, which is a nice
optimization in its own right. So it's no longer pertinent for the surround
query parser wether the highlighter's phrase handling has a leaf reader that
implements getFieldInfos or not (I could put that UnsupportedOperationException
back in). It's hard to say for sure what very advanced queries I've never seen
before might require... but the highlighters throwing an exception here
guarantees queries that use getFieldInfos won't work whereas letting
getFieldInfos through allows for it to might work. I think until such use
cases present themselves, we should just let getFieldInfos delegate. Perhaps
the best most correct option is to synthesize a FieldInfos that properly
"looks" like what this filtered leaf reader is exposing. It's dubious wether
we should bother writing this code though.
> Remove Fields.java in lieu of LeafReader.getTerms(fieldName)
> ------------------------------------------------------------
>
> Key: LUCENE-7500
> URL: https://issues.apache.org/jira/browse/LUCENE-7500
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: David Smiley
> Assignee: David Smiley
> Fix For: master (7.0)
>
> Attachments: LUCENE_7500_avoid_leafReader_fields.patch,
> LUCENE_7500_avoid_leafReader_fields.patch,
> LUCENE_7500_Remove_LeafReader_fields.patch,
> LUCENE_7500_Remove_LeafReader_fields.patch
>
>
> {{Fields}} seems like a pointless intermediary between the {{LeafReader}} and
> {{Terms}}. Why not have {{LeafReader.getTerms(fieldName)}} instead? One loses
> the ability to get the count and iterate over indexed fields, but it's not
> clear what real use-cases are for that and such rare needs could figure that
> out with FieldInfos.
> [~mikemccand] pointed out that we'd probably need to re-introduce a
> {{TermVectors}} class since TV's are row-oriented not column-oriented. IMO
> they should be column-oriented but that'd be a separate issue.
> _(p.s. I'm lacking time to do this w/i the next couple months so if someone
> else wants to tackle it then great)_
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]