[
https://issues.apache.org/jira/browse/LUCENE-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Woodward updated LUCENE-7627:
----------------------------------
Attachment: LUCENE-7627.patch
The immediate problem we faced in marple can be fixed by adding
SortedDocValues#termsEnum(CompiledAutomaton) and
SortedSetDocValues#termsEnum(CompiledAutomaton) methods, as in this patch.
I think the next step could be to address the TODO in
CompiledAutomaton#getTermsEnum(Terms), adding an optional startTerm parameter.
Then Terms.intersect() can delegate to that method instead of throwing
exceptions.
That leaves AutomatonTermsEnum, which I think can be fixed by making the
constructor private, and adding factory methods to do the right thing depending
on the automaton type. Plus some javadocs which point out that if you have a
Terms, then calling intersect directly on the Terms instance is likely to be
more efficient than calling iterator() and passing that to ATE.
> Improve TermsEnum automaton filtering APIs
> ------------------------------------------
>
> Key: LUCENE-7627
> URL: https://issues.apache.org/jira/browse/LUCENE-7627
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Attachments: LUCENE-7627.patch
>
>
> To filter a TermsEnum by a CompiledAutomaton, we currently have a number of
> different possibilities:
> * Terms.intersect(CompiledAutomaton, BytesRef) - efficient, but only works on
> NORMAL type automata
> * CompiledAutomaton.getTerms(Terms) - efficient, works on all automaton
> types, but requires a Terms instead of a TermsEnum, so no use for eg
> SortedDocValues.termsEnum()
> * AutomatonTermsEnum - takes a TermsEnum, so it's more general than the Terms
> methods above, but agian only works on NORMAL automata
> It's easy to do the wrong thing here, and at the moment we only guard against
> incorrect usage via runtime checks (see eg LUCENE-7576,
> https://github.com/flaxsearch/marple/issues/24). We should try and clean
> this up.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]