Hm, how about something like this, on CompiledAutomaton:
public TermsEnum getTermsEnum(TermsEnum te) throws IOException {
switch (type) {
case NONE:
return TermsEnum.EMPTY;
case ALL:
return te;
case SINGLE:
return new SingleTermsEnum(te, term);
case NORMAL:
return new AutomatonTermsEnum(te, this);
default:
// unreachable
throw new RuntimeException("unhandled case");
}
}
Alan Woodward
www.flax.co.uk
> On 6 Jan 2017, at 19:16, Michael McCandless <[email protected]> wrote:
>
> These automaton intersection APIs are frustrating with all the special
> case handling... Ideas welcome!
>
> We've had similar challenges with them in the past, when a user
> invoked Terms.intersect directly instead of via CompiledAutomaton:
> https://issues.apache.org/jira/browse/LUCENE-7576
>
> The problem is CompiledAutomaton specializes certain cases (all
> strings match, no strings match, single term) and sidesteps
> Terms.intersect for those cases.
>
> We should fix AutomatonTermsEnum public ctor w/ the same checks
> (insist on a NORMAL case) so you don't hit assert failures, or, worse
> ... I'll do that.
>
> I think a new CompiledAutomaton.intersect taking TermsEnum would be
> tricky in general because it relies on the (efficient) Terms.intersect
> to handle the NORMAL case well, but we can't invoke that from a
> TermsEnum.
>
> In the SINGLE case, could you use SingleTermsEnum, passing the
> TermsEnum from your doc values, and the term from the
> CompiledAutomaton? Would that suffice as a workaround?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Fri, Jan 6, 2017 at 11:17 AM, Alan Woodward <[email protected]> wrote:
>> We’ve hit an issue while developing marple, where we want to have the
>> ability to filter the values from a SortedDocValues terms dictionary.
>> Normally you’d create a CompiledAutomaton from the filter string, and then
>> call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms
>> instance, we instead have a TermsEnum.
>>
>> Using AutomatonTermsEnum to wrap the TermsEnum works in most cases here, but
>> if the CompiledAutomaton in question is a fixed string, then we get
>> assertion failures, because ATE uses the compiled automaton’s internal
>> ByteRunAutomaton for filtering, and fixed-string automata don’t have one.
>>
>> Is there a work-around that I’m missing here? Or should I maybe open a JIRA
>> to add a #getTermsEnum(TermsEnum) method to CompiledAutomaton?
>>
>> Alan Woodward
>> www.flax.co.uk
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>