[ 
https://issues.apache.org/jira/browse/LUCENE-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368791#comment-17368791
 ] 

Jim Ferenczi commented on LUCENE-9204:
--------------------------------------

Thanks for sharing [~mgibney]! 

> However, this still leaves SynonymGraphTokenFilterFactory and 
> WordDelimiterGraphTokenFilterFactory (in Elasticsearch) as potentially 
> triggering this kind of expansion (in a manner identical to what's reported 
> in the above-referenced thread from the solr users list).

Only phrase queries are affected though. For this type of query I expect that 
the number of expansions is low as well as the number of terms. We generate the 
query lazily so the max number of clauses check ensures that we don't build the 
full query if it's gigantic.
We can also revive the optimization that we implemented with Spans and replace 
it with Intervals. However it's not as easy as it looks since Intervals use a 
different scoring mechanism.

> This is a spooky result!  I did not know our IntervalQuery for the 
> disjunctive case had exponential cost in the number of clauses.

This is only on a special case where duplicated terms appear at different 
position. It's not ideal but in this situation we favored correctness which is 
always an issue with positional queries. I also wonder if we could test 
something that's not an edge case but a more realistic query with duplicate 
terms. Right now we compare the performance of queries that return different 
result set so it's difficult to conclude anything.



> Move span queries to the queries module
> ---------------------------------------
>
>                 Key: LUCENE-9204
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9204
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: main (9.0)
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have a slightly odd situation currently, with two parallel query 
> structures for building complex positional queries: the long-standing span 
> queries, in core; and interval queries, in the queries module.  Given that 
> interval queries solve at least some of the problems we've had with Spans, I 
> think we should be pushing users more towards these implementations.  It's 
> counter-intuitive to do that when Spans are in core though.  I've opened this 
> issue to discuss moving the spans package as a whole to the queries module.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to