[jira] [Comment Edited] (SOLR-13190) Fuzzy search treated as server error instead of client error when terms are too complex

Andy Webb (Jira) Wed, 18 Dec 2019 05:19:06 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16999118#comment-16999118
 ]


Andy Webb edited comment on SOLR-13190 at 12/18/19 1:18 PM:
------------------------------------------------------------

Thanks Mike!

I've [tried adding|https://github.com/apache/lucene-solr/pull/1098] a 
{{maxQueryLength}} option to {{Direct(Solr)SpellChecker}} which can be set to 
prevent long terms being spellchecked - it's a simple change, largely a 
cut-and-paste of the {{minQueryLength}}, and as far as I can see this would 
prevent us seeing the exceptions. It could default to 0, i.e. "no limit", to 
maintain the existing default behaviour unless it's deliberately set. Would 
this be a reasonable change to make to Lucene/Solr or do you think there might 
be a better approach?


was (Author: andywebb1975):
Thanks Mike!

I've [tried 
adding|https://github.com/apache/lucene-solr/compare/master...andywebb1975:maxQueryLength]
 a {{maxQueryLength}} option to {{Direct(Solr)SpellChecker}} which can be set 
to prevent long terms being spellchecked - it's a simple change, largely a 
cut-and-paste of the {{minQueryLength}}, and as far as I can see this would 
prevent us seeing the exceptions. It could default to 0, i.e. "no limit", to 
maintain the existing default behaviour unless it's deliberately set. Would 
this be a reasonable change to make to Lucene/Solr or do you think there might 
be a better approach?

> Fuzzy search treated as server error instead of client error when terms are 
> too complex
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-13190
>                 URL: https://issues.apache.org/jira/browse/SOLR-13190
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: master (9.0)
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We've seen a fuzzy search end up breaking the automaton and getting reported 
> as a server error. This usage should be improved by
> 1) reporting as a client error, because it's similar to something like too 
> many boolean clauses queries in how an operator should deal with it
> 2) report what field is causing the error, since that currently must be 
> deduced from adjacent query logs and can be difficult if there are multiple 
> terms in the search
> This trigger was added to defend against adversarial regex but somehow hits 
> fuzzy terms as well, I don't understand enough about the automaton mechanisms 
> to really know how to approach a fix there, but improving the operability is 
> a good first step.
> relevant stack trace:
> {noformat}
> org.apache.lucene.util.automaton.TooComplexToDeterminizeException: 
> Determinizing automaton with 13632 states and 21348 transitions would result 
> in more than 10000 states.
>       at 
> org.apache.lucene.util.automaton.Operations.determinize(Operations.java:746)
>       at 
> org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:69)
>       at 
> org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(ByteRunAutomaton.java:32)
>       at 
> org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:247)
>       at 
> org.apache.lucene.util.automaton.CompiledAutomaton.<init>(CompiledAutomaton.java:133)
>       at 
> org.apache.lucene.search.FuzzyTermsEnum.<init>(FuzzyTermsEnum.java:143)
>       at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:154)
>       at 
> org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:78)
>       at 
> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:58)
>       at 
> org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67)
>       at 
> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:310)
>       at 
> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:667)
>       at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:442)
>       at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:200)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1604)
>       at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1420)
>       at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:567)
>       at 
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1435)
>       at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:374)
>       at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
>       at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>       at org.apache.solr.core.SolrCore.execute(SolrCore.java:2559)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13190) Fuzzy search treated as server error instead of client error when terms are too complex

Reply via email to