[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108892#comment-16108892
 ] 

Robert Muir commented on LUCENE-7914:
-------------------------------------

ok, thanks.

As far as finite stuff, we might be able to remove the requirement that this is 
provided to AutomatonTermsEnum at all (see 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java#L200-L202)

It is kind of a leftover from when its intersection was less efficient (it 
handled infinite and finite DFAs differently). Nowadays it handles the case 
where its actually in a looping portion and drives that part completely by the 
terms dictionary. 

I'm not sure if the current "if finite" guards to this loop detection really 
save us CPU for complex finite DFAs (e.g. fuzzy, spellcheck). That would be the 
thing to experiment with to see if this boolean could be removed...

> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>
>                 Key: LUCENE-7914
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7914
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>         Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to