[ https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110223#comment-16110223 ]
Robert Muir commented on LUCENE-7914: ------------------------------------- Patch looks good to me. I think there are some spurious imports ({{import org.apache.lucene.util.fst.Util}}) added in some of the tests that can be removed. > Add safeguards to RegExp.toAutomaton and Operations > --------------------------------------------------- > > Key: LUCENE-7914 > URL: https://issues.apache.org/jira/browse/LUCENE-7914 > Project: Lucene - Core > Issue Type: Bug > Reporter: Jim Ferenczi > Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, > LUCENE-7914.patch, LUCENE-7914.patch > > > When creating an automaton from a regexp, some operators can create more > states than maxDeterminizedStates: > {code} > a{10000} > {code} > The example above <b>creates</b> a single path with 10k states already > determinized so maxDeterminizedStates is not checked. > Some operations on automaton like Operations.isFinite or > Operations.topoSortStates are recursive and the maximum level of recursion > depends on the longest path in the automaton. So a large automaton like above > can exceed java's stack. > In most of the cases we are covered by maxDeterminizedStates but there will > always be adversarial cases where a large automaton is created from a small > input so I think we should also have safeguards in the recursive methods. > I've attached a patch that adds a max recursion level to Operations.isFinite > and Operations.topoSortStates in order to limit stack overflows. The limit is > set to 1000 so any automaton with a path bigger than 1000 would throw an > IllegalStateException. > The patch also uses maxDeterminizedStates to limit the number of states that > a repeat operator can create and throw a TooComplex..Exception when this > limit is reached. > Finally the patch adds the ability to skip Operations.isFinite on > AutomatonQuery and uses this as an optimization for PrefixQuery that uses > infinite automatons only. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org