[ https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195065#comment-14195065 ]
Nik Everett commented on LUCENE-6046: ------------------------------------- I'll certainly add the regexp string to the exception message. And I'll merge the toStringTree from your patch into mine if you'd like. Yeah - QueryParserBase should have this option too. The thing I found most useful for debugging this was to call toDot on the automata before and after normalization. I just looked at it and went, oh, of course you have to do it that way. No wonder the states explode. And then I read https://en.wikipedia.org/wiki/Powerset_construction and remembered it from my rusty CS degree. > RegExp.toAutomaton high memory use > ---------------------------------- > > Key: LUCENE-6046 > URL: https://issues.apache.org/jira/browse/LUCENE-6046 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser > Affects Versions: 4.10.1 > Reporter: Lee Hinman > Assignee: Michael McCandless > Priority: Minor > Attachments: LUCENE-6046.patch, LUCENE-6046.patch > > > When creating an automaton from an org.apache.lucene.util.automaton.RegExp, > it's possible for the automaton to use so much memory it exceeds the maximum > array size for java. > The following caused an OutOfMemoryError with a 32gb heap: > {noformat} > new > RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton(); > {noformat} > When increased to a 60gb heap, the following exception is thrown: > {noformat} > 1> java.lang.IllegalArgumentException: requested array size 2147483624 > exceeds maximum array in java (2147483623) > 1> > __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0) > 1> org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168) > 1> org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295) > 1> > org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639) > 1> > org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741) > 1> > org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62) > 1> > org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51) > 1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477) > 1> org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org