[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194484#comment-14194484
 ] 

Nik Everett commented on LUCENE-6046:
-------------------------------------

I'm working on a first cut of something that does that.  Better regex 
implementation would be great but the biggest thing to me is being able to 
limit the amount of work the determinize operation performs.  Its such a costly 
operation that I don't think it should ever be really abstracted from the user. 
 Something like having determinize throw a checked exception when it performs 
too much work would make you keenly aware whenever you might be straying into 
exponential territory.

> RegExp.toAutomaton high memory use
> ----------------------------------
>
>                 Key: LUCENE-6046
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6046
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 4.10.1
>            Reporter: Lee Hinman
>            Assignee: Michael McCandless
>            Priority: Minor
>
> When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
> it's possible for the automaton to use so much memory it exceeds the maximum 
> array size for java.
> The following caused an OutOfMemoryError with a 32gb heap:
> {noformat}
> new 
> RegExp("\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}").toAutomaton();
> {noformat}
> When increased to a 60gb heap, the following exception is thrown:
> {noformat}
>   1> java.lang.IllegalArgumentException: requested array size 2147483624 
> exceeds maximum array in java (2147483623)
>   1>     
> __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
>   1>     org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
>   1>     org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
>   1>     
> org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
>   1>     
> org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
>   1>     
> org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
>   1>     
> org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
>   1>     org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to