[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200018#comment-14200018
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637054 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1637054 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200019#comment-14200019
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637055 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1637055 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200021#comment-14200021
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637056 from [~rcmuir] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1637056 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200079#comment-14200079
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637078 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1637078 ]

LUCENE-6046: remove det state limit for all AutomatonTestUtil.randomAutomaton 
since they can become biggish

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200086#comment-14200086
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637080 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1637080 ]

LUCENE-6046: remove det state limit for all AutomatonTestUtil.randomAutomaton 
since they can become biggish

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200090#comment-14200090
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1637082 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1637082 ]

LUCENE-6046: remove det state limit for all AutomatonTestUtil.randomAutomaton 
since they can become biggish

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197970#comment-14197970
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636830 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1636830 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197972#comment-14197972
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636831 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636831 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197974#comment-14197974
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636832 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1636832 ]

LUCENE-6046: let this test determinize massive automata

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196070#comment-14196070
 ] 

Nik Everett commented on LUCENE-6046:
-

A couple of updates:
This affects version 4.9 as well.  Probably all versions.  But its impact is 
likely minor enough to only be worth adding to the 4.10 line.

A found a few test cases that need lots and lots of states.  Any time you feed 
a couple hundred random unicode words to the automata you'll end up needing 
more than ten thousand states.  I've updated those tests to ask for a million 
states and they caught a few places where I hadn't been as diligent in piping 
maxDeterminizedStates through.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196295#comment-14196295
 ] 

Michael McCandless commented on LUCENE-6046:


Thanks Nik, your patch looks great; I'll fold in some more minor things from my 
patch and commit!

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196723#comment-14196723
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636716 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1636716 ]

LUCENE-6046: add maxDeterminizedStates to determinize to prevent exhausting 
CPU/RAM when the automaton is too difficult to determinize

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196770#comment-14196770
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636728 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636728 ]

LUCENE-6046: add maxDeterminizedStates to determinize to prevent exhausting 
CPU/RAM when the automaton is too difficult to determinize

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196947#comment-14196947
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636758 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1636758 ]

LUCENE-6046: fix test failure, add maxDeterminizedStates to AutomatonQuery and 
WildcardQuery too

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196949#comment-14196949
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636759 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1636759 ]

LUCENE-6046: fix test failure, add maxDeterminizedStates to AutomatonQuery and 
WildcardQuery too

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196981#comment-14196981
 ] 

ASF subversion and git services commented on LUCENE-6046:
-

Commit 1636762 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
[ https://svn.apache.org/r1636762 ]

LUCENE-6046: add maxDeterminizedStates to determinize to prevent exhausting 
CPU/RAM when the automaton is too difficult to determinize

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Fix For: 4.10.3, 5.0, Trunk

 Attachments: LUCENE-6046.patch, LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-04 Thread Nikolas Everett
Thanks Mike!
On Nov 4, 2014 5:29 PM, ASF subversion and git services (JIRA) 
j...@apache.org wrote:


 [
 https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196981#comment-14196981
 ]

 ASF subversion and git services commented on LUCENE-6046:
 -

 Commit 1636762 from [~mikemccand] in branch 'dev/branches/lucene_solr_4_10'
 [ https://svn.apache.org/r1636762 ]

 LUCENE-6046: add maxDeterminizedStates to determinize to prevent
 exhausting CPU/RAM when the automaton is too difficult to determinize

  RegExp.toAutomaton high memory use
  --
 
  Key: LUCENE-6046
  URL: https://issues.apache.org/jira/browse/LUCENE-6046
  Project: Lucene - Core
   Issue Type: Bug
   Components: core/queryparser
 Affects Versions: 4.10.1
 Reporter: Lee Hinman
 Assignee: Michael McCandless
 Priority: Minor
  Fix For: 4.10.3, 5.0, Trunk
 
  Attachments: LUCENE-6046.patch, LUCENE-6046.patch,
 LUCENE-6046.patch
 
 
  When creating an automaton from an
 org.apache.lucene.util.automaton.RegExp, it's possible for the automaton to
 use so much memory it exceeds the maximum array size for java.
  The following caused an OutOfMemoryError with a 32gb heap:
  {noformat}
  new
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
  {noformat}
  When increased to a 60gb heap, the following exception is thrown:
  {noformat}
1 java.lang.IllegalArgumentException: requested array size 2147483624
 exceeds maximum array in java (2147483623)
1
  __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
1
  
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
1
  org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
1
  
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
1
  
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
1
  org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
1
  org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
  {noformat}



 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194395#comment-14194395
 ] 

Michael McCandless commented on LUCENE-6046:


Hmm, two bugs here.

First off, RegExp.toAutomaton is an inherently costly method: wasteful of RAM 
and CPU, doing minimize after each recursive operation, in order to build a DFA 
in the end. It's unfortunately quite easy to concoct regular expressions that 
make it consume ridiculous resources.  I'll look at this example and see if we 
can improve it, but in the end it will always have its adversarial cases 
unless we give up on making the resulting automaton deterministic, which would 
be a very big change.

Maybe we should add adversary defenses to it, e.g. you set a limit on the 
number of states it's allowed to create, and it throws a RegExpTooHardException 
if it would exceed that?

Second off, ArrayUtil.oversize has the wrong (too large) value for 
MAX_ARRAY_LENGTH, which is a bug from LUCENE-5844.  Which JVM did you run this 
test on?

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194397#comment-14194397
 ] 

Dawid Weiss commented on LUCENE-6046:
-

Just a note -- Russ Cox wrote a series of excellent articles about different 
approaches of implementing regexp scanners. 
http://swtch.com/~rsc/regexp/regexp1.html

(There is no clear winner -- both DFAs and NFA have advantages.)

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Lee Hinman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194400#comment-14194400
 ] 

Lee Hinman commented on LUCENE-6046:


[~mikemccand] I ran it with the following JVM:

{noformat}
java version 1.8.0_20
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
{noformat}

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194406#comment-14194406
 ] 

Michael McCandless commented on LUCENE-6046:


bq.  Russ Cox wrote a series of excellent articles about different approaches 
of implementing regexp scanners. 

Thanks Dawid, these are great.

Switching to NFA based matching would be a very large change ... I don't think 
we should pursue it here.  Terms.intersect implementation for block tree is 
already very complex ... though I suppose of we could hide the on the fly 
subset construction (and convert regexp to a Thompson NFA) under an API, then 
Terms.intersect implementation wouldn't have to change much.

Still, there will always be adversarial cases no matter which approach we 
choose.  I think for this issue we should allow passing in a how much work are 
you willing to do to RegExp.toAutomaton, and it throws an exc when it would 
exceed that.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194413#comment-14194413
 ] 

Dawid Weiss commented on LUCENE-6046:
-

I didn't mean to imply we should change the regexp implementation! :) This was 
just a pointer in case somebody wished to understand why regexps can explode. I 
actually wish there was an NFA-based regexp implementation for Java (with 
low-memory footprint) because this would make concatenating thousands of 
regexps (e.g., for pattern detection) much easier.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194412#comment-14194412
 ] 

Michael McCandless commented on LUCENE-6046:


bq. Michael McCandless I ran it with the following JVM:

Thanks [~dakrone].

I was wrong about the first bug: there is no bug in ArrayUtil.oversize.  That 
exception just means RegExp is trying to create a too-big array ... so just the 
one bug :)

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Lee Hinman (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194436#comment-14194436
 ] 

Lee Hinman commented on LUCENE-6046:


bq. I think for this issue we should allow passing in a how much work are you 
willing to do to RegExp.toAutomaton, and it throws an exc when it would exceed 
that.

For what it's worth, I think this would be a good solution for us, much better 
than silently (from the user's perspective) freezing and then hitting an OOME.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194484#comment-14194484
 ] 

Nik Everett commented on LUCENE-6046:
-

I'm working on a first cut of something that does that.  Better regex 
implementation would be great but the biggest thing to me is being able to 
limit the amount of work the determinize operation performs.  Its such a costly 
operation that I don't think it should ever be really abstracted from the user. 
 Something like having determinize throw a checked exception when it performs 
too much work would make you keenly aware whenever you might be straying into 
exponential territory.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194585#comment-14194585
 ] 

Michael McCandless commented on LUCENE-6046:


OK I boiled down the adversarial regexp to this simpler still-adversarial 
version: \[ac]*a\[ac]\{50,200}

I suspect this is a legitimate adversary and not a bug in our RegExp/automaton 
impl, i.e. the number of states in the DFA for this is exponential as a 
function of the 50/200.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194592#comment-14194592
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh yeah, its totally running into 2^n territory legitiately here.  This is 
totally something that'd be rejected by a framework to prevent explosive growth 
during determination.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor

 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194716#comment-14194716
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh - I'm still running the solr tests against this.  I imagine they'll pass as 
they've been running fine for 30 minutes now but I should throw that out there 
in case someone gets them to fail with this before I do.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195033#comment-14195033
 ] 

Nik Everett commented on LUCENE-6046:
-

Oh no!  I wrote a very similar patch!  Sorry to duplicate effort there.  

I found that 10,000 states wasn't quite enough to handle some of the tests so I 
went with 1,000,000 as the default.  Its pretty darn huge but it does get the 
job done.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195047#comment-14195047
 ] 

Michael McCandless commented on LUCENE-6046:


Woops, sorry, I didn't see you had a patch here!  Thank you.

I like your patch: it's good to make all hidden usages of determinize visible.  
Let's start from your patch and merge anything from mine in?  E.g. I think we 
can collapse minimizeHopcroft into just minimize...

bq. I found that 10,000 states wasn't quite enough to handle some of the tests 
so I went with 1,000,000 as the default. Its pretty darn huge but it does get 
the job done.

Whoa, which tests needed 1M max states?  I worry about passing a 1M state 
automaton to minimize...

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195053#comment-14195053
 ] 

Michael McCandless commented on LUCENE-6046:


I like the test simplifications, and removing dead code from 
Operations.determinize.

Can we fix the exc thrown from Regexp to include the offending regular 
expression, and fix the test to confirm the message contains it?  Maybe also 
add RegExp.toStringTree?  I found it very useful while debugging the original 
regexp :)

I think QueryParserBase should also have a set/get for this option?

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195056#comment-14195056
 ] 

Nik Everett commented on LUCENE-6046:
-

TestDeterminizeLexicon wants to make an automata that accepts 5000 random 
strings.  So 10,000 isn't enough states for it.  I'll drop the default limit to 
10,000 again and just feed a million to that test case. 

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6046) RegExp.toAutomaton high memory use

2014-11-03 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195065#comment-14195065
 ] 

Nik Everett commented on LUCENE-6046:
-

I'll certainly add the regexp string to the exception message.  And I'll merge 
the toStringTree from your patch into mine if you'd like.

Yeah - QueryParserBase should have this option too.

The thing I found most useful for debugging this was to call toDot on the 
automata before and after normalization.  I just looked at it and went, oh, of 
course you have to do it that way.  No wonder the states explode.  And then I 
read https://en.wikipedia.org/wiki/Powerset_construction and remembered it from 
my rusty CS degree.

 RegExp.toAutomaton high memory use
 --

 Key: LUCENE-6046
 URL: https://issues.apache.org/jira/browse/LUCENE-6046
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.10.1
Reporter: Lee Hinman
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-6046.patch, LUCENE-6046.patch


 When creating an automaton from an org.apache.lucene.util.automaton.RegExp, 
 it's possible for the automaton to use so much memory it exceeds the maximum 
 array size for java.
 The following caused an OutOfMemoryError with a 32gb heap:
 {noformat}
 new 
 RegExp(\\[\\[(Datei|File|Bild|Image):[^]]*alt=[^]|}]{50,200}).toAutomaton();
 {noformat}
 When increased to a 60gb heap, the following exception is thrown:
 {noformat}
   1 java.lang.IllegalArgumentException: requested array size 2147483624 
 exceeds maximum array in java (2147483623)
   1 
 __randomizedtesting.SeedInfo.seed([7BE81EF678615C32:95C8057A4ABA5B52]:0)
   1 org.apache.lucene.util.ArrayUtil.oversize(ArrayUtil.java:168)
   1 org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:295)
   1 
 org.apache.lucene.util.automaton.Automaton$Builder.addTransition(Automaton.java:639)
   1 
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:741)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimizeHopcroft(MinimizationOperations.java:62)
   1 
 org.apache.lucene.util.automaton.MinimizationOperations.minimize(MinimizationOperations.java:51)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:477)
   1 org.apache.lucene.util.automaton.RegExp.toAutomaton(RegExp.java:426)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org