[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121363#comment-16121363
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 25ee0141c19b7f6156c2c439a59fad97f383b8cc in lucene-solr's branch 
refs/heads/branch_7_0 from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=25ee014 ]

LUCENE-7914: AnalyzingSuggesterTest#testRandomRealisticKeys: trim big titles to 
make sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121362#comment-16121362
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit a0aa0a0d83168186a84d49ec811e098dbcc37076 in lucene-solr's branch 
refs/heads/branch_7_0 from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a0aa0a0 ]

LUCENE-7914: Add a maximum recursion level in automaton recursive functions 
(Operations.isFinite and Operations.topsortState) to prevent large automaton to 
overflow the stack.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-10 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121364#comment-16121364
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit e16df81cb9a1e687295237ce45339be2e7185519 in lucene-solr's branch 
refs/heads/branch_7_0 from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e16df81 ]

 LUCENE-7914: Fix TestSuggestField#testRealisticKeys: trim big titles to make 
sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118269#comment-16118269
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 806a485576d20bb39781d1bd73cdc78f690c40e9 in lucene-solr's branch 
refs/heads/branch_7x from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=806a485 ]

 LUCENE-7914: Fix TestSuggestField#testRealisticKeys: trim big titles to make 
sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118268#comment-16118268
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 2a8930cf838b323eeadba240eb7141ec1f14ca6d in lucene-solr's branch 
refs/heads/master from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2a8930c ]

 LUCENE-7914: Fix TestSuggestField#testRealisticKeys: trim big titles to make 
sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-08 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118037#comment-16118037
 ] 

Adrien Grand commented on LUCENE-7914:
--

+1

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-08 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118027#comment-16118027
 ] 

Jim Ferenczi commented on LUCENE-7914:
--

I've merged this to master and 7.x but since it's an internal change I'd also 
like to merge in 7.0. Are there any objections?

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116178#comment-16116178
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit f059ad065553460412f247c61b887a0c80406c0e in lucene-solr's branch 
refs/heads/branch_7x from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f059ad06 ]

LUCENE-7914: AnalyzingSuggesterTest#testRandomRealisticKeys: trim big titles to 
make sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-07 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116175#comment-16116175
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 6b3ea4d0c24adad9a270e43a3b5e07aac8317bee in lucene-solr's branch 
refs/heads/master from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6b3ea4d ]

LUCENE-7914: AnalyzingSuggesterTest#testRandomRealisticKeys: trim big titles to 
make sure that they can pass the max recursion level in Operations#topsortState.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114209#comment-16114209
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 3610c8d1b8927fdb61667ea4c49f983bbe13a404 in lucene-solr's branch 
refs/heads/branch_7x from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3610c8d ]

LUCENE-7914: Add a maximum recursion level in automaton recursive functions 
(Operations.isFinite and Operations.topsortState) to prevent large automaton to 
overflow the stack.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114200#comment-16114200
 ] 

ASF subversion and git services commented on LUCENE-7914:
-

Commit 7dde798473d1a8640edafb41f28ad25d17f25a2d in lucene-solr's branch 
refs/heads/master from [~jimczi]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7dde798 ]

LUCENE-7914: Add a maximum recursion level in automaton recursive functions 
(Operations.isFinite and Operations.topsortState) to prevent large automaton to 
overflow the stack.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Fix For: master (8.0), 7.1
>
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-02 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110389#comment-16110389
 ] 

Adrien Grand commented on LUCENE-7914:
--

+1 FYI tests seem to still expect an ISE.

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110223#comment-16110223
 ] 

Robert Muir commented on LUCENE-7914:
-

Patch looks good to me. I think there are some spurious imports ({{import 
org.apache.lucene.util.fst.Util}}) added in some of the tests that can be 
removed.

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108918#comment-16108918
 ] 

Adrien Grand commented on LUCENE-7914:
--

+1 I like those safeguards, they are not invasive. I'd throw an IAE rather than 
an ISE when the recursion depth is exceeded since the problem is about the 
input?

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108895#comment-16108895
 ] 

Robert Muir commented on LUCENE-7914:
-

And these comments above are just for discussion and probably not something we 
should do on this one, we should split out a different issue probably if we 
want to get rid of the finite tracking.

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108892#comment-16108892
 ] 

Robert Muir commented on LUCENE-7914:
-

ok, thanks.

As far as finite stuff, we might be able to remove the requirement that this is 
provided to AutomatonTermsEnum at all (see 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java#L200-L202)

It is kind of a leftover from when its intersection was less efficient (it 
handled infinite and finite DFAs differently). Nowadays it handles the case 
where its actually in a looping portion and drives that part completely by the 
terms dictionary. 

I'm not sure if the current "if finite" guards to this loop detection really 
save us CPU for complex finite DFAs (e.g. fuzzy, spellcheck). That would be the 
thing to experiment with to see if this boolean could be removed...

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, 
> LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108799#comment-16108799
 ] 

Robert Muir commented on LUCENE-7914:
-

I don't understand the serialization/deserialization stuff in the tests. Lucene 
queries no longer support this serialization for years, is this intentional?

> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> infinite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations

2017-08-01 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108729#comment-16108729
 ] 

Robert Muir commented on LUCENE-7914:
-

Sorry, but prefixquery is definitely infinite. Besides this logic being wrong, 
I really don't like the "trillean" and additional ctor added to the public API.


> Add safeguards to RegExp.toAutomaton and Operations
> ---
>
> Key: LUCENE-7914
> URL: https://issues.apache.org/jira/browse/LUCENE-7914
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Jim Ferenczi
> Attachments: LUCENE-7914.patch
>
>
> When creating an automaton from a regexp, some operators can create more 
> states than maxDeterminizedStates:
> {code}
> a{1}
> {code}
> The example above creates a single path with 10k states already 
> determinized so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or 
> Operations.topoSortStates are recursive and the maximum level of recursion 
> depends on the longest path in the automaton. So a large automaton like above 
> can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will 
> always be adversarial cases where a large automaton is created from a small 
> input so I think we should also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite 
> and Operations.topoSortStates in order to limit stack overflows. The limit is 
> set to 1000 so any automaton with a path bigger than 1000 would throw an 
> IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that 
> a repeat operator can create and throw a TooComplex..Exception when this 
> limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on 
> AutomatonQuery and uses this as an optimization for PrefixQuery that uses 
> finite automatons only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org