[
https://issues.apache.org/jira/browse/LUCENE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-3363:
--------------------------------
Attachment: LUCENE-3363.patch
patch for the blocktree branch:
The issue with this automaton:
* its already minimal! so all of this work is stupid.
* it only has 2096 states, but sigma=3544, this is what causes the hopcroft
blowup.
Originally, AutomatonQuery minimized the automaton in its ctor, but I'm not
sure we should do this, the input automaton could be large and if someone wants
to do this, they should do it themselves?
I think my original motivation was to try to fend off any adversaries (some
crazy worst-case crap that would make the query slow)... but I think this is
obselete.
The patch changes this to determinize() + removeDeadTransitions() + reduce(),
the first 2 operations really being all we need, but reduce() might help speed
up the intersection.
Note: RegExp already minimizes "incrementally" during its parsing, but this is
one op at a time, so I think there is no problem here. I tested removing this
too and replacing it with det + removeDead + reduce, but it slowed down regex
parsing considerably, so I think we should continue to use minimize here.
Additionally, I optimized wildcardquery here to use the optimized concatenate()
to avoid useless determinize() calls when the LHS is a string, before it was
using the concatenate(List) method.
> minimizeHopcroft OOMEs on smallish (2096 states, finite) automaton
> ------------------------------------------------------------------
>
> Key: LUCENE-3363
> URL: https://issues.apache.org/jira/browse/LUCENE-3363
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/other
> Reporter: Michael McCandless
> Fix For: 3.4, 4.0
>
> Attachments: LUCENE-3363.patch
>
>
> Not sure what's up w/ this... if you check out the blocktree branch
> (LUCENE-3030) and comment out the @Ignore in
> TestTermsEnum2.testFiniteVersusInfinite then this should hit OOME: {[ant
> test-core -Dtestcase=TestTermsEnum2 -Dtestmethod=testFiniteVersusInfinite
> -Dtests.seed=-2577608857970454726:-2463580050179334504}}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]