Simon Willnauer created LUCENE-4556:
---------------------------------------
Summary: FuzzyTermsEnum creates tons of objects
Key: LUCENE-4556
URL: https://issues.apache.org/jira/browse/LUCENE-4556
Project: Lucene - Core
Issue Type: Improvement
Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
Fix For: 4.1, 5.0
I ran into this problem in production using the DirectSpellchecker. The number
of objects created by the spellchecker shoot through the roof very very
quickly. We ran about 130 queries and ended up with > 2M transitions / states.
We spend 50% of the time in GC just because of transitions. Other parts of the
system behave just fine here.
I talked quickly to robert and gave a POC a shot providing a
LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case and
build a array based strucuture converted into UTF-8 directly instead of going
through the object based APIs. This involved quite a bit of changes but they
are all package private at this point. I have a patch that still has a fair set
of nocommits but its shows that its possible and IMO worth the trouble to make
this really useable in production. All tests pass with the patch - its a
start....
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]