[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

James Dyer (JIRA) Fri, 03 Jun 2011 10:37:45 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

Here's another patch.  This time PossibilityIterator is guaranteed not to 
save/return more than the # of collations the user requested with 
"maxCollationTries".

Changing this also invalidated some of the tests in SpellCheckCollatorTest.java 
.  My research indicates this is because many of the possibilities end up with 
the same score so this is not indicative of a new bug.  I changed the test to 
be less brittle in this regard.

While I generally like both of these last two patches, I am still unsure of the 
wisdom of this last change.  It is true this last change ensures we never will 
store more Collations than the app might possibly use.  On the other hand, the 
Collations ought to enter the PQ somewhat sorted already.  Having it churn 
in/out all of the low-ranking ones introduces a lot of extra add/remove 
operations for the common cases in return for saving a bit of memory in the 
more rare cases.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Reply via email to