[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

James Dyer (JIRA) Thu, 02 Jun 2011 10:50:39 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

Right...because the elements are sorted already I don't have to go back to the 
100th element to compare.  I can just look at the last element using peek() as 
you suggest.

This version uses the more sophisticated methods of the original patch but 
accomplishes it with less code.  Also, we're using nanoTime() instead of 
currentTimeMillis() to reduce any overhead, and are checking the clock only 
once every 10000 iterations.

>From code comments:

Three performance & memory-usage safeguards:
  1. Quit if the RankedPossibilities queue grows larger than 10000.
  2. If the RankedPossibilities queue is bigger than 1000, only add competitive 
possibilities.
  3. Check the clock periodically to be sure we haven't taken more than 50ms.  
If so, quit immediately.
                

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Reply via email to