[ http://issues.apache.org/jira/browse/LUCENE-537?page=comments#action_12372746 ]
Karl Wettin commented on LUCENE-537: ------------------------------------ This just came on Lucene-users and might explain what I thought was thread safty. I'll take a look at update my refactored code some time soon. Från: [EMAIL PROTECTED] Ämne: Spellchecker bug (or feature?) Datum: lördag 1 apr 2006 00.20.08 GMT+02:00 Till: java-user@lucene.apache.org Svara till: java-user@lucene.apache.org Not sure if this is the right place to report this issue: The accuracy value, which can be set via setAccuracy(), is being modified in SpellChecker.java when a word is checked. As a result, the "min" may be pushed very high and will not suggest anything for later requests. One workaround would be to call setAccuracy() each time before a word is checked, I'm not sure if this is a feature (intended behavior) or a bug. By the way, I'm using spellchecker 1.9.1 that comes with Lucene 1.9.1. Thanks, Xiaocheng > Refactor of spell check > ----------------------- > > Key: LUCENE-537 > URL: http://issues.apache.org/jira/browse/LUCENE-537 > Project: Lucene - Java > Type: Improvement > Reporter: Karl Wettin > Attachments: lucene_spellcheck.tar.gz > > I use the same ngram index for multiple categories, but only want to spell > check per category. The old implementation did not support this as it used > docFreq as controller source. > The spell check returns suggestions with score and not just the suggested > word. > TokenFrequencyVector replace the IndexReader used for docFreq. > LuceneTokenFrequencyVector wraps an IndexReader and works just as the old > implementation. > LuceneQueryDictionary creates an ngram dictionary based on a query and not > the whole index. > MultiTokenFrequencyVector treats a number of TokenFrequencyVector:s as one. > I.e. for use when spell checking in multiple contexts. > TokenFrequencyVectorMap is a HashMap facade. Comes with static factory to > create the vector based on the the tokens in a specific field from a search. > I use the TokenFrequencyVectorMap to build one vector per category and > instanciate a MultiTokenFrequencyVector for each user query. Could probably > save a couple of clock ticks by buffering MultiVectors rather than creating > new once all the time. > Also it seems as the ngram-code might not be thread safe. This also include > the source in CVS. Have not succeded to prove it when when testing, only in > the live environment. Each instance of Spellchecker only suggest once. And it > takes quite some resources to create new instances of the spellchecker as it > is designed today. Might get back on that subject. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]