Samuel, Do you think you could write a failing unit test and open a JIRA issue? Or at the least open a JIRA issue with all the details without a test?
James Dyer Ingram Content Group (615) 213-4311 -----Original Message----- From: Samuel García Martínez [mailto:samuelgmarti...@gmail.com] Sent: Thursday, February 21, 2013 2:33 AM To: java-user@lucene.apache.org Subject: Re: possible bug on Spellchecker Importance: Low I'm using Solr 3.6 and DirectSpellchecker is available only on v4+. Moreover, in "big" indexes i prefer using sidekick index rather than iterating over term dictionary. On Thu, Feb 21, 2013 at 8:19 AM, Jack Krupansky <j...@basetechnology.com>wrote: > Any reason that you are not using the DirectSpellChecker? > > See: > http://lucene.apache.org/core/**4_0_0/suggest/org/apache/** > lucene/search/spell/**DirectSpellChecker.html<http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html> > > -- Jack Krupansky > > -----Original Message----- From: Samuel García Martínez > Sent: Wednesday, February 20, 2013 3:34 PM > To: java-user@lucene.apache.org > Subject: possible bug on Spellchecker > > > Hi all, > > Debugging Solr spellchecker (IndexBasedSpellchecker, delegating on lucene > Spellchecker) behaviour i think i found a bug when the input is a 6 letter > word: > - george > - anthem > - argued > - fluent > > Due to the getMin() and getMax() the grams indexed for these terms are 3 > and 4. So, the fields would be something like this: > - for "*george*" > > - start3: "geo" > - start4: "geor" > - end3: "rge" > - end4: "orge" > - 3: "geo", "eor", "org", "rge" > - 4: "geor", "eorg", "orge" > - for "*anthem*" > > - start3: "ant" > - start4: "anth" > - end3: "tem" > - end4: "them" > > The problem shows up when the user swap 3rd a 4th characters, misspelling > the word like this: > - geroge > - anhtem > > The queries generated for this terms are: (SHOULD boolean queries) > - for "*geroge*" > > - start3: "ger" > - start4: "gero" > - end3: "oge" > - end4: "roge" > - 3: "ger", "ero", "rog", "oge" > - 4: "gero", "erog", "roge" > - for "*anhtem*" > > - start3: "anh" > - start4: "anht" > - end3: "tem" > - end4: "htem" > - 3: "anh", "nht", "hte", "tem" > - 4: "anht", "nhte", "htem" > > So, as you can see, this kind of misspelling never matches the suitable > suggestions although the edit distance is 0.95555556. > > I think getMin(int l) and getMax(int l) should return 2 and 3, > respectively, for l==6. Debugging other values i did not found any problem > with any kind of misspelling. > > Any thoughts about this? > > -- > Un saludo, > Samuel García > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> > > -- Un saludo, Samuel García. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org