Re: possible bug on Spellchecker

Samuel García Martínez Thu, 21 Feb 2013 13:32:51 -0800

Here it is https://issues.apache.org/jira/browse/LUCENE-4793 :)



On Thu, Feb 21, 2013 at 9:02 PM, Samuel García Martínez <
samuelgmarti...@gmail.com> wrote:

> Yes, of course i can. I'll try to open it this night (European Time) or
> tomorrow as soon as I get to the office.
>
>
> On Thu, Feb 21, 2013 at 4:14 PM, Dyer, James <james.d...@ingramcontent.com
> > wrote:
>
>> Samuel,
>>
>> Do you think you could write a failing unit test and open a JIRA issue?
>>  Or at the least open a JIRA issue with all the details without a test?
>>
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: Samuel García Martínez [mailto:samuelgmarti...@gmail.com]
>> Sent: Thursday, February 21, 2013 2:33 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: possible bug on Spellchecker
>> Importance: Low
>>
>> I'm using Solr 3.6 and DirectSpellchecker is available only on v4+.
>> Moreover, in "big" indexes i prefer using sidekick index rather than
>> iterating over term dictionary.
>>
>>
>> On Thu, Feb 21, 2013 at 8:19 AM, Jack Krupansky <j...@basetechnology.com
>> >wrote:
>>
>> > Any reason that you are not using the DirectSpellChecker?
>> >
>> > See:
>> > http://lucene.apache.org/core/**4_0_0/suggest/org/apache/**
>> > lucene/search/spell/**DirectSpellChecker.html<
>> http://lucene.apache.org/core/4_0_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html
>> >
>> >
>> > -- Jack Krupansky
>> >
>> > -----Original Message----- From: Samuel García Martínez
>> > Sent: Wednesday, February 20, 2013 3:34 PM
>> > To: java-user@lucene.apache.org
>> > Subject: possible bug on Spellchecker
>> >
>> >
>> > Hi all,
>> >
>> > Debugging Solr spellchecker (IndexBasedSpellchecker, delegating on
>> lucene
>> > Spellchecker) behaviour i think i found a bug when the input is a 6
>> letter
>> > word:
>> >  - george
>> >  - anthem
>> >  - argued
>> >  - fluent
>> >
>> > Due to the getMin() and getMax() the grams indexed for these terms are 3
>> > and 4. So, the fields would be something like this:
>> >  - for "*george*"
>> >
>> >     - start3: "geo"
>> >     - start4: "geor"
>> >     - end3: "rge"
>> >     - end4: "orge"
>> >     - 3: "geo", "eor", "org", "rge"
>> >     - 4: "geor", "eorg", "orge"
>> >  - for "*anthem*"
>> >
>> >     - start3: "ant"
>> >     - start4: "anth"
>> >     - end3: "tem"
>> >     - end4: "them"
>> >
>> > The problem shows up when the user swap 3rd a 4th characters,
>> misspelling
>> > the word like this:
>> >  - geroge
>> >  - anhtem
>> >
>> > The queries generated for this terms are: (SHOULD boolean queries)
>> > - for "*geroge*"
>> >
>> >  - start3: "ger"
>> >  - start4: "gero"
>> >  - end3: "oge"
>> >  - end4: "roge"
>> >  - 3: "ger", "ero", "rog", "oge"
>> >  - 4: "gero", "erog", "roge"
>> > - for "*anhtem*"
>> >
>> >  - start3: "anh"
>> >  - start4: "anht"
>> >  - end3: "tem"
>> >  - end4: "htem"
>> >  - 3: "anh", "nht", "hte", "tem"
>> >  - 4: "anht", "nhte", "htem"
>> >
>> > So, as you can see, this kind of misspelling never matches the suitable
>> > suggestions although the edit distance is 0.95555556.
>> >
>> > I think getMin(int l) and getMax(int l) should return 2 and 3,
>> > respectively, for l==6. Debugging other values i did not found any
>> problem
>> > with any kind of misspelling.
>> >
>> > Any thoughts about this?
>> >
>> > --
>> > Un saludo,
>> > Samuel García
>> >
>> >
>> ------------------------------**------------------------------**---------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<
>> java-user-unsubscr...@lucene.apache.org>
>> > For additional commands, e-mail: java-user-help@lucene.apache.**org<
>> java-user-h...@lucene.apache.org>
>> >
>> >
>>
>>
>> --
>> Un saludo,
>> Samuel García.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
> --
> Un saludo,
> Samuel García.
>



-- 
Un saludo,
Samuel García.

Re: possible bug on Spellchecker

Reply via email to