If it adds the clauses as Occur.SHOULD, it means they should appear, but
does not have to appear.
Looking at suggestSimilar, it looks like it computes the edit_distance
values of the requested word and the suggestions. If the score is lower than
the minimum score, it may skip the word.
Could you try indexing a document with the word "abcde" and request
suggestions for "abce". It may be, like in your example, that if the
requested word is too different than the actual word (for example if their
length is very different), it will fail to achieve what you need.

On Thu, Feb 14, 2008 at 12:25 PM, Cam Bazz <[EMAIL PROTECTED]> wrote:

> Hello Shai,
>
> Thats right, Speller is in the contrib.it is named spellchecker. Basically
> it is a special index that stores the words as ngrams.
> I looked at the code to see how it is querying the index and basically it
> makes ngrams and adds each ngram to a boolean query.
>
> Here is how it adds to the boolean query. I could not find out whether it
> is
> AND or OR
>
> Best.
>
>  private static void add(BooleanQuery q, String name, String value, float
> boost) {
>    Query tq = new TermQuery(new Term(name, value));
>    tq.setBoost(boost);
>    q.add(new BooleanClause(tq, BooleanClause.Occur.SHOULD));
>  }
>
>  private static void add(BooleanQuery q, String name, String value) {
>    q.add(new BooleanClause(new TermQuery(new Term(name, value)),
> BooleanClause.Occur.SHOULD));
>   }
>
> On Thu, Feb 14, 2008 at 8:44 AM, Shai Erera <[EMAIL PROTECTED]> wrote:
>
> > Is this Speller class a Lucene class? I didn't find it in the main code
> > stream, maybe it's part of contrib?
> >
> > Anyway, still it depends how it is implemented (OR or AND). For example,
> > someone indexed a document with the word "abcde" and the index keeps the
> > ngrams "abc", "bcd" and "cde". Then somebody types in "abc", what would
> > the
> > speller suggest? What would the speller suggest for "abce"?
> > If it works in an OR mode, I assume it would suggest "abcde" for both,
> as
> > "abc" appears in both. But if it works in AND mode, then for the first
> it
> > will suggest "abcde" but for the second it won't suggest it because the
> > ngrams produced are "abc" and "bce" .. and "bce" does not appear in
> > "abcde".
> >
> > Am I right? If not, can you elaborate more on the Speller class you use?
> >
> > On Wed, Feb 13, 2008 at 8:19 PM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> >
> > > Hello Shai,
> > >
> > > The class that does the matching is Speller.
> > > It does not work query based but rather there is a method called -
> > > suggestSimilar(String word, int numSug); where the numSug is number of
> > > suggestions. The words are kept in the index as ngrams. For example
> > abcde
> > > is
> > > kept as abc bcd cde.
> > > So this is not normal query like we all know.
> > >
> > > Best regards,
> > > C.B.
> > >
> > >
> > > On Feb 13, 2008 7:00 PM, Shai Erera <[EMAIL PROTECTED]> wrote:
> > >
> > > > What is the default Operator of your QueryParser? Is it AND_OPERATOR
> > or
> > > > OR_OPERATOR. If it's OR ... then it's strange. If it's AND, then
> once
> > > you
> > > > add more terms than what exists, it won't find anything.
> > > >
> > > > On Feb 13, 2008 6:54 PM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hello;
> > > > >
> > > > > I am trying to make a product matcher based on lucene's ngram
> based
> > > > > suggest.
> > > > > I did some changes so that instead of giving the speller a
> > dictionary
> > > I
> > > > > feed
> > > > > it with a List<String>.
> > > > >
> > > > > For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600
> > > > > 1.83GHz/512MB/80GB/12.1''
> > > > > NOTEBOOK"
> > > > > and I index it with speller using an ngram approach.
> > > > >
> > > > > It works quite well - when using the suggest feature, for example
> if
> > > the
> > > > > user submits something similar. similar as in the string lenght is
> > > > > relatively equal, a word or two might be mistyped - or even
> missing,
> > > > > lucene
> > > > > finds it.
> > > > > However - when the user submits the same product - but with much
> > less
> > > or
> > > > > much more string length - for example "HP NC4400 EY605EA" or "HP
> > > NC4400
> > > > > EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK WITH
> > > WINDOWS
> > > > > XP
> > > > > AND GIFT MOUSE" - the suggester wont work.
> > > > >
> > > > > I am not sure about the ngrams approach any more.
> > > > >
> > > > > Any ideas/recomendations/help greatly appreciated.
> > > > >
> > > > > Best Regards,
> > > > > C.B.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Shai Erera
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Shai Erera
> >
>



-- 
Regards,

Shai Erera

Reply via email to