Hello Shai, Thats right, Speller is in the contrib.it is named spellchecker. Basically it is a special index that stores the words as ngrams. I looked at the code to see how it is querying the index and basically it makes ngrams and adds each ngram to a boolean query.
Here is how it adds to the boolean query. I could not find out whether it is AND or OR Best. private static void add(BooleanQuery q, String name, String value, float boost) { Query tq = new TermQuery(new Term(name, value)); tq.setBoost(boost); q.add(new BooleanClause(tq, BooleanClause.Occur.SHOULD)); } private static void add(BooleanQuery q, String name, String value) { q.add(new BooleanClause(new TermQuery(new Term(name, value)), BooleanClause.Occur.SHOULD)); } On Thu, Feb 14, 2008 at 8:44 AM, Shai Erera <[EMAIL PROTECTED]> wrote: > Is this Speller class a Lucene class? I didn't find it in the main code > stream, maybe it's part of contrib? > > Anyway, still it depends how it is implemented (OR or AND). For example, > someone indexed a document with the word "abcde" and the index keeps the > ngrams "abc", "bcd" and "cde". Then somebody types in "abc", what would > the > speller suggest? What would the speller suggest for "abce"? > If it works in an OR mode, I assume it would suggest "abcde" for both, as > "abc" appears in both. But if it works in AND mode, then for the first it > will suggest "abcde" but for the second it won't suggest it because the > ngrams produced are "abc" and "bce" .. and "bce" does not appear in > "abcde". > > Am I right? If not, can you elaborate more on the Speller class you use? > > On Wed, Feb 13, 2008 at 8:19 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > Hello Shai, > > > > The class that does the matching is Speller. > > It does not work query based but rather there is a method called - > > suggestSimilar(String word, int numSug); where the numSug is number of > > suggestions. The words are kept in the index as ngrams. For example > abcde > > is > > kept as abc bcd cde. > > So this is not normal query like we all know. > > > > Best regards, > > C.B. > > > > > > On Feb 13, 2008 7:00 PM, Shai Erera <[EMAIL PROTECTED]> wrote: > > > > > What is the default Operator of your QueryParser? Is it AND_OPERATOR > or > > > OR_OPERATOR. If it's OR ... then it's strange. If it's AND, then once > > you > > > add more terms than what exists, it won't find anything. > > > > > > On Feb 13, 2008 6:54 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > > > > > > > Hello; > > > > > > > > I am trying to make a product matcher based on lucene's ngram based > > > > suggest. > > > > I did some changes so that instead of giving the speller a > dictionary > > I > > > > feed > > > > it with a List<String>. > > > > > > > > For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600 > > > > 1.83GHz/512MB/80GB/12.1'' > > > > NOTEBOOK" > > > > and I index it with speller using an ngram approach. > > > > > > > > It works quite well - when using the suggest feature, for example if > > the > > > > user submits something similar. similar as in the string lenght is > > > > relatively equal, a word or two might be mistyped - or even missing, > > > > lucene > > > > finds it. > > > > However - when the user submits the same product - but with much > less > > or > > > > much more string length - for example "HP NC4400 EY605EA" or "HP > > NC4400 > > > > EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK WITH > > WINDOWS > > > > XP > > > > AND GIFT MOUSE" - the suggester wont work. > > > > > > > > I am not sure about the ngrams approach any more. > > > > > > > > Any ideas/recomendations/help greatly appreciated. > > > > > > > > Best Regards, > > > > C.B. > > > > > > > > > > > > > > > > -- > > > Regards, > > > > > > Shai Erera > > > > > > > > > -- > Regards, > > Shai Erera >