Hello; I am trying to make a product matcher based on lucene's ngram based suggest. I did some changes so that instead of giving the speller a dictionary I feed it with a List<String>.
For example lets say I have "HP NC4400 EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK" and I index it with speller using an ngram approach. It works quite well - when using the suggest feature, for example if the user submits something similar. similar as in the string lenght is relatively equal, a word or two might be mistyped - or even missing, lucene finds it. However - when the user submits the same product - but with much less or much more string length - for example "HP NC4400 EY605EA" or "HP NC4400 EY605EA CORE 2 DUO T5600 1.83GHz/512MB/80GB/12.1'' NOTEBOOK WITH WINDOWS XP AND GIFT MOUSE" - the suggester wont work. I am not sure about the ngrams approach any more. Any ideas/recomendations/help greatly appreciated. Best Regards, C.B.