Hello Everyone,
I am looking for some method which can help me to build *word-N-gram* based
queries.
After doing some search I think that I have to define an analyzer as
follows:
public static Analyzer wordNgramAnalyzer(final int minShingle, final int
maxShingle) {
return new Analyzer() {
@Override
public TokenStream tokenStream(String fieldName, Reader reader)
{
return new ShingleFilter(new WhitespaceTokenizer(reader),
minShingle, maxShingle)
}
};
}
This analyzer will help to get unigram, bigram, trigram,... tokens, which I
can use during indexing as well as at the query time.
So, can anyone please tell me:
1) Is this the right approach to index and query word-N-gram?
2) Is there any way to set weights to the N-grams, like at the query time
tri-gram based tokens should have higher weight than an uni-gram based token
(something like the final lucene score should be interpolation of uni-gram
score, bi-gram score, tri-gram score,... and so on)
Any help is much appreciated.
Thanks
--
-Regards,
Rajen Chatterjee.