How to make word-N-gram based query and interpolate each N-gram score to obtain final Lucene score

Rajen Chatterjee Mon, 11 Jan 2016 00:44:43 -0800

Hello Everyone,

I am looking for some method which can help me to build *word-N-gram* based
queries.
After doing some search I think that I have to define an analyzer as
follows:


public static Analyzer wordNgramAnalyzer(final int minShingle, final int
maxShingle) {
        return new Analyzer() {
            @Override
            public TokenStream tokenStream(String fieldName, Reader reader)
{
               return new ShingleFilter(new WhitespaceTokenizer(reader),
minShingle, maxShingle)
            }
        };
    }
This analyzer will help to get unigram, bigram, trigram,... tokens, which I
can use during indexing as well as at the query time.
So, can anyone please tell me:
1) Is this the right approach to index and query word-N-gram?
2) Is there any way to set weights to the N-grams, like at the query time
tri-gram based tokens should have higher weight than an uni-gram based token
(something like the final lucene score should be interpolation of uni-gram
score, bi-gram score, tri-gram score,... and so on)

Any help is much appreciated.

Thanks

-- 
-Regards,
 Rajen Chatterjee.

How to make word-N-gram based query and interpolate each N-gram score to obtain final Lucene score

Reply via email to