Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization inquery phase

Hiroaki Kawai Fri, 14 Mar 2008 02:47:03 -0700

Thanks for your replay.

> > I found that NGramTokenFilter-ed token stream could be optimized in query.
> >
> > A standard 1,2 NGramTokenFilter will generate a token stream from "abcde" 
> > as follows:
> > a ab b bc c cd d de e
> >
> > When we index "abcde", we'll use all of the tokens.
> >
> > But when we query, we only need:
> > ab cd de
> >   
> I don't understand why you index something that you will not query?
> Why don'y you use a  bigram?


Good point. :-) Consider the following case:
1. We stored(indexed) "abcde"

2. We query with "a", and want "abcde" to be hit.

3. We query with "ab", and want "abcde" to be hit.

4. We query with "de", and want "abcde" to be hit.

5. Of cource, we query with "abcde", and want "abcde" to be hit.


I mean, whether the gram is really necessary to query or not is 
dependent on the query string. Required tokens might be differnt in 
index phase and query phase.

Of cource, you CAN query "abcde" with ALL of the tokens of
(a ab b bc c cd d de e). But, it is not necessary.
We can omit some tokens to test and search for query that include
"abcde".

Bigram, might work as well, but it can't hit 3 gram token in one 
index search, so we want to index 3gram token as well, for example.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization inquery phase

Reply via email to