Hi again.

You can find additional info regarding this Bigram index here:
http://asbjorn.fellinghaug.com/blog/master-thesis/

The source code was available, from the same site but it has 
disappeared. However, it can be downloaded from the computer 
science department at NTNU in Norway:
http://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429

Hope this helps.

Hayes, Peter:
> Thanks for your input.  I will try and apply your suggestion.
> 
> Thanks,
> Peter 
> 
> -----Original Message-----
> From: Asbjørn A. Fellinghaug [mailto:asbj...@fellinghaug.com] 
> Sent: Thursday, January 15, 2009 3:25 AM
> To: java-user@lucene.apache.org
> Subject: Re: Google finance-like suggestible search field
> 
> 
> Hi.
> 
> Such 'autocompletion' features with Lucene could be provided with n-gram
> tokenizers, as Erick states. I made a 'Bigram' analyzer for my master
> thesis, when I was doing some research on how to enhance phrase
> searching. This Analyzer considers pair of words as single terms.
> 
> Basically, what the Bigram analyzer does is to index stopwords combined
> with the "previous" word, and with the "next" word. Single stopwords
> would not be indexed, as they demand a lot of resources during searches.
> Only combination of prev+stopword and stopword+nextword would be
> indexed. This saves a lot during searching.
> 
> Consider this sentence: "fetch me a beer honey" (where 'a' and 'me' is
> stopwords). The Bigram analyzer would index these 'Tokens':
>     'fetch', 'fetch me', 'me a', 'a beer', 'honey'.
> 
> Erick Erickson:
> > You could look at the n-gram tokenizers (I confess I haven't used them
> > so I'm not all *that* familiar with them). Or you could make a rule like
> > "no autocomplete until the user types 3 characters" if that would work.
> > 
> > Instead of forming a query, you might try using TermEnum, or
> > WildCardTermEnum
> > or even RegexTermEnum to quickly get the list of terms for your
> > autocomplete. The
> > nice part about this approach is that you could quit after a suitable number
> > of
> > terms were found rather than get them all. As I remember, WildCardTermEnum
> > is
> > faster than RegexTermEnum, but don't hold me to that. So I'd try
> > WildCardTermEnum
> > first, I think you'll find it much more suitable than forming
> > 
> > Best
> > Erick
> 
> -- 
> Asbjørn A. Fellinghaug
> asbj...@fellinghaug.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 

-- 
Asbjørn A. Fellinghaug
asbj...@fellinghaug.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to