Turning PrefixQuery into a TermQuery

Steffen Heinrich Wed, 11 Apr 2007 13:33:25 -0700

Hello Lucene users,

I'm rather new to lucene and java but have done work with other 
search engines some time before.
Right now I'm trying my hands (and luck) on a 'search as you type'-
sort of high performance search a la GoogleSuggest.


There meanwhile are on the net, a number of examples for such script-
driven forms that are suggesting new possible input to the user with 
every keystroke. Mostly for some sort of products.
Some instances are even combining the input from two or more text 
fields or giving fault tolerant feedback.

According to occasional references on this list some people have 
already tried to implement such a search with lucene but did they 
succeed?

My first idea was to run every completed token of the request 
(current user input) through a spellchecker and expand an incomplete 
token to a PrefixQuery.

Example:
artist:'beetles'
title:'yellow submar'

Alternative terms for 'beetles' and 'yellow' would be looked up by 
Spellcheckers for their respective fields and 'submar' being the last 
token of the active textfield with no trailing whitespace would be 
turned into a PrefixQuery.
And of course the performance considerations have to be of major 
concern with these searches.

I'm currently dealing with the problem that short prefixes are 
resulting in BooleanQuery$TooManyClauses exceptions.
That's why I've thought of discarding them in favour of extra fields 
with the first bigrams and trigrams of every indexed token.

artist:'the beatles'
artist_start:'be bea'    ('the' being a stopword)
title:'yellow submarine'
title_start:'ye yel su sub'

Every PrefixQuery of length 2 or 3 could thus be turned into a simple 
TermQuery on the appropriate field. (With searches for 1-letter-
prefixes altogether discarded.)

I've seen Otis Gospodnetic suggesting the very same strategy in a 
former thread but I have no idea about how I could possibly add these 
extra fields.

Normally an IndexWriter uses only one default Analyzer for all its 
tokenizing businesses. And while it is appearantly possible to supply 
a certain other instance when adding a specific document there seems 
to be no way to use different analyzers on different fields within 
one document.

Could this be done in one pass at all, or do I have to copy all 
documents from one index to a new one, parsing field tokens and 
adding new fields on the 2nd go?

I'd appreciate every hint and suggestion on what classes and methods 
to write for this purpose because I definitely lack a knack when it 
comes to OOP.


Thank You In Advance,
Steffen


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Turning PrefixQuery into a TermQuery

Reply via email to