Partial token matches

Eric Isakson Wed, 26 Apr 2006 09:20:43 -0700

Hi All,

Just wanted to throw out something I'm working on. It is working well for me, 
but I wanted to see if anyone can suggest any other alternatives that might 
perform better than what I'm doing now.


I have a field in my index that contains keywords (back of the book index 
terms) and a UI feature that allows the user to find documents that contain a 
partial keyword supplied by the user. So a particular document in my index 
might have the token "informat" in the keywords field and the user may supply 
"form" in the UI and I should get a match.

My old implementation does not use Lucene and just uses String.matches with a 
regular expression that looks like ".*form.*". I reimplemented using Lucene and 
just tokenize the field so I get the tokens

informat
nformat
format
ormat
rmat
mat
at
t

Then I use a prefix query to find hits. Both implementations ignore case in the 
search and the hit order is controlled by another field that I'm sorting on, so 
relevance ranking is not important in this use case. Search time performance is 
crucial, time to create the index and index size are not really important. The 
index is created statically at application startup or possibly delivered to the 
application and is not updated while the application is using it.

Thanks for any suggestions,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Partial token matches

Reply via email to