One of our main use-cases for search is to find objects based on partial name 
matches.  I've implemented this using n-grams and it works pretty well.  
However we're currently using trigrams and that causes an interesting problem 
when searching for things like "abc ab" since we first split on whitespace and 
then construct PhraseQuerys containing each trigram yielded by the "word".  
Obviously we cannot get a trigram out of "ab".  So our choices would seem to be 
either discard this part of the search term which seems unwise, or to reduce 
the minimum n-gram size.  But I'm slightly concerned about the resulting bloat 
in both the of number of Terms stored in the index as well as contained in 
queries.  Is this something I should be concerned about?  It just "feels" like 
a query for the word "abcdef" shouldn't require a PhraseQuery of 15 terms 
(assuming n-grams 1,3).  Is this the best way to do partial word matches?  
Thanks in advance.

-Tommy


Reply via email to