On Thursday 11 March 2004 06:15, Tomcat Programmer wrote: > I have a situation where I need to be able to find > incomplete word matches, for example a search for the > string 'ape' would return matches for 'grapes' > 'naples' 'staples' etc. I have been searching the > archives of this user list and can't seem to find any > example of someone doing this. > > At one point I recall finding someone's site (on > Google) who indicated that their search engine was > Lucene, and they offered the capability of doing this > type of matching. However I can't seem to find that > site again to save my life! > > Has anyone been successful in implementing this type > of matching with Lucene? If so, would you be able to > share some insight as to how you did it?
I havn't actually done this, but I would make a first attempt by indexing all the suffixes in a separate field and use a PrefixQuery to search. You would index eg. google as: google oogle ogle gle le e all on the same position. To search for substring ogl you would query ogl* on the field. To save space you might impose a minimum substring length. The minimum query length should preferably be the same. Your index will grow quite a bit, but it's difficult to say how much. You can do this by providing your own TokenStream on the field that returns each substring as a Token with a getPositionIncrement() of zero just after the the normal full Token (google) with an increment of 1. See also: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/analysis/Token.html Paul --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]