Hi Martin, On 07/22/2008 at 5:48 AM, mpermar wrote: > I want to index some incoming text. In this case what I want > to do is just detect keywords in that text. Therefore I want > to discard everything that is not in the keywords set. This > sounds to me pretty much like the reverse of using stop words, > that is it I want to use a set of "accepted" words. > > So I planned to create a new filter that just checks that > incoming words are in the "acceptable set" and discards them > otherwise. Are you aware of any analyzer/filter out there that > uses this approach? Is there any other better way to do this?
Solr has KeepWordFilter - it sounds exactly like what you want: Javadoc: <http://lucene.apache.org/solr/api/org/apache/solr/analysis/KeepWordFilter.html> Source: <http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/KeepWordFilter.java?view=markup> Depending on your requirements and the nature of your keywords list, you might consider applying this filter only to queries, rather than at index time. That way, the keyword list can change without having to re-index. Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]