Ah. Sorry. Last post was a ProfanitySelector rather than ProfanityFilter! - this fixes it anyway....
<CachedFilter> <BooleanFilter> <Clause occurs="mustNot"> <TermsFilter fieldName="content"> naughty1 naughty2 xxx </TermsFilter> </Clause> </BooleanFilter> </CachedFilter> ----- Original Message ---- From: mark harwood <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 7 March, 2007 4:05:56 PM Subject: Re: Negative Filtering (such as for profanity) Sounds like the sort of filter that could be usefully cached. You can do all this in Java code or the XML query parser (in contrib) might be a quick and simple way to externalize the profanity settings in a stylesheet which is actually used at query time e.g. <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:template match="/Document"> <FilteredQuery> <Query> <<UserQuery><xsl:value-of select="content"/></UserQuery> </Query> <Filter> <CachedFilter> <TermsFilter fieldName="content"> naughty1 naughty2 xxx </TermsFilter> </CachedFilter> </Filter> </FilteredQuery> </xsl:template> </xsl:stylesheet> The above example also automatically adds caching to the results of the profanity filter. Your app code to use this would then look like this: init() //parse and cache the stylesheet QueryTemplateManager qtm=new QueryTemplateManager(getClass().getResourceAsStream("query.xsl")); .... runQuery() //get the user input Properties userInput=new Properties(); userInput.setProperty("content",httpRequest.getParameter("queryCriteria"); //Transform the user input into a Lucene XML query org.w3c.dom.Document doc=qtm.getQueryAsDOM(userInput); //Parse the XML query using the XML parser Query q=xmlQueryBuilder.getQuery(doc.getDocumentElement()); //run query as normal Cheers Mark ----- Original Message ---- From: Greg Gershman <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 7 March, 2007 3:07:45 PM Subject: Negative Filtering (such as for profanity) I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I believe the bit set returned by a purely negative QueryFilter is empty, so no matter how many results returned by the initial query, the result after filtering is always zero documents. I was wondering if anyone had suggestions as to how else to do this. I've considered simply amending the query string submitted by the user to include a pre-generated String that would exclude the query terms, but I consider this a non-elegant solution. I had also thought about creating a new sub-class of QueryFilter, NegativeQueryFilter. Basically, it would works just like a QueryFilter, taking a positive query (so, I would pass it an OR'ed list of profane words), then the resulting bits are simply flipped. I think this would work, unless I'm missing something. I'm going to experiment with it, I'd appreciate anyone's thoughts on this. Thanks, Greg ____________________________________________________________________________________ It's here! Your new message! Get new email alerts with the free Yahoo! Toolbar. http://tools.search.yahoo.com/toolbar/features/mail/ ___________________________________________________________ New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo! Mail Championships. Plus: play games and win prizes. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___________________________________________________________ What kind of emailer are you? Find out today - get a free analysis of your email personality. Take the quiz at the Yahoo! Mail Championship. http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]