Not sure if this helpful given your proposed solution, but could you do something on the indexing side, such as:

1. Remove the profanity from the token stream, much like a stopword. This would also mean stripping it from the display text 2. If your TokenFilter comes across a profanity, somehow mark the document as containing a profanity via a "profanity" Field (not sure if there is a way, in Lucene, to add another Field while you are in the analysis phase, but you could also have it update a table in a db or something.) Then on search, you could just say (regular query) +profanity:false

HTH,
Grant

On Mar 7, 2007, at 10:07 AM, Greg Gershman wrote:

I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND [EMAIL PROTECTED] AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I believe the bit set returned by a purely negative QueryFilter is empty, so no matter how many results returned by the initial query, the result after filtering is always zero documents.

I was wondering if anyone had suggestions as to how else to do this. I've considered simply amending the query string submitted by the user to include a pre-generated String that would exclude the query terms, but I consider this a non-elegant solution. I had also thought about creating a new sub-class of QueryFilter, NegativeQueryFilter. Basically, it would works just like a QueryFilter, taking a positive query (so, I would pass it an OR'ed list of profane words), then the resulting bits are simply flipped. I think this would work, unless I'm missing something. I'm going to experiment with it, I'd appreciate anyone's thoughts on this.

Thanks,

Greg





______________________________________________________________________ ______________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to