What is the best practice around filtering out search results with curse words

varun kumar Tue, 28 Apr 2015 16:01:24 -0700

Hey All,
I want to filter out docs with hate words in my search result. Currently we 
are having bool filter in every search query for the list of all words. And 
this results in tons of slow queries, since the list of hate words is long 
(So much of hatred around :( )


I was wondering what are the best practices for this spam/hate words 
filtering.

Here are what we are considering:
1. Pre-process : Scan the doc prior to indexing and hence mark them bad or 
do not index them.
    Problem :  The documents are indexed from several processes and it is 
difficult to force the rule on any new component some one writes.

2. Creating a percolator and running it periodically (Not sure of the best 
frequency and timing) to tag all documents with bad words as "badDoc" : 
true. Hence have a filter in all the queries.
    Problem : Not sure of the performance impact due to periodical running 
of percolator, secondly the same problem of discipline in all queries to 
exclude badDoc

Personally I would favor a pure ES solution and I am sure this is not a new 
problem, and hence seeking expert guidance and best practices. 
Any guidance/links would be helpful!

Thanks and Regards
Varun



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7e3915d1-4c51-4c00-aa57-516f52d7983f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What is the best practice around filtering out search results with curse words

Reply via email to