Hi Steve,

On 28/11/2011 19:43, Steven A Rowe wrote:
I assume that when you refer to "the impact of stop words," you're concerned 
about query-time performance?  You should consider the possibility that performance 
without removing stop words is good enough that you won't have to take any steps to 
address the issue.
Not to fussed about query-time performance; certainly no-one has complained so far. It's more the sheer number of junk pages we get searching on phrases that contain stop words - it can lead to hundreds of thousands of results, and the pedants among our userbase insist on paging through the lot :-|

I'd much rather contain the stop words using a *gram based approach and offer a less populous but more accurate resultset.


That said, there are two filters in Solr 3.X[1] that would do the equivalent of what you 
have outlined: 
CommonGramsFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsFilter.html>
  and 
CommonGramsQueryFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsQueryFilter.html>.
We use lucene directly, but I'll take a look - Thanks.

You can use these filters with a Lucene 3.X application by including the 
(same-versioned) solr-core jar as a dependency.

Steve

--

Rgds.
*Dawn Raison*

Reply via email to