Hi Steve,
On 28/11/2011 19:43, Steven A Rowe wrote:
I assume that when you refer to "the impact of stop words," you're concerned
about query-time performance? You should consider the possibility that performance
without removing stop words is good enough that you won't have to take any steps to
address the issue.
Not to fussed about query-time performance; certainly no-one has
complained so far. It's more the sheer number of junk pages we get
searching on phrases that contain stop words - it can lead to hundreds
of thousands of results, and the pedants among our userbase insist on
paging through the lot :-|
I'd much rather contain the stop words using a *gram based approach and
offer a less populous but more accurate resultset.
That said, there are two filters in Solr 3.X[1] that would do the equivalent of what you
have outlined:
CommonGramsFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsFilter.html>
and
CommonGramsQueryFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsQueryFilter.html>.
We use lucene directly, but I'll take a look - Thanks.
You can use these filters with a Lucene 3.X application by including the
(same-versioned) solr-core jar as a dependency.
Steve
--
Rgds.
*Dawn Raison*