You can easily use just the CommonGrams stuff from Solr in your pure lucene project.
There are a couple of useful docs on stop words and common grams et al at http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2 -- Ian. On Mon, Nov 28, 2011 at 8:31 PM, Dawn Zoë Raison <d...@digitorial.co.uk> wrote: > Hi Steve, > > On 28/11/2011 19:43, Steven A Rowe wrote: >> >> I assume that when you refer to "the impact of stop words," you're >> concerned about query-time performance? You should consider the possibility >> that performance without removing stop words is good enough that you won't >> have to take any steps to address the issue. > > Not to fussed about query-time performance; certainly no-one has complained > so far. It's more the sheer number of junk pages we get searching on phrases > that contain stop words - it can lead to hundreds of thousands of results, > and the pedants among our userbase insist on paging through the lot :-| > > I'd much rather contain the stop words using a *gram based approach and > offer a less populous but more accurate resultset. > >> >> That said, there are two filters in Solr 3.X[1] that would do the >> equivalent of what you have outlined: >> CommonGramsFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsFilter.html> >> and >> CommonGramsQueryFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsQueryFilter.html>. > > We use lucene directly, but I'll take a look - Thanks. > >> You can use these filters with a Lucene 3.X application by including the >> (same-versioned) solr-core jar as a dependency. >> >> Steve > > -- > > Rgds. > *Dawn Raison* > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org