Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words 
one-by-one to be analyzed.  All analysis components that do their work based on 
more than one word, including ShingleFilter and SynonymFilter, are borked by 
this.  (There is a JIRA issue open for the QueryParser problem: 
<https://issues.apache.org/jira/browse/LUCENE-2605>).  

There is a workaround involving PositionFilter described on the Solr wiki: 
<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>.
  Essentially, include PositionFilter after ShingleFilter in your analyzer, 
then wrap queries in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but 
in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; 
you can use it in your application by including the solr-core jar as a 
dependency.  In trunk, which will be released as Lucene/Solr 4.0, 
CommonGramsFilter has been moved to the analyzers-common module.

Steve

> -----Original Message-----
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Tuesday, February 21, 2012 8:07 AM
> To: java-user@lucene.apache.org
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
> 
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
> 
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
> 
> Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to