[ https://issues.apache.org/jira/browse/LUCENE-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758056#action_12758056 ]
Jason Rutherglen commented on LUCENE-1917: ------------------------------------------ I'm going to port SOLR-908 rather than reuse ShingleFilter as SF seems to be built tightly for it's use case. > ShingleFilter include words > --------------------------- > > Key: LUCENE-1917 > URL: https://issues.apache.org/jira/browse/LUCENE-1917 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Affects Versions: 2.9 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > By default ShingleFilter creates shingles (i.e. combines tokens > into a single token) from all tokens. For the purposes of for > example, indexing stop words as shingles, however not creating > shingles out of every word, we can supply an include words > CharArraySet to ShingleFilter that determines the tokens to > shingle. > This is similar to Nutch CommonGrams and SOLR-908. SOLR-908 > does not utilize the new token attribute API, and I figured this > functionality is more suitable being a part of Lucene. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org