[ 
https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013624#comment-13013624
 ] 

Hoss Man commented on SOLR-2450:
--------------------------------

people can name their stopwords file anything they want -- and that's just with 
the default StopFilterFactory, it doesn't even account for the possibility of 
other filter factories that implement similar functionality.

one thing you could probably do, assuming you wanted to stick with just 
worrying about the stock StopFilterFactory, is to query the IndexSchema for the 
analyzer of the fieldTypes you are interested in (presumably via some 
configured list of field names) and then test those analyzers to see if they 
are analysis chain based, and if they are look to see if they contain the 
StopFilterFactory, and if they do, THEN you can get the list of words (or at 
the very least: the file the words came from)

AnalysisRequestHandlerBase should have an example of walking an analysis chain 
to see what factories are in it.

> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>
>                 Key: SOLR-2450
>                 URL: https://issues.apache.org/jira/browse/SOLR-2450
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Stanislaw Osinski
>            Assignee: Stanislaw Osinski
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>
> While using only Solr's stop words for clustering isn't a good idea (compared 
> to indexing, clustering needs more aggressive stop word removal to get 
> reasonable cluster labels), it would be good if Carrot2 used both its own and 
> Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first 
> thought was to simply load {{stopwords.txt}} from Solr config dir and merge 
> them with Carrot2's. But then, maybe a better approach would be to get the 
> stop words from the StopFilter being used? Ideally, we should also consider 
> the per-field stop filters configured on the fields used for clustering.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to