[ https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118689#comment-15118689 ]
Chris A. Mattmann commented on NUTCH-2206: ------------------------------------------ +1 please commit > Provide example scoring.similarity.stopword.file > ------------------------------------------------ > > Key: NUTCH-2206 > URL: https://issues.apache.org/jira/browse/NUTCH-2206 > Project: Nutch > Issue Type: Bug > Components: plugin, scoring > Affects Versions: 1.11 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.12 > > Attachments: NUTCH-2206.patch, NUTCH-2206.patch > > > The scoring-similarity plugin does not provide an example file for the > property scoring.similarity.stopword.file. > This is an issue for a number of reasons, namely > * A user does not know what it is meant to look like, and > * We always check of this file and will [throw an exception if it is not > found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80], > this may not be picked up by the user until much later. > I suggest a simple fix here, simply include the [standard English stop words > taken from Lucene's > StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt]. > The comments will help people to easily customize the list to whatever they > require. -- This message was sent by Atlassian JIRA (v6.3.4#6332)