[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-25 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116491#comment-15116491
 ] 

Lewis John McGibbney commented on NUTCH-2206:
-

CC [~sujenshah]

> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117800#comment-15117800
 ] 

Lewis John McGibbney commented on NUTCH-2206:
-

We should most likely also provide the nutch-default.xml properties for 
reference... otherwise no one know how to use them. If you can update 
nutch-default.xml with them then I am +1 to committing. 

> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Sujen Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117810#comment-15117810
 ] 

Sujen Shah commented on NUTCH-2206:
---

Ohh yes, will do it now, missed it in the patch. 

> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118286#comment-15118286
 ] 

Lewis John McGibbney commented on NUTCH-2206:
-

+1 [~sujenshah], thanks 

> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch, NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-26 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118689#comment-15118689
 ] 

Chris A. Mattmann commented on NUTCH-2206:
--

+1 please commit

> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch, NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2206) Provide example scoring.similarity.stopword.file

2016-01-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119782#comment-15119782
 ] 

Hudson commented on NUTCH-2206:
---

FAILURE: Integrated in Nutch-trunk #3342 (See 
[https://builds.apache.org/job/Nutch-trunk/3342/])
Added missing stopword file for NUTCH-2206 (sujen: 
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1727126])
* trunk/conf/stopwords.txt.template
NUTCH-2206 Provide example scoring.similarity.stopword.file (sujen: 
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1727122])
* trunk/CHANGES.txt
* trunk/conf/nutch-default.xml


> Provide example scoring.similarity.stopword.file
> 
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, scoring
>Affects Versions: 1.11
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch, NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the 
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely 
>  * A user does not know what it is meant to look like, and
>  * We always check of this file and will [throw an exception if it is not 
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
>  this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words 
> taken from Lucene's 
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
>  The comments will help people to easily customize the list to whatever they 
> require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)