Jenkins build is back to normal : Nutch-trunk #3403

2016-12-13 Thread Apache Jenkins Server
See 



Build failed in Jenkins: Nutch-trunk #3402

2016-12-13 Thread Apache Jenkins Server
See 

Changes:

[snagel] NUTCH-2337 urlnormalizer-basic to strip empty port, closes #160 - make

--
[...truncated 11300 lines...]
  [javadoc] import org.apache.lucene.analysis.en.PorterStemFilter;
  [javadoc] ^
  [javadoc] 
:29:
 error: package org.apache.lucene.analysis.standard does not exist
  [javadoc] import org.apache.lucene.analysis.standard.ClassicTokenizer;
  [javadoc]   ^
  [javadoc] 
:30:
 error: package org.apache.lucene.analysis.standard does not exist
  [javadoc] import org.apache.lucene.analysis.standard.StandardAnalyzer;
  [javadoc]   ^
  [javadoc] 
:31:
 error: package org.apache.lucene.analysis.util does not exist
  [javadoc] import org.apache.lucene.analysis.util.CharArraySet;
  [javadoc]   ^
  [javadoc] 
:37:
 error: cannot find symbol
  [javadoc] public class LuceneAnalyzerUtil extends Analyzer{ 
  [javadoc] ^
  [javadoc]   symbol: class Analyzer
  [javadoc] 
:22:
 error: package org.apache.lucene.analysis does not exist
  [javadoc] import org.apache.lucene.analysis.Tokenizer;
  [javadoc]  ^
  [javadoc] 
:23:
 error: package org.apache.lucene.analysis does not exist
  [javadoc] import org.apache.lucene.analysis.TokenStream;
  [javadoc]  ^
  [javadoc] 
:24:
 error: package org.apache.lucene.analysis.core does not exist
  [javadoc] import org.apache.lucene.analysis.core.LowerCaseFilter;
  [javadoc]   ^
  [javadoc] 
:25:
 error: package org.apache.lucene.analysis.core does not exist
  [javadoc] import org.apache.lucene.analysis.core.StopFilter;
  [javadoc]   ^
  [javadoc] 
:26:
 error: package org.apache.lucene.analysis.en does not exist
  [javadoc] import org.apache.lucene.analysis.en.EnglishMinimalStemFilter;
  [javadoc] ^
  [javadoc] 
:27:
 error: package org.apache.lucene.analysis.en does not exist
  [javadoc] import org.apache.lucene.analysis.en.PorterStemFilter;
  [javadoc] ^
  [javadoc] 
:28:
 error: package org.apache.lucene.analysis.standard does not exist
  [javadoc] import org.apache.lucene.analysis.standard.ClassicTokenizer;
  [javadoc]   ^
  [javadoc] 
:29:
 error: package org.apache.lucene.analysis.standard does not exist
  [javadoc] import org.apache.lucene.analysis.standard.StandardAnalyzer;
  [javadoc]   ^
  [javadoc] 
:30:
 error: package org.apache.lucene.analysis.standard does not exist
  [javadoc] import org.apache.lucene.analysis.standard.StandardTokenizer;
  [javadoc]   ^
  [javadoc] 


[jira] [Commented] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745226#comment-15745226
 ] 

Hudson commented on NUTCH-2337:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3402 (See 
[https://builds.apache.org/job/Nutch-trunk/3402/])
NUTCH-2337 urlnormalizer-basic to strip empty port, closes #160 - make (snagel: 
rev f351790d7f496561aeae5e214d1b33975ca34cf2)
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java


> urlnormalizer-basic to strip empty port
> ---
>
> Key: NUTCH-2337
> URL: https://issues.apache.org/jira/browse/NUTCH-2337
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin
>Affects Versions: 2.3.1, 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.13
>
>
> Basic URL normalizer should strip an empty port from the URL, that's not the 
> case at present:
> {noformat}
> echo "http://example.com:/"; \
>| nutch plugin urlnormalizer-basic 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> http://example.com:/
> {noformat}
> The result should be {{http://example.com/}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745211#comment-15745211
 ] 

Hudson commented on NUTCH-2337:
---

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1576 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1576/])
NUTCH-2337 urlnormalizer-basic to strip empty port - make sure that URLs 
(snagel: rev 6e3c34db16e385b0dadbe6444c2685283c863350)
* (edit) 
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java


> urlnormalizer-basic to strip empty port
> ---
>
> Key: NUTCH-2337
> URL: https://issues.apache.org/jira/browse/NUTCH-2337
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin
>Affects Versions: 2.3.1, 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.13
>
>
> Basic URL normalizer should strip an empty port from the URL, that's not the 
> case at present:
> {noformat}
> echo "http://example.com:/"; \
>| nutch plugin urlnormalizer-basic 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> http://example.com:/
> {noformat}
> The result should be {{http://example.com/}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2337:
---
Affects Version/s: 2.3.1

> urlnormalizer-basic to strip empty port
> ---
>
> Key: NUTCH-2337
> URL: https://issues.apache.org/jira/browse/NUTCH-2337
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin
>Affects Versions: 2.3.1, 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.13
>
>
> Basic URL normalizer should strip an empty port from the URL, that's not the 
> case at present:
> {noformat}
> echo "http://example.com:/"; \
>| nutch plugin urlnormalizer-basic 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> http://example.com:/
> {noformat}
> The result should be {{http://example.com/}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2337.

   Resolution: Fixed
Fix Version/s: 2.4

Committed to trunk f351790 and 2.x 6e3c34d. Thanks!

> urlnormalizer-basic to strip empty port
> ---
>
> Key: NUTCH-2337
> URL: https://issues.apache.org/jira/browse/NUTCH-2337
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 2.4, 1.13
>
>
> Basic URL normalizer should strip an empty port from the URL, that's not the 
> case at present:
> {noformat}
> echo "http://example.com:/"; \
>| nutch plugin urlnormalizer-basic 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> http://example.com:/
> {noformat}
> The result should be {{http://example.com/}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745118#comment-15745118
 ] 

ASF GitHub Bot commented on NUTCH-2337:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/160


> urlnormalizer-basic to strip empty port
> ---
>
> Key: NUTCH-2337
> URL: https://issues.apache.org/jira/browse/NUTCH-2337
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.13
>
>
> Basic URL normalizer should strip an empty port from the URL, that's not the 
> case at present:
> {noformat}
> echo "http://example.com:/"; \
>| nutch plugin urlnormalizer-basic 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> http://example.com:/
> {noformat}
> The result should be {{http://example.com/}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #160: NUTCH-2337 urlnormalizer-basic to strip empty port

2016-12-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/160


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-12-13 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745073#comment-15745073
 ] 

Sebastian Nagel commented on NUTCH-2046:


A statement in change log and release notes that the behavior has changed 
should be sufficient.
On the long term an optional argument is cleaner than a required 
position-dependent argument which can take a magic form.
But it's more important to agree on one solution and get it done finally: my +1

> The crawl script should be able to skip an initial injection.
> -
>
> Key: NUTCH-2046
> URL: https://issues.apache.org/jira/browse/NUTCH-2046
> Project: Nutch
>  Issue Type: Improvement
>  Components: crawldb, injector
>Affects Versions: 1.10
>Reporter: Luis Lopez
>Assignee: Lewis John McGibbney
>  Labels: crawl, injection
> Fix For: 1.13
>
> Attachments: crawl.patch
>
>
> When our crawl gets really big a new injection takes considerable time as it 
> updates crawldb, the crawl script should be able to skip the injection and go 
> directly to the generate call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-12-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744889#comment-15744889
 ] 

ASF GitHub Bot commented on NUTCH-2046:
---

GitHub user jnioche opened a pull request:

https://github.com/apache/nutch/pull/161

Fix for NUTCH-2046 contributed by jnioche

This makes the seed argument optional and is an alternative to the solution 
proposed in [https://issues.apache.org/jira/browse/NUTCH-2046]. The latter is 
also acceptable and has the advantage of not breaking compatibility. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jnioche/nutch NUTCH-2046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/161.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #161


commit 7b0103fe62c9b0e479bb03e7b9575522adcf68b8
Author: Julien Nioche 
Date:   2016-12-13T11:03:08Z

fix for NUTCH-2046 contributed by jnioche




> The crawl script should be able to skip an initial injection.
> -
>
> Key: NUTCH-2046
> URL: https://issues.apache.org/jira/browse/NUTCH-2046
> Project: Nutch
>  Issue Type: Improvement
>  Components: crawldb, injector
>Affects Versions: 1.10
>Reporter: Luis Lopez
>Assignee: Lewis John McGibbney
>  Labels: crawl, injection
> Fix For: 1.13
>
> Attachments: crawl.patch
>
>
> When our crawl gets really big a new injection takes considerable time as it 
> updates crawldb, the crawl script should be able to skip the injection and go 
> directly to the generate call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #161: Fix for NUTCH-2046 contributed by jnioche

2016-12-13 Thread jnioche
GitHub user jnioche opened a pull request:

https://github.com/apache/nutch/pull/161

Fix for NUTCH-2046 contributed by jnioche

This makes the seed argument optional and is an alternative to the solution 
proposed in [https://issues.apache.org/jira/browse/NUTCH-2046]. The latter is 
also acceptable and has the advantage of not breaking compatibility. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jnioche/nutch NUTCH-2046

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/161.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #161


commit 7b0103fe62c9b0e479bb03e7b9575522adcf68b8
Author: Julien Nioche 
Date:   2016-12-13T11:03:08Z

fix for NUTCH-2046 contributed by jnioche




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744738#comment-15744738
 ] 

Sebastian Nagel commented on NUTCH-2338:


Hi Markus,
thanks! See the comments on NUTCH-2320 which apply to this patch as well. Two 
further points:

- With a normalizer given the telnet arguments are ignored and input is read 
from stdin:
{noformat}
% nutch org.apache.nutch.net.URLNormalizerChecker -normalizer 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer -listen 1234 
-keepClientCnxOpen
Checking URLNormalizer 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
{noformat}
(other terminal)
{noformat}
% telnet localhost 1234
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
{noformat}
- If an invalid URL is passed via telnet, no exception is shown and nothing is 
returned. That's probably better than exiting with an error (the behavior when 
URLNormalizerChecker is reading from stdin), but it may make it difficult to 
localize external (network) problems.


> URLNormalizerChecker to run as TCP Telnet service
> -
>
> Key: NUTCH-2338
> URL: https://issues.apache.org/jira/browse/NUTCH-2338
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.12
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.13
>
> Attachments: NUTCH-2338.patch
>
>
> Similar to NUTCH-2320, but then for normalizer checker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744691#comment-15744691
 ] 

Sebastian Nagel commented on NUTCH-2320:


Hi Markus,
generally +1 - the telnet service works and it's good to have it!

Wouldn't it be good to bundle the functionality shared by URLFilterChecker, 
URLNormalizerChecker and IndexingFiltersChecker in a generic checker class and 
let all others inherit from it? It's cleaner to avoid the duplications, and 
would make it easier to port the same functionality to other "checkers" in the 
future.

Also two trivial improvements:
- could use {{StandardCharsets.UTF_8}} instead of {{Charset.forName("UTF-8")}}, 
no need to catch exceptions (IllegalCharsetNameException or 
IllegalArgumentException)
- the code isn't formated via eclipse-codeformat.xml

> URLFilterChecker to run as TCP Telnet service
> -
>
> Key: NUTCH-2320
> URL: https://issues.apache.org/jira/browse/NUTCH-2320
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.12
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2320.patch, NUTCH-2320.patch
>
>
> Allow testing URL filters for webapplications just like indexing filters 
> checker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)