[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848358#comment-15848358
 ] 

Markus Jelsma commented on NUTCH-2349:
--

Thanks!

> urlnormalizer-basic NPE for ill-formed URL "http:/"
> ---
>
> Key: NUTCH-2349
> URL: https://issues.apache.org/jira/browse/NUTCH-2349
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 2.4, 1.13
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 2.4, 1.13
>
>
> NUTCH-2337 introduced a potential (though rare) NullPointerException when an 
> ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" 
> or even more slashes):
> {noformat}
> % echo "http:/"; \
>   | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
>  -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
> Checking URLNormalizer 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
> at 
> org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
> at 
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present

2017-02-01 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848339#comment-15848339
 ] 

Markus Jelsma commented on NUTCH-2355:
--

Hello Sebastian,

# right now we can only pass the Cookie via metadata injection and transferral;
# i really have no idea how it behaves internally, the cookie policy manual is 
not very clear. I just have it add a Cookie header via the value. It does works 
for both protocol plugins.



> Protocol plugins to set cookie if Cookie metadata field is present
> --
>
> Key: NUTCH-2355
> URL: https://issues.apache.org/jira/browse/NUTCH-2355
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.12
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2355.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848282#comment-15848282
 ] 

Hudson commented on NUTCH-2349:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3405 (See 
[https://builds.apache.org/job/Nutch-trunk/3405/])
NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check (snagel: 
rev 1a718e0cc9a0c381e40f4bf8351e26f73522)
* (edit) 
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java


> urlnormalizer-basic NPE for ill-formed URL "http:/"
> ---
>
> Key: NUTCH-2349
> URL: https://issues.apache.org/jira/browse/NUTCH-2349
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 2.4, 1.13
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 2.4, 1.13
>
>
> NUTCH-2337 introduced a potential (though rare) NullPointerException when an 
> ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" 
> or even more slashes):
> {noformat}
> % echo "http:/"; \
>   | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
>  -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
> Checking URLNormalizer 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
> at 
> org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
> at 
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present

2017-02-01 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848277#comment-15848277
 ] 

Sebastian Nagel commented on NUTCH-2355:


Hi Markus,
useful for sure, e.g., if a server uses cookies for authentication or to 
deliver customized content. Two questions:
# the "Cookie" metadata field of the crawl datum is set during injection (and 
optionally transferred to outlinks). Or is there another way to use it?
# protocol-httpclient handles cookies internally. Does the metadata cookie 
overwrite internal cookies?

> Protocol plugins to set cookie if Cookie metadata field is present
> --
>
> Key: NUTCH-2355
> URL: https://issues.apache.org/jira/browse/NUTCH-2355
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.12
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-2355.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848273#comment-15848273
 ] 

Hudson commented on NUTCH-2349:
---

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1581 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1581/])
NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check (snagel: 
rev 700857d16c9e1517ddb9868ed41171d91e5c9116)
* (edit) 
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java


> urlnormalizer-basic NPE for ill-formed URL "http:/"
> ---
>
> Key: NUTCH-2349
> URL: https://issues.apache.org/jira/browse/NUTCH-2349
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 2.4, 1.13
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 2.4, 1.13
>
>
> NUTCH-2337 introduced a potential (though rare) NullPointerException when an 
> ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" 
> or even more slashes):
> {noformat}
> % echo "http:/"; \
>   | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
>  -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
> Checking URLNormalizer 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
> at 
> org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
> at 
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848272#comment-15848272
 ] 

Hudson commented on NUTCH-2347:
---

SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1581 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1581/])
NUTCH-2347 Logger is used instead of printing Throwable. (kamaci: rev 
8dbf8083aa63fbd881c18fc8824981b4c84c9c02)
* (edit) src/java/org/apache/nutch/protocol/RobotRulesParser.java
* (edit) src/java/org/apache/nutch/parse/NutchSitemapParser.java
* (edit) src/java/org/apache/nutch/util/URLUtil.java
* (edit) src/java/org/apache/nutch/crawl/WebTableReader.java
* (edit) src/java/org/apache/nutch/host/HostDbReader.java
* (edit) src/java/org/apache/nutch/tools/DmozParser.java
* (edit) src/java/org/apache/nutch/util/GenericWritableConfigurable.java
* (edit) src/java/org/apache/nutch/parse/ParseUtil.java
* (edit) src/java/org/apache/nutch/util/NutchTool.java


> Use Logger Instead of Printing Throwable
> 
>
> Key: NUTCH-2347
> URL: https://issues.apache.org/jira/browse/NUTCH-2347
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> Loggers should be used instead of printing Throwable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2347.

Resolution: Fixed

Merged into 2.x, thanks [~kamaci]!

> Use Logger Instead of Printing Throwable
> 
>
> Key: NUTCH-2347
> URL: https://issues.apache.org/jira/browse/NUTCH-2347
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> Loggers should be used instead of printing Throwable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848255#comment-15848255
 ] 

ASF GitHub Bot commented on NUTCH-2347:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/173


> Use Logger Instead of Printing Throwable
> 
>
> Key: NUTCH-2347
> URL: https://issues.apache.org/jira/browse/NUTCH-2347
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 2.4
>
>
> Loggers should be used instead of printing Throwable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] nutch pull request #173: NUTCH-2347 Logger is used instead of printing Throw...

2017-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/173


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-02-01 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2345.

Resolution: Duplicate

Thanks [~Mgupta]! The fix is included in NUTCH-2352.

> FetchItemQueue logs are logged with wrong class name
> 
>
> Key: NUTCH-2345
> URL: https://issues.apache.org/jira/browse/NUTCH-2345
> Project: Nutch
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 1.11, 1.12
> Environment: Any
>Reporter: Monika Gupta
>Assignee: Furkan KAMACI
>Priority: Minor
> Fix For: 1.13
>
>
> I ran bin/nutch fetch and notice that the log statements of class 
> FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as 
> FetchItemQueues.java
> Refer the execution log:
> 2017-01-06 15:31:25,562 INFO  fetcher.FetchItemQueues -   maxThreads= 1
> 2017-01-06 15:31:28,565 INFO  fetcher.FetchItemQueues -   inProgress= 0
> Issue is in the logger for class FetchItemQueue.java. 
> Currently it is-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueues.class);
> Correction: It should be-
> private static final Logger LOG = 
> LoggerFactory.getLogger(FetchItemQueue.class);



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2349.

Resolution: Fixed
  Assignee: Sebastian Nagel

Committed to 1.x and 2.x.

> urlnormalizer-basic NPE for ill-formed URL "http:/"
> ---
>
> Key: NUTCH-2349
> URL: https://issues.apache.org/jira/browse/NUTCH-2349
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 2.4, 1.13
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
> Fix For: 2.4, 1.13
>
>
> NUTCH-2337 introduced a potential (though rare) NullPointerException when an 
> ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" 
> or even more slashes):
> {noformat}
> % echo "http:/"; \
>   | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
>  -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
> Checking URLNormalizer 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
> at 
> org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
> at 
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848236#comment-15848236
 ] 

ASF GitHub Bot commented on NUTCH-2349:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/169


> urlnormalizer-basic NPE for ill-formed URL "http:/"
> ---
>
> Key: NUTCH-2349
> URL: https://issues.apache.org/jira/browse/NUTCH-2349
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 2.4, 1.13
>Reporter: Sebastian Nagel
> Fix For: 2.4, 1.13
>
>
> NUTCH-2337 introduced a potential (though rare) NullPointerException when an 
> ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" 
> or even more slashes):
> {noformat}
> % echo "http:/"; \
>   | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \
>  -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 
> Checking URLNormalizer 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120)
> at 
> org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72)
> at 
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] nutch pull request #169: NUTCH-2349 urlnormalizer-basic: NPE for URLs withou...

2017-02-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/169


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---