[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848358#comment-15848358 ] Markus Jelsma commented on NUTCH-2349: -- Thanks! > urlnormalizer-basic NPE for ill-formed URL "http:/" > --- > > Key: NUTCH-2349 > URL: https://issues.apache.org/jira/browse/NUTCH-2349 > Project: Nutch > Issue Type: Bug >Affects Versions: 2.4, 1.13 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel > Fix For: 2.4, 1.13 > > > NUTCH-2337 introduced a potential (though rare) NullPointerException when an > ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" > or even more slashes): > {noformat} > % echo "http:/"; \ > | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ > -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Checking URLNormalizer > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120) > at > org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72) > at > org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present
[ https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848339#comment-15848339 ] Markus Jelsma commented on NUTCH-2355: -- Hello Sebastian, # right now we can only pass the Cookie via metadata injection and transferral; # i really have no idea how it behaves internally, the cookie policy manual is not very clear. I just have it add a Cookie header via the value. It does works for both protocol plugins. > Protocol plugins to set cookie if Cookie metadata field is present > -- > > Key: NUTCH-2355 > URL: https://issues.apache.org/jira/browse/NUTCH-2355 > Project: Nutch > Issue Type: Improvement >Affects Versions: 1.12 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.13 > > Attachments: NUTCH-2355.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848282#comment-15848282 ] Hudson commented on NUTCH-2349: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3405 (See [https://builds.apache.org/job/Nutch-trunk/3405/]) NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check (snagel: rev 1a718e0cc9a0c381e40f4bf8351e26f73522) * (edit) src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java * (edit) src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java > urlnormalizer-basic NPE for ill-formed URL "http:/" > --- > > Key: NUTCH-2349 > URL: https://issues.apache.org/jira/browse/NUTCH-2349 > Project: Nutch > Issue Type: Bug >Affects Versions: 2.4, 1.13 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel > Fix For: 2.4, 1.13 > > > NUTCH-2337 introduced a potential (though rare) NullPointerException when an > ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" > or even more slashes): > {noformat} > % echo "http:/"; \ > | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ > -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Checking URLNormalizer > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120) > at > org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72) > at > org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present
[ https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848277#comment-15848277 ] Sebastian Nagel commented on NUTCH-2355: Hi Markus, useful for sure, e.g., if a server uses cookies for authentication or to deliver customized content. Two questions: # the "Cookie" metadata field of the crawl datum is set during injection (and optionally transferred to outlinks). Or is there another way to use it? # protocol-httpclient handles cookies internally. Does the metadata cookie overwrite internal cookies? > Protocol plugins to set cookie if Cookie metadata field is present > -- > > Key: NUTCH-2355 > URL: https://issues.apache.org/jira/browse/NUTCH-2355 > Project: Nutch > Issue Type: Improvement >Affects Versions: 1.12 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.13 > > Attachments: NUTCH-2355.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848273#comment-15848273 ] Hudson commented on NUTCH-2349: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1581 (See [https://builds.apache.org/job/Nutch-nutchgora/1581/]) NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority - check (snagel: rev 700857d16c9e1517ddb9868ed41171d91e5c9116) * (edit) src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java * (edit) src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java > urlnormalizer-basic NPE for ill-formed URL "http:/" > --- > > Key: NUTCH-2349 > URL: https://issues.apache.org/jira/browse/NUTCH-2349 > Project: Nutch > Issue Type: Bug >Affects Versions: 2.4, 1.13 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel > Fix For: 2.4, 1.13 > > > NUTCH-2337 introduced a potential (though rare) NullPointerException when an > ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" > or even more slashes): > {noformat} > % echo "http:/"; \ > | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ > -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Checking URLNormalizer > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120) > at > org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72) > at > org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2347) Use Logger Instead of Printing Throwable
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848272#comment-15848272 ] Hudson commented on NUTCH-2347: --- SUCCESS: Integrated in Jenkins build Nutch-nutchgora #1581 (See [https://builds.apache.org/job/Nutch-nutchgora/1581/]) NUTCH-2347 Logger is used instead of printing Throwable. (kamaci: rev 8dbf8083aa63fbd881c18fc8824981b4c84c9c02) * (edit) src/java/org/apache/nutch/protocol/RobotRulesParser.java * (edit) src/java/org/apache/nutch/parse/NutchSitemapParser.java * (edit) src/java/org/apache/nutch/util/URLUtil.java * (edit) src/java/org/apache/nutch/crawl/WebTableReader.java * (edit) src/java/org/apache/nutch/host/HostDbReader.java * (edit) src/java/org/apache/nutch/tools/DmozParser.java * (edit) src/java/org/apache/nutch/util/GenericWritableConfigurable.java * (edit) src/java/org/apache/nutch/parse/ParseUtil.java * (edit) src/java/org/apache/nutch/util/NutchTool.java > Use Logger Instead of Printing Throwable > > > Key: NUTCH-2347 > URL: https://issues.apache.org/jira/browse/NUTCH-2347 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > Loggers should be used instead of printing Throwable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (NUTCH-2347) Use Logger Instead of Printing Throwable
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2347. Resolution: Fixed Merged into 2.x, thanks [~kamaci]! > Use Logger Instead of Printing Throwable > > > Key: NUTCH-2347 > URL: https://issues.apache.org/jira/browse/NUTCH-2347 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > Loggers should be used instead of printing Throwable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2347) Use Logger Instead of Printing Throwable
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848255#comment-15848255 ] ASF GitHub Bot commented on NUTCH-2347: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/173 > Use Logger Instead of Printing Throwable > > > Key: NUTCH-2347 > URL: https://issues.apache.org/jira/browse/NUTCH-2347 > Project: Nutch > Issue Type: Improvement >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 2.4 > > > Loggers should be used instead of printing Throwable. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] nutch pull request #173: NUTCH-2347 Logger is used instead of printing Throw...
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/173 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2345. Resolution: Duplicate Thanks [~Mgupta]! The fix is included in NUTCH-2352. > FetchItemQueue logs are logged with wrong class name > > > Key: NUTCH-2345 > URL: https://issues.apache.org/jira/browse/NUTCH-2345 > Project: Nutch > Issue Type: Bug > Components: fetcher >Affects Versions: 1.11, 1.12 > Environment: Any >Reporter: Monika Gupta >Assignee: Furkan KAMACI >Priority: Minor > Fix For: 1.13 > > > I ran bin/nutch fetch and notice that the log statements of class > FetchItemQueue.java are logged in logs/hadoop.log with wrong file name as > FetchItemQueues.java > Refer the execution log: > 2017-01-06 15:31:25,562 INFO fetcher.FetchItemQueues - maxThreads= 1 > 2017-01-06 15:31:28,565 INFO fetcher.FetchItemQueues - inProgress= 0 > Issue is in the logger for class FetchItemQueue.java. > Currently it is- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueues.class); > Correction: It should be- > private static final Logger LOG = > LoggerFactory.getLogger(FetchItemQueue.class); -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2349. Resolution: Fixed Assignee: Sebastian Nagel Committed to 1.x and 2.x. > urlnormalizer-basic NPE for ill-formed URL "http:/" > --- > > Key: NUTCH-2349 > URL: https://issues.apache.org/jira/browse/NUTCH-2349 > Project: Nutch > Issue Type: Bug >Affects Versions: 2.4, 1.13 >Reporter: Sebastian Nagel >Assignee: Sebastian Nagel > Fix For: 2.4, 1.13 > > > NUTCH-2337 introduced a potential (though rare) NullPointerException when an > ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" > or even more slashes): > {noformat} > % echo "http:/"; \ > | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ > -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Checking URLNormalizer > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120) > at > org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72) > at > org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848236#comment-15848236 ] ASF GitHub Bot commented on NUTCH-2349: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/169 > urlnormalizer-basic NPE for ill-formed URL "http:/" > --- > > Key: NUTCH-2349 > URL: https://issues.apache.org/jira/browse/NUTCH-2349 > Project: Nutch > Issue Type: Bug >Affects Versions: 2.4, 1.13 >Reporter: Sebastian Nagel > Fix For: 2.4, 1.13 > > > NUTCH-2337 introduced a potential (though rare) NullPointerException when an > ill-formed URL (just the protocol followed by "{{:}}", "{{:/}}", "{{:}}" > or even more slashes): > {noformat} > % echo "http:/"; \ > | runtime/local/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ > -normalizer org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Checking URLNormalizer > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer > Exception in thread "main" java.lang.NullPointerException > at > org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:120) > at > org.apache.nutch.net.URLNormalizerChecker.checkOne(URLNormalizerChecker.java:72) > at > org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] nutch pull request #169: NUTCH-2349 urlnormalizer-basic: NPE for URLs withou...
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/169 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---