[jira] [Commented] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-05-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850039#comment-17850039
 ] 

Hudson commented on NUTCH-3044:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #163 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/163/])
NUTCH-3044 Generator: NPE when extracting the host part of a URL fails (snagel: 
[https://github.com/apache/nutch/commit/4b263533a9cdea208383fdbb0a8cc0b537423d7f])
* (edit) src/java/org/apache/nutch/crawl/Generator.java
NUTCH-3044 Generator: NPE when extracting the host part of a URL fails (snagel: 
[https://github.com/apache/nutch/commit/4729786e4d7f9e1136580ceb191274862d03ba5b])
* (edit) src/test/org/apache/nutch/crawl/TestGenerator.java
NUTCH-3044 Generator: NPE when extracting the host part of a URL fails (snagel: 
[https://github.com/apache/nutch/commit/b153279ad5844b32560ecf62a8e7f83f8ecbd43c])
* (edit) src/java/org/apache/nutch/crawl/Generator.java
* (edit) src/test/org/apache/nutch/crawl/TestGenerator.java


> Generator: NPE when extracting the host part of a URL fails
> ---
>
> Key: NUTCH-3044
> URL: https://issues.apache.org/jira/browse/NUTCH-3044
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.21
>
>
> When extracting the host part of a URL fails, the Generator job fails because 
> of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb 
> contains an malformed URL, for example, a URL with an unsupported scheme 
> (smb://).
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439)
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3055) README: fix Github "hub" commands

2024-05-28 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850040#comment-17850040
 ] 

Hudson commented on NUTCH-3055:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #163 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/163/])
NUTCH-3055 README: fix Github "hub" commands (snagel: 
[https://github.com/apache/nutch/commit/ca03d9b76485b7c9d50dff2c3946bb8189daf5e1])
* (edit) README.md


> README: fix Github "hub" commands
> -
>
> Key: NUTCH-3055
> URL: https://issues.apache.org/jira/browse/NUTCH-3055
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.21
>
>
> The [README.md|https://github.com/apache/nutch/blob/master/README.md] 
> contains [Github hub|https://hub.github.com/] commands but with "git" as 
> command (executable) name, maybe an alias or some other magic. However, if 
> hub isn't installed, these commands fail with {{git: 'pull-request' is not a 
> git command. See 'git --help'.}} or similar.
> We should use the command "hub" instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3055) README: fix Github "hub" commands

2024-05-28 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-3055.

Resolution: Fixed

> README: fix Github "hub" commands
> -
>
> Key: NUTCH-3055
> URL: https://issues.apache.org/jira/browse/NUTCH-3055
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.21
>
>
> The [README.md|https://github.com/apache/nutch/blob/master/README.md] 
> contains [Github hub|https://hub.github.com/] commands but with "git" as 
> command (executable) name, maybe an alias or some other magic. However, if 
> hub isn't installed, these commands fail with {{git: 'pull-request' is not a 
> git command. See 'git --help'.}} or similar.
> We should use the command "hub" instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3055) README: fix Github "hub" commands

2024-05-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850005#comment-17850005
 ] 

ASF GitHub Bot commented on NUTCH-3055:
---

sebastian-nagel merged PR #818:
URL: https://github.com/apache/nutch/pull/818




> README: fix Github "hub" commands
> -
>
> Key: NUTCH-3055
> URL: https://issues.apache.org/jira/browse/NUTCH-3055
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.21
>
>
> The [README.md|https://github.com/apache/nutch/blob/master/README.md] 
> contains [Github hub|https://hub.github.com/] commands but with "git" as 
> command (executable) name, maybe an alias or some other magic. However, if 
> hub isn't installed, these commands fail with {{git: 'pull-request' is not a 
> git command. See 'git --help'.}} or similar.
> We should use the command "hub" instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3055 README: fix Github "hub" commands [nutch]

2024-05-28 Thread via GitHub


sebastian-nagel merged PR #818:
URL: https://github.com/apache/nutch/pull/818


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-05-28 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-3044.

Resolution: Fixed

> Generator: NPE when extracting the host part of a URL fails
> ---
>
> Key: NUTCH-3044
> URL: https://issues.apache.org/jira/browse/NUTCH-3044
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.21
>
>
> When extracting the host part of a URL fails, the Generator job fails because 
> of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb 
> contains an malformed URL, for example, a URL with an unsupported scheme 
> (smb://).
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439)
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3044) Generator: NPE when extracting the host part of a URL fails

2024-05-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850004#comment-17850004
 ] 

ASF GitHub Bot commented on NUTCH-3044:
---

sebastian-nagel merged PR #815:
URL: https://github.com/apache/nutch/pull/815




> Generator: NPE when extracting the host part of a URL fails
> ---
>
> Key: NUTCH-3044
> URL: https://issues.apache.org/jira/browse/NUTCH-3044
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.21
>
>
> When extracting the host part of a URL fails, the Generator job fails because 
> of a NPE in the SelectorReducer. This issue is reproducible if the CrawlDb 
> contains an malformed URL, for example, a URL with an unsupported scheme 
> (smb://).
> {noformat}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:439)
>   at 
> org.apache.nutch.crawl.Generator$SelectorReducer.reduce(Generator.java:300)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3044 Generator: NPE when extracting the host part of a URL fails [nutch]

2024-05-28 Thread via GitHub


sebastian-nagel merged PR #815:
URL: https://github.com/apache/nutch/pull/815


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org