Re: [PR] WIP StatsD metrics example [nutch]

2024-03-14 Thread via GitHub


lewismc commented on PR #712:
URL: https://github.com/apache/nutch/pull/712#issuecomment-1998875276

   Closing this PR out. StatsD is widely used but open source Java SDK’s/agents 
are few and far between.
   When I get around to properly instrumenting Nutch I will probably suggest 
that we use [Apache SkyWalking](https://skywalking.apache.org/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP StatsD metrics example [nutch]

2024-03-14 Thread via GitHub


lewismc closed pull request #712: WIP StatsD metrics example
URL: https://github.com/apache/nutch/pull/712


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827307#comment-17827307
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc closed pull request #807: NUTCH-3036 Upgrade 
org.seleniumhq.selenium:selenium-java dependency i…
URL: https://github.com/apache/nutch/pull/807




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827308#comment-17827308
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc opened a new pull request, #807:
URL: https://github.com/apache/nutch/pull/807

   WIP for https://issues.apache.org/jira/browse/NUTCH-3036. Testing ongoing. 
I’ll also check for additional deprecation notices in the build log.
   I’m testing this on MacBook Pro Apple M1 Pro Sonora 14.4.




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… [nutch]

2024-03-14 Thread via GitHub


lewismc closed pull request #807: NUTCH-3036 Upgrade 
org.seleniumhq.selenium:selenium-java dependency i…
URL: https://github.com/apache/nutch/pull/807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… [nutch]

2024-03-14 Thread via GitHub


lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998718730

   There are some tangential proposed changes (such as improvements to logging) 
to this PR but they concern the relevant Class files. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827306#comment-17827306
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998718730

   There are some tangential proposed changes (such as improvements to logging) 
to this PR but they concern the relevant Class files. 




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827305#comment-17827305
 ] 

ASF GitHub Bot commented on NUTCH-3035:
---

lewismc commented on PR #808:
URL: https://github.com/apache/nutch/pull/808#issuecomment-1998717443

   Hi @sebastian-nagel did you perform this task manually?




> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work stopped] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3036 stopped by Lewis John McGibbney.
---
> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3035 Update license and notice file for release of 1.20 [nutch]

2024-03-14 Thread via GitHub


lewismc commented on PR #808:
URL: https://github.com/apache/nutch/pull/808#issuecomment-1998717443

   Hi @sebastian-nagel did you perform this task manually?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827304#comment-17827304
 ] 

ASF GitHub Bot commented on NUTCH-3035:
---

sebastian-nagel opened a new pull request, #808:
URL: https://github.com/apache/nutch/pull/808

   Update the license and notice files of dependencies  included as binary jar 
files in the binary release.




> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827303#comment-17827303
 ] 

ASF GitHub Bot commented on NUTCH-3035:
---

lewismc closed pull request #808: NUTCH-3035 Update license and notice file for 
release of 1.20
URL: https://github.com/apache/nutch/pull/808




> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827302#comment-17827302
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998714969

   [Further guidance on browser compatibility/supported 
platforms](https://firefox-source-docs.mozilla.org/testing/geckodriver/Support.html)
   
   Along the way I discovered that **_full screenshots_** ar now handled 
differently so we need to rethink how to do this. For example, the 
[FirefoxDriver has a pretty elegant way of doing 
this](https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/firefox/FirefoxDriver.html#getFullPageScreenshotAs(org.openqa.selenium.OutputType))
 but it is different on other browsers.
   For the time being each browser can take a screenshot of the view 
window/partial webpage. This is satisfactory but there is room for improvement.




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3035 Update license and notice file for release of 1.20 [nutch]

2024-03-14 Thread via GitHub


lewismc closed pull request #808: NUTCH-3035 Update license and notice file for 
release of 1.20
URL: https://github.com/apache/nutch/pull/808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… [nutch]

2024-03-14 Thread via GitHub


lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998714969

   [Further guidance on browser compatibility/supported 
platforms](https://firefox-source-docs.mozilla.org/testing/geckodriver/Support.html)
   
   Along the way I discovered that **_full screenshots_** ar now handled 
differently so we need to rethink how to do this. For example, the 
[FirefoxDriver has a pretty elegant way of doing 
this](https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/firefox/FirefoxDriver.html#getFullPageScreenshotAs(org.openqa.selenium.OutputType))
 but it is different on other browsers.
   For the time being each browser can take a screenshot of the view 
window/partial webpage. This is satisfactory but there is room for improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827301#comment-17827301
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998711992

   PR ready or review. Tested on
   * MacBook Pro
   * Apple M1 Pro
   * Sonora 14.4 
   * Firefox 115.X (compatible with current version of Selenium)




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… [nutch]

2024-03-14 Thread via GitHub


lewismc commented on PR #807:
URL: https://github.com/apache/nutch/pull/807#issuecomment-1998711992

   PR ready or review. Tested on
   * MacBook Pro
   * Apple M1 Pro
   * Sonora 14.4 
   * Firefox 115.X (compatible with current version of Selenium)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-14 Thread Joe Gilvary (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Gilvary updated NUTCH-3032:
---
Patch Info: Patch Available

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-14 Thread Joe Gilvary (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825873#comment-17825873
 ] 

Joe Gilvary edited comment on NUTCH-3032 at 3/14/24 11:05 PM:
--

-Done!-

Updated the patch file 2024-03-14 because it had an extraneous file from the 
tests that wasn't actually used in the tests I included.


was (Author: JIRAUSER304553):
Done!

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-14 Thread Joe Gilvary (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Gilvary updated NUTCH-3032:
---
Attachment: NUTCH-3032.patch

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3032) Indexing plugin as an adapter for end user's own POJO instances

2024-03-14 Thread Joe Gilvary (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Gilvary updated NUTCH-3032:
---
Attachment: (was: NUTCH-3032.patch)

> Indexing plugin as an adapter for end user's own POJO instances
> ---
>
> Key: NUTCH-3032
> URL: https://issues.apache.org/jira/browse/NUTCH-3032
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Reporter: Joe Gilvary
>Priority: Major
>  Labels: indexing
> Attachments: NUTCH-3032.patch
>
>
> It could be helpful to let end users manipulate information at indexing time 
> with their own code without the need for writing their own indexing plugin. I 
> mentioned this on the dev mailing list 
> (https://www.mail-archive.com/dev@nutch.apache.org/msg31190.html) with some 
> description of my work in progress.
> One potential use is to address some of the same concerns that NUTCH-585 
> discusses regarding an alternative approach to picking and choosing which 
> content to index, but this approach would allow making index time decisions, 
> rather than setting the configuration for all content at the start of the 
> indexing run.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3035) Update license and notice file for release of 1.20

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827223#comment-17827223
 ] 

ASF GitHub Bot commented on NUTCH-3035:
---

sebastian-nagel opened a new pull request, #808:
URL: https://github.com/apache/nutch/pull/808

   Update the license and notice files of dependencies  included as binary jar 
files in the binary release.




> Update license and notice file for release of 1.20 
> ---
>
> Key: NUTCH-3035
> URL: https://issues.apache.org/jira/browse/NUTCH-3035
> Project: Nutch
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.20
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Close to the release of 1.20 the license and notice files should be updated 
> to contain all (third-party) licenses of all dependencies. Cf. NUTCH-2290 and 
> NUTCH-2981.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] NUTCH-3035 Update license and notice file for release of 1.20 [nutch]

2024-03-14 Thread via GitHub


sebastian-nagel opened a new pull request, #808:
URL: https://github.com/apache/nutch/pull/808

   Update the license and notice files of dependencies  included as binary jar 
files in the binary release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827208#comment-17827208
 ] 

ASF GitHub Bot commented on NUTCH-3036:
---

lewismc opened a new pull request, #807:
URL: https://github.com/apache/nutch/pull/807

   WIP for https://issues.apache.org/jira/browse/NUTCH-3036. Testing ongoing. 
I’ll also check for additional deprecation notices in the build log.
   I’m testing this on MacBook Pro Apple M1 Pro Sonora 14.4.




> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-3036 started by Lewis John McGibbney.
---
> Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium
> 
>
> Key: NUTCH-3036
> URL: https://issues.apache.org/jira/browse/NUTCH-3036
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, selenium
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
> Fix For: 1.20
>
>
> lib-selenium currently packages org.seleniumhq.selenium:selenium-java 
> *v4.7.2* but *v4.18.1* is available on Maven Central.
> This ticket will upgrade the java dependency and validate that both 
> protocol-selenium and protocol-interactiveselenium work as expected in local 
> mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NUTCH-3036) Upgrade org.seleniumhq.selenium:selenium-java dependency in lib-selenium

2024-03-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created NUTCH-3036:
---

 Summary: Upgrade org.seleniumhq.selenium:selenium-java dependency 
in lib-selenium
 Key: NUTCH-3036
 URL: https://issues.apache.org/jira/browse/NUTCH-3036
 Project: Nutch
  Issue Type: Improvement
  Components: selenium, plugin
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.20


lib-selenium currently packages org.seleniumhq.selenium:selenium-java *v4.7.2* 
but *v4.18.1* is available on Maven Central.

This ticket will upgrade the java dependency and validate that both 
protocol-selenium and protocol-interactiveselenium work as expected in local 
mode and via selenium grid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-14 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827093#comment-17827093
 ] 

Hudson commented on NUTCH-3008:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #154 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/154/])
NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues 
(snagel: 
[https://github.com/apache/nutch/commit/367988dfd63751e05e10c93c4c32bd9f7c47b634])
* (edit) 
src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java
* (edit) src/plugin/indexer-elastic/howto_upgrade_es.md
* (edit) src/plugin/indexer-elastic/plugin.xml
* (edit) src/plugin/indexer-elastic/ivy.xml


> indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
> ---
>
> Key: NUTCH-3008
> URL: https://issues.apache.org/jira/browse/NUTCH-3008
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Downgrade to ES 7.10.2 (licensed under ASF 2.0) as an alternative solution to 
> address the licensing issues of the indexer-elastic plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2960.

Resolution: Won't Fix

The license issue is addressed by NUTCH-3008.

> indexer-elastic: remove plugin from binary package to address licensing issues
> --
>
> Key: NUTCH-2960
> URL: https://issues.apache.org/jira/browse/NUTCH-2960
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
>
> The license of Elasticsearch has changed with v7.11.0 and upwards and is (if 
> correct) not more compatible with the Apache license. Accordingly, we should 
> not further ship Elastic jars with the binary package.
> It should be possible to keep the indexer-elastic plugin in the source 
> package as an [optional|https://www.apache.org/legal/resolved.html#optional] 
> dependency (indexer-solr is the default indexing backend and more are 
> available).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel closed NUTCH-2960.
--

> indexer-elastic: remove plugin from binary package to address licensing issues
> --
>
> Key: NUTCH-2960
> URL: https://issues.apache.org/jira/browse/NUTCH-2960
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
>
> The license of Elasticsearch has changed with v7.11.0 and upwards and is (if 
> correct) not more compatible with the Apache license. Accordingly, we should 
> not further ship Elastic jars with the binary package.
> It should be possible to keep the indexer-elastic plugin in the source 
> package as an [optional|https://www.apache.org/legal/resolved.html#optional] 
> dependency (indexer-solr is the default indexing backend and more are 
> available).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-2960) indexer-elastic: remove plugin from binary package to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2960:
---
Fix Version/s: (was: 1.20)

> indexer-elastic: remove plugin from binary package to address licensing issues
> --
>
> Key: NUTCH-2960
> URL: https://issues.apache.org/jira/browse/NUTCH-2960
> Project: Nutch
>  Issue Type: Bug
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
>
> The license of Elasticsearch has changed with v7.11.0 and upwards and is (if 
> correct) not more compatible with the Apache license. Accordingly, we should 
> not further ship Elastic jars with the binary package.
> It should be possible to keep the indexer-elastic plugin in the source 
> package as an [optional|https://www.apache.org/legal/resolved.html#optional] 
> dependency (indexer-solr is the default indexing backend and more are 
> available).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-3008.

Resolution: Fixed

> indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
> ---
>
> Key: NUTCH-3008
> URL: https://issues.apache.org/jira/browse/NUTCH-3008
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Downgrade to ES 7.10.2 (licensed under ASF 2.0) as an alternative solution to 
> address the licensing issues of the indexer-elastic plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3008) indexer-elastic: downgrade to ES 7.10.2 to address licensing issues

2024-03-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827079#comment-17827079
 ] 

ASF GitHub Bot commented on NUTCH-3008:
---

sebastian-nagel merged PR #806:
URL: https://github.com/apache/nutch/pull/806




> indexer-elastic: downgrade to ES 7.10.2 to address licensing issues
> ---
>
> Key: NUTCH-3008
> URL: https://issues.apache.org/jira/browse/NUTCH-3008
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.19
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.20
>
>
> Downgrade to ES 7.10.2 (licensed under ASF 2.0) as an alternative solution to 
> address the licensing issues of the indexer-elastic plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues [nutch]

2024-03-14 Thread via GitHub


sebastian-nagel merged PR #806:
URL: https://github.com/apache/nutch/pull/806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is back to normal : Nutch » Nutch-trunk #153

2024-03-14 Thread Apache Jenkins Server
See 




[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827060#comment-17827060
 ] 

Hudson commented on NUTCH-3029:
---

SUCCESS: Integrated in Jenkins build Nutch » Nutch-trunk #153 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/153/])
NUTCH-3029 (markus: 
[https://github.com/apache/nutch/commit/98902236d782615ea1b8676a477bfa735499810a])
* (edit) src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java


> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.20
>
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Markus Jelsma (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827048#comment-17827048
 ] 

Markus Jelsma commented on NUTCH-3029:
--

comment describing throws is also required these days.

   a8ec17ca8..98902236d  master -> master

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.20
>
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-3029.

Resolution: Implemented

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.20
>
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel closed NUTCH-3029.
--

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.20
>
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reopened NUTCH-3029:

  Assignee: Sebastian Nagel  (was: Markus Jelsma)

Reopen to update "Fix version(s)" - add 1.20, to make it appear in the release 
notes.

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NUTCH-3029) Host specific max. and min. intervals in adaptive scheduler

2024-03-14 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-3029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-3029:
---
Fix Version/s: 1.20

> Host specific max. and min. intervals in adaptive scheduler
> ---
>
> Key: NUTCH-3029
> URL: https://issues.apache.org/jira/browse/NUTCH-3029
> Project: Nutch
>  Issue Type: New Feature
>Affects Versions: 1.19, 1.20
>Reporter: Martin Djukanovic
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.20
>
> Attachments: adaptive-host-specific-intervals.txt.template, 
> new_adaptive_fetch_schedule-1.patch
>
>
> This patch implements custom max. and min. refetching intervals for specific 
> hosts, in the AdaptiveFetchSchedule class. The intervals are set up in a .txt 
> configuration file (template also attached).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)