[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690779#comment-17690779
 ] 

Hudson commented on NUTCH-2980:
---

SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #94 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/94/])
NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit (snagel: 
[https://github.com/apache/nutch/commit/383aeca5d30342b29b6ee6e05f8f3052c62d7303])
* (edit) src/plugin/lib-htmlunit/ivy.xml
* (edit) src/plugin/lib-selenium/plugin.xml
* (edit) README.md
* (edit) 
src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java
* (edit) 
src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java
* (edit) src/plugin/lib-htmlunit/plugin.xml
* (edit) src/plugin/lib-selenium/ivy.xml


> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690763#comment-17690763
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

sebastian-nagel commented on PR #753:
URL: https://github.com/apache/nutch/pull/753#issuecomment-1435694153

   Finally, I was able to successfully test it - the reason was that on recent 
Ubuntu systems Firefox and Chromium are installed as snap packages. This adds 
extra sandboxing and requires that `TMPDIR` points to a folder the snap 
packages are allowed to write to (they cannot write to the default `/tmp/`).
   
   Thanks, @KamilMroczek ! 




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-02-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690764#comment-17690764
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

sebastian-nagel merged PR #753:
URL: https://github.com/apache/nutch/pull/753




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679448#comment-17679448
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

KamilMroczek commented on PR #753:
URL: https://github.com/apache/nutch/pull/753#issuecomment-1399257710

   Ok. I was able to run in local mode on mac with firefox & chrome. And also 
on AWS Linux with Chrome.




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679442#comment-17679442
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

sebastian-nagel commented on PR #753:
URL: https://github.com/apache/nutch/pull/753#issuecomment-1399245784

   Hi @KamilMroczek, indeed - keeping the licenses up-to-date is a difficult 
task. See also NUTCH-2290 and NUTCH-2981. 
   
   Unfortunately, so far I wasn't able to successfully test protocol-selenium 
and this PR. But this is on my side. Both Chrome and Firefox (recent browser 
and driver versions) show up (in headful mode) but for some reason the driver 
than times out with obscure error messages. It's reproducible from Python, so 
it's something with my system (maybe because I recently switched to use Wayland 
instead of X11). Will try it on a different system...




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679189#comment-17679189
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

KamilMroczek commented on PR #753:
URL: https://github.com/apache/nutch/pull/753#issuecomment-1398516536

   Yeah the verifying of the licenses was a bit of a pain. I found some tools 
to help with finding licenses for a batch of libraries but none of them (which 
I could find) supported our format.




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2

2023-01-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678993#comment-17678993
 ] 

ASF GitHub Bot commented on NUTCH-2980:
---

KamilMroczek opened a new pull request, #753:
URL: https://github.com/apache/nutch/pull/753

   - Disabled phantomJS driver as it was causing problems casting 
TakeScreenshot to HtmlUnitWebDriver and the project has been archived since 2018
   - Improved README setup instructions for IntelliJ
   
   The following libraries were added as part of the selenium-java and htmlunit 
upgrades. They are all Apache 2.0, MIT or EDL.
   
   async-http-client
   async-http-client-netty-utils
   auto-common
   auto-service
   auto-service-annotations
   checker-qual
   dec
   failsafe
   failureaccess
   htmlunit-xpath
   jakarta.activation
   jcommander
   jtoml
   listenablefuture
   netty-buffer
   netty-codec
   netty-codec-http
   netty-codec-socks
   netty-common
   netty-handler
   netty-handler-proxy
   netty-reactive-streams
   netty-resolver
   netty-transport
   netty-transport-classes-epoll
   netty-transport-classes-kqueue
   netty-transport-native-epoll
   netty-transport-native-kqueue
   netty-transport-native-unix-common
   opentelemetry-api
   opentelemetry-api-logs
   opentelemetry-context
   opentelemetry-exporter-common
   opentelemetry-exporter-logging
   opentelemetry-sdk
   opentelemetry-sdk-common
   opentelemetry-sdk-extension-autoconfigure
   opentelemetry-sdk-extension-autoconfigure-spi
   opentelemetry-sdk-logs
   opentelemetry-sdk-metrics
   opentelemetry-sdk-trace
   opentelemetry-semconv
   reactive-streams
   salvation2
   selenium-chromium-driver
   selenium-devtools-v106
   selenium-devtools-v107
   selenium-devtools-v108
   selenium-devtools-v85
   selenium-http
   selenium-json
   selenium-manager




> Upgrade Selenium Java to 4.7.2
> --
>
> Key: NUTCH-2980
> URL: https://issues.apache.org/jira/browse/NUTCH-2980
> Project: Nutch
>  Issue Type: Improvement
>  Components: plugin, protocol
>Affects Versions: 1.19
>Reporter: Kamil Mroczek
>Priority: Major
> Fix For: 1.20
>
>
> Selenium version is quite old and had some issues processing a website. Once 
> I switched to the latest version I was able to scrape that websites. Good to 
> keep it up to date since we were already 1 major release behind.
> Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from 
> 2.35.1 to 4.7.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)