[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690779#comment-17690779 ] Hudson commented on NUTCH-2980: --- SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #94 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/94/]) NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit (snagel: [https://github.com/apache/nutch/commit/383aeca5d30342b29b6ee6e05f8f3052c62d7303]) * (edit) src/plugin/lib-htmlunit/ivy.xml * (edit) src/plugin/lib-selenium/plugin.xml * (edit) README.md * (edit) src/plugin/lib-selenium/src/java/org/apache/nutch/protocol/selenium/HttpWebClient.java * (edit) src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/handlers/DefaultClickAllAjaxLinksHandler.java * (edit) src/plugin/lib-htmlunit/plugin.xml * (edit) src/plugin/lib-selenium/ivy.xml > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690763#comment-17690763 ] ASF GitHub Bot commented on NUTCH-2980: --- sebastian-nagel commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1435694153 Finally, I was able to successfully test it - the reason was that on recent Ubuntu systems Firefox and Chromium are installed as snap packages. This adds extra sandboxing and requires that `TMPDIR` points to a folder the snap packages are allowed to write to (they cannot write to the default `/tmp/`). Thanks, @KamilMroczek ! > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690764#comment-17690764 ] ASF GitHub Bot commented on NUTCH-2980: --- sebastian-nagel merged PR #753: URL: https://github.com/apache/nutch/pull/753 > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679448#comment-17679448 ] ASF GitHub Bot commented on NUTCH-2980: --- KamilMroczek commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1399257710 Ok. I was able to run in local mode on mac with firefox & chrome. And also on AWS Linux with Chrome. > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679442#comment-17679442 ] ASF GitHub Bot commented on NUTCH-2980: --- sebastian-nagel commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1399245784 Hi @KamilMroczek, indeed - keeping the licenses up-to-date is a difficult task. See also NUTCH-2290 and NUTCH-2981. Unfortunately, so far I wasn't able to successfully test protocol-selenium and this PR. But this is on my side. Both Chrome and Firefox (recent browser and driver versions) show up (in headful mode) but for some reason the driver than times out with obscure error messages. It's reproducible from Python, so it's something with my system (maybe because I recently switched to use Wayland instead of X11). Will try it on a different system... > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679189#comment-17679189 ] ASF GitHub Bot commented on NUTCH-2980: --- KamilMroczek commented on PR #753: URL: https://github.com/apache/nutch/pull/753#issuecomment-1398516536 Yeah the verifying of the licenses was a bit of a pain. I found some tools to help with finding licenses for a batch of libraries but none of them (which I could find) supported our format. > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NUTCH-2980) Upgrade Selenium Java to 4.7.2
[ https://issues.apache.org/jira/browse/NUTCH-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678993#comment-17678993 ] ASF GitHub Bot commented on NUTCH-2980: --- KamilMroczek opened a new pull request, #753: URL: https://github.com/apache/nutch/pull/753 - Disabled phantomJS driver as it was causing problems casting TakeScreenshot to HtmlUnitWebDriver and the project has been archived since 2018 - Improved README setup instructions for IntelliJ The following libraries were added as part of the selenium-java and htmlunit upgrades. They are all Apache 2.0, MIT or EDL. async-http-client async-http-client-netty-utils auto-common auto-service auto-service-annotations checker-qual dec failsafe failureaccess htmlunit-xpath jakarta.activation jcommander jtoml listenablefuture netty-buffer netty-codec netty-codec-http netty-codec-socks netty-common netty-handler netty-handler-proxy netty-reactive-streams netty-resolver netty-transport netty-transport-classes-epoll netty-transport-classes-kqueue netty-transport-native-epoll netty-transport-native-kqueue netty-transport-native-unix-common opentelemetry-api opentelemetry-api-logs opentelemetry-context opentelemetry-exporter-common opentelemetry-exporter-logging opentelemetry-sdk opentelemetry-sdk-common opentelemetry-sdk-extension-autoconfigure opentelemetry-sdk-extension-autoconfigure-spi opentelemetry-sdk-logs opentelemetry-sdk-metrics opentelemetry-sdk-trace opentelemetry-semconv reactive-streams salvation2 selenium-chromium-driver selenium-devtools-v106 selenium-devtools-v107 selenium-devtools-v108 selenium-devtools-v85 selenium-http selenium-json selenium-manager > Upgrade Selenium Java to 4.7.2 > -- > > Key: NUTCH-2980 > URL: https://issues.apache.org/jira/browse/NUTCH-2980 > Project: Nutch > Issue Type: Improvement > Components: plugin, protocol >Affects Versions: 1.19 >Reporter: Kamil Mroczek >Priority: Major > Fix For: 1.20 > > > Selenium version is quite old and had some issues processing a website. Once > I switched to the latest version I was able to scrape that websites. Good to > keep it up to date since we were already 1 major release behind. > Upgrading Selenium Java from 3.141.59 to 4.7.2 and Selenium HTMLUnit from > 2.35.1 to 4.7.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)