Thanks Markus. Let me submit the upgrade first to get my first commit in
and then go from there. That optimization of reducing the number of HTTP
requests will useful so I will look into that.

On Tue, Jan 17, 2023 at 1:56 PM Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello Kamil,
>
> Yes, the plugin needs some upgrading indeed. We use a modern version of it
> elsewhere and it works really well, at least better than HtmlUnit.
>
> Besides that, the plugin also needs some overhaul. It currently first
> downloads the URL with HttpClient, and then, depending on MIME-type, it may
> or may not forward the URL to Selenium so it can be downloaded again.
>
> There is a lot of code in the plugin that should be removed. I would also
> opt for merging the lib-selenium plugin with the protocol-selenium plugin.
> There is no obvious need for having it separated.
>
> These can be, of course, separate tasks.
>
> Regards,
> Markus
>
> Op di 17 jan. 2023 om 17:49 schreef Kamil Mroczek <kamil@elio.earth>:
>
>> Hello,
>>
>> I am sending a message to inquire whether I should submit a patch which
>> updates selenium to the latest version. Although it is a major version
>> upgrade to the library, very few code changes were needed to update.
>>
>> For a preview of the changes I made you can look here
>> <https://github.com/Elio-Earth/nutch/commit/9960f14bce0f0d6cebc406556a298a7c8c2e6b9f>.
>> Although not used in the code anymore (it was commented out), PhantomJS
>> support has been removed from Selenium in the latest version. The commit
>> also removes Opera since it was commented out but I can leave that in if
>> needed. The build and tests pass. I have been using the Chrome driver
>> successfully with it and would just need to run a quick test with Firefox
>> to make sure it works too.
>>
>> I have only been using Nutch for about a month but have spent quite a bit
>> of time looking over different parts of the code to understand how to
>> configure it and change it.
>>
>> Kamil
>>
>

Reply via email to