> There must be a way, some how, some time. There isn't: https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/141
Op do 19 jan. 2023 om 15:23 schreef Markus Jelsma < markus.jel...@openindex.io>: > > This makes some sense if you do not know anything about the URL. > > - a HEAD request could do almost the same > > - often one knows whether there are only HTML pages or also PDFs, zip > files, > > and other stuff not suitable for Selenium. Could make the HEAD request > > optional. > > Ah crap, i forgot about that. With Selenium, it is still not possible to > get the HTTP headers of the most recent request. And when requesting the > page source, it will either return nothing, or the previous 'successful' > call when requesting a non-text MIME-type URL. > > Besides doing a HEAD request first, there is no neat way to work with > non-text/html URLs as we can using HtmlUnit. That at least returns the > headers and the raw binary data. > > There must be a way, some how, some time. > > Thanks, > Markus > > Op do 19 jan. 2023 om 11:38 schreef Sebastian Nagel < > wastl.na...@googlemail.com>: > >> Hi Kamil, hi Markus, >> >> upgrading the Selenium plugin is very appreciated! >> >> > Besides that, the plugin also needs some overhaul. >> >> Definitely. >> >> > It currently first downloads the URL with HttpClient, and then, >> depending on >> > MIME-type, it may or may not forward the URL to Selenium so it can be >> > downloaded again. >> >> This makes some sense if you do not know anything about the URL. >> - a HEAD request could do almost the same >> - often one knows whether there are only HTML pages or also PDFs, zip >> files, >> and other stuff not suitable for Selenium. Could make the HEAD request >> optional. >> >> > merging the lib-selenium plugin with the protocol-selenium plugin >> >> I guess lib-selenium is to share common components between >> protocol-selenium and >> protocol-interactiveselenium. Maybe merge all three? Or skip >> interactiveselenium >> for now. >> >> ~Sebastian >> >> On 1/17/23 19:56, Markus Jelsma wrote: >> > Hello Kamil, >> > >> > Yes, the plugin needs some upgrading indeed. We use a modern version of >> it >> > elsewhere and it works really well, at least better than HtmlUnit. >> > >> > Besides that, the plugin also needs some overhaul. It currently first >> downloads >> > the URL with HttpClient, and then, depending on MIME-type, it may or >> may not >> > forward the URL to Selenium so it can be downloaded again. >> > >> > There is a lot of code in the plugin that should be removed. I would >> also opt >> > for merging the lib-selenium plugin with the protocol-selenium plugin. >> There is >> > no obvious need for having it separated. >> > >> > These can be, of course, separate tasks. >> > >> > Regards, >> > Markus >> > >> > Op di 17 jan. 2023 om 17:49 schreef Kamil Mroczek <kamil@elio.earth>: >> > >> > Hello, >> > >> > I am sending a message to inquire whether I should submit a patch >> which >> > updates selenium to the latest version. Although it is a major >> version >> > upgrade to the library, very few code changes were needed to update. >> > >> > For a preview of the changes I made you can look here >> > < >> https://github.com/Elio-Earth/nutch/commit/9960f14bce0f0d6cebc406556a298a7c8c2e6b9f>. >> Although not used in the code anymore (it was commented out), PhantomJS >> support has been removed from Selenium in the latest version. The commit >> also removes Opera since it was commented out but I can leave that in if >> needed. The build and tests pass. I have been using the Chrome driver >> successfully with it and would just need to run a quick test with Firefox >> to make sure it works too. >> > >> > I have only been using Nutch for about a month but have spent quite >> a bit of >> > time looking over different parts of the code to understand how to >> configure >> > it and change it. >> > >> > Kamil >> > >> >