> There must be a way, some how, some time.

There isn't:
https://github.com/seleniumhq/selenium-google-code-issue-archive/issues/141

Op do 19 jan. 2023 om 15:23 schreef Markus Jelsma <
markus.jel...@openindex.io>:

> > This makes some sense if you do not know anything about the URL.
> > - a HEAD request could do almost the same
> > - often one knows whether there are only HTML pages or also PDFs, zip
> files,
> >    and other stuff not suitable for Selenium. Could make the HEAD request
> >    optional.
>
> Ah crap, i forgot about that. With Selenium, it is still not possible to
> get the HTTP headers of the most recent request. And when requesting the
> page source, it will either return nothing, or the previous 'successful'
> call when requesting a non-text MIME-type URL.
>
> Besides doing a HEAD request first, there is no neat way to work with
> non-text/html URLs as we can using HtmlUnit. That at least returns the
> headers and the raw binary data.
>
> There must be a way, some how, some time.
>
> Thanks,
> Markus
>
> Op do 19 jan. 2023 om 11:38 schreef Sebastian Nagel <
> wastl.na...@googlemail.com>:
>
>> Hi Kamil, hi Markus,
>>
>> upgrading the Selenium plugin is very appreciated!
>>
>>  > Besides that, the plugin also needs some overhaul.
>>
>> Definitely.
>>
>>  > It currently first downloads the URL with HttpClient, and then,
>> depending on
>>  > MIME-type, it may or may not forward the URL to Selenium so it can be
>>  > downloaded again.
>>
>> This makes some sense if you do not know anything about the URL.
>> - a HEAD request could do almost the same
>> - often one knows whether there are only HTML pages or also PDFs, zip
>> files,
>>    and other stuff not suitable for Selenium. Could make the HEAD request
>>    optional.
>>
>>  > merging the lib-selenium plugin with the protocol-selenium plugin
>>
>> I guess lib-selenium is to share common components between
>> protocol-selenium and
>> protocol-interactiveselenium. Maybe merge all three? Or skip
>> interactiveselenium
>> for now.
>>
>> ~Sebastian
>>
>> On 1/17/23 19:56, Markus Jelsma wrote:
>> > Hello Kamil,
>> >
>> > Yes, the plugin needs some upgrading indeed. We use a modern version of
>> it
>> > elsewhere and it works really well, at least better than HtmlUnit.
>> >
>> > Besides that, the plugin also needs some overhaul. It currently first
>> downloads
>> > the URL with HttpClient, and then, depending on MIME-type, it may or
>> may not
>> > forward the URL to Selenium so it can be downloaded again.
>> >
>> > There is a lot of code in the plugin that should be removed. I would
>> also opt
>> > for merging the lib-selenium plugin with the protocol-selenium plugin.
>> There is
>> > no obvious need for having it separated.
>> >
>> > These can be, of course, separate tasks.
>> >
>> > Regards,
>> > Markus
>> >
>> > Op di 17 jan. 2023 om 17:49 schreef Kamil Mroczek <kamil@elio.earth>:
>> >
>> >     Hello,
>> >
>> >     I am sending a message to inquire whether I should submit a patch
>> which
>> >     updates selenium to the latest version. Although it is a major
>> version
>> >     upgrade to the library, very few code changes were needed to update.
>> >
>> >     For a preview of the changes I made you can look here
>> >     <
>> https://github.com/Elio-Earth/nutch/commit/9960f14bce0f0d6cebc406556a298a7c8c2e6b9f>.
>> Although not used in the code anymore (it was commented out), PhantomJS
>> support has been removed from Selenium in the latest version. The commit
>> also removes Opera since it was commented out but I can leave that in if
>> needed. The build and tests pass. I have been using the Chrome driver
>> successfully with it and would just need to run a quick test with Firefox
>> to make sure it works too.
>> >
>> >     I have only been using Nutch for about a month but have spent quite
>> a bit of
>> >     time looking over different parts of the code to understand how to
>> configure
>> >     it and change it.
>> >
>> >     Kamil
>> >
>>
>

Reply via email to