[ 
https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152140#comment-15152140
 ] 

Markus Jelsma commented on NUTCH-2191:
--------------------------------------

Hi - it works indeed. But new problems appear, as usual!

1. SSL does not work due to
{code}
2016-02-18 11:53:21,130 ERROR htmlunit.Http - Failed to get protocol output
java.lang.IllegalArgumentException: Cannot locate declared field 
org.apache.http.impl.client.HttpClientBuilder.sslContext
        at 
org.apache.commons.lang3.reflect.FieldUtils.readDeclaredField(FieldUtils.java:382)
        at 
com.gargoylesoftware.htmlunit.HttpWebConnection.createConnectionManager(HttpWebConnection.java:944)
        at 
com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:161)
        at 
com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1321)
        at 
com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1238)
        at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:346)
        at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:415)
        at 
org.apache.nutch.protocol.htmlunit.HttpResponse.<init>(HttpResponse.java:103)
{code}

2. I don't know how yet but since it uses Selenium, every time i try a file a 
browser opens! This is crazy, i didn't know this was even possible.

Markus

> Add protocol-htmlunit
> ---------------------
>
>                 Key: NUTCH-2191
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2191
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: 1.11
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.12
>
>         Attachments: NUTCH-2191.patch, NUTCH-2191.patch
>
>
> HtmlUnit is, opposed to other Javascript enabled headless browsers, a 
> portable library and should therefore be better suited for very large scale 
> crawls. This issue is an attempt to implement protocol-htmlunit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to