[ 
https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642968#comment-16642968
 ] 

Sebastian Nagel commented on NUTCH-2648:
----------------------------------------

Hi [~markus17], it should work for protocol-http, protocol-httpclient and 
protocol-okhttp. I've tested it using parsechecker for all three plugins, here 
for httpclient:
{noformat}
% bin/nutch parsechecker -Dhttp.tls.certificates.check=true 
-Dplugin.includes='protocol-httpclient|parse-tika' 
https://ingevd.waarbenjij.nu/kaart/5000179/dag-4
...
Fetch failed with protocol status: exception(16), lastModified=0: 
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: 
PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target

% bin/nutch parsechecker -Dhttp.tls.certificates.check=false 
-Dplugin.includes='protocol-httpclient|parse-tika' 
...
Status: success(1,0)
...{noformat}
Regarding the protocol-htmlunit and the two selenium-based protocol plugins: 
it's now tracked in NUTCH-2649.

?? Maybe its also an idea to add a dummy trust manager in Nutch' base code??

Yes, or in lib-http. While implementing this for protocol-okhttp, I've thought 
about trying to bundle the DummyTrustManager functionalities. But the code 
overlaps only partially, so I was lazy here.

> Make configurable whether TLS/SSL certificates are checked by protocol plugins
> ------------------------------------------------------------------------------
>
>                 Key: NUTCH-2648
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2648
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> (see discussion in NUTCH-2647)
> It should be possible to enable/disable TLS/SSL certificate validation 
> centrally for all http/https protocol plugins by a single configuration 
> property.
> Some use cases (eg. crawl a site to detect insecure pages) may require that 
> TLS/SSL certificates are checked. Also a broader, unrestricted web crawl may 
> skip sites with invalid certificates as this is can be an indicator for the 
> quality of a site.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to