[ https://issues.apache.org/jira/browse/NUTCH-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642968#comment-16642968 ]
Sebastian Nagel commented on NUTCH-2648: ---------------------------------------- Hi [~markus17], it should work for protocol-http, protocol-httpclient and protocol-okhttp. I've tested it using parsechecker for all three plugins, here for httpclient: {noformat} % bin/nutch parsechecker -Dhttp.tls.certificates.check=true -Dplugin.includes='protocol-httpclient|parse-tika' https://ingevd.waarbenjij.nu/kaart/5000179/dag-4 ... Fetch failed with protocol status: exception(16), lastModified=0: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target % bin/nutch parsechecker -Dhttp.tls.certificates.check=false -Dplugin.includes='protocol-httpclient|parse-tika' ... Status: success(1,0) ...{noformat} Regarding the protocol-htmlunit and the two selenium-based protocol plugins: it's now tracked in NUTCH-2649. ?? Maybe its also an idea to add a dummy trust manager in Nutch' base code?? Yes, or in lib-http. While implementing this for protocol-okhttp, I've thought about trying to bundle the DummyTrustManager functionalities. But the code overlaps only partially, so I was lazy here. > Make configurable whether TLS/SSL certificates are checked by protocol plugins > ------------------------------------------------------------------------------ > > Key: NUTCH-2648 > URL: https://issues.apache.org/jira/browse/NUTCH-2648 > Project: Nutch > Issue Type: Improvement > Components: protocol > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.16 > > > (see discussion in NUTCH-2647) > It should be possible to enable/disable TLS/SSL certificate validation > centrally for all http/https protocol plugins by a single configuration > property. > Some use cases (eg. crawl a site to detect insecure pages) may require that > TLS/SSL certificates are checked. Also a broader, unrestricted web crawl may > skip sites with invalid certificates as this is can be an indicator for the > quality of a site. -- This message was sent by Atlassian JIRA (v7.6.3#76005)