Sebastian Nagel created NUTCH-2653:
--------------------------------------

             Summary: ProtocolFactory.getProtocol(url) creates separate plugin 
instances for http/https
                 Key: NUTCH-2653
                 URL: https://issues.apache.org/jira/browse/NUTCH-2653
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher, protocol
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16


Fetcher creates two instances of the protocol-okhttp plugin, one to handle http 
requests, another for https. The plugin properties are logged during plugin 
instantiation when calling {{setConf(...)}}:
{noformat}
2018-10-11 13:28:34,417 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 40 fetching http://...
...
2018-10-11 13:28:35,099 INFO [FetcherThread] 
org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
2018-10-11 13:28:35,100 INFO [FetcherThread] 
org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.port = 8080
...
2018-10-11 13:28:36,864 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 87 fetching https://...
...
2018-10-11 13:28:36,864 INFO [FetcherThread] 
org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
2018-10-11 13:28:36,864 INFO [FetcherThread] 
org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.port = 8080
{noformat}

The question is whether this is the correct behavior for plugins supporting 
multiple protocols (http and https)? It may cause that connection pooling and 
other network optimizations do not work as expected. Of course, it's correct if 
different plugins are required, e.g., for ftp or the local file system.

(seen while reviewing the behavior of fetcher with fix for NUTCH-2625 applied)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to