[ 
https://issues.apache.org/jira/browse/NUTCH-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554299#comment-16554299
 ] 

ASF GitHub Bot commented on NUTCH-2625:
---------------------------------------

sebastian-nagel opened a new pull request #368: NUTCH-2625 
ProtocolFactory.getProtocol(url) may create multiple plugin instances
URL: https://github.com/apache/nutch/pull/368
 
 
   - use object cache object to hold lock for critical block (conditional 
creation of plugin instance)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProtocolFactory.getProtocol(url) may create multiple plugin instances
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-2625
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2625
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The method ProtocolFactory.getProtocol(URL url) may create unnecessarily 
> multiple instances of protocol plugins given the same configuration. The 
> following snippets from a Fetcher using 100 FetcherThreads show that the 
> setConf(conf) method of the protocol-okhttp plugin is called 100 times (once 
> for each thread):
> {noformat}
> 2018-07-12 12:04:32,811 INFO [main] org.apache.nutch.fetcher.FetcherThread: 
> FetcherThread 1 Using queue mode : byHost
> ... (skipped 98 repeated messages)
> 2018-07-12 12:04:33,136 INFO [main] org.apache.nutch.fetcher.FetcherThread: 
> FetcherThread 1 Using queue mode : byHost
> ...
> 2018-07-12 12:04:37,493 INFO [FetcherThread] 
> org.apache.nutch.protocol.RobotRulesParser: robots.txt whitelist not 
> configured.
> 2018-07-12 12:04:37,493 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
> ...
> 2018-07-12 12:04:37,494 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.enable.cookie.header = false
> ... (skipped 98 blocks of repeated messages)
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.RobotRulesParser: robots.txt whitelist not 
> configured.
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
> ...
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.enable.cookie.header = false
> {noformat}
> The method ProtocolFactory.getProtocol(URL url) is synchronized, however each 
> FetcherThread holds its own instance of the ProtocolFactory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to