[ 
https://issues.apache.org/jira/browse/NUTCH-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657917#comment-16657917
 ] 

ASF GitHub Bot commented on NUTCH-2625:
---------------------------------------

sebastian-nagel closed pull request #368: NUTCH-2625 
ProtocolFactory.getProtocol(url) may create multiple plugin instances
URL: https://github.com/apache/nutch/pull/368
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/java/org/apache/nutch/protocol/ProtocolFactory.java 
b/src/java/org/apache/nutch/protocol/ProtocolFactory.java
index b39155b79..e25ad8f1d 100644
--- a/src/java/org/apache/nutch/protocol/ProtocolFactory.java
+++ b/src/java/org/apache/nutch/protocol/ProtocolFactory.java
@@ -87,7 +87,7 @@ public Protocol getProtocol(String urlString) throws 
ProtocolNotFound {
    * @throws ProtocolNotFound
    *           when Protocol can not be found for url
    */
-  public synchronized Protocol getProtocol(URL url)
+  public Protocol getProtocol(URL url)
       throws ProtocolNotFound {
     ObjectCache objectCache = ObjectCache.get(conf);
     try {
@@ -97,19 +97,21 @@ public synchronized Protocol getProtocol(URL url)
       }
 
       String cacheId = Protocol.X_POINT_ID + protocolName;
-      Protocol protocol = (Protocol) objectCache.getObject(cacheId);
-      if (protocol != null) {
+      synchronized (objectCache) {
+        Protocol protocol = (Protocol) objectCache.getObject(cacheId);
+        if (protocol != null) {
+          return protocol;
+        }
+
+        Extension extension = findExtension(protocolName);
+        if (extension == null) {
+          throw new ProtocolNotFound(protocolName);
+        }
+
+        protocol = (Protocol) extension.getExtensionInstance();
+        objectCache.setObject(cacheId, protocol);
         return protocol;
       }
-
-      Extension extension = findExtension(protocolName);
-      if (extension == null) {
-        throw new ProtocolNotFound(protocolName);
-      }
-
-      protocol = (Protocol) extension.getExtensionInstance();
-      objectCache.setObject(cacheId, protocol);
-      return protocol;
     } catch (PluginRuntimeException e) {
       throw new ProtocolNotFound(url.toString(), e.toString());
     }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ProtocolFactory.getProtocol(url) may create multiple plugin instances
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-2625
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2625
>             Project: Nutch
>          Issue Type: Improvement
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> The method ProtocolFactory.getProtocol(URL url) may create unnecessarily 
> multiple instances of protocol plugins given the same configuration. The 
> following snippets from a Fetcher using 100 FetcherThreads show that the 
> setConf(conf) method of the protocol-okhttp plugin is called 100 times (once 
> for each thread):
> {noformat}
> 2018-07-12 12:04:32,811 INFO [main] org.apache.nutch.fetcher.FetcherThread: 
> FetcherThread 1 Using queue mode : byHost
> ... (skipped 98 repeated messages)
> 2018-07-12 12:04:33,136 INFO [main] org.apache.nutch.fetcher.FetcherThread: 
> FetcherThread 1 Using queue mode : byHost
> ...
> 2018-07-12 12:04:37,493 INFO [FetcherThread] 
> org.apache.nutch.protocol.RobotRulesParser: robots.txt whitelist not 
> configured.
> 2018-07-12 12:04:37,493 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
> ...
> 2018-07-12 12:04:37,494 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.enable.cookie.header = false
> ... (skipped 98 blocks of repeated messages)
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.RobotRulesParser: robots.txt whitelist not 
> configured.
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.proxy.host = null
> ...
> 2018-07-12 12:04:39,080 INFO [FetcherThread] 
> org.apache.nutch.protocol.okhttp.OkHttp: http.enable.cookie.header = false
> {noformat}
> The method ProtocolFactory.getProtocol(URL url) is synchronized, however each 
> FetcherThread holds its own instance of the ProtocolFactory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to