Sebastian Nagel created NUTCH-2949:
--------------------------------------

             Summary: Tasks of a multi-threaded map runner may fail because of 
slow creation of URL stream handlers
                 Key: NUTCH-2949
                 URL: https://issues.apache.org/jira/browse/NUTCH-2949
             Project: Nutch
          Issue Type: Bug
          Components: net, plugin, protocol
    Affects Versions: 1.19
            Reporter: Sebastian Nagel
             Fix For: 1.19


While running a custom Nutch job ([code 
here|https://github.com/commoncrawl/nutch/blob/cc/src/java/org/apache/nutch/crawl/SitemapInjector.java]),
 many but not all task failed exceeding the the Hadoop task time-out 
(`mapreduce.task.timeout`) without generating any "heartbeat" (output, counter 
increments, log messages). Hadoop logs the stacks of all threads of the timed 
out task. That's the base for the excerpts below.

The job runs a MultithreadedMapper - most of the mapper threads (48 in total) 
are waiting for the URLStreamHandler in order to construct a java.net.URL 
object:

{noformat}
"Thread-11" #27 prio=5 os_prio=0 cpu=243.78ms elapsed=647.25s 
tid=0x00007f3eb5b0f800 nid=0x8e651 waiting for monitor entry  
[0x00007f3e84ef9000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
        - waiting to lock <0x00000006a1bc0630> (a java.lang.String)
        at 
org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597)
        at 
org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95)
        at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:651)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:541)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:488)
        at 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179)
        at 
org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318)
        at 
org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at 
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
{noformat}

Only a single mapper thread is active:

{noformat}
"Thread-23" #39 prio=5 os_prio=0 cpu=5830.17ms elapsed=647.09s 
tid=0x00007f3eb5b42800 nid=0x8e661 in Object.wait()  [0x00007f3e842ec000]
   java.lang.Thread.State: RUNNABLE
        at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native
 Method)
        at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62)
        at 
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45)
        at 
java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170)
        - locked <0x00000006a1bc0630> (a java.lang.String)
        at 
org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597)
        at 
org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95)
        at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:651)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:541)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:488)
        at 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179)
        at 
org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318)
        at 
org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at 
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
{noformat}

The stack of one thread shows a nested lock:

{noformat}
"Thread-26" #42 prio=5 os_prio=0 cpu=24.93ms elapsed=647.07s 
tid=0x00007f3eb5b49800 nid=0x8e664 waiting for monitor entry  
[0x00007f3e83fe7000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
        - waiting to lock <0x00000006a1bc0630> (a java.lang.String)
        at 
org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597)
        at 
org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95)
        at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:651)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:541)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:488)
        at 
javax.crypto.JceSecurity.<clinit>(java.base@11.0.15/JceSecurity.java:239)
        at javax.crypto.Cipher.getInstance(java.base@11.0.15/Cipher.java:540)
        at 
sun.security.ssl.JsseJce.getCipher(java.base@11.0.15/JsseJce.java:190)
        at 
sun.security.ssl.SSLCipher.isTransformationAvailable(java.base@11.0.15/SSLCipher.java:509)
        at 
sun.security.ssl.SSLCipher.<init>(java.base@11.0.15/SSLCipher.java:498)
        at 
sun.security.ssl.SSLCipher.<clinit>(java.base@11.0.15/SSLCipher.java:81)
        at 
sun.security.ssl.CipherSuite.<clinit>(java.base@11.0.15/CipherSuite.java:65)
        at 
sun.security.ssl.SSLContextImpl.getApplicableSupportedCipherSuites(java.base@11.0.15/SSLContextImpl.java:348)
        at 
sun.security.ssl.SSLContextImpl$AbstractTLSContext.<clinit>(java.base@11.0.15/SSLContextImpl.java:580)
        at java.lang.Class.forName0(java.base@11.0.15/Native Method)
        at java.lang.Class.forName(java.base@11.0.15/Class.java:315)
        at 
java.security.Provider$Service.getImplClass(java.base@11.0.15/Provider.java:1918)
        at 
java.security.Provider$Service.newInstance(java.base@11.0.15/Provider.java:1894)
        at 
sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:236)
        at 
sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:164)
        at 
javax.net.ssl.SSLContext.getInstance(java.base@11.0.15/SSLContext.java:168)
        at org.apache.nutch.protocol.okhttp.OkHttp.<clinit>(OkHttp.java:97)
        at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native
 Method)
        at 
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62)
        at 
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45)
        at 
java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170)
        - locked <0x00000006a1bc0410> (a java.lang.String)
        at 
org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597)
        at 
org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95)
        at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:651)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:541)
        at java.net.URL.<init>(java.base@11.0.15/URL.java:488)
        at 
org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179)
        at 
org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318)
        at 
org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325)
        at 
org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at 
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
{noformat}

It isn't a dead-lock because the lock is on two different String objects in 
different lines of code. However, a recursive barrier may make the performance 
impact of locks much worse. The stack also shows that Nutch's 
URLStreamHandlerFactory is also called from Java-internal methods. The 
cryptographic classes in the stack could also give a hint what's going wrong in 
NUTCH-2936.

I'm not yet clear what the solution could be:
- 
[Extension.getExtensionInstance()|https://github.com/apache/nutch/blob/568993b908702b9631a705c8122ba99af1b72f04/src/java/org/apache/nutch/plugin/Extension.java#L154]
 with nested synchronization is for sure not suitable to be called frequently
- maybe caching the URL stream handlers solves the problem
- but we should keep also NUTCH-2936 on the radar when looking for solutions



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to