https://issues.apache.org/bugzilla/show_bug.cgi?id=55907
Bug ID: 55907 Summary: Unterminated loop during Poller creation due to cyclic references in native allocator Product: Tomcat Native Version: 1.1.29 Hardware: Sun OS: Solaris Status: NEW Severity: normal Priority: P2 Component: Library Assignee: dev@tomcat.apache.org Reporter: rainer.j...@kippdata.de The loop happens during poller thread startup. The following shutdown hangs due to waiting for a native lock that the poller holds. The looping is not reproducible, it was observed during one of many runs of TestAsyncContextImpl. tcnative version used was 1.1.29 plus apr 1.4.8, platform Solaris 10 Sparc, JDK 1.7.0_45 32 Bits. I add the data to this issue primarily such that we can come back here if the problem gets observed in the wild. In the logs the incomplete poller startup results in the message [junit] 18-Dec-2013 20:50:28.194 SEVERE [http-apr-127.0.0.1-auto-27-Poller] org.apache.tomcat.util.net.AprEndpoint$Poller.run Poller failed with error [9] : [Bad file number] about a second after the loop starts. The looping thread is: "http-apr-127.0.0.1-auto-27-Poller" daemon prio=3 tid=0x007c9400 nid=0x134 runnable [0xb407f000] java.lang.Thread.State: RUNNABLE apr_palloc (38a540, 28, 58a258, 28000, 42c8a0, 42c8a0) + 29c impl_pollset_create (44c918, 2000, 58a258, 44c950, 20000, 44c918) + b8 apr_pollset_create_ex (b407f78c, 2000, 58a258, 2, 3, b51d4ec4) + c8 Java_org_apache_tomcat_jni_Poll_create (7c9530, b407f78c, 2000, 2, 58a258, 0) + 38 at org.apache.tomcat.jni.Poll.create(Native Method) at org.apache.tomcat.util.net.AprEndpoint.allocatePoller(AprEndpoint.java:804) at org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1969) at java.lang.Thread.run(Thread.java:744) The main thread later blocks during shutdown: "main" prio=3 tid=0x00029800 nid=0x2 runnable [0xfdf7e000] java.lang.Thread.State: RUNNABLE lwp_park (0, 0, 0) apr_pool_destroy (75c680, fdf7e958, 0, 75c680, fdf7e9d4, 372d08) + c4 at org.apache.tomcat.jni.Pool.destroy(Native Method) at org.apache.tomcat.util.net.AprEndpoint$Sendfile.destroy(AprEndpoint.java:2098) at org.apache.tomcat.util.net.AprEndpoint.stopInternal(AprEndpoint.java:704) at org.apache.tomcat.util.net.AbstractEndpoint.stop(AbstractEndpoint.java:740) at org.apache.coyote.AbstractProtocol.stop(AbstractProtocol.java:515) at org.apache.catalina.connector.Connector.stopInternal(Connector.java:1014) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) - locked <0xe69c1760> (a org.apache.catalina.connector.Connector) at org.apache.catalina.core.StandardService.stopInternal(StandardService.java:516) - locked <0xe69ca618> (a java.lang.Object) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) - locked <0xe69ca570> (a org.apache.catalina.core.StandardService) at org.apache.catalina.core.StandardServer.stopInternal(StandardServer.java:766) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) - locked <0xe69c9ec8> (a org.apache.catalina.core.StandardServer) at org.apache.catalina.startup.Tomcat.stop(Tomcat.java:350) at org.apache.catalina.startup.TomcatBaseTest.tearDown(TomcatBaseTest.java:163) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:38) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:523) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1063) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:914) Debugger says: #0 allocator_alloc (in_size=163840, allocator=0x38a540) at memory/unix/apr_pools.c:315 node = 0x42c8a0 ref = 0x42c8a0 max_index = <optimized out> i = <optimized out> size = 167936 index = 40 #1 apr_palloc (pool=pool@entry=0x58a258, in_size=in_size@entry=163840) at memory/unix/apr_pools.c:696 active = 0x42c8a0 node = <optimized out> mem = <optimized out> size = 163840 free_index = <optimized out> #2 0xb51bccc4 in impl_pollset_create (pollset=0x44c918, size=8192, p=0x58a258, flags=<optimized out>) at poll/unix/port.c:199 No locals. #3 0xb51bc5a0 in apr_pollset_create_ex (ret_pollset=0xb407f78c, size=8192, p=0x58a258, flags=2, method=APR_POLLSET_PORT) at poll/unix/pollset.c:251 rv = <optimized out> pollset = 0x44c918 provider = 0xb51d4ec4 #4 0xb53e1330 in Java_org_apache_tomcat_jni_Poll_create (e=0x7c9530, o=<optimized out>, size=8192, pool=<optimized out>, flags=<optimized out>, default_timeout=-1000) at src/poll.c:126 R = 8192 p = 0x58a258 pollset = <optimized out> tps = 0x0 f = 2 The loop happens in memory/unix/apr_pools.c:315 while ((node = *ref) != NULL && index > node->index) ref = &node->next; because node->next is the same as node: (gdb) print node $4 = (apr_memnode_t *) 0x42c8a0 (gdb) print *node $3 = {next = 0x42c8a0, ref = 0x42c8a0, index = 32, free_index = 0, first_avail = 0x44c8b8 "", endp = 0x44d8a0 ""} The root cause will have happened earlier. Some additional data: (gdb) print *allocator $5 = {max_index = 1, max_free_index = 0, current_free_index = 0, mutex = 0x372d08, owner = 0x372cc8, free = {0x49c8f8, 0x58a240, 0x0 <repeats 18 times>}} The associated pool hierarchy: (gdb) print pool $15 = (apr_pool_t *) 0x58a258 (gdb) print *pool $6 = {parent = 0x179920, child = 0x0, sibling = 0x0, ref = 0x75c688, cleanups = 0x0, free_cleanups = 0x0, allocator = 0x38a540, subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0x0, active = 0x42c8a0, self = 0x58a240, self_first_avail = 0x58a298 "", pre_cleanups = 0x0} (gdb) print *pool->parent $7 = {parent = 0x790028, child = 0x75c680, sibling = 0x0, ref = 0x180250, cleanups = 0x0, free_cleanups = 0x0, allocator = 0x38a540, subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0x0, active = 0x179908, self = 0x179908, self_first_avail = 0x179960 "", pre_cleanups = 0x0} (gdb) print *pool->parent->parent $8 = {parent = 0x409898, child = 0x180248, sibling = 0x0, ref = 0x40989c, cleanups = 0x0, free_cleanups = 0x0, allocator = 0x38a540, subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0x0, active = 0x790010, self = 0x790010, self_first_avail = 0x790068 "", pre_cleanups = 0x0} (gdb) print *pool->parent->parent->parent $9 = {parent = 0x372cc8, child = 0x790028, sibling = 0x7eac40, ref = 0x372ccc, cleanups = 0x40a150, free_cleanups = 0x0, allocator = 0x38a540, subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0x0, active = 0x409880, self = 0x409880, self_first_avail = 0x4098d8 "/dev/urandom", pre_cleanups = 0x0} (gdb) print *pool->parent->parent->parent->parent $10 = {parent = 0x0, child = 0x409898, sibling = 0x0, ref = 0x0, cleanups = 0x372d28, free_cleanups = 0x0, allocator = 0x38a540, subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0xb51c2808 "apr_global_pool", active = 0x372cb0, self = 0x372cb0, self_first_avail = 0x372d08 "", pre_cleanups = 0x0} The blocked main thread tries to acquire the lock of the allocator that the other thread loops in (and has the lock acquired). -- You are receiving this mail because: You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org