[ https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762138#comment-13762138 ]
ASF subversion and git services commented on SOLR-5215: ------------------------------------------------------- Commit 1521236 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1521236 ] SOLR-5215: Fix possibility of deadlock in ZooKeeper ConnectionManager. > Deadlock in Solr Cloud ConnectionManager > ---------------------------------------- > > Key: SOLR-5215 > URL: https://issues.apache.org/jira/browse/SOLR-5215 > Project: Solr > Issue Type: Bug > Components: clients - java, SolrCloud > Affects Versions: 4.2.1 > Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > java version "1.6.0_18" > Java(TM) SE Runtime Environment (build 1.6.0_18-b07) > Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode) > Reporter: Ricardo Merizalde > Assignee: Mark Miller > Fix For: 4.5, 5.0 > > Attachments: SOLR-5215.patch > > > We are constantly seeing a deadlocks in our production application servers. > The problem seems to be that a thread A: > - tries to process an event and acquires the ConnectionManager lock > - the update callback acquires connectionUpdateLock and invokes > waitForConnected > - waitForConnected tries to acquire the ConnectionManager lock (which already > has) > - waitForConnected calls wait and release the ConnectionManager lock (but > still has the connectionUpdateLock) > The a thread B: > - tries to process an event and acquires the ConnectionManager lock > - the update call back tries to acquire connectionUpdateLock but gets blocked > holding the ConnectionManager lock and preventing thread A from getting out > of the wait state. > > Here is part of the thread dump: > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x0000000059965800 > nid=0x3e81 waiting for monitor entry [0x0000000057169000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) > - waiting to lock <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x000000005ad40000 > nid=0x3e67 waiting for monitor entry [0x000000004dbd4000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x00002aac4c2f7000 > nid=0x3d9a waiting for monitor entry [0x0000000042821000] > java.lang.Thread.State: BLOCKED (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - locked <0x00002aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > > Found one Java-level deadlock: > ============================= > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a > org.apache.solr.common.cloud.ConnectionManager), > which is held by "http-0.0.0.0-8080-82-EventThread" > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x00002aac4c314978 (object 0x00002aab1b0e0f78, a > java.lang.Object), > which is held by "http-0.0.0.0-8080-82-EventThread" > "http-0.0.0.0-8080-82-EventThread": > waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a > org.apache.solr.common.cloud.ConnectionManager), > which is held by "http-0.0.0.0-8080-82-EventThread" > > > Java stack information for the threads listed above: > =================================================== > "http-0.0.0.0-8080-82-EventThread": > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71) > - waiting to lock <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > "http-0.0.0.0-8080-82-EventThread": > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) > "http-0.0.0.0-8080-82-EventThread": > at java.lang.Object.wait(Native Method) > - waiting on <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98) > - locked <0x00002aab1b0e0f78> (a java.lang.Object) > at > org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46) > at > org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91) > - locked <0x00002aab1b0e0ce0> (a > org.apache.solr.common.cloud.ConnectionManager) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org