[ https://issues.apache.org/jira/browse/HBASE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703477#comment-14703477 ]
Hudson commented on HBASE-14241: -------------------------------- SUCCESS: Integrated in HBase-1.2-IT #99 (See [https://builds.apache.org/job/HBase-1.2-IT/99/]) HBASE-14241 Fix deadlock during cluster shutdown due to concurrent connection close (tedyu: rev 639018a857a5e58f56d1db45e3f2d0e6043e2650) * hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientImpl.java > Fix deadlock during cluster shutdown due to concurrent connection close > ----------------------------------------------------------------------- > > Key: HBASE-14241 > URL: https://issues.apache.org/jira/browse/HBASE-14241 > Project: HBase > Issue Type: Bug > Affects Versions: 1.0.2 > Reporter: Andrew Purtell > Assignee: Ted Yu > Priority: Critical > Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0 > > Attachments: 14241-v2.txt, 14241-v3.txt, 14241-v4.txt, 14241-v5.txt, > deadlock.txt.gz > > > Caught while testing branch-1.0, shutting down TestMasterMetricsWrapper. > Found one Java-level deadlock: > ============================= > "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0": > waiting to lock monitor 0x00007f2a040051c8 (object 0x00000007e36108a8, a > org.apache.hadoop.hbase.util.PoolMap), > which is held by "M:0;ip-10-32-130-237:55342" > "M:0;ip-10-32-130-237:55342": > waiting to lock monitor 0x00007f2a04005118 (object 0x00000007e3610b00, a > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection), > which is held by "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0" > Full stack dump and deadlock debug output attached. > Root cause: > In RpcClientImpl#close(), we obtain lock on connections first: > {code} > synchronized (connections) { > for (Connection conn : connections.values()) { > {code} > Then markClosed() tries to obtain lock on connection object: > {code} > if (!conn.isAlive()) { > conn.markClosed(new InterruptedIOException("RpcClient is closing")); > conn.close(); > {code} > Another thread, MetaServerShutdownHandler, calls > RpcClientImpl$Connection#setupIOstreams() where : > {code} > markClosed(e); > close(); > {code} > Lock on connection object is obtained first, then lock on connections is > attempted, leading to deadlock: > {code} > synchronized (connections) { > connections.removeValue(remoteId, this); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)