[ https://issues.apache.org/jira/browse/PHOENIX-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150003#comment-16150003 ]
Hudson commented on PHOENIX-4131: --------------------------------- FAILURE: Integrated in Jenkins build Phoenix-master #1762 (See [https://builds.apache.org/job/Phoenix-master/1762/]) PHOENIX-4131 UngroupedAggregateRegionObserver.preClose() and (samarth: rev c2e85f2131669c381e61cc3d6982ab66e4ed63b9) * (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/query/BaseTest.java > UngroupedAggregateRegionObserver.preClose() and doPostScannerOpen() can > deadlock > -------------------------------------------------------------------------------- > > Key: PHOENIX-4131 > URL: https://issues.apache.org/jira/browse/PHOENIX-4131 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain > Assignee: Samarth Jain > Fix For: 4.12.0 > > Attachments: PHOENIX-4131.patch > > > On my local test run I saw that the tests were not completing because the > mini cluster couldn't shut down. So I took a jstack and discovered the > following deadlock: > {code} > "RS:0;samarthjai-wsm4:59006" #16265 prio=5 os_prio=31 tid=0x00007fafa6327000 > nid=0x37b3f runnable [0x00007000115f5000] > java.lang.Thread.State: RUNNABLE > at java.lang.Object.wait(Native Method) > at > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.preClose(UngroupedAggregateRegionObserver.java:1201) > - locked <0x000000072bc406b8> (a java.lang.Object) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$4.call(RegionCoprocessorHost.java:494) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1749) > at > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preClose(RegionCoprocessorHost.java:490) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2843) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegionIgnoreErrors(HRegionServer.java:2805) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.closeUserRegions(HRegionServer.java:2423) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1052) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:157) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:110) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:141) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) > at > org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:334) > at > org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:139) > at java.lang.Thread.run(Thread.java:748) > {code} > {code} > "RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=59006" #16246 daemon > prio=5 os_prio=31 tid=0x00007fafae856000 nid=0x1abdb waiting for monitor > entry [0x00007000102bc000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:734) > - waiting to lock <0x000000072bc406b8> (a java.lang.Object) > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:236) > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:281) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2629) > - locked <0x000000072b625a90> (a > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2833) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168) > {code} > preClose() has the object monitor and is waiting for scanReferencesCount to > go down to 0. doPostScannerOpen() is trying to acquire the same lock so that > it can reduce the scanReferencesCount to 0. > I think this bug was introduced in PHOENIX-3111 to solve other deadlocks. > FYI, [~rajeshbabu], [~sergey.soldatov], [~enis], [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)