[ https://issues.apache.org/jira/browse/GEODE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Swapnil Bawaskar closed GEODE-1677. ----------------------------------- > Persistent AsyncEventQueue with non-persistent data PR hangs during recovery > ---------------------------------------------------------------------------- > > Key: GEODE-1677 > URL: https://issues.apache.org/jira/browse/GEODE-1677 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: Barry Oglesby > Assignee: Barry Oglesby > Fix For: 1.0.0-incubating > > > This is the same bug as GEM-801. > During recovery of a persistent {{AsyncEventQueue}} on a non-persistent data > {{PartitionedRegion}}, a deadlock occurs. > Here is analysis duplicated from GEM-801: > *Member dataStoregemfire1_31558* > This member has created its PR and is recovering its shadow PR (async event > queue). The {{ParallelGatewaySenderQueue > addShadowPartitionedRegionForUserPR}} method has taken the > {{AbstractGatewaySender's lifeCycleLock's writeLock}}. > The bgexec19832_31558.log thread dumps show: > {noformat} > "vm_0_thr_0_dataStore1_client-13_31558" #162 daemon prio=5 os_prio=0 > tid=0x00007f406c01f800 nid=0x7fca waiting on condition [0x00007f40bd7c4000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f1a6db90> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > at > com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:362) > at > com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:632) > at > com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1782) > at > com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:887) > - locked <0x00000000f1cd7070> (a > com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue$ParallelGatewaySenderQueueMetaRegion) > at > com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1007) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3065) > at > com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:559) > at > com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:203) > at > com.gemstone.gemfire.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:172) > at > com.gemstone.gemfire.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:986) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3109) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2959) > at > com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2948) > at hydra.RegionHelper.createRegion(RegionHelper.java:117) > - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper) > at hydra.RegionHelper.createRegion(RegionHelper.java:85) > - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper) > at hydra.RegionHelper.createRegion(RegionHelper.java:72) > - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper) > at hydra.RegionHelper.createRegion(RegionHelper.java:52) > - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper) > at > parReg.wbcl.ParRegWBCLTest.HA_reinitializeRegion(ParRegWBCLTest.java:250) > at parReg.ParRegTest.HAController(ParRegTest.java:2063) > at parReg.wbcl.ParRegWBCLTest.HAController(ParRegWBCLTest.java:274) > at parReg.ParRegTest.HydraTask_HAController(ParRegTest.java:985) > {noformat} > As part of recovery, 5 buckets are waiting for their initial images: > {noformat} > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102" #705 > daemon prio=5 os_prio=0 tid=0x00007f406c16d000 nid=0x954 waiting on condition > [0x00007f3fcdfdd000] > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99" #702 > daemon prio=5 os_prio=0 tid=0x00007f406c169800 nid=0x951 waiting on condition > [0x00007f3fce2e0000] > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93" #696 > daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition > [0x00007f3fce8e6000] > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64" #665 > daemon prio=5 os_prio=0 tid=0x00007f406c13b000 nid=0x92e waiting on condition > [0x00007f3fd55d5000] > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54" #655 > daemon prio=5 os_prio=0 tid=0x00007f406c12e800 nid=0x924 waiting on condition > [0x00007f3fd5fdf000] > {noformat} > Here is bucket 93's recovery thread: > {noformat} > "Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93" #696 > daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition > [0x00007f3fce8e6000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f17baf68> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886) > at > com.gemstone.gemfire.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:458) > at > com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1352) > at > com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1159) > at > com.gemstone.gemfire.internal.cache.BucketRegion.initialize(BucketRegion.java:263) > at > com.gemstone.gemfire.internal.cache.LocalRegion.createSubregion(LocalRegion.java:892) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:765) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:444) > - locked <0x00000000f17a9150> (a > com.gemstone.gemfire.internal.cache.ProxyBucketRegion) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2982) > at > com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:446) > at > com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:403) > at > com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run2(PRHARedundancyProvider.java:1765) > at > com.gemstone.gemfire.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:64) > at > com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run(PRHARedundancyProvider.java:1757) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The dataStoregemfire1_31558/system.log contains this warning showing the > above thread is waiting for member dataStoregemfire1_client-13_31576: > {noformat} > [warning 2016/07/03 04:10:04.000 UTC dataStoregemfire1_client-13_31558 > <Recovery thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> > tid=0x2b8] 15 seconds have elapsed while waiting for replies: > <com.gemstone.gemfire.internal.cache.InitialImageOperation$ImageProcessor > 4138 waiting for 1 replies from > [client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027]; waiting > for 0 messages in-flight; region=/__PR/_B__dataStoreRegion_93; abort=false> > on client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025 whose > current membership list is: [[client-13(31491:locator)<ec><v0>:1024, > client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, > client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027, > client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, > client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029, > client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]] > {noformat} > *Member dataStoregemfire1_client-13_31576* > The bgexec16591_31576.log thread dumps show several blocked Pooled High > Priority Message Processor threads waiting for entries while processing > {{InitialImageOperations}}: > {noformat} > "Pooled High Priority Message Processor 11" #372 daemon prio=10 os_prio=0 > tid=0x00007f609c047000 nid=0x581 waiting for monitor entry > [0x00007f6090e4f000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.chunkEntries(InitialImageOperation.java:1857) > - waiting to lock <0x00000000f16ce110> (a > com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2) > at > com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.process(InitialImageOperation.java:1657) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:450) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:611) > at > com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.run(DistributionManager.java:922) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The P2P message reader that has the entry lock is waiting for replies from > dataStoregemfire1_client-13_31558:31558 shown by the log warning and thread > below: > {noformat} > [warning 2016/07/03 04:10:03.671 UTC dataStoregemfire1_client-13_31576 <P2P > message reader for > client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 unshared > ordered uid=416 dom #1 port=35558> tid=0x150] 15 seconds have elapsed while > waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor > 4707 waiting for 2 replies from > [client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, > client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025]> on > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 whose current > membership list is: [[client-13(31491:locator)<ec><v0>:1024, > client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, > client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027, > client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, > client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029, > client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]] > {noformat} > {noformat} > "P2P message reader for > client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 unshared > ordered uid=416 dom #1 port=35558" #336 daemon prio=10 os_prio=0 > tid=0x00007f6169731800 nid=0x4d5 waiting on condition [0x00007f6093373000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f16cdfc8> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796) > at > com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886) > at > com.gemstone.gemfire.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:743) > at > com.gemstone.gemfire.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:622) > at > com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.distribute(AbstractUpdateOperation.java:71) > at > com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:634) > at > com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736) > - locked <0x00000000f16ce110> (a > com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2) > at > com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1275) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1250) > at > com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.putEntryOnRemote(PartitionedRegionDataView.java:107) > at > com.gemstone.gemfire.internal.cache.partitioned.PutMessage.operateOnPartitionedRegion(PutMessage.java:833) > at > com.gemstone.gemfire.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:339) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142) > at > com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382) > at > com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726) > at > com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815) > at > com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961) > at > com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545) > at > com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837) > at > com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Back in bgexec19832_31558.log, the thread dumps show a number of P2P message > reader threads for dataStoregemfire1_client-13_31576:31576 stuck waiting for > the {{AbstractGatewaySender's lifeCycleLock's readLock}} here: > {noformat} > "P2P message reader for > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 unshared > ordered uid=537 dom #2 port=55007" #868 daemon prio=10 os_prio=0 > tid=0x00007f40403a4000 nid=0xa23 waiting on condition [0x00007f3fc3442000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000000f1cbe5f8> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727) > at > com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.distribute(AbstractGatewaySender.java:928) > at > com.gemstone.gemfire.internal.cache.LocalRegion.notifyGatewaySender(LocalRegion.java:6485) > at > com.gemstone.gemfire.internal.cache.BucketRegion.notifyGatewaySender(BucketRegion.java:654) > at > com.gemstone.gemfire.internal.cache.LocalRegion.basicPutPart2(LocalRegion.java:6022) > at > com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:644) > at > com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736) > - locked <0x00000000f18891f0> (a > com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2) > at > com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485) > at > com.gemstone.gemfire.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:132) > at > com.gemstone.gemfire.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5817) > at > com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:148) > at > com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:286) > at > com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:255) > at > com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191) > at > com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1092) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379) > at > com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519) > at > com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142) > at > com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028) > at > com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382) > at > com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726) > at > com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815) > at > com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961) > at > com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545) > at > com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837) > at > com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > These threads will never get the {{readLock}} since the {{writeLock}} is > blocked. > This deadlock only occurs when the {{AsyncEventQueue}} is persistent, but its > attached data region is not. > The regions being recovered by the {{AsyncEventQueue}} recovery threads are > the actual data regions. Its the dataStoreRegion that is being GIIed not the > {{AsyncEventQueue}} region: > {noformat} > [info 2016/07/03 04:09:48.504 UTC dataStoregemfire1_client-13_31558 <Recovery > thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102> > tid=0x2c1] Region _B__dataStoreRegion_102 requesting initial image from > client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028 > [info 2016/07/03 04:09:48.968 UTC dataStoregemfire1_client-13_31558 <Recovery > thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54> > tid=0x28f] Region _B__dataStoreRegion_54 requesting initial image from > client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 > [info 2016/07/03 04:09:49.007 UTC dataStoregemfire1_client-13_31558 <Recovery > thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99> > tid=0x2be] Region _B__dataStoreRegion_99 requesting initial image from > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 > [info 2016/07/03 04:09:49.202 UTC dataStoregemfire1_client-13_31558 <Recovery > thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> > tid=0x2b8] Region _B__dataStoreRegion_93 requesting initial image from > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 > [info 2016/07/03 04:09:49.206 UTC dataStoregemfire1_client-13_31558 <Recovery > thread for bucket > _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64> > tid=0x299] Region _B__dataStoreRegion_64 requesting initial image from > client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 > {noformat} > The code below is from the {{ProxyBucketRegion recoverFromDisk}} method which > is executed during recovery of the {{AsyncEventQueue}} bucket. This is the > source of the data region GII: > {noformat} > if(this.partitionedRegion.getDataPolicy().withPersistence() && > !colocatedRegion.getDataPolicy().withPersistence()) { > result = colocatedRegion.getDataStore() > .grabBucket(bid, getDistributionManager().getDistributionManagerId(), > true, true, false, null, true); > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)