[ 
https://issues.apache.org/jira/browse/GEODE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swapnil Bawaskar closed GEODE-1677.
-----------------------------------

> Persistent AsyncEventQueue with non-persistent data PR hangs during recovery
> ----------------------------------------------------------------------------
>
>                 Key: GEODE-1677
>                 URL: https://issues.apache.org/jira/browse/GEODE-1677
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Barry Oglesby
>            Assignee: Barry Oglesby
>             Fix For: 1.0.0-incubating
>
>
> This is the same bug as GEM-801.
> During recovery of a persistent {{AsyncEventQueue}} on a non-persistent data 
> {{PartitionedRegion}}, a deadlock occurs.
> Here is analysis duplicated from GEM-801:
> *Member dataStoregemfire1_31558*
> This member has created its PR and is recovering its shadow PR (async event 
> queue). The {{ParallelGatewaySenderQueue 
> addShadowPartitionedRegionForUserPR}} method has taken the 
> {{AbstractGatewaySender's lifeCycleLock's writeLock}}.
> The bgexec19832_31558.log thread dumps show:
> {noformat}
> "vm_0_thr_0_dataStore1_client-13_31558" #162 daemon prio=5 os_prio=0 
> tid=0x00007f406c01f800 nid=0x7fca waiting on condition [0x00007f40bd7c4000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f1a6db90> (a 
> java.util.concurrent.CountDownLatch$Sync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>       at 
> com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:362)
>       at 
> com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:632)
>       at 
> com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1782)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:887)
>       - locked <0x00000000f1cd7070> (a 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue$ParallelGatewaySenderQueueMetaRegion)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1007)
>       at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3065)
>       at 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderQueue.java:559)
>       at 
> com.gemstone.gemfire.internal.cache.wan.parallel.ParallelGatewaySenderEventProcessor.addShadowPartitionedRegionForUserPR(ParallelGatewaySenderEventProcessor.java:203)
>       at 
> com.gemstone.gemfire.internal.cache.wan.parallel.ConcurrentParallelGatewaySenderQueue.addShadowPartitionedRegionForUserPR(ConcurrentParallelGatewaySenderQueue.java:172)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegion.postCreateRegion(PartitionedRegion.java:986)
>       at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3109)
>       at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2959)
>       at 
> com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:2948)
>       at hydra.RegionHelper.createRegion(RegionHelper.java:117)
>       - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
>       at hydra.RegionHelper.createRegion(RegionHelper.java:85)
>       - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
>       at hydra.RegionHelper.createRegion(RegionHelper.java:72)
>       - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
>       at hydra.RegionHelper.createRegion(RegionHelper.java:52)
>       - locked <0x00000000f0d19030> (a java.lang.Class for hydra.RegionHelper)
>       at 
> parReg.wbcl.ParRegWBCLTest.HA_reinitializeRegion(ParRegWBCLTest.java:250)
>       at parReg.ParRegTest.HAController(ParRegTest.java:2063)
>       at parReg.wbcl.ParRegWBCLTest.HAController(ParRegWBCLTest.java:274)
>       at parReg.ParRegTest.HydraTask_HAController(ParRegTest.java:985)
> {noformat}
> As part of recovery, 5 buckets are waiting for their initial images:
> {noformat}
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102" #705 
> daemon prio=5 os_prio=0 tid=0x00007f406c16d000 nid=0x954 waiting on condition 
> [0x00007f3fcdfdd000]
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99" #702 
> daemon prio=5 os_prio=0 tid=0x00007f406c169800 nid=0x951 waiting on condition 
> [0x00007f3fce2e0000]
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93" #696 
> daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition 
> [0x00007f3fce8e6000]
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64" #665 
> daemon prio=5 os_prio=0 tid=0x00007f406c13b000 nid=0x92e waiting on condition 
> [0x00007f3fd55d5000]
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54" #655 
> daemon prio=5 os_prio=0 tid=0x00007f406c12e800 nid=0x924 waiting on condition 
> [0x00007f3fd5fdf000]
> {noformat}
> Here is bucket 93's recovery thread:
> {noformat}
> "Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93" #696 
> daemon prio=5 os_prio=0 tid=0x00007f406c161800 nid=0x94b waiting on condition 
> [0x00007f3fce8e6000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f17baf68> (a 
> java.util.concurrent.CountDownLatch$Sync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>       at 
> com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886)
>       at 
> com.gemstone.gemfire.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:458)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1352)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1159)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.initialize(BucketRegion.java:263)
>       at 
> com.gemstone.gemfire.internal.cache.LocalRegion.createSubregion(LocalRegion.java:892)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:765)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:444)
>       - locked <0x00000000f17a9150> (a 
> com.gemstone.gemfire.internal.cache.ProxyBucketRegion)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2982)
>       at 
> com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDisk(ProxyBucketRegion.java:446)
>       at 
> com.gemstone.gemfire.internal.cache.ProxyBucketRegion.recoverFromDiskRecursively(ProxyBucketRegion.java:403)
>       at 
> com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run2(PRHARedundancyProvider.java:1765)
>       at 
> com.gemstone.gemfire.internal.cache.partitioned.RecoveryRunnable.run(RecoveryRunnable.java:64)
>       at 
> com.gemstone.gemfire.internal.cache.PRHARedundancyProvider$4.run(PRHARedundancyProvider.java:1757)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The dataStoregemfire1_31558/system.log contains this warning showing the 
> above thread is waiting for member dataStoregemfire1_client-13_31576:
> {noformat}
> [warning 2016/07/03 04:10:04.000 UTC dataStoregemfire1_client-13_31558 
> <Recovery thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> 
> tid=0x2b8] 15 seconds have elapsed while waiting for replies: 
> <com.gemstone.gemfire.internal.cache.InitialImageOperation$ImageProcessor 
> 4138 waiting for 1 replies from 
> [client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027]; waiting 
> for 0 messages in-flight; region=/__PR/_B__dataStoreRegion_93; abort=false> 
> on client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025 whose 
> current membership list is: [[client-13(31491:locator)<ec><v0>:1024, 
> client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, 
> client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027, 
> client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, 
> client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029, 
> client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]]
> {noformat}
> *Member dataStoregemfire1_client-13_31576*
> The bgexec16591_31576.log thread dumps show several blocked Pooled High 
> Priority Message Processor threads waiting for entries while processing 
> {{InitialImageOperations}}:
> {noformat}
> "Pooled High Priority Message Processor 11" #372 daemon prio=10 os_prio=0 
> tid=0x00007f609c047000 nid=0x581 waiting for monitor entry 
> [0x00007f6090e4f000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at 
> com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.chunkEntries(InitialImageOperation.java:1857)
>       - waiting to lock <0x00000000f16ce110> (a 
> com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
>       at 
> com.gemstone.gemfire.internal.cache.InitialImageOperation$RequestImageMessage.process(InitialImageOperation.java:1657)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:450)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:611)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager$5$1.run(DistributionManager.java:922)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The P2P message reader that has the entry lock is waiting for replies from 
> dataStoregemfire1_client-13_31558:31558 shown by the log warning and thread 
> below:
> {noformat}
> [warning 2016/07/03 04:10:03.671 UTC dataStoregemfire1_client-13_31576 <P2P 
> message reader for 
> client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 unshared 
> ordered uid=416 dom #1 port=35558> tid=0x150] 15 seconds have elapsed while 
> waiting for replies: <DistributedCacheOperation$CacheOperationReplyProcessor 
> 4707 waiting for 2 replies from 
> [client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, 
> client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025]> on 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 whose current 
> membership list is: [[client-13(31491:locator)<ec><v0>:1024, 
> client-13(dataStoregemfire1_client-13_31558:31558)<ec><v6>:1025, 
> client-13(dataStoregemfire2_client-13_482:482)<ec><v3>:1026, 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027, 
> client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028, 
> client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029, 
> client-13(dataStoregemfire2_client-13_460:460)<ec><v4>:1030]]
> {noformat}
> {noformat}
> "P2P message reader for 
> client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029 unshared 
> ordered uid=416 dom #1 port=35558" #336 daemon prio=10 os_prio=0 
> tid=0x00007f6169731800 nid=0x4d5 waiting on condition [0x00007f6093373000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f16cdfc8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>       at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>       at 
> com.gemstone.gemfire.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:743)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:819)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:796)
>       at 
> com.gemstone.gemfire.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:886)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:743)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:622)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.distribute(AbstractUpdateOperation.java:71)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:634)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736)
>       - locked <0x00000000f16ce110> (a 
> com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1275)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataStore.putLocally(PartitionedRegionDataStore.java:1250)
>       at 
> com.gemstone.gemfire.internal.cache.PartitionedRegionDataView.putEntryOnRemote(PartitionedRegionDataView.java:107)
>       at 
> com.gemstone.gemfire.internal.cache.partitioned.PutMessage.operateOnPartitionedRegion(PutMessage.java:833)
>       at 
> com.gemstone.gemfire.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:339)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382)
>       at 
> com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726)
>       at 
> com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Back in bgexec19832_31558.log, the thread dumps show a number of P2P message 
> reader threads for dataStoregemfire1_client-13_31576:31576 stuck waiting for 
> the {{AbstractGatewaySender's lifeCycleLock's readLock}} here:
> {noformat}
> "P2P message reader for 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027 unshared 
> ordered uid=537 dom #2 port=55007" #868 daemon prio=10 os_prio=0 
> tid=0x00007f40403a4000 nid=0xa23 waiting on condition [0x00007f3fc3442000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000000f1cbe5f8> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>       at 
> com.gemstone.gemfire.internal.cache.wan.AbstractGatewaySender.distribute(AbstractGatewaySender.java:928)
>       at 
> com.gemstone.gemfire.internal.cache.LocalRegion.notifyGatewaySender(LocalRegion.java:6485)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.notifyGatewaySender(BucketRegion.java:654)
>       at 
> com.gemstone.gemfire.internal.cache.LocalRegion.basicPutPart2(LocalRegion.java:6022)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.basicPutPart2(BucketRegion.java:644)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractRegionMap.basicPut(AbstractRegionMap.java:2736)
>       - locked <0x00000000f18891f0> (a 
> com.gemstone.gemfire.internal.cache.VersionedThinRegionEntryHeapStringKey2)
>       at 
> com.gemstone.gemfire.internal.cache.BucketRegion.virtualPut(BucketRegion.java:485)
>       at 
> com.gemstone.gemfire.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:132)
>       at 
> com.gemstone.gemfire.internal.cache.LocalRegion.basicUpdate(LocalRegion.java:5817)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractUpdateOperation.doPutOrCreate(AbstractUpdateOperation.java:148)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.basicOperateOnRegion(AbstractUpdateOperation.java:286)
>       at 
> com.gemstone.gemfire.internal.cache.AbstractUpdateOperation$AbstractUpdateMessage.operateOnRegion(AbstractUpdateOperation.java:255)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.basicProcess(DistributedCacheOperation.java:1191)
>       at 
> com.gemstone.gemfire.internal.cache.DistributedCacheOperation$CacheOperationMessage.process(DistributedCacheOperation.java:1092)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:379)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionMessage.schedule(DistributionMessage.java:442)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager.scheduleIncomingMessage(DistributionManager.java:3519)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager.handleIncomingDMsg(DistributionManager.java:3142)
>       at 
> com.gemstone.gemfire.distributed.internal.DistributionManager$MyListener.messageReceived(DistributionManager.java:4341)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.dispatchMessage(GMSMembershipManager.java:1100)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager.handleOrDeferMessage(GMSMembershipManager.java:1028)
>       at 
> com.gemstone.gemfire.distributed.internal.membership.gms.mgr.GMSMembershipManager$MyDCReceiver.messageReceived(GMSMembershipManager.java:382)
>       at 
> com.gemstone.gemfire.distributed.internal.direct.DirectChannel.receive(DirectChannel.java:726)
>       at 
> com.gemstone.gemfire.internal.tcp.TCPConduit.messageReceived(TCPConduit.java:815)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.dispatchMessage(Connection.java:3961)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.processNIOBuffer(Connection.java:3545)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.runNioReader(Connection.java:1837)
>       at 
> com.gemstone.gemfire.internal.tcp.Connection.run(Connection.java:1706)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> These threads will never get the {{readLock}} since the {{writeLock}} is 
> blocked.
> This deadlock only occurs when the {{AsyncEventQueue}} is persistent, but its 
> attached data region is not.
> The regions being recovered by the {{AsyncEventQueue}} recovery threads are 
> the actual data regions. Its the dataStoreRegion that is being GIIed not the 
> {{AsyncEventQueue}} region:
> {noformat}
> [info 2016/07/03 04:09:48.504 UTC dataStoregemfire1_client-13_31558 <Recovery 
> thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_102> 
> tid=0x2c1] Region _B__dataStoreRegion_102 requesting initial image from 
> client-13(dataStoregemfire1_client-13_31563:31563)<ec><v1>:1028
> [info 2016/07/03 04:09:48.968 UTC dataStoregemfire1_client-13_31558 <Recovery 
> thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_54> 
> tid=0x28f] Region _B__dataStoreRegion_54 requesting initial image from 
> client-13(dataStoregemfire2_client-13_31595:31595)<ec><v1>:1029
> [info 2016/07/03 04:09:49.007 UTC dataStoregemfire1_client-13_31558 <Recovery 
> thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_99> 
> tid=0x2be] Region _B__dataStoreRegion_99 requesting initial image from 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> [info 2016/07/03 04:09:49.202 UTC dataStoregemfire1_client-13_31558 <Recovery 
> thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_93> 
> tid=0x2b8] Region _B__dataStoreRegion_93 requesting initial image from 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> [info 2016/07/03 04:09:49.206 UTC dataStoregemfire1_client-13_31558 <Recovery 
> thread for bucket 
> _B__AsyncEventQueue__wbclQueue__PARALLEL__GATEWAY__SENDER__QUEUE_64> 
> tid=0x299] Region _B__dataStoreRegion_64 requesting initial image from 
> client-13(dataStoregemfire1_client-13_31576:31576)<ec><v1>:1027
> {noformat}
> The code below is from the {{ProxyBucketRegion recoverFromDisk}} method which 
> is executed during recovery of the {{AsyncEventQueue}} bucket. This is the 
> source of the data region GII:
> {noformat}
> if(this.partitionedRegion.getDataPolicy().withPersistence() && 
> !colocatedRegion.getDataPolicy().withPersistence()) {
>       result = colocatedRegion.getDataStore()
>       .grabBucket(bid, getDistributionManager().getDistributionManagerId(), 
>                       true, true, false, null, true);
>   ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to