[ https://issues.apache.org/jira/browse/GEODE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Shu updated GEODE-5186: ---------------------------- Affects Version/s: 1.1.0 1.1.1 1.2.0 1.3.0 1.2.1 1.4.0 1.5.0 1.6.0 > set operation in a client transaction could cause the transaction to hang > ------------------------------------------------------------------------- > > Key: GEODE-5186 > URL: https://issues.apache.org/jira/browse/GEODE-5186 > Project: Geode > Issue Type: Bug > Components: transactions > Affects Versions: 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.2.1, 1.4.0, 1.5.0, 1.6.0 > Reporter: Eric Shu > Priority: Major > > During an entry operation in a client transaction, server connection could be > lost. In this case, client will failover to another server and try to resume > the transaction and retry the operation if the original transaction host node > is found. > If this operation happens to be a keySet operation (or other set operations) > on a partitioned region, the transaction could hang due to a deadlock. > The scenario is the original tx host node holds its transactional lock when > sending fetchKey request to other nodes hosting the partitioned region data. > The node on which the client transaction failed over, will hold its > transactional lock while sending the FetchKey message to transaction hosting > node. > These two FetchKeyMessage will not be able to be processed as processing > these tx message requires to hold the lock. But the locks are already been > held by the nodes handing the client message of the transaction. > {noformat} > vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) > state=WAITING > waiting to lock > <java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb> > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4 > vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 > ID=0x128(296) state=TIMED_WAITING > waiting to lock <java.util.concurrent.CountDownLatch$Sync@226dbb4> > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853) > at > org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541) > at > org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342) > at > org.apache.geode.internal.cache.TXStateStub.getBucketKeys(TXStateStub.java:644) > at > org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730) > at > org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066) > at > org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024) > at > java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041) > at > org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168) > at > org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869) > at > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@3ca60534 > java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb > vm_0_bridge1_latvia_25064:PartitionedRegion Message Processor4 ID=0x2b8(696) > state=WAITING > waiting to lock > <java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785> > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) > at > org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921) > at > org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881) > at > org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332) > at > org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378) > at > org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121) > at > org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109) > at > org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.ThreadPoolExecutor$Worker@71b1b4c5 > vm_0_bridge1_latvia_25064:ServerConnection on port 24946 Thread 0 > ID=0x29b(667) state=TIMED_WAITING > waiting to lock <java.util.concurrent.CountDownLatch$Sync@41e6d28f> > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61) > at > org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766) > at > org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853) > at > org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541) > at > org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342) > at > org.apache.geode.internal.cache.TXState.getBucketKeys(TXState.java:1852) > at > org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730) > at > org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066) > at > org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024) > at > java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041) > at > org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168) > at > org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126) > at > org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869) > at > org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77) > at > org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644) > at java.lang.Thread.run(Thread.java:745) > Locked synchronizers: > java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785 > java.util.concurrent.ThreadPoolExecutor$Worker@51e84752 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)