Eric Shu created GEODE-5186:
-------------------------------

             Summary: set operation in a client transaction could cause the 
transaction to hang
                 Key: GEODE-5186
                 URL: https://issues.apache.org/jira/browse/GEODE-5186
             Project: Geode
          Issue Type: Bug
          Components: transactions
            Reporter: Eric Shu


During an entry operation in a client transaction, server connection could be 
lost. In this case, client will failover to another server and try to resume 
the transaction and retry the operation if the original transaction host node 
is found. 

If this operation happens to be a keySet operation (or other set operations) on 
a partitioned region, the transaction could hang due to a deadlock.

The scenario is the original tx host node holds its transactional lock when 
sending fetchKey request to other nodes hosting the partitioned region data. 
The node on which the client transaction failed over, will hold its 
transactional lock while sending the FetchKey message to transaction hosting 
node.

These two FetchKeyMessage will not be able to be processed as processing these 
tx message requires to hold the lock. But the locks are already been held by 
the nodes handing the client message of the transaction.

{noformat}
vm_6_bridge7_latvia_25133:PartitionedRegion Message Processor10 ID=0xe2(226) 
state=WAITING
        waiting to lock 
<java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb>
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at 
org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
        at 
org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
        at 
org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
        at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
        at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
        at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
java.util.concurrent.ThreadPoolExecutor$Worker@c84d7d4

vm_6_bridge7_latvia_25133:ServerConnection on port 23931 Thread 10 
ID=0x128(296) state=TIMED_WAITING
        waiting to lock <java.util.concurrent.CountDownLatch$Sync@226dbb4>
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
        at 
org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
        at 
org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
        at 
org.apache.geode.internal.cache.TXStateStub.getBucketKeys(TXStateStub.java:644)
        at 
org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
        at 
org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
        at 
org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
        at 
java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at 
org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
        at 
org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
        at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
        at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
        at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
java.util.concurrent.ThreadPoolExecutor$Worker@3ca60534
java.util.concurrent.locks.ReentrantLock$NonfairSync@453d49bb

vm_0_bridge1_latvia_25064:PartitionedRegion Message Processor4 ID=0x2b8(696) 
state=WAITING
        waiting to lock 
<java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785>
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at 
org.apache.geode.internal.cache.TXManagerImpl.getLock(TXManagerImpl.java:921)
        at 
org.apache.geode.internal.cache.TXManagerImpl.masqueradeAs(TXManagerImpl.java:881)
        at 
org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
        at 
org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:378)
        at 
org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:444)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.runUntilShutdown(ClusterDistributionManager.java:1121)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager.access$000(ClusterDistributionManager.java:109)
        at 
org.apache.geode.distributed.internal.ClusterDistributionManager$8$1.run(ClusterDistributionManager.java:945)
        at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
java.util.concurrent.ThreadPoolExecutor$Worker@71b1b4c5

vm_0_bridge1_latvia_25064:ServerConnection on port 24946 Thread 0 ID=0x29b(667) 
state=TIMED_WAITING
        waiting to lock <java.util.concurrent.CountDownLatch$Sync@41e6d28f>
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:61)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
        at 
org.apache.geode.internal.cache.partitioned.FetchKeysMessage$FetchKeysResponse.waitForKeys(FetchKeysMessage.java:541)
        at 
org.apache.geode.internal.cache.PartitionedRegion.getBucketKeys(PartitionedRegion.java:4342)
        at 
org.apache.geode.internal.cache.TXState.getBucketKeys(TXState.java:1852)
        at 
org.apache.geode.internal.cache.TXStateProxyImpl.getBucketKeys(TXStateProxyImpl.java:730)
        at 
org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.getNextBucketIter(PartitionedRegion.java:6066)
        at 
org.apache.geode.internal.cache.PartitionedRegion$KeysSet$KeysSetIterator.hasNext(PartitionedRegion.java:6024)
        at 
java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at 
org.apache.geode.internal.cache.tier.sockets.command.KeySet.fillAndSendKeySetResponseChunks(KeySet.java:168)
        at 
org.apache.geode.internal.cache.tier.sockets.command.KeySet.cmdExecute(KeySet.java:126)
        at 
org.apache.geode.internal.cache.tier.sockets.BaseCommand.execute(BaseCommand.java:157)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.doNormalMsg(ServerConnection.java:869)
        at 
org.apache.geode.internal.cache.tier.sockets.OriginalServerConnection.doOneMessage(OriginalServerConnection.java:77)
        at 
org.apache.geode.internal.cache.tier.sockets.ServerConnection.run(ServerConnection.java:1248)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at 
org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$4$1.run(AcceptorImpl.java:644)
        at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
java.util.concurrent.locks.ReentrantLock$NonfairSync@33b1b785
java.util.concurrent.ThreadPoolExecutor$Worker@51e84752

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to