[ 
https://issues.apache.org/jira/browse/IGNITE-28242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Chugunov reassigned IGNITE-28242:
----------------------------------------

    Assignee: Sergey Chugunov

> Possible deadlock as GridCacheMapEntry lock is aquired before cp readlock
> -------------------------------------------------------------------------
>
>                 Key: IGNITE-28242
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28242
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.17
>            Reporter: Sergey Chugunov
>            Assignee: Sergey Chugunov
>            Priority: Critical
>              Labels: ise
>             Fix For: 2.19
>
>
> In one instance a stack trace was observed (common parts are omitted):
> {code:java}
> "client-connector-#44" #83 prio=5 os_prio=0 cpu=5738716.41ms 
> elapsed=1685639.62s tid=0x00007f9b90074000 nid=0x2349 waiting on condition  
> [0x00007f9b704e7000]
>    java.lang.Thread.State: WAITING (parking)
>       at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
>       - parking to wait for  <0x0000000658588260> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>       at 
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointReadWriteLock.readLock(CheckpointReadWriteLock.java:69)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:117)
>       at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1602)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:460)
>  <<--<<-- cp readLock is requested here
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGet0(GridCacheMapEntry.java:670)
>  <<--<<-- lock in this GridCacheMapEntry is acquired here
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGetVersioned(GridCacheMapEntry.java:608)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistRead(GridNearTxLocal.java:2370)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1839)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$1.op(GridDhtColocatedCache.java:201)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5071)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:3934)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:199)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAsync(GridCacheAdapter.java:4169)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1399)
> {code}
> At the same time another thread was found in thread dump holding the cp 
> readlock and waiting for a lock on GridCacheMapEntry:
> {code}
> "client-connector-#49" #100 prio=5 os_prio=0 cpu=5745735.56ms 
> elapsed=1685597.78s tid=0x00007f9b9409d800 nid=0x2390 waiting on condition  
> [0x00007f9b5fa0e000]
>    java.lang.Thread.State: WAITING (parking)
>       at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
>       - parking to wait for  <0x00000007ee8095e0> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>       at 
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
>       at 
> java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:267)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:4164)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.obsolete(GridCacheMapEntry.java:2088)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:134)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:70)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:96)
>       at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:918)
>       at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:434)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.lockMultiple(IgniteTxManager.java:1943)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.prepareTx(IgniteTxManager.java:1162)
>       at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userPrepare(IgniteTxLocalAdapter.java:403)
> {code}
> It is possible that these two threads are blocking each other while 
> checkpointer thread prevents one of them to aquire a cp readlock and thus 
> make progress.
> We need to investigate the order of locks being aquired and fix possibilities 
> for deadlocks here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to