[
https://issues.apache.org/jira/browse/IGNITE-28242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Chugunov reassigned IGNITE-28242:
----------------------------------------
Assignee: Sergey Chugunov
> Possible deadlock as GridCacheMapEntry lock is aquired before cp readlock
> -------------------------------------------------------------------------
>
> Key: IGNITE-28242
> URL: https://issues.apache.org/jira/browse/IGNITE-28242
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.17
> Reporter: Sergey Chugunov
> Assignee: Sergey Chugunov
> Priority: Critical
> Labels: ise
> Fix For: 2.19
>
>
> In one instance a stack trace was observed (common parts are omitted):
> {code:java}
> "client-connector-#44" #83 prio=5 os_prio=0 cpu=5738716.41ms
> elapsed=1685639.62s tid=0x00007f9b90074000 nid=0x2349 waiting on condition
> [0x00007f9b704e7000]
> java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> - parking to wait for <0x0000000658588260> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared([email protected]/AbstractQueuedSynchronizer.java:1009)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared([email protected]/AbstractQueuedSynchronizer.java:1324)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock([email protected]/ReentrantReadWriteLock.java:738)
> at
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointReadWriteLock.readLock(CheckpointReadWriteLock.java:69)
> at
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointTimeoutLock.checkpointReadLock(CheckpointTimeoutLock.java:117)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1602)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:460)
> <<--<<-- cp readLock is requested here
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGet0(GridCacheMapEntry.java:670)
> <<--<<-- lock in this GridCacheMapEntry is acquired here
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGetVersioned(GridCacheMapEntry.java:608)
> at
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.enlistRead(GridNearTxLocal.java:2370)
> at
> org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.getAllAsync(GridNearTxLocal.java:1839)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache$1.op(GridDhtColocatedCache.java:201)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter$AsyncOp.op(GridCacheAdapter.java:5071)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOp(GridCacheAdapter.java:3934)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:199)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAsync(GridCacheAdapter.java:4169)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1399)
> {code}
> At the same time another thread was found in thread dump holding the cp
> readlock and waiting for a lock on GridCacheMapEntry:
> {code}
> "client-connector-#49" #100 prio=5 os_prio=0 cpu=5745735.56ms
> elapsed=1685597.78s tid=0x00007f9b9409d800 nid=0x2390 waiting on condition
> [0x00007f9b5fa0e000]
> java.lang.Thread.State: WAITING (parking)
> at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
> - parking to wait for <0x00000007ee8095e0> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
> at
> java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:267)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.lockEntry(GridCacheMapEntry.java:4164)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.obsolete(GridCacheMapEntry.java:2088)
> at
> org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:134)
> at
> org.apache.ignite.internal.processors.cache.GridCacheConcurrentMapImpl.putEntryIfObsoleteOrAbsent(GridCacheConcurrentMapImpl.java:70)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridCachePartitionedConcurrentMap.putEntryIfObsoleteOrAbsent(GridCachePartitionedConcurrentMap.java:96)
> at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.entryEx(GridCacheAdapter.java:918)
> at
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.entryEx(GridDhtCacheAdapter.java:434)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.lockMultiple(IgniteTxManager.java:1943)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager.prepareTx(IgniteTxManager.java:1162)
> at
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userPrepare(IgniteTxLocalAdapter.java:403)
> {code}
> It is possible that these two threads are blocking each other while
> checkpointer thread prevents one of them to aquire a cp readlock and thus
> make progress.
> We need to investigate the order of locks being aquired and fix possibilities
> for deadlocks here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)