[ https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442297#comment-16442297 ]
Andrew Mashenkov commented on IGNITE-8295: ------------------------------------------ After wrap partStoreLock into checkpointLock i've got next stacktrace. Seems, we should truncate partition file under checkpointLock. java.lang.AssertionError: FullPageId [pageId=0001005700000003, effectivePageId=0000005700000003, grpId=2141373874] at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:730) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:624) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:142) at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:186) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:164) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3155) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2909) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2808) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) > Possible deadlock on partition eviction. > ---------------------------------------- > > Key: IGNITE-8295 > URL: https://issues.apache.org/jira/browse/IGNITE-8295 > Project: Ignite > Issue Type: Bug > Components: persistence > Reporter: Andrew Mashenkov > Assignee: Andrew Mashenkov > Priority: Major > Fix For: 2.6 > > Attachments: deadlock.stack > > > GridCacheOffheapManager.recreateCacheDataStore() calls > updatePartitionCounter() under partStoreLock which may try to acquire > checkpointReadLock. > recreateCacheDataStore() method can be called with checkpointReadLock (on > GridDhtPartitionsExchangeFuture.updatePartitionFullMap) > or without checkpointReadLock (GridDhtPartitionEvictor thread calls > evictPartitionAsync), > So, checkpoint can cause a deadlock if it happens in between. > Seems, we should acquire checkpointReadLock before partStoreLock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)