Anton Kalashnikov created IGNITE-14197: ------------------------------------------
Summary: Checkpoint thread can't take checkpoint write lock because it waits for parked threads to complete their work Key: IGNITE-14197 URL: https://issues.apache.org/jira/browse/IGNITE-14197 Project: Ignite Issue Type: Bug Reporter: Anton Kalashnikov Assignee: Anton Kalashnikov In case of enabled write throttling, when, for example, node parks data streamer thread, it still holds checkpoint read lock and it leads to the long pauses on waiting for checkpoint lock: [2020-07-23 07:09:21,614][INFO ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of WAL without checkpoint'] All operations at this moment are blocked. Sometimes, it can lead to a complete disaster: Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855* {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394) at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997) at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473) at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306) at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441) at org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770) at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278) at org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139) at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104) at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) at java.lang.Thread.run(Thread.java:748) {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)