Anton Kalashnikov created IGNITE-14197:
------------------------------------------

             Summary: Checkpoint thread can't take checkpoint write lock 
because it waits for parked threads to complete their work
                 Key: IGNITE-14197
                 URL: https://issues.apache.org/jira/browse/IGNITE-14197
             Project: Ignite
          Issue Type: Bug
            Reporter: Anton Kalashnikov
            Assignee: Anton Kalashnikov


In case of enabled write throttling, when, for example, node parks data 
streamer thread, it still holds checkpoint read lock and it leads to the long 
pauses on waiting for checkpoint lock:
[2020-07-23 07:09:21,614][INFO 
][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint started 
[checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, startPtr=FileWALPointer 
[idx=56913, fileOff=10362905, len=41972], checkpointBeforeLockTime=1983ms, 
*checkpointLockWait=812117ms*, checkpointListenersExecuteTime=90ms, 
checkpointLockHoldTime=93ms, walCpRecordFsyncDuration=123ms, 
writeCheckpointEntryDuration=4ms, splitAndSortCpPagesDuration=4155ms, 
pages=10516815, reason='too big size of WAL without checkpoint']
All operations at this moment are blocked.

Sometimes, it can lead to a complete disaster:
Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
{quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
at 
org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
at 
org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to