[ https://issues.apache.org/jira/browse/IGNITE-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412810#comment-17412810 ]
Maxim Muzafarov commented on IGNITE-15429: ------------------------------------------ Cherry-picked to 2.11 > Taking a snapshot may increase the PME execution time by the > checkpointFrequency interval. > ------------------------------------------------------------------------------------------ > > Key: IGNITE-15429 > URL: https://issues.apache.org/jira/browse/IGNITE-15429 > Project: Ignite > Issue Type: Bug > Reporter: Pavel Pereslegin > Assignee: Pavel Pereslegin > Priority: Major > Labels: iep-43 > Fix For: 2.11 > > Time Spent: 20m > Remaining Estimate: 0h > > When a snapshot is taken, a checkpoint is forced on all server nodes. > In a rare case, when forcing a checkpoint, the start of the snapshot > operation is set to the planned (instead of the current) checkpoint. In this > case, the local partition exchange future does not finish until the next > checkpoint starts (by timeout), which significantly increases the execution > time of the exchange. > Log output on a node with a checkpoint frequency of 60 seconds > (_stage="Exchange done" (61542 ms)_). > {noformat} > 2021-08-31 23:30:04.792 [INFO > ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotFutureTask] > Snapshot operation is scheduled on local node and will be handled by the > checkpoint listener [sctx=SnapshotFutureTask > [pageStore=GridCacheSharedManagerAdapter [starting=true, stop=false], > srcNodeId=a49f4c59-a4d1-4b02-b416-ceede4ffc0ba, > snpName=20210831233001_snapshot, > tmpSnpWorkDir=/opt/ignite/ssd/data/something/snp/20210831233001_snapshot, > locBuff=java.lang.ThreadLocal$SuppliedThreadLocal@6fd06d5f, > ioFactory=org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIOFactory@54067b77, > cpEndFut=java.util.concurrent.CompletableFuture@382a989[Not completed], > startedFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, > hash=378462544], > tmpConsIdDir=/opt/ignite/ssd/data/something/snp/20210831233001_snapshot/db/something, > closeFut=null, err=null, started=true], topVer=AffinityTopologyVersion > [topVer=515, minorTopVer=2]] > 2021-08-31 23:30:05.444 [INFO > ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] > Checkpoint started [checkpointId=b0e9b43d-02f4-44fb-90c5-41dc6c294248, > startPtr=FileWALPointer [idx=13473, fileOff=1008077757, len=236399], > checkpointBeforeLockTime=352ms, checkpointLockWait=0ms, > checkpointListenersExecuteTime=325ms, checkpointLockHoldTime=331ms, > walCpRecordFsyncDuration=3ms, writeCheckpointEntryDuration=0ms, > splitAndSortCpPagesDuration=8ms, pages=15001, reason='timeout'] > 2021-08-31 23:30:05.671 [INFO > ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] > Checkpoint finished [cpId=b0e9b43d-02f4-44fb-90c5-41dc6c294248, > pages=15001,markPos=FileWALPointer [idx=13473, fileOff=1008077757, > len=236399], walSegmentsCleared=0, walSegmentsCovered=[], markDuration=343ms, > pagesWrite=128ms, fsync=99ms, total=922ms] > ...60 seconds later... > 2021-08-31 23:31:05.779 [INFO > ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotSender] > Resolved snapshot work directory: > /opt/ignite/sas/snapshot/20210831233001_snapshot/db/something > 2021-08-31 23:31:05.812 [INFO > ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.snapshot.SnapshotFutureTask] > Submit partition processing tasks with partition allocated lengths: ... > 2021-08-31 23:31:05.837 [INFO > ][db-checkpoint-thread-#236][org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager] > Skipping checkpoint (no pages were modified) > [checkpointBeforeLockTime=328ms, checkpointLockWait=0ms, > checkpointListenersExecuteTime=298ms, checkpointLockHoldTime=304ms, > reason='timeout'] > ... > 2021-08-31 23:31:06.315 [INFO > ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture] > Completed partition exchange > [localNode=1bd343d0-d6b5-4ea3-880b-ebda77bc48d7, > exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion > [topVer=515, minorTopVer=2], evt=DISCOVERY_CUSTOM_EVT, > evtNode=TcpDiscoveryNode [id=a49f4c59-a4d1-4b02-b416-ceede4ffc0ba, > consistentId=epk_rb_sylvanas40.ca.sbrf.ru, addrs=ArrayList [10.126.13.32, > 127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47500, > sylvanas40.ca.sbrf.ru/10.126.13.32:47500], discPort=47500, order=60, > intOrder=60, lastExchangeTime=1626477723350, loc=false, > ver=2.9.0#20210705-sha1:b32c94b2, isClient=false], rebalanced=true, > done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=515, > minorTopVer=2]] > 2021-08-31 23:31:06.315 [INFO > ][exchange-worker-#179][org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture] > Exchange timing [startVer=AffinityTopologyVersion [topVer=515, > minorTopVer=2], resVer=AffinityTopologyVersion [topVer=515, minorTopVer=2], > stage="Waiting in exchange queue" (0 ms), stage="Exchange parameters > initialization" (1 ms), stage="Determine exchange type" (24 ms), > stage="Preloading notification" (0 ms), stage="Wait partitions release > [latch=exchange]" (0 ms), stage="Wait partitions release latch > [latch=exchange]" (40 ms), stage="Wait partitions release [latch=exchange]" > (0 ms), stage="After states restored callback" (0 ms), stage="WAL history > reservation" (12 ms), stage="Exchange done" (61542 ms), stage="Total time" > (61619 ms)] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)