Sergey, not so easy to recognize the problem, also with logs, plz fill the ticket and append link to this message or duplicate all logs there. thanks ! >From: "Sergey Korotkov" < serge.korot...@gmail.com > >To: u...@ignite.apache.org >Cc: >Subject: [2.11.0]: 'B+Tree is corrupted' exception in >GridCacheTtlManager.expire() and >PartitionsEvictManager$PartitionEvictionTask.run() on node start >Date: Thu, 18 Nov 2021 14:24:59 +0300 > >Hello, > >We have troubles with the CorruptedTreeException: B+Tree is corrupted >during the node start after cluster restart. Looks like the caches with >the Expiry Policy configured are source of the problems. > >I have attached the log from the problem node. The exact steps with >timestamps are as folows. Before the deactivation cluster works fine >about 5 days > >2021-11-08 10:54:44 cluster deactivate request > >2021-11-08 10:59:33 cluster deactivated > >2021-11-08 11:02:30 stop all nodes > >2021-11-08 11:02:39 start all nodes > >2021-11-08 11:03:14 auto-activation start > >2021-11-08 11:03:16 cluster activated > >2021-11-08 11:03:21 'B+Tree is corrupted' exception in >GridCacheTtlManager.expire() on one of the nodes (see the >10.12.86.29-ignite-2021-11-08.0.log): > >[2021-11-08 11:03:21,820][ERROR][ttl-cleanup-worker-#215][ROOT]{} >Critical system error detected. Will be handled accordingly to >configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, >timeout=0, super=AbstractFailureHandler >[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, >SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext >[type=CRITICAL_ERROR, err=class >o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree >is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on bounds: >[lower=null, upper=PendingRow []]]]] >org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: >B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on >bounds: [lower=null, upper=PendingRow []]] > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095) > at >org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:3076) > at >org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:3023) > at >org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1255) > at >org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246) > at >org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:193) > at >java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769) > at >org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:192) > at >org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) > at java.lang.Thread.run(Thread.java:748) >Caused by: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >java.lang.IllegalStateException: Item not found: 3 > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118) > ... 11 common frames omitted >Caused by: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >java.lang.IllegalStateException: Item not found: 3 > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:123) > at >org.apache.ignite.internal.processors.cache.tree.PendingRow.initKey(PendingRow.java:73) > at >org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:127) > at >org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:32) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068) > ... 12 common frames omitted >Caused by: java.lang.IllegalStateException: Item not found: 3 > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476) > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584) > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316) > ... 22 common frames omitted > > >Error message suggests to run the >org.apache.ignite.development.utis.IgniteWalConverter to diagnose the >problem. I attached the output of this utility: > > - corruptedPages_2021-11-08_11-03-21_999.txt - file created by ignite >on crash > > - diag-2021-11-08.txt - output of the diagnostic utility. > > >Next day we try to start this node again and it still fail with 'B+Tree >is corrupted' but in different place: > >2021-11-09 12:37:34 'B+Tree is corrupted' exception in >PartitionsEvictManager$PartitionEvictionTask.run() (see the >10.12.86.29-ignite-2021-11-09.0.log) > > >[2021-11-09 12:43:10,857][ERROR][rebalance-#344][ROOT]{} Critical system >error detected. Will be handled accordingly to configured handler >[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, >super=AbstractFailu >reHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, >SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext >[type=CRITICAL_ERROR, err=class >o.a.i.i.processors.cache.persistence.tree.CorruptedT >reeException: B+Tree is corrupted [pages(groupId, pageId)=[], >msg=Runtime failure on bounds: [lower=null, upper=null]]]] >org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException: >B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on >bounds: [lower=null, upper=null]] > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133) > at >org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:167) > at >org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:63) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095) > at >org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.cursor(IgniteCacheOffheapManagerImpl.java:2914) > at >org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.cursor(GridCacheOffheapManager.java:2856) > at >org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.onHasNext(IgniteCacheOffheapManagerImpl.java:938) > at >org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53) > at >org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45) > at >org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:987) > at >org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:409) > at >java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at >java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) >Caused by: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >java.lang.IllegalStateException: Item not found: 16 > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118) > ... 16 common frames omitted >Caused by: >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException: >java.lang.IllegalStateException: Item not found: 16 > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136) > at >org.apache.ignite.internal.processors.cache.tree.DataRow.<init>(DataRow.java:55) > at >org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(CacheDataRowStore.java:129) > at >org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:422) > at >org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:63) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512) > at >org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068) > ... 17 common frames omitted >Caused by: java.lang.IllegalStateException: Item not found: 16 > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476) > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584) > at >org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380) > at >org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316) > > >I also attached the corrupted pages file and diagnostic output for this >case as well > >- corruptedPages_2021-11-09_12-43-12_449.txt > >- diag-2021-11-09.txt > > >In both cases pages are for the caches with the Expire Policy configured. > > >**** > >What can be done about that? Is there any recommended way to stop/start >ignite cluster to prevent such data loss problems? > >**** > >I see some similar fixed issues in Jira like >https://issues.apache.org/jira/browse/IGNITE-12489 or >https://issues.apache.org/jira/browse/IGNITE-14093 but looks like >something still not working in 2.11.0. > > >Thanks,