Sergey, not so easy to recognize the problem, also with logs, plz fill the 
ticket and append link to this message or duplicate all logs there.
 
thanks !
 
>From: "Sergey Korotkov" < serge.korot...@gmail.com >
>To:  u...@ignite.apache.org
>Cc:
>Subject: [2.11.0]: 'B+Tree is corrupted' exception in
>GridCacheTtlManager.expire() and
>PartitionsEvictManager$PartitionEvictionTask.run() on node start
>Date: Thu, 18 Nov 2021 14:24:59 +0300
>
>Hello,
>
>We have troubles with the CorruptedTreeException: B+Tree is corrupted
>during the node start after cluster restart. Looks like the caches with
>the Expiry Policy configured are source of the problems.
>
>I have attached the log from the problem node. The exact steps with
>timestamps are as folows. Before the deactivation cluster works fine
>about 5 days
>
>2021-11-08 10:54:44 cluster deactivate request
>
>2021-11-08 10:59:33 cluster deactivated
>
>2021-11-08 11:02:30 stop all nodes
>
>2021-11-08 11:02:39 start all nodes
>
>2021-11-08 11:03:14 auto-activation start
>
>2021-11-08 11:03:16 cluster activated
>
>2021-11-08 11:03:21 'B+Tree is corrupted' exception in
>GridCacheTtlManager.expire() on one of the nodes (see the
>10.12.86.29-ignite-2021-11-08.0.log):
>
>[2021-11-08 11:03:21,820][ERROR][ttl-cleanup-worker-#215][ROOT]{}
>Critical system error detected. Will be handled accordingly to
>configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false,
>timeout=0, super=AbstractFailureHandler
>[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
>SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
>[type=CRITICAL_ERROR, err=class
>o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree
>is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on bounds:
>[lower=null, upper=PendingRow []]]]]
>org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on
>bounds: [lower=null, upper=PendingRow []]]
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095)
>         at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:3076)
>         at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:3023)
>         at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1255)
>         at
>org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246)
>         at
>org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:193)
>         at
>java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
>         at
>org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:192)
>         at
>org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>         at java.lang.Thread.run(Thread.java:748)
>Caused by:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>java.lang.IllegalStateException: Item not found: 3
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
>         ... 11 common frames omitted
>Caused by:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>java.lang.IllegalStateException: Item not found: 3
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:123)
>         at
>org.apache.ignite.internal.processors.cache.tree.PendingRow.initKey(PendingRow.java:73)
>         at
>org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:127)
>         at
>org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:32)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068)
>         ... 12 common frames omitted
>Caused by: java.lang.IllegalStateException: Item not found: 3
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316)
>         ... 22 common frames omitted
>
>
>Error message suggests to run the
>org.apache.ignite.development.utis.IgniteWalConverter to diagnose the
>problem. I attached the output of this utility:
>
>  - corruptedPages_2021-11-08_11-03-21_999.txt - file created by ignite
>on crash
>
>  - diag-2021-11-08.txt - output of the diagnostic utility.
>
>
>Next day we try to start this node again and it still fail with 'B+Tree
>is corrupted' but in different place:
>
>2021-11-09 12:37:34 'B+Tree is corrupted' exception in
>PartitionsEvictManager$PartitionEvictionTask.run() (see the
>10.12.86.29-ignite-2021-11-09.0.log)
>
>
>[2021-11-09 12:43:10,857][ERROR][rebalance-#344][ROOT]{} Critical system
>error detected. Will be handled accordingly to configured handler
>[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>super=AbstractFailu
>reHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
>SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
>[type=CRITICAL_ERROR, err=class
>o.a.i.i.processors.cache.persistence.tree.CorruptedT
>reeException: B+Tree is corrupted [pages(groupId, pageId)=[],
>msg=Runtime failure on bounds: [lower=null, upper=null]]]]
>org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime failure on
>bounds: [lower=null, upper=null]]
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
>         at
>org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:167)
>         at
>org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:63)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095)
>         at
>org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.cursor(IgniteCacheOffheapManagerImpl.java:2914)
>         at
>org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.cursor(GridCacheOffheapManager.java:2856)
>         at
>org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.onHasNext(IgniteCacheOffheapManagerImpl.java:938)
>         at
>org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
>         at
>org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
>         at
>org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:987)
>         at
>org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:409)
>         at
>java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>Caused by:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>java.lang.IllegalStateException: Item not found: 16
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
>         ... 16 common frames omitted
>Caused by:
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>java.lang.IllegalStateException: Item not found: 16
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136)
>         at
>org.apache.ignite.internal.processors.cache.tree.DataRow.<init>(DataRow.java:55)
>         at
>org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(CacheDataRowStore.java:129)
>         at
>org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:422)
>         at
>org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:63)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068)
>         ... 17 common frames omitted
>Caused by: java.lang.IllegalStateException: Item not found: 16
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584)
>         at
>org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380)
>         at
>org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316)
>
>
>I also attached the corrupted pages file and diagnostic output for this
>case as well
>
>- corruptedPages_2021-11-09_12-43-12_449.txt
>
>- diag-2021-11-09.txt
>
>
>In both cases pages are for the caches with the Expire Policy configured.
>
>
>****
>
>What can be done about that? Is there any recommended way to stop/start
>ignite cluster to prevent such data loss problems?
>
>****
>
>I see some similar fixed issues in Jira like
>https://issues.apache.org/jira/browse/IGNITE-12489 or
>https://issues.apache.org/jira/browse/IGNITE-14093 but looks like
>something still not working in 2.11.0.
>
>
>Thanks, 
 
 
 
 

Reply via email to