Kirill Tkalenko created IGNITE-26037:
----------------------------------------

             Summary: Error saving FreeList metadata causing checkpointer to 
crash
                 Key: IGNITE-26037
                 URL: https://issues.apache.org/jira/browse/IGNITE-26037
             Project: Ignite
          Issue Type: Bug
            Reporter: Kirill Tkalenko
            Assignee: Kirill Tkalenko
             Fix For: 3.1


When analyzing the log, I found an error when saving FreeList metadata, which 
led to a Checkpointer crash, and this, as a consequence, leads to an 
inoperative node. This needs to be sorted out.

What scenario, there was a cluster of three nodes on which a lot of data was 
loaded, all tables were in a zone with a replica count of 1. After loading all 
the data, the replica count was changed from 1 to 3, which led to multiple 
rebalancings via raft snapshots. After some time, this problem appeared.

{noformat}
2025-07-24 14:11:42:486 +0000 [ERROR][%node1%checkpoint-thread][FailureManager] 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_TERMINATION]
org.apache.ignite.internal.failure.StackTraceCapturingException: IGN-CMN-65535 
Unknown error TraceId:00fda422
        at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:183)
        at 
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:160)
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
        at 
org.apache.ignite.internal.util.worker.IgniteWorker.run(IgniteWorker.java:89)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.ignite.internal.lang.IgniteInternalCheckedException: 
IGN-CMN-65535 java.lang.AssertionError: FullPageId [pageId=000100020000003c, 
effectivePageId=000000020000003c, groupId=38] TraceId:00fda422
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:347)
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.body(Checkpointer.java:243)
        ... 2 more
Caused by: java.util.concurrent.CompletionException: java.lang.AssertionError: 
FullPageId [pageId=000100020000003c, effectivePageId=000000020000003c, 
groupId=38]
        at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
        at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
        at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874)
        at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:55)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        ... 1 more
Caused by: java.lang.AssertionError: FullPageId [pageId=000100020000003c, 
effectivePageId=000000020000003c, groupId=38]
        at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:819)
        at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:705)
        at 
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:677)
        at 
org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:202)
        at 
org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:250)
        at 
org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:380)
        at 
org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:322)
        at 
org.apache.ignite.internal.pagememory.freelist.FreeListImpl.saveMetadata(FreeListImpl.java:813)
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.saveFreeListMetadataBusy(PersistentPageMemoryMvPartitionStorage.java:579)
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$17(PersistentPageMemoryMvPartitionStorage.java:494)
        at 
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busySafe(AbstractPageMemoryMvPartitionStorage.java:1052)
        at 
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$18(PersistentPageMemoryMvPartitionStorage.java:494)
        at 
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:51)
        ... 3 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to