Kirill Tkalenko created IGNITE-26037:
----------------------------------------
Summary: Error saving FreeList metadata causing checkpointer to
crash
Key: IGNITE-26037
URL: https://issues.apache.org/jira/browse/IGNITE-26037
Project: Ignite
Issue Type: Bug
Reporter: Kirill Tkalenko
Assignee: Kirill Tkalenko
Fix For: 3.1
When analyzing the log, I found an error when saving FreeList metadata, which
led to a Checkpointer crash, and this, as a consequence, leads to an
inoperative node. This needs to be sorted out.
What scenario, there was a cluster of three nodes on which a lot of data was
loaded, all tables were in a zone with a replica count of 1. After loading all
the data, the replica count was changed from 1 to 3, which led to multiple
rebalancings via raft snapshots. After some time, this problem appeared.
{noformat}
2025-07-24 14:11:42:486 +0000 [ERROR][%node1%checkpoint-thread][FailureManager]
Critical system error detected. Will be handled accordingly to configured
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=SYSTEM_WORKER_TERMINATION]
org.apache.ignite.internal.failure.StackTraceCapturingException: IGN-CMN-65535
Unknown error TraceId:00fda422
at
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:183)
at
org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:160)
at
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.body(Checkpointer.java:263)
at
org.apache.ignite.internal.util.worker.IgniteWorker.run(IgniteWorker.java:89)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.ignite.internal.lang.IgniteInternalCheckedException:
IGN-CMN-65535 java.lang.AssertionError: FullPageId [pageId=000100020000003c,
effectivePageId=000000020000003c, groupId=38] TraceId:00fda422
at
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:347)
at
org.apache.ignite.internal.pagememory.persistence.checkpoint.Checkpointer.body(Checkpointer.java:243)
... 2 more
Caused by: java.util.concurrent.CompletionException: java.lang.AssertionError:
FullPageId [pageId=000100020000003c, effectivePageId=000000020000003c,
groupId=38]
at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
at
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
at
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:874)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:55)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
... 1 more
Caused by: java.lang.AssertionError: FullPageId [pageId=000100020000003c,
effectivePageId=000000020000003c, groupId=38]
at
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:819)
at
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:705)
at
org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.acquirePage(PersistentPageMemory.java:677)
at
org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:202)
at
org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:250)
at
org.apache.ignite.internal.pagememory.freelist.PagesList.flushBucketsCache(PagesList.java:380)
at
org.apache.ignite.internal.pagememory.freelist.PagesList.saveMetadata(PagesList.java:322)
at
org.apache.ignite.internal.pagememory.freelist.FreeListImpl.saveMetadata(FreeListImpl.java:813)
at
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.saveFreeListMetadataBusy(PersistentPageMemoryMvPartitionStorage.java:579)
at
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$17(PersistentPageMemoryMvPartitionStorage.java:494)
at
org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busySafe(AbstractPageMemoryMvPartitionStorage.java:1052)
at
org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$syncMetadataOnCheckpoint$18(PersistentPageMemoryMvPartitionStorage.java:494)
at
org.apache.ignite.internal.pagememory.persistence.checkpoint.AwaitTasksCompletionExecutor.lambda$execute$1(AwaitTasksCompletionExecutor.java:51)
... 3 more
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)