[ 
https://issues.apache.org/jira/browse/IGNITE-23212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Tkalenko updated IGNITE-23212:
-------------------------------------
    Component/s: persistence

> Page replacement doesn't work sometimes
> ---------------------------------------
>
>                 Key: IGNITE-23212
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23212
>             Project: Ignite
>          Issue Type: Bug
>          Components: persistence
>            Reporter: Ivan Bessonov
>            Assignee: Kirill Tkalenko
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0
>
>          Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Under a sophisticated load, we sometimes see the following exception:
> {noformat}
> org.apache.ignite.lang.IgniteException: Error while executing 
> addWriteCommitted: [rowId=RowId [partitionId=13, 
> uuid=00000191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
>       at 
> java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) 
> ~[?:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ViewUtils.copyExceptionWithCauseIfPossible(ViewUtils.java:91)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ViewUtils.ensurePublicException(ViewUtils.java:71)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at org.apache.ignite.internal.util.ViewUtils.sync(ViewUtils.java:54) 
> ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:207)
>  ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.client.table.ClientKeyValueBinaryView.put(ClientKeyValueBinaryView.java:60)
>  ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
>       at site.ycsb.db.ignite3.IgniteClient.insert(IgniteClient.java:49) 
> [ignite3-binding-2024.15.jar:?]
>       at site.ycsb.DBWrapper.insert(DBWrapper.java:284) [core-2024.15.jar:?]
>       at site.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:657) 
> [core-2024.15.jar:?]
>       at site.ycsb.ClientThread.run(ClientThread.java:181) 
> [core-2024.15.jar:?]
>       at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: org.apache.ignite.lang.IgniteException: Error while executing 
> addWriteCommitted: [rowId=RowId [partitionId=13, 
> uuid=00000191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
>       at 
> java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:710) 
> ~[?:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
>  ~[ignite-core-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:549)
>  ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.client.TcpClientChannel.processNextMessage(TcpClientChannel.java:435)
>  ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
>       at 
> org.apache.ignite.internal.client.TcpClientChannel.lambda$onMessage$3(TcpClientChannel.java:277)
>  ~[ignite-client-3.0.0-SNAPSHOT.jar:?]
>       at 
> java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  ~[?:?]
>       at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> ~[?:?]
>       at 
> java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  ~[?:?]
>       at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) ~[?:?]
>       at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> ~[?:?]
>       at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) 
> ~[?:?]
> Caused by: org.apache.ignite.lang.IgniteException: 
> org.apache.ignite.lang.IgniteException: IGN-CMN-65535 
> TraceId:60b4295a-8c18-4cdb-93e8-266bc9aaed88 Error while executing 
> addWriteCommitted: [rowId=RowId [partitionId=13, 
> uuid=00000191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$mapToPublicException$2(IgniteExceptionMapperUtil.java:88)
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapCheckingResultIsPublic(IgniteExceptionMapperUtil.java:141)
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:137)
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.mapToPublicException(IgniteExceptionMapperUtil.java:88)
>       at 
> org.apache.ignite.internal.lang.IgniteExceptionMapperUtil.lambda$convertToPublicFuture$3(IgniteExceptionMapperUtil.java:178)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>       at 
> org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplica$7(ReplicaService.java:249)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:586)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:272)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:215)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:228)
>       at 
> org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:142)
>       at 
> org.apache.ignite.internal.replicator.ReplicaManager.lambda$handleReplicaRequest$5(ReplicaManager.java:460)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>       at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$117(PartitionReplicaListener.java:2680)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
>       at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.handleSmErrorResponse(RaftGroupServiceImpl.java:749)
>       at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$40(RaftGroupServiceImpl.java:613)
>       at 
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
>       at 
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
>       at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
>       at 
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:586)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:272)
>       at 
> org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:215)
>       at 
> org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:115)
>       at 
> org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$NetworkRpcContext.sendResponse(IgniteRpcServer.java:245)
>       at 
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendSMError(ActionRequestProcessor.java:302)
>       at 
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendResponse(ActionRequestProcessor.java:257)
>       at 
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor$1.result(ActionRequestProcessor.java:187)
>       at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine$1$1.result(JraftServerImpl.java:770)
>       at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWriteBusy$2(PartitionListener.java:280)
>       at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
>       at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWriteBusy(PartitionListener.java:199)
>       at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.onWrite(PartitionListener.java:192)
>       at 
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:731)
>       at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:571)
>       at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:539)
>       at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:458)
>       at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:131)
>       at 
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:125)
>       at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:326)
>       at 
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:283)
>       at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
>       at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.ignite.internal.storage.StorageException: IGN-STORAGE-1 
> TraceId:60b4295a-8c18-4cdb-93e8-266bc9aaed88 Error while executing 
> addWriteCommitted: [rowId=RowId [partitionId=13, 
> uuid=00000191-eb5c-824c-7a07-fa1210b49ed8], tableId=10, partitionId=13]
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.lambda$addWriteCommitted$13(AbstractPageMemoryMvPartitionStorage.java:532)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.addWriteCommitted(AbstractPageMemoryMvPartitionStorage.java:515)
>       at 
> org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.addWriteCommitted(SnapshotAwarePartitionDataStorage.java:135)
>       at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.tryProcessRow(StorageUpdateHandler.java:162)
>       at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.lambda$handleUpdate$0(StorageUpdateHandler.java:114)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$0(PersistentPageMemoryMvPartitionStorage.java:180)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:659)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:170)
>       at 
> org.apache.ignite.internal.table.distributed.raft.snapshot.outgoing.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:76)
>       at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.handleUpdate(StorageUpdateHandler.java:109)
>       at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.handleUpdateCommand(PartitionListener.java:327)
>       at 
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.lambda$onWriteBusy$2(PartitionListener.java:242)
>       ... 14 more
> Caused by: org.apache.ignite.internal.pagememory.tree.CorruptedTreeException: 
> IGN-CMN-65535 TraceId:60b4295a-8c18-4cdb-93e8-266bc9aaed88 B+Tree is 
> corrupted [groupId=10, pageIds=[563005788186369], groupName=10, msg=Runtime 
> failure on search row: 
> org.apache.ignite.internal.storage.pagememory.mv.VersionChainKey@7cda74e0]
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.corruptedTreeException(BplusTree.java:6670)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invoke(BplusTree.java:2119)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.lambda$addWriteCommitted$13(AbstractPageMemoryMvPartitionStorage.java:524)
>       ... 26 more
> Caused by: java.lang.AssertionError: Checkpoint must be in progress
>       at 
> org.apache.ignite.internal.pagememory.persistence.checkpoint.CheckpointManager.writePageToDeltaFilePageStore(CheckpointManager.java:312)
>       at 
> org.apache.ignite.internal.storage.pagememory.PersistentPageMemoryDataRegion.lambda$start$0(PersistentPageMemoryDataRegion.java:110)
>       at 
> org.apache.ignite.internal.pagememory.persistence.replacement.DelayedDirtyPageWrite.finishReplacement(DelayedDirtyPageWrite.java:104)
>       at 
> org.apache.ignite.internal.pagememory.persistence.PersistentPageMemory.allocatePageNoReuse(PersistentPageMemory.java:593)
>       at 
> org.apache.ignite.internal.pagememory.PageIdAllocator.allocatePage(PageIdAllocator.java:111)
>       at 
> org.apache.ignite.internal.pagememory.PageIdAllocator.allocatePage(PageIdAllocator.java:70)
>       at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.allocateDataPage(FreeListImpl.java:491)
>       at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.writeSinglePage(FreeListImpl.java:621)
>       at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.insertDataRow(FreeListImpl.java:500)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.insertRowVersion(AbstractPageMemoryMvPartitionStorage.java:412)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AddWriteCommittedInvokeClosure.insertCommittedRowVersion(AddWriteCommittedInvokeClosure.java:137)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AddWriteCommittedInvokeClosure.call(AddWriteCommittedInvokeClosure.java:99)
>       at 
> org.apache.ignite.internal.storage.pagememory.mv.AddWriteCommittedInvokeClosure.call(AddWriteCommittedInvokeClosure.java:43)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree$Invoke.invokeClosure(BplusTree.java:4268)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invokeDown(BplusTree.java:2187)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invokeDown(BplusTree.java:2171)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invokeDown(BplusTree.java:2171)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invokeDown(BplusTree.java:2171)
>       at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invoke(BplusTree.java:2093)
>       ... 27 more
> {noformat}
> h2. {color:#DE350B}Update:{color}
> The problem is in the incorrect implementation of page replacement, firstly 
> it can be after the completion of the checkpoint, secondly after the fsync of 
> files and thirdly it can disrupt the operation of delta files. It will need 
> to be fixed.
> What needs to be done:
> # Get rid of delayed page replacement, since the implementation and use can 
> lead to bugs.
> # It is necessary to do page replacement only during the checkpoint and 
> before fsync of delta files.
> # fsync of delta files should start only after all page replacements are 
> completed and after that prohibit page replacement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to