[ https://issues.apache.org/jira/browse/IGNITE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikita Sivkov updated IGNITE-22005: ----------------------------------- Description: *Steps to reproduce:* Perform a long (about 2 hours) load test with a balance transfer scenario (see scenario pseudo code in attachments). *Expected result:* No errors happen. *Actual result:* Get error in server logs - {{Failed to process replica request}} {code:java} 2024-04-05 17:50:55:802 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-AppendEntries-Processor-2][NodeImpl] Node <193_part_15/poc-tester-SERVER-192.168.1.97-id-0> is not in active state, currTerm=2. 2024-04-05 17:50:55:805 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%Raft-Group-Client-19][ReplicaManager] Failed to process replica request [request=TxFinishReplicaRequestImpl [commit=false, commitTimestampLong=0, enlistmentConsistencyToken=112218720633356321, groupId=123_part_21, groups=HashMap {141_part_13=poc-tester-SERVER-192.168.1.27-id-0, 139_part_9=poc-tester-SERVER-192.168.1.97-id-0, 193_part_3=poc-tester-SERVER-192.168.1.27-id-0, 19_part_23=poc-tester-SERVER-192.168.1.27-id-0, 117_part_17=poc-tester-SERVER-192.168.1.18-id-0, 45_part_9=poc-tester-SERVER-192.168.1.18-id-0, 39_part_3=poc-tester-SERVER-192.168.1.18-id-0, 77_part_4=poc-tester-SERVER-192.168.1.18-id-0, 105_part_4=poc-tester-SERVER-192.168.1.18-id-0, 123_part_21=poc-tester-SERVER-192.168.1.97-id-0, 103_part_9=poc-tester-SERVER-192.168.1.18-id-0, 161_part_15=poc-tester-SERVER-192.168.1.27-id-0, 103_part_22=poc-tester-SERVER-192.168.1.27-id-0, 89_part_10=poc-tester-SERVER-192.168.1.18-id-0, 39_part_19=poc-tester-SERVER-192.168.1.27-id-0, 149_part_13=poc-tester-SERVER-192.168.1.27-id-0, 97_part_24=poc-tester-SERVER-192.168.1.97-id-0, 83_part_9=poc-tester-SERVER-192.168.1.27-id-0, 209_part_10=poc-tester-SERVER-192.168.1.27-id-0, 185_part_5=poc-tester-SERVER-192.168.1.18-id-0, 117_part_9=poc-tester-SERVER-192.168.1.27-id-0, 105_part_22=poc-tester-SERVER-192.168.1.18-id-0}, timestampLong=112219170129903617, txId=018eaebd-88ba-0001-606d-622500000001]]. java.util.concurrent.CompletionException: org.apache.ignite.tx.TransactionException: IGN-TX-7 TraceId:cb1577e6-ec35-47f0-ab7d-56a0687344ed java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932) at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$126(PartitionReplicaListener.java:2806) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.ignite.tx.TransactionException: IGN-TX-7 TraceId:cb1577e6-ec35-47f0-ab7d-56a0687344ed java.util.concurrent.TimeoutException at org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$finishTransaction$70(PartitionReplicaListener.java:1867) at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) ... 16 more Caused by: java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) ... 10 more Caused by: java.util.concurrent.TimeoutException ... 8 more 2024-04-05 17:50:55:807 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-Request-Processor-42][NodeImpl] Node <35_part_20/poc-tester-SERVER-192.168.1.97-id-0> is not in active state, currTerm=4.{code} was: *Steps to reproduce:* Perform a long (about 2 hours) load test with a balance transfer scenario (see scenario pseudo code in attachments). *Expected result:* No errors happen. *Actual result:* Get error in server logs - {{Failed to process delayed response}} {code:java} 2024-04-05 17:50:50:776 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-Request-Processor-1][NodeImpl] Node <27_part_23/poc-tester-SERVER-192.168.1.97-id-0> is not in active state, currTerm=2. 2024-04-05 17:50:50:778 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%Raft-Group-Client-5][ReplicaManager] Failed to process delayed response [request=ReadWriteSingleRowReplicaRequestImpl [commitPartitionId=TablePartitionIdMessageImpl [partitionId=21, tableId=123], coordinatorId=3de6f999-7ab9-4405-aff0-ee0c7e4886ce, enlistmentConsistencyToken=112218720633356321, full=false, groupId=123_part_21, requestType=RW_UPSERT, schemaVersion=1, timestampLong=112219169796915211, transactionId=018eaebd-88ba-0001-606d-622500000001]] java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) at org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.util.concurrent.TimeoutException ... 8 more 2024-04-05 17:50:50:780 +0300 [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-Request-Processor-27][NodeImpl] Node <99_part_6/poc-tester-SERVER-192.168.1.97-id-0> is not in active state, currTerm=3. {code} > 'Failed to process replica request' error under load with balance transfer > scenario > ----------------------------------------------------------------------------------- > > Key: IGNITE-22005 > URL: https://issues.apache.org/jira/browse/IGNITE-22005 > Project: Ignite > Issue Type: Bug > Affects Versions: 3.0.0-beta2 > Environment: Cluster of 3 nodes > Reporter: Nikita Sivkov > Priority: Major > Labels: ignite-3 > Attachments: transfer_ign3.yaml > > > *Steps to reproduce:* > Perform a long (about 2 hours) load test with a balance transfer scenario > (see scenario pseudo code in attachments). > *Expected result:* > No errors happen. > *Actual result:* > Get error in server logs - {{Failed to process replica request}} > {code:java} > 2024-04-05 17:50:55:802 +0300 > [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-AppendEntries-Processor-2][NodeImpl] > Node <193_part_15/poc-tester-SERVER-192.168.1.97-id-0> is not in active > state, currTerm=2. > 2024-04-05 17:50:55:805 +0300 > [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%Raft-Group-Client-19][ReplicaManager] > Failed to process replica request [request=TxFinishReplicaRequestImpl > [commit=false, commitTimestampLong=0, > enlistmentConsistencyToken=112218720633356321, groupId=123_part_21, > groups=HashMap {141_part_13=poc-tester-SERVER-192.168.1.27-id-0, > 139_part_9=poc-tester-SERVER-192.168.1.97-id-0, > 193_part_3=poc-tester-SERVER-192.168.1.27-id-0, > 19_part_23=poc-tester-SERVER-192.168.1.27-id-0, > 117_part_17=poc-tester-SERVER-192.168.1.18-id-0, > 45_part_9=poc-tester-SERVER-192.168.1.18-id-0, > 39_part_3=poc-tester-SERVER-192.168.1.18-id-0, > 77_part_4=poc-tester-SERVER-192.168.1.18-id-0, > 105_part_4=poc-tester-SERVER-192.168.1.18-id-0, > 123_part_21=poc-tester-SERVER-192.168.1.97-id-0, > 103_part_9=poc-tester-SERVER-192.168.1.18-id-0, > 161_part_15=poc-tester-SERVER-192.168.1.27-id-0, > 103_part_22=poc-tester-SERVER-192.168.1.27-id-0, > 89_part_10=poc-tester-SERVER-192.168.1.18-id-0, > 39_part_19=poc-tester-SERVER-192.168.1.27-id-0, > 149_part_13=poc-tester-SERVER-192.168.1.27-id-0, > 97_part_24=poc-tester-SERVER-192.168.1.97-id-0, > 83_part_9=poc-tester-SERVER-192.168.1.27-id-0, > 209_part_10=poc-tester-SERVER-192.168.1.27-id-0, > 185_part_5=poc-tester-SERVER-192.168.1.18-id-0, > 117_part_9=poc-tester-SERVER-192.168.1.27-id-0, > 105_part_22=poc-tester-SERVER-192.168.1.18-id-0}, > timestampLong=112219170129903617, txId=018eaebd-88ba-0001-606d-622500000001]]. > java.util.concurrent.CompletionException: > org.apache.ignite.tx.TransactionException: IGN-TX-7 > TraceId:cb1577e6-ec35-47f0-ab7d-56a0687344ed > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319) > at > java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:932) > at > java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$126(PartitionReplicaListener.java:2806) > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > at > java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:550) > at > org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleErrorResponse$44(RaftGroupServiceImpl.java:653) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: org.apache.ignite.tx.TransactionException: IGN-TX-7 > TraceId:cb1577e6-ec35-47f0-ab7d-56a0687344ed > java.util.concurrent.TimeoutException > at > org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$finishTransaction$70(PartitionReplicaListener.java:1867) > at > java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930) > ... 16 more > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632) > ... 10 more > Caused by: java.util.concurrent.TimeoutException > ... 8 more > 2024-04-05 17:50:55:807 +0300 > [WARNING][%poc-tester-SERVER-192.168.1.97-id-0%JRaft-Request-Processor-42][NodeImpl] > Node <35_part_20/poc-tester-SERVER-192.168.1.97-id-0> is not in active > state, currTerm=4.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)