[ https://issues.apache.org/jira/browse/IGNITE-17959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640874#comment-17640874 ]
Alexander Lapin commented on IGNITE-17959: ------------------------------------------ [~Sergey Uttsel] LGTM > ReplicaUnavailableException: Replica is not ready. > -------------------------------------------------- > > Key: IGNITE-17959 > URL: https://issues.apache.org/jira/browse/IGNITE-17959 > Project: Ignite > Issue Type: Bug > Components: general > Affects Versions: 3.0.0-alpha5 > Reporter: Evgeny Stanilovsky > Assignee: Sergey Uttsel > Priority: Major > Labels: ignite-3 > Attachments: err.log, err2.log > > Time Spent: 2h 40m > Remaining Estimate: 0h > > h2. *Motivation* > Frequently in long running tests can be observed (full msg in attach) : > {noformat} > Caused by: > org.apache.ignite.internal.replicator.exception.ReplicaUnavailableException: > IGN-REP-5 TraceId:82267e0a-aca2-47a3-806e-7922ed61d6d3 Replica is not ready > [replicationGroupId=b5b3a2e5-1342-4a90-97b3-a46e9509a1d6_part_5, > nodeName=iist_n_1] > {noformat} > check for example test: ItIndexSpoolTest.test, numerous runs or run until > failure will highlight the problem. > Additionally we can observe (err2.log attached): > {noformat} > 2022-10-24 13:23:52:308 +0300 > [WARNING][%iist_n_1%Raft-Group-Client-4][RaftGroupServiceImpl] Recoverable > error during the request type=ActionRequestImpl occurred (will be retried on > the randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > at > java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > at > java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.util.concurrent.TimeoutException > ... 7 more > 2022-10-24 13:24:13:437 +0300 > [WARNING][%iist_n_1%Raft-Group-Client-2][RaftGroupServiceImpl] Recoverable > error during the request type=ActionRequestImpl occurred (will be retried on > the randomly selected node): > java.util.concurrent.CompletionException: > java.util.concurrent.TimeoutException > at > java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:367) > at > java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:376) > at > java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:1019) > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > at > java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) > at > java.base/java.util.concurrent.CompletableFuture$Timeout.run(CompletableFuture.java:2792) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.util.concurrent.TimeoutException > ... 7 more > {noformat} > h2. *Implementation Notes* > The root cause is that there is a race between a table creation and an > operation on the table. So it's possible that Replica isn't created in > ReplicaManager when rw/ro operation is invoked. It's need to retry the > operation several times if ReplicaUnavailableException is thrown. -- This message was sent by Atlassian Jira (v8.20.10#820010)