[ https://issues.apache.org/jira/browse/IGNITE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824772#comment-17824772 ]
Denis Chudov edited comment on IGNITE-21382 at 3/8/24 12:49 PM: ---------------------------------------------------------------- The problem is that NodeUtils#transferPrimary is not competed in 30 seconds. I would propose to rewrite this method without using RaftGroupService#transferLeadership. Primary replica doesn't have to be colocated with raft leader, and we can use this in tests. We have StopLeaseProlongationMessage that is intended to stop lease prolongation for replica which lost its ability to serve as a primary (at least, a preferred one), and NodeUtils#transferPrimary can be reworked in a way that it sends this message to corresponding node that is a placement driver active actor (or, which is more simple for tests - just to every node, this message will be ignored on other nodes). The only problem is that StopLeaseProlongationMessage#redirectProposal is not handled by the placement driver correctly - this is a bug and should be fixed. After that we will obtain an ability to propose any node as the new primary and so choose the new primary deliberately. Until IGNITE-18879 is done, the LeaseUpdater chooses the proposed leaseholder every time when it is present, it never enforces another node and possibility of that can be neglected. After that, IGNITE-20365 might be closed as well. was (Author: denis chudov): The problem is that NodeUtils#transferPrimary is not competed in 30 seconds. I would propose to rewrite this method without using RaftGroupService#transferLeadership. Primary replica doesn't have to be colocated with raft leader, and we can use this in tests. We have StopLeaseProlongationMessage that is intended to stop lease prolongation for replica which lost its ability to serve as a primary (at least, a preferred one), and NodeUtils#transferPrimary can be reworked in a way that it sends this message to corresponding node that is a placement driver active actor (or, which is more simple for tests - just to every node, this message will be ignored on other nodes). The only problem is that StopLeaseProlongationMessage#redirectProposal is not handled by the placement driver correctly - this is a bug and should be fixed. After that we will obtain an ability to propose any node as the new primary and so choose the new primary deliberately. After that, IGNITE-20365 might be closed as well. > Test ItPrimaryReplicaChoiceTest.testPrimaryChangeLongHandling is flaky > ---------------------------------------------------------------------- > > Key: IGNITE-21382 > URL: https://issues.apache.org/jira/browse/IGNITE-21382 > Project: Ignite > Issue Type: Bug > Reporter: Vladislav Pyatkov > Priority: Major > Labels: ignite-3 > Time Spent: 20m > Remaining Estimate: 0h > > The test falls while waiting for the primary replica change. This issue is > also reproduced locally, at least one per five passes. > {code} > assertThat(primaryChangeTask, willCompleteSuccessfully()); > {code} > {noformat} > java.lang.AssertionError: java.util.concurrent.TimeoutException > at > org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:78) > at > org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:35) > at org.hamcrest.TypeSafeMatcher.matches(TypeSafeMatcher.java:67) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:10) > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6) > at > org.apache.ignite.internal.placementdriver.ItPrimaryReplicaChoiceTest.testPrimaryChangeLongHandling(ItPrimaryReplicaChoiceTest.java:179) > {noformat} > This test will be muted on TC to pervent future falls. -- This message was sent by Atlassian Jira (v8.20.10#820010)