[ 
https://issues.apache.org/jira/browse/IGNITE-21382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824772#comment-17824772
 ] 

Denis Chudov edited comment on IGNITE-21382 at 3/8/24 12:49 PM:
----------------------------------------------------------------

The problem is that NodeUtils#transferPrimary is not competed in 30 seconds. I 
would propose to rewrite this method without using 
RaftGroupService#transferLeadership. Primary replica doesn't have to be 
colocated with raft leader, and we can use this in tests. We have 
StopLeaseProlongationMessage that is intended to stop lease prolongation for 
replica which lost its ability to serve as a primary (at least, a preferred 
one), and NodeUtils#transferPrimary can be reworked in a way that it sends this 
message to corresponding node that is a placement driver active actor (or, 
which is more simple for tests - just to every node, this message will be 
ignored on other nodes).

The only problem is that StopLeaseProlongationMessage#redirectProposal is not 
handled by the placement driver correctly - this is a bug and should be fixed. 
After that we will obtain an ability to propose any node as the new primary and 
so choose the new primary deliberately. Until IGNITE-18879 is done, the 
LeaseUpdater chooses the proposed leaseholder every time when it is present, it 
never enforces another node and possibility of that can be neglected.

After that, IGNITE-20365 might be closed as well.


was (Author: denis chudov):
The problem is that NodeUtils#transferPrimary is not competed in 30 seconds. I 
would propose to rewrite this method without using 
RaftGroupService#transferLeadership. Primary replica doesn't have to be 
colocated with raft leader, and we can use this in tests. We have 
StopLeaseProlongationMessage that is intended to stop lease prolongation for 
replica which lost its ability to serve as a primary (at least, a preferred 
one), and NodeUtils#transferPrimary can be reworked in a way that it sends this 
message to corresponding node that is a placement driver active actor (or, 
which is more simple for tests - just to every node, this message will be 
ignored on other nodes).

The only problem is that StopLeaseProlongationMessage#redirectProposal is not 
handled by the placement driver correctly - this is a bug and should be fixed. 
After that we will obtain an ability to propose any node as the new primary and 
so choose the new primary deliberately.

After that, IGNITE-20365 might be closed as well.

> Test ItPrimaryReplicaChoiceTest.testPrimaryChangeLongHandling is flaky
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-21382
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21382
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test falls while waiting for the primary replica change. This issue is 
> also reproduced locally, at least one per five passes.
> {code}
> assertThat(primaryChangeTask, willCompleteSuccessfully());
> {code}
> {noformat}
> java.lang.AssertionError: java.util.concurrent.TimeoutException
>   at 
> org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:78)
>   at 
> org.apache.ignite.internal.testframework.matchers.CompletableFutureMatcher.matchesSafely(CompletableFutureMatcher.java:35)
>   at org.hamcrest.TypeSafeMatcher.matches(TypeSafeMatcher.java:67)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:10)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.ignite.internal.placementdriver.ItPrimaryReplicaChoiceTest.testPrimaryChangeLongHandling(ItPrimaryReplicaChoiceTest.java:179)
> {noformat}
> This test will be muted on TC to pervent future falls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to