[ https://issues.apache.org/jira/browse/IGNITE-22191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Chudov reassigned IGNITE-22191: ------------------------------------- Assignee: Denis Chudov (was: Kirill Tkalenko) > Fix AssertionError in IndexBuildController#mvPartitionStorage > ------------------------------------------------------------- > > Key: IGNITE-22191 > URL: https://issues.apache.org/jira/browse/IGNITE-22191 > Project: Ignite > Issue Type: Improvement > Reporter: Kirill Tkalenko > Assignee: Denis Chudov > Priority: Major > Labels: ignite-3 > Fix For: 3.0.0-beta2 > > Time Spent: 20m > Remaining Estimate: 0h > > Need to fix the error by stacktrace: > {noformat} > [2024-05-07T15:31:12,023][ERROR][%irt_trtcfz_0%metastorage-watch-executor-3][FailureProcessor] > Critical system error detected. Will be handled accordingly to configured > handler [hnd=NoOpFailureHandler [], failureCtx=FailureContext > [type=CRITICAL_ERROR, err=java.lang.AssertionError: 8_part_1]] > java.lang.AssertionError: 8_part_1 > at > org.apache.ignite.internal.index.IndexBuildController.mvPartitionStorage(IndexBuildController.java:345) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.index.IndexBuildController.scheduleBuildIndexAfterDisasterRecovery(IndexBuildController.java:314) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.index.IndexBuildController.lambda$tryScheduleBuildIndexesForNewPrimaryReplica$14(IndexBuildController.java:219) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:869) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.index.IndexBuildController.tryScheduleBuildIndexesForNewPrimaryReplica(IndexBuildController.java:208) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.index.IndexBuildController.lambda$onPrimaryReplicaElected$11(IndexBuildController.java:187) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > java.base/java.util.concurrent.CompletableFuture.uniAcceptNow(CompletableFuture.java:753) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.uniAcceptStage(CompletableFuture.java:731) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.thenAccept(CompletableFuture.java:2108) > ~[?:?] > at > org.apache.ignite.internal.index.IndexBuildController.lambda$onPrimaryReplicaElected$12(IndexBuildController.java:187) > ~[ignite-index-3.0.0-SNAPSHOT.jar:?] > at > java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) > ~[?:?] > at > org.apache.ignite.internal.util.IgniteUtils.lambda$copyStateTo$9(IgniteUtils.java:1273) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251) > ~[?:?] > at > org.apache.ignite.internal.causality.BaseVersionedValue.copyState(BaseVersionedValue.java:315) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.causality.BaseVersionedValue.complete(BaseVersionedValue.java:201) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.causality.IncrementalVersionedValue.lambda$completeInternal$2(IncrementalVersionedValue.java:256) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251) > ~[?:?] > at > org.apache.ignite.internal.causality.IncrementalVersionedValue.completeInternal(IncrementalVersionedValue.java:256) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.causality.IncrementalVersionedValue.lambda$dependingOn$0(IncrementalVersionedValue.java:76) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.causality.BaseVersionedValue.lambda$notifyCompletionListeners$6(BaseVersionedValue.java:337) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:883) > ~[?:?] > at > java.base/java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2251) > ~[?:?] > at > org.apache.ignite.internal.causality.BaseVersionedValue.notifyCompletionListeners(BaseVersionedValue.java:332) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > at > org.apache.ignite.internal.causality.BaseVersionedValue.complete(BaseVersionedValue.java:210) > ~[ignite-core-3.0.0-SNAPSHOT.jar:?] > {noformat} > h2. What I managed to find out. > The error can be reproduced in > *org.apache.ignite.internal.rebalance.ItRebalanceTest#testRebalanceTablesCounterForZone*, > not the first time. > What happens in the test: > # 3 nodes start. > # 3 tables are created with 3 partitions and 3 replicas. > # We are waiting for the end of rebalancing for all tables. > # We change the number of replicas to 2 for the zone. > # We are waiting for the rebalancing to complete for all tables. > Let's say we have nodes A, B and C. > The problem occurs when the number of replicas changes from 3 to 2, when node > A has left the assignments for the partition, but an event that node A has > been elected as the primary replica for the (lease holder) partition in which > it is no longer in the assignments. Since, according to the logic of > rebalancing, after a node leaves the assignments, the partition data is > cleared, which is why we find ourselves in this situation. > In this ticket, in order for the tests to pass, I will add a check that if > the storage is *null*, then we will not build indexes, but in ticket > IGNITE-22202 we will need to somehow fix this. > h2. What thoughts do I have on how to fix the problem: > # Correct the rebalancing logic; if a node is the primary replica for a > partition, then do not throw it out of the assignments. > # Correct the logic for electing the primary replica; if a node is no longer > in the partition assignments, then there is no need to elexct it as a primary > replica or not generate an event about this or reelect it or something else. > # Improve the logic in *IndexBuildController* and when receiving a primary > replica election event, check that if the node is not in the partition > assignments, then ignore this event. -- This message was sent by Atlassian Jira (v8.20.10#820010)