[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations
[ https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-20279: - Ignite Flags: (was: Docs Required,Release Notes Required) > Reordering of altering zone operations > -- > > Key: IGNITE-20279 > URL: https://issues.apache.org/jira/browse/IGNITE-20279 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Sergey Uttsel >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > The issue is shown in the test, where several zone change operations occur. > In this case, the operation can be reordered and incomplete at the end of the > test. There are messages "Received update for replicas number" in the test > log in a wrong order. The reproducer based on > ItRebalanceDistributedTest#testThreeQueuedRebalances: > {code:java} > @Test > void testThreeQueuedRebalances() throws Exception { > Node node = getNode(0); > createZone(node, ZONE_NAME, 1, 1); > createTable(node, ZONE_NAME, TABLE_NAME); > assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, > 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > waitPartitionAssignmentsSyncedToExpected(0, 2); > checkPartitionNodes(0, 2); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations
[ https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Uttsel updated IGNITE-20279: --- Description: The issue is shown in the test, where several zone change operations occur. In this case, the operation can be reordered and incomplete at the end of the test. There are messages "Received update for replicas number" in the test log in a wrong order. The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances: {code:java} @Test void testThreeQueuedRebalances() throws Exception { Node node = getNode(0); createZone(node, ZONE_NAME, 1, 1); createTable(node, ZONE_NAME, TABLE_NAME); assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); waitPartitionAssignmentsSyncedToExpected(0, 2); checkPartitionNodes(0, 2); } {code} was: The issue is shown in the test, where several zone change operations occur. In this case, the operation can be reordered and incomplete at the end of the test. There are messages "Received update for replicas number" in the test log in a wrong order. The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the test log: {code:java} @Test void testThreeQueuedRebalances() throws Exception { Node node = getNode(0); createZone(node, ZONE_NAME, 1, 1); createTable(node, ZONE_NAME, TABLE_NAME); assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); waitPartitionAssignmentsSyncedToExpected(0, 2); checkPartitionNodes(0, 2); } {code} > Reordering of altering zone operations > -- > > Key: IGNITE-20279 > URL: https://issues.apache.org/jira/browse/IGNITE-20279 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Sergey Uttsel >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > The issue is shown in the test, where several zone change operations occur. > In this case, the operation can be reordered and incomplete at the end of the > test. There are messages "Received update for replicas number" in the test > log in a wrong order. The reproducer based on > ItRebalanceDistributedTest#testThreeQueuedRebalances: > {code:java} > @Test > void testThreeQueuedRebalances() throws Exception { > Node node = getNode(0); > createZone(node, ZONE_NAME, 1, 1); > createTable(node, ZONE_NAME, TABLE_NAME); > assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, > 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > waitPartitionAssignmentsSyncedToExpected(0, 2); > checkPartitionNodes(0, 2); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations
[ https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Uttsel updated IGNITE-20279: --- Reviewer: Mirza Aliev > Reordering of altering zone operations > -- > > Key: IGNITE-20279 > URL: https://issues.apache.org/jira/browse/IGNITE-20279 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Sergey Uttsel >Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > The issue is shown in the test, where several zone change operations occur. > In this case, the operation can be reordered and incomplete at the end of the > test. There are messages "Received update for replicas number" in the test > log in a wrong order. The reproducer based on > ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the > test log: > {code:java} > @Test > void testThreeQueuedRebalances() throws Exception { > Node node = getNode(0); > createZone(node, ZONE_NAME, 1, 1); > createTable(node, ZONE_NAME, TABLE_NAME); > assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, > 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > waitPartitionAssignmentsSyncedToExpected(0, 2); > checkPartitionNodes(0, 2); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations
[ https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Uttsel updated IGNITE-20279: --- Description: The issue is shown in the test, where several zone change operations occur. In this case, the operation can be reordered and incomplete at the end of the test. There are messages "Received update for replicas number" in the test log in a wrong order. The reproducer based on ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the test log: {code:java} @Test void testThreeQueuedRebalances() throws Exception { Node node = getNode(0); createZone(node, ZONE_NAME, 1, 1); createTable(node, ZONE_NAME, TABLE_NAME); assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); alterZone(node, ZONE_NAME, 3); alterZone(node, ZONE_NAME, 2); waitPartitionAssignmentsSyncedToExpected(0, 2); checkPartitionNodes(0, 2); } {code} was: The issue is shown in the test, where several zone change operations occur. On my laptop, the test ({{tRebalanceDistributedTest#testThreeQueuedRebalances}}) fails at least twice on 30 runs. # The first issue that I see is that the test does not wait to execute the last zone change operation: alterZone(node, ZONE_NAME, 2). In this case, the operation can be incomplete at the end of the test. # The second issue is that the next operation may start earlier than the previous one is completed. {noformat} 2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor] Error occurred when processing a watch event org.apache.ignite.lang.IgniteInternalException: Raft group on the node is already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer [consistentId=irdt_ttqr_2, idx=0]]] at org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261) ~[main/:?] at org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259) ~[main/:?] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] {noformat} > Reordering of altering zone operations > -- > > Key: IGNITE-20279 > URL: https://issues.apache.org/jira/browse/IGNITE-20279 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Assignee: Sergey Uttsel >Priority: Major > Labels: ignite-3 > > The issue is shown in the test, where several zone change operations occur. > In this case, the operation can be reordered and incomplete at the end of the > test. There are messages "Received update for replicas number" in the test > log in a wrong order. The reproducer based on > ItRebalanceDistributedTest#testThreeQueuedRebalances. See exception in the > test log: > {code:java} > @Test > void testThreeQueuedRebalances() throws Exception { > Node node = getNode(0); > createZone(node, ZONE_NAME, 1, 1); > createTable(node, ZONE_NAME, TABLE_NAME); > assertTrue(waitForCondition(() -> getPartitionClusterNodes(node, > 0).size() == 1, AWAIT_TIMEOUT_MILLIS)); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > alterZone(node, ZONE_NAME, 2); > alterZone(node, ZONE_NAME, 3); > a
[jira] [Updated] (IGNITE-20279) Reordering of altering zone operations
[ https://issues.apache.org/jira/browse/IGNITE-20279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladislav Pyatkov updated IGNITE-20279: --- Description: The issue is shown in the test, where several zone change operations occur. On my laptop, the test ({{tRebalanceDistributedTest#testThreeQueuedRebalances}}) fails at least twice on 30 runs. # The first issue that I see is that the test does not wait to execute the last zone change operation: alterZone(node, ZONE_NAME, 2). In this case, the operation can be incomplete at the end of the test. # The second issue is that the next operation may start earlier than the previous one is completed. {noformat} 2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor] Error occurred when processing a watch event org.apache.ignite.lang.IgniteInternalException: Raft group on the node is already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer [consistentId=irdt_ttqr_2, idx=0]]] at org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261) ~[main/:?] at org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259) ~[main/:?] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] {noformat} was: The issue is shown in the test, where several zone change operations occur. On my laptop, the test (tRebalanceDistributedTest#testThreeQueuedRebalances) fails at least twice on 30 runs. # The first issue that I see is that the test does not wait to execute the last zone change operation: alterZone(node, ZONE_NAME, 2). In this case, the operation can be incomplete at the end of the test. # The second issue is that the next operation may start earlier than the previous one is completed. {noformat} 2023-08-24T16:58:51,328][ERROR][%irdt_ttqr_2%tableManager-io-10][WatchProcessor] Error occurred when processing a watch event org.apache.ignite.lang.IgniteInternalException: Raft group on the node is already started [nodeId=RaftNodeId [groupId=1_part_0, peer=Peer [consistentId=irdt_ttqr_2, idx=0]]] at org.apache.ignite.internal.raft.Loza.startRaftGroupNodeInternal(Loza.java:342) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:230) ~[main/:?] at org.apache.ignite.internal.raft.Loza.startRaftGroupNode(Loza.java:203) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.startPartitionRaftGroupNode(TableManager.java:2361) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$98(TableManager.java:2261) ~[main/:?] at org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:922) ~[main/:?] at org.apache.ignite.internal.table.distributed.TableManager.lambda$handleChangePendingAssignmentEvent$99(TableManager.java:2259) ~[main/:?] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:834) ~[?:?] {noformat} > Reordering of altering zone operations > -- > > Key: IGNITE-20279 > URL: https://issues.apache.org/jira/browse/IGNITE-20279 > Project: Ignite > Issue Type: Bug >Reporter: Vladislav Pyatkov >Priority: Major > Labels: ignite-3 > > The issue is shown in the test, where several zone change operations occur. > On my laptop, the test > ({{tRebalanceDistributedTest#testThreeQueuedRebalances}}) fails at least > twice on 30 runs. > # The first issue that I see is that the test does not wait to execute the > last zone change operation: alterZone(node, ZONE_NAME, 2). In this case, the > oper