[ https://issues.apache.org/jira/browse/IGNITE-22774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mikhail Efremov updated IGNITE-22774: ------------------------------------- Epic Link: IGNITE-22313 > Remove raftClient map from ItTxTestCluster > ------------------------------------------ > > Key: IGNITE-22774 > URL: https://issues.apache.org/jira/browse/IGNITE-22774 > Project: Ignite > Issue Type: Improvement > Reporter: Mikhail Efremov > Assignee: Mikhail Efremov > Priority: Major > Labels: ignite-3 > Time Spent: 10m > Remaining Estimate: 0h > > *Description* > The goal is to remove {{raftClient}} from {{ItTxTestCluster}} due to all > raft-clients are created inside replicas through > {{ReplicaManager#startReplica}}, and most of usages of {{raftClient}} are > effectively useless: this map is used in the single common case: > {code:title=|language=java|collapse=false}assertSame( > txTestCluster.raftClients.get(ACC_TABLE_NAME).get(0).clusterService(), > txTestCluster.getLeader(ACC_TABLE_NAME).service() > ); > {code} > Effectively, the first statement get raft-client for a table and it's 0th > partition, then extract it's {{ClusterService}}. The second statement inside > of {{#getLeader}} do the similar thing: > {code:title=|language=java|collapse=false}protected Loza getLeader(String > tableName) { > var services = raftClients.get(tableName); > Peer leader = services.get(0).leader(); > assertNotNull(leader); > return raftServers.get(leader.consistentId()); > } > {code} > Thus, the second statement means that we get a {{ClusterService}} from > {{Loza}} instance that related to leader's raft-client. > But when we put clients inside the map, we put only raft-clients from leaders > of partition's replication group: > {code:title=|language=java|collapse=false}if (startClient) { > RaftGroupService service = RaftGroupServiceImpl > .start(grpId, client, FACTORY, raftConfig, membersConf, true, > executor, commandsMarshaller) > .get(5, TimeUnit.SECONDS); > clients.put(p, service); > } else { > // Create temporary client to find a leader address. > ClusterService tmpSvc = cluster.get(0); > RaftGroupService service = RaftGroupServiceImpl > .start(grpId, tmpSvc, FACTORY, raftConfig, membersConf, true, > executor, commandsMarshaller) > .get(5, TimeUnit.SECONDS); > Peer leader = service.leader(); > service.shutdown(); > ClusterService leaderSrv = cluster.stream() > .filter(cluster -> > cluster.topologyService().localMember().name().equals(leader.consistentId())) > .findAny() > .orElseThrow(); > RaftGroupService leaderClusterSvc = RaftGroupServiceImpl > .start(grpId, leaderSrv, FACTORY, raftConfig, membersConf, true, > executor, commandsMarshaller) > .get(5, TimeUnit.SECONDS); > clients.put(p, leaderClusterSvc); > } > {code} > The first branch relates to tests that requires separated node as transaction > coordinator and legacy approach demanded for separated raft-client. Now it > all works on replication layer and the first branch will be fully removed > from [https://issues.apache.org/jira/browse/IGNITE-22691]. Then, the second > branch create temporal raft-client just to find out a leader peer, and then > get leader's {{ClusterService}} and create an instance of raft-client based > on found service. Again, after before-mentioned ticket there wouldn't any > raft-clients instantiations, but still the logic after this branch is get a > leader's raft-client and save it to the map. > Then, the first statement get a leader {{ClusterService}} from the map for > given table and 0th partition, and then the second statement get {{Loza}} > instance for leader's node consistent ID and and ask it for a > {{ClusterService}}. The both services are always identical because it's the > same replication layer leader node. Then, the assertion is redundand and all > the code with {{raftClients}} and {{#leader()}} are the same. > The only place that didn't mention, but shouldn't be removed, but rewritten > is {{getLeaderId}} that is required by > {{ItTxDistributedTestThreeNodesThreeReplicas#testPrimaryReplicaDirectUpdateForExplicitTxn}} > test, returns more explicit by method's name {{Peer}} type. The test is > internally blocking message handling on RAFT replication layer before entity > will be written in RAFT-log, and then it must be done exactly on RAFT-group > leader. Then, taking into the account {{raftClients}} removal, the method's > body should be: > {code:title=|language=java|collapse=false} > int partId = 0; > return replicaManagers.get(extractConsistentId(cluster.get(partId))) > .replica(new > TablePartitionId(tables.get(tableName).tableId(), partId)) > .thenApply(replica -> replica.raftClient().leader()) > .join(); > {code} > The function {{extractConsistentId(cluster.get(partId))}} is equivalent to > {{cluster.get(partId).topologyService().localMember().name()}} and {{tables}} > is a mapping of tables' names to {{TableImpl}} instances that are inserted to > the map in {{ItTxTestCluster#startTable}} right after instantiation. > And the last place the map is used is raft-clients stopping process, that > should be replaced (actually it should be done much earlier) with replicas > stopping process for every possible table-partition pairs: > {code:title=|language=java|collapse=false} > for (String consistentId : clusterServices.keySet()) { > ReplicaManager replicaManager = replicaManagers.get(consistentId); > if (replicaManager == null) { > continue; > } > grpIds.forEach(id -> { > try { > replicaManager.stopReplica(id); > } catch (NodeStoppingException e) { > // no-op > } > }); > } > {code} > *Motivation* > The map and related code are redundant, aren't working as intended and should > be removed. > *Definition of done* > # {{raftClients}} map is removed. > # {{getLeader}} and related assertions are removed. > # {{getLeaderId}} method is rewritten as provided above. > # replicas are stopped instead of raft-clients from the map. -- This message was sent by Atlassian Jira (v8.20.10#820010)