[ 
https://issues.apache.org/jira/browse/IGNITE-22774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Efremov updated IGNITE-22774:
-------------------------------------
    Epic Link: IGNITE-22313

> Remove raftClient map from ItTxTestCluster
> ------------------------------------------
>
>                 Key: IGNITE-22774
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22774
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Mikhail Efremov
>            Assignee: Mikhail Efremov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> The goal is to remove {{raftClient}} from {{ItTxTestCluster}} due to all 
> raft-clients are created inside replicas through 
> {{ReplicaManager#startReplica}}, and most of usages of {{raftClient}} are 
> effectively useless: this map is used in the single common case: 
> {code:title=|language=java|collapse=false}assertSame(
>     txTestCluster.raftClients.get(ACC_TABLE_NAME).get(0).clusterService(),
>     txTestCluster.getLeader(ACC_TABLE_NAME).service()
> );
> {code}
> Effectively, the first statement get raft-client for a table and it's 0th 
> partition, then extract it's {{ClusterService}}. The second statement inside 
> of {{#getLeader}} do the similar thing:
> {code:title=|language=java|collapse=false}protected Loza getLeader(String 
> tableName) {
>     var services = raftClients.get(tableName);
>     Peer leader = services.get(0).leader();
>     assertNotNull(leader);
>     return raftServers.get(leader.consistentId());
> }
> {code}
> Thus, the second statement means that we get a {{ClusterService}} from 
> {{Loza}} instance that related to leader's raft-client.
> But when we put clients inside the map, we put only raft-clients from leaders 
> of partition's replication group:
> {code:title=|language=java|collapse=false}if (startClient) {
>       RaftGroupService service = RaftGroupServiceImpl
>             .start(grpId, client, FACTORY, raftConfig, membersConf, true, 
> executor, commandsMarshaller)
>             .get(5, TimeUnit.SECONDS);
>     clients.put(p, service);
> } else {
>     // Create temporary client to find a leader address.
>     ClusterService tmpSvc = cluster.get(0);
>     RaftGroupService service = RaftGroupServiceImpl
>             .start(grpId, tmpSvc, FACTORY, raftConfig, membersConf, true, 
> executor, commandsMarshaller)
>             .get(5, TimeUnit.SECONDS);
>     Peer leader = service.leader();
>     service.shutdown();
>     ClusterService leaderSrv = cluster.stream()
>             .filter(cluster -> 
> cluster.topologyService().localMember().name().equals(leader.consistentId()))
>             .findAny()
>             .orElseThrow();
>     RaftGroupService leaderClusterSvc = RaftGroupServiceImpl
>             .start(grpId, leaderSrv, FACTORY, raftConfig, membersConf, true, 
> executor, commandsMarshaller)
>             .get(5, TimeUnit.SECONDS);
>     clients.put(p, leaderClusterSvc);
> }
> {code}
> The first branch relates to tests that requires separated node as transaction 
> coordinator and legacy approach demanded for separated raft-client. Now it 
> all works on replication layer and the first branch will be fully removed 
> from [https://issues.apache.org/jira/browse/IGNITE-22691]. Then, the second 
> branch create temporal raft-client just to find out a leader peer, and then 
> get leader's {{ClusterService}} and create an instance of raft-client based 
> on found service. Again, after before-mentioned ticket there wouldn't any 
> raft-clients instantiations, but still the logic after this branch is get a 
> leader's raft-client and save it to the map.
> Then, the first statement get a leader {{ClusterService}} from the map for 
> given table and 0th partition, and then the second statement get {{Loza}} 
> instance for leader's node consistent ID and and ask it for a 
> {{ClusterService}}. The both services are always identical because it's the 
> same replication layer leader node. Then, the assertion is redundand and all 
> the code with {{raftClients}} and {{#leader()}} are the same.
> The only place that didn't mention, but shouldn't be removed, but rewritten 
> is {{getLeaderId}} that is required by 
> {{ItTxDistributedTestThreeNodesThreeReplicas#testPrimaryReplicaDirectUpdateForExplicitTxn}}
>  test, returns more explicit by method's name {{Peer}} type. The test is 
> internally blocking message handling on RAFT replication layer before entity 
> will be written in RAFT-log, and then it must be done exactly on RAFT-group 
> leader. Then, taking into the account {{raftClients}} removal, the method's 
> body should be:
> {code:title=|language=java|collapse=false}
> int partId = 0;
> return replicaManagers.get(extractConsistentId(cluster.get(partId)))
>                 .replica(new 
> TablePartitionId(tables.get(tableName).tableId(), partId))
>                 .thenApply(replica -> replica.raftClient().leader())
>                 .join();
> {code}
> The function {{extractConsistentId(cluster.get(partId))}} is equivalent to 
> {{cluster.get(partId).topologyService().localMember().name()}} and {{tables}} 
> is a mapping of tables' names to {{TableImpl}} instances that are inserted to 
> the map in {{ItTxTestCluster#startTable}} right after instantiation.
> And the last place the map is used is raft-clients stopping process, that 
> should be replaced (actually it should be done much earlier) with replicas 
> stopping process for every possible table-partition pairs:
> {code:title=|language=java|collapse=false}
> for (String consistentId : clusterServices.keySet()) {
>     ReplicaManager replicaManager = replicaManagers.get(consistentId);
>     if (replicaManager == null) {
>         continue;
>     }
>     grpIds.forEach(id -> {
>         try {
>             replicaManager.stopReplica(id);
>         } catch (NodeStoppingException e) {
>             // no-op
>         }
>     });
> }
> {code}
> *Motivation*
> The map and related code are redundant, aren't working as intended and should 
> be removed.
> *Definition of done*
> # {{raftClients}} map is removed.
> # {{getLeader}} and related assertions are removed.
> # {{getLeaderId}} method is rewritten as provided above.
> # replicas are stopped instead of raft-clients from the map.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to