[ 
https://issues.apache.org/jira/browse/IOTDB-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091091#comment-17091091
 ] 

Houliang Qi commented on IOTDB-606:
-----------------------------------

The operations that can cause the contents of the partition table to change in 
the system are as follows:
1. Add a node;
2. Remove a node;

The main reasons when a node needs to pull metasnapshot are as follows;
1. New nodes are added:
2. Restart after downtime, the meta information of this node is far away from 
the leader.
3. The new network partition node rejoins the cluster, the meta information  of 
this node has been far away from the leader.

For 1, no request will come before the new partition table is applied. So just 
apply the partition table directly.

For 2 and 3, if the request is being routed to this node, because the partition 
table information is old, the metadata obtained by the metamember or datamember 
is also wrong. In this case, the operation will definitely fail, so let the 
upper layer retry.  This node can directly replace the partition table. Before 
the replacement of the partition table is completed, all operations  are 
blocked(emptying flow).

The above is to consider the case of adding only one node or deleting one node 
at a time. Let us consider the case of adding or deleting multiple nodes. Since 
all operations are performed sequentially at the leader node, the leader has 
the newest partition table, Raft guarantees that the partition table given by 
the leader to the follower must be accurate. So in this case, for follower, it 
is the same as the addition and deletion of a node.

Please leave your opinion, thanks.

 

> [Distributed] Replace raw logs in MetaSnapshot
> ----------------------------------------------
>
>                 Key: IOTDB-606
>                 URL: https://issues.apache.org/jira/browse/IOTDB-606
>             Project: Apache IoTDB
>          Issue Type: Improvement
>            Reporter: Tian Jiang
>            Priority: Major
>              Labels: cluster, metadata, snapshot
>
> The current MetaSnapshot is using the simplest way, storing the raw committed 
> logs. It would be more efficient to replace the logs with compact structures 
> like the partition table and other objects that will be affected by meta logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to