Hi all,

I’m currently working on https://github.com/apache/iotdb/pull/12914

Here one of the problems was, that in a cluster with 3 datanodes, where the 
dn_rpc_address is set to 0.0.0.0, the command “show datanodes” lists each node 
with an address of 0.0.0.0. So, if someone wants to remove a data-node by its 
IP, the cli will not find the corresponding node as it thinks it’s 0.0.0.0. 
However, if you use the cli to delete 0.0.0.0, then it deletes all nodes.

Now my initial fix for this, was, that if a data-node registers and says his 
dn_rpc_address is 0.0.0.0, that instead of this, the IP from which we are 
getting the request is being used (Obviously this one exists and belongs to the 
data-node registering).
The problem is that this usually will be the dn_internal_address instead of the 
dn_rpc_address.

Now we could use that instead, but I think it would reduce the usefulness of 
the “show datanodes” command, because it could be in a cluster-internal 
network, that the client can’t connect to.
So, if a client wants to know which other data-nodes there are in order to 
connect to another one, this might not be helpful.
However, 0.0.0.0 is also not helpful, as I see no chance to be able to connect 
to a data-node using 0.0.0.0:6667 as that’s not really a real Ip address.

So, I think, that possibly instead of maintaining 0.0.0.0 we should replace 
this with the list of public IP addresses the data-node possesses. In this case 
show data-nodes would no longer display only one IP-Address, but a list of 
IP-Addresses.

When removing a data-node (or sending any other commands to it) we could now 
identify a particular data-node as only one will have the IP+Port combination.

What do you think?

Chris

Reply via email to