[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

Stephen O'Donnell (Jira) Tue, 12 Nov 2019 04:06:20 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972318#comment-16972318
 ]


Stephen O'Donnell commented on HDDS-2446:
-----------------------------------------

I looked into the code a bit more to double check this area.

The only place outside of tests where a DatanodeInfo object gets created is via 
SCMNodeMananger.register() -> nodeStateManager.addNode() -> Here it creates the 
new datanodeInfo. So far as I can tell, nothing cleans a registered node 
(DatanodeDetails or datanodeInfo) out of SCM except a restart - it will 
remember all nodes which have previously registered with it.

If a node re-registers, the above chain of calls will give a NodeAlreadyExists 
exception on registration, which is caught and a success is still returned to 
the DN.

If a node goes dead, then all its containers will be purged, but if it 
re-registers without being dead, the containers will still be present 
referencing the old DatanodeInfo object, which will not have changed.

One thing we could do, is purge the container list on re-registration, as the 
register command should have a container report which must be processed anyway.

As an aside, I wonder if there is a bug in the re-registration process - the 
way SCM checks if a node has already registered, is to look it up by UUID. If a 
DN is stopped and changes its IP or hostname, but retains the UUID, then it 
will 'register' successfully but the datanodeDetails information will not be 
updated if any of it has changed.

{code}
  public RegisteredCommand register(
      DatanodeDetails datanodeDetails, NodeReportProto nodeReport,
      PipelineReportsProto pipelineReportsProto) {

    InetAddress dnAddress = Server.getRemoteIp();
    if (dnAddress != null) {
      // Mostly called inside an RPC, update ip and peer hostname
      datanodeDetails.setHostName(dnAddress.getHostName());
      datanodeDetails.setIpAddress(dnAddress.getHostAddress());
    }
    try {
      String dnsName;
      String networkLocation;
      datanodeDetails.setNetworkName(datanodeDetails.getUuidString());
      if (useHostname) {
        dnsName = datanodeDetails.getHostName();
      } else {
        dnsName = datanodeDetails.getIpAddress();
      }
      networkLocation = nodeResolve(dnsName);
      if (networkLocation != null) {
        datanodeDetails.setNetworkLocation(networkLocation);
      }
      nodeStateManager.addNode(datanodeDetails);
      clusterMap.add(datanodeDetails);
      addEntryTodnsToUuidMap(dnsName, datanodeDetails.getUuidString());
      // Updating Node Report, as registration is successful
      processNodeReport(datanodeDetails, nodeReport);
      LOG.info("Registered Data node : {}", datanodeDetails);
    } catch (NodeAlreadyExistsException e) {
      if (LOG.isTraceEnabled()) {
        LOG.trace("Datanode is already registered. Datanode: {}",
            datanodeDetails.toString());
      }
    }

    return RegisteredCommand.newBuilder().setErrorCode(ErrorCode.success)
        .setDatanode(datanodeDetails)
        .setClusterID(this.scmStorageConfig.getClusterID())
        .build();
  }
{code}

We should probably open another Jira if this bug is potentially there, but we 
may need to look at re-registration for maintenance mode anyway, as that will 
involve a node going dead, NOT clearing its replicas out, and then it 
registering again.


> ContainerReplica should contain DatanodeInfo rather than DatanodeDetails
> ------------------------------------------------------------------------
>
>                 Key: HDDS-2446
>                 URL: https://issues.apache.org/jira/browse/HDDS-2446
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ContainerReplica object is used by the SCM to track containers reported 
> by the datanodes. The current fields stored in ContainerReplica are:
> {code}
> final private ContainerID containerID;
> final private ContainerReplicaProto.State state;
> final private DatanodeDetails datanodeDetails;
> final private UUID placeOfBirth;
> {code}
> Now we have introduced decommission and maintenance mode, the replication 
> manager (and potentially other parts of the code) need to know the status of 
> the replica in terms of IN_SERVICE, DECOMMISSIONING, DECOMMISSIONED etc to 
> make replication decisions.
> The DatanodeDetails object does not carry this information, however the 
> DatanodeInfo object extends DatanodeDetails and does carry the required 
> information.
> As DatanodeInfo extends DatanodeDetails, any place which needs a 
> DatanodeDetails can accept a DatanodeInfo instead.
> In this Jira I propose we change the DatanodeDetails stored in 
> ContainerReplica to DatanodeInfo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-2446) ContainerReplica should contain DatanodeInfo rather than DatanodeDetails

Reply via email to