[jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM

Anu Engineer (JIRA) Fri, 27 Jul 2018 12:05:23 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560153#comment-16560153
 ]


Anu Engineer commented on HDDS-245:
-----------------------------------

The changes look excellent. I am a +1 on this patch. Some small comments, most 
of them are non-actionable in this patch. As soon you take a look, we can 
probably commit this patch.
 * *ContainerReportHandler.java#ContainerReportHandler*
 Add a Precondition.NotNull check for \{containerMapping, node2ContainerMap, 
replicationActivityStatus}

 * *ContainerReportHandler.java#onMessage*
{code:java}
  //update state in container db and trigger close container events
      containerMapping.processContainerReports(datanodeOrigin, containerReport);
{code}
I don't think this is what we should do. This function follows the old state 
logic. IMHO, we should replace this completely.  However, we can commit this 
patch and go rewrite {{containerMapping.processContainerReports}} function 
later. The current implementation makes it the code kind of kludge. cc: 
[~nandakumar131].

 * *ContainerReportHandler.java#onMessage*
{code:java}
     Set<ContainerID> containerIds = containerReport.getReportsList().stream()
          .map(containerProto -> containerProto.getContainerID())
          .map(ContainerID::new)
          .collect(Collectors.toSet());
{code}
A nit: Technically a container report should *only* contain unique container 
IDs. But I am not sure if that is enforced on the datanode side. This change 
can loose some IDs. But no action is needed here. Just pointing out something 
that I thought about.

 * ContainerReportHandler.java#onMessage
 I am being generous today with my comments :). So I am going to share some 
history behind this code fragment.

{code:java}
  ReportResult reportResult = node2ContainerMap
          .processReport(datanodeOrigin.getUuid(), containerIds);

      //we have the report, so we can update the states for the next iteration.
      node2ContainerMap
          .updateDatanodeMap(datanodeOrigin.getUuid(), containerIds);
{code}
My original proposal (exactly as the code in this patch) was that we will 
update the node report and return the resultset. [~nandakumar131] while 
reviewing the original design pointed out a strange problem, he said it is 
possible that you might update the Node2Container State, but fail to update the 
state in ContainerStateManager.

If that happens, we will never detect the inconsistency because the 
Node2Container mapping is used to evaluate if we need to update the state of 
ContainerManager. So he proposed that we first compute the difference, then 
update ContainerStateManager and then come back and update the Node2Container 
map.

Bottom line: He proposed that we do:
 # processReport
 # The Loop you have for processing the Missing Containers and new containers
 # The update the updateDatanodeMap – So the change is trivial and it is just 
defensive coding. The actual probability of something like this happening is 
very low. So move line 94 to Line 107 is the change.

 * A future feature request: If we get an IOException during the report 
processing, can we have a metric for that ? not only log it, but a metric that 
we can be used to monitor and flag this issue.
 * Node2ContainerMapping.java – Not sure why we are allocating new HashSets; 
isn't the calling function giving us a set already do we need to make a copy ? 
(Just scrolled up and saw [~xyao] comments) I am pretty sure they cannot be 
null, but it is not a big deal.
 * ReplicationActivityStatus: Great idea, and thanks for supporting this in 
this patch.

> Handle ContainerReports in the SCM
> ----------------------------------
>
>                 Key: HDDS-245
>                 URL: https://issues.apache.org/jira/browse/HDDS-245
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-245.001.patch, HDDS-245.002.patch, 
> HDDS-245.003.patch, HDDS-245.004.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the 
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the 
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the 
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode 
> events, update the containerStateMap and node2ContainerMap and calculate the 
> missing/duplicate containers and send the ReplicateCommand to the 
> ReplicateManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM

Reply via email to