[jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM

Elek, Marton (JIRA) Fri, 27 Jul 2018 01:09:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559403#comment-16559403
 ]


Elek, Marton commented on HDDS-245:
-----------------------------------

Thank you very much [~xyao] the review:

{quote}
ContianerReportHandler.java
Line 93-95: should we consolidate this inside node2ContainerMap.processReport()?
After second thoughts, I think a better place might be updating the 
DN2ContainerMap after we emit the replication requestat line 108.
{quote}

emitReplicationRequestEvent is async operation. I would like to be sure that 
during the event processing the node2ContainerMap is up-to-date (as the 
ReplicationManager may use information from the node2ContainerMap). That's the 
reason why I moved this line before the event emitting.

updateDatanodeMap and processReport are two different methods with different 
responsibilities and they could be tested in this level (processReport has a 
lot of low level unit tests). If you think we need one method I can create a 
third one which calls both of them.

{quote}
Node2ContainerMap.java
Line 134/140: we need to check if currentSet is null before invoke removeAll to 
avoid NPE or use the computeIfPresent?
{quote}

Fix me If I am wrong, but newContainers and missingContainers never could be 
null as initialized here. currentSet can not be null as the existence of the 
key is checked in L124 (isKnownDatanode) and containers can not be null as we 
have a precondition check at L122.
I would be happy to add more checks but I don't know which one is reasonable...

{quote}
Node2ContainerMapTest.java should be renamed to TestNode2ContainerMap.java
{quote}

It should be included in the current test. (I renamed the tests and realized 
that some of them are failing which are fixed in the patch.) But I had patches 
in wrong format earlies, let me know it if can't be applied with the rename.

{code}
diff --git 
a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
 
b/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
similarity index 91%
rename from 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
rename to 
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
index 79f1b40db03..4796f4db85a 100644
--- 
a/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/Node2ContainerMapTest.java
+++ 
b/hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/states/TestNode2ContainerMap.java
@@ -38,7 +38,7 @@
{code}

{quote}
TestContainerReportHandler.java
Line 127:130: NIT: unused can be removed.
{quote}

Sure, removed.

{quote}
StorageContainerManager.java
Line 234: this can be removed, the handler has already been added on line 228.
{quote}

Good catch, removed.

{quote}
ReplicationActivityStatus.java
Line 61: I did not follow this. Can you elaborate on this or add some TODO to 
fix it later?
{quote}

The idea was to enable the replication with an async event. But you are right, 
it's strange that we can turn it on by even but can't turn it off. I modified 
the EventHandler<Void> to EventHandler<Boolean>, now it could be 
turned off and on with event where the payload is just a boolean.



[~ljain]: thank you your review as well.

{quote}
ReportResult:58,59 - we can keep the missingContainers and newContainers as 
null.
{quote}

Sometimes (eg. Node2ContainerMap L152) only the newContainers are added. I 
would like to be sure that the sets are not null. Do you think it's a problem 
because a GC pressure? I think (hope?) it's negligible as we create a lot of 
hash sets in Node2ContainerMap anyway and the very short term memory objects 
are handler very well (AFAIK).

{quote}
2. ContainerMapping#getContainerWithPipeline needs to be updated for closed 
container case. For closed containers we need to fetch the datanodes from 
ContainerStateMap and return the appropriate pipeline information.
{quote}

Fix me, If I am wrong, I am trying to rephrase: You say that the 
SCMClientProtocolServer.getContainerWithPipeline won't work until we return the 
pipeline information based on ContainerStateMap.contReplicaMap. And (I guess) 
without that we can't read closed containers.
[~anu]: Do we have separated issue for that?

{quote}
3. START_REPLICATION is currently not fired by any publisher. I guess it will 
be part of another jira?
{quote}

Yes, the event should be sent when we are leaving the chill mode. [~anu]: Do I 
need to create a separated jira or will be handled by HDDS-2?

{quote}
4. We are currently processing the report as soon it is received. Are we 
handling the case when a container is added in one DN and has been removed from 
another DN? In such a case we might be sending out a false replicate event as 
replication count would still match the replication factor.
{quote}

Could you please give more info for this example? If container is added _and_ 
removed (and both are finished) the replication factor should be ok. In case of 
in-progress replications the ReplicationManager (which listens to this events) 
will recalculate the replication number where the in-progress copies are also 
considered. So the calculation here is just a draft to sort the replication 
requests by priority, the ReplicationManager will recalculate the required 
replicas.

> Handle ContainerReports in the SCM
> ----------------------------------
>
>                 Key: HDDS-245
>                 URL: https://issues.apache.org/jira/browse/HDDS-245
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-245.001.patch, HDDS-245.002.patch, 
> HDDS-245.003.patch
>
>
> HDDS-242 provides a new class ContainerReportHandler which could handle the 
> ContainerReports from the SCMHeartbeatDispatchere.
> HDDS-228 introduces a new map to store the container -> datanode[] mapping
> HDDS-199 implements the ReplicationManager which could send commands to the 
> datanodes to copy the datanode.
> To wire all these components, we need to put implementation to the 
> ContainerReportHandler (created in HDDS-242).
> The ContainerReportHandler should process the new ContainerReportForDatanode 
> events, update the containerStateMap and node2ContainerMap and calculate the 
> missing/duplicate containers and send the ReplicateCommand to the 
> ReplicateManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDDS-245) Handle ContainerReports in the SCM

Reply via email to