[jira] [Comment Edited] (HDFS-11493) Ozone: SCM: Add the ability to handle container reports

Weiwei Yang (JIRA) Sat, 22 Apr 2017 11:35:39 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980071#comment-15980071
 ]


Weiwei Yang edited comment on HDFS-11493 at 4/22/17 6:34 PM:
-------------------------------------------------------------

Hi [~anu]

Thanks for posting the patch as well as the design doc, looks very nice. I 
haven't read thoroughly about the code yet, just have some quick thoughts that 
hope helps,

*Node State*

Right now node state only has HEALTHY, STALE, DEAD, UNKNOWN. Is it useful to 
add following states as well?
* MAINTENANCE:  admin could bring down a node and set it as "MAINTENANCE" state 
for maintenance, and in this case scm doesn't treat containers on this node as 
missing;
* DECOMMISSIONING and DECOMMISSIONED: admin could gracefully decommission a 
node from a given pool

*Pull Container Report*

Replication manager requests a pool of nodes to send container reports, imagine 
there 3 pools being processed in parallel, does that mean 24 * 3 = 72 nodes 
container report will arrive scm in a wave? Would that cause network problem?

*Scm configuration*

Can we move configuration properties 
OZONE_SCM_CONTAINER_REPORT_PROCCESSING_LAG, 
OZONE_SCM_MAX_CONTAINER_REPORT_THREADS and 
OZONE_SCM_MAX_WAIT_FOR_CONTAINER_REPORTS_SECONDS from {{OzoneConfigKeys}} to 
{{ScmConfigKeys}} ?

*CommandQueue*

Looks like the command queue maintains a list of commands for each datanode, 
suggest to use finer grained lock for synchronization. More specifically, if a 
thread wants to add a command for datanode A, and another thread wants to add a 
command for datanode B, we probably don't want them to wait for the other.

This is a in-memory queue, how to make sure not to run into inconsistent state? 
Imagine if replication manager has just processed container reports from a pool 
and ask a datanode to replicate a container, assume the replication is 
happening in progress. And then scm crashed and restarted, how scm gets to know 
what current state is?

The queue seems to be time ordered, I think it will be better to support 
priority as well. Commands may have different priority, for example, replicate 
a container priority is usually higher than delete a container replica; 
replicate a container also may have different priorities according to  the 
number of replicas in desire.

I will try to read more and comment. Thanks.


was (Author: cheersyang):
Hi [~anu]

Thanks for posting the patch as well as the design doc, looks very nice. I 
haven't read thoroughly about the code yet, just have some quick thoughts that 
hope helps,

*Node pool*

Some general questions, how we handle over-replicated containers; 

*Node State*

Right now node state only has HEALTHY, STALE, DEAD, UNKNOWN. Is it useful to 
add following states as well?
* MAINTENANCE:  admin could bring down a node and set it as "MAINTENANCE" state 
for maintenance, and in this case scm doesn't treat containers on this node as 
missing;
* DECOMMISSIONING and DECOMMISSIONED: admin could gracefully decommission a 
node from a given pool

*Pull Container Report*

Replication manager requests a pool of nodes to send container reports, imagine 
there 3 pools being processed in parallel, does that mean 24 * 3 = 72 nodes 
container report will arrive scm in a wave? Would that cause network problem?

*Scm configuration*

Can we move configuration properties 
OZONE_SCM_CONTAINER_REPORT_PROCCESSING_LAG, 
OZONE_SCM_MAX_CONTAINER_REPORT_THREADS and 
OZONE_SCM_MAX_WAIT_FOR_CONTAINER_REPORTS_SECONDS from {{OzoneConfigKeys}} to 
{{ScmConfigKeys}} ?

*CommandQueue*

Looks like the command queue maintains a list of commands for each datanode, 
suggest to use finer grained lock for synchronization. More specifically, if a 
thread wants to add a command for datanode A, and another thread wants to add a 
command for datanode B, we probably don't want them to wait for the other.

This is a in-memory queue, how to make sure not to run into inconsistent state? 
Imagine if replication manager has just processed container reports from a pool 
and ask a datanode to replicate a container, assume the replication is 
happening in progress. And then scm crashed and restarted, how scm gets to know 
what current state is?

The queue seems to be time ordered, I think it will be better to support 
priority as well. Commands may have different priority, for example, replicate 
a container priority is usually higher than delete a container replica; 
replicate a container also may have different priorities according to  the 
number of replicas in desire.

I will try to read more and comment. Thanks.

> Ozone: SCM:  Add the ability to handle container reports 
> ---------------------------------------------------------
>
>                 Key: HDFS-11493
>                 URL: https://issues.apache.org/jira/browse/HDFS-11493
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>    Affects Versions: HDFS-7240
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>         Attachments: container-replication-storage.pdf, 
> HDFS-11493-HDFS-7240.001.patch
>
>
> Once a datanode sends the container report it is SCM's responsibility to 
> determine if the replication levels are acceptable. If it is not, SCM should 
> initiate a replication request to another datanode. This JIRA tracks how SCM  
> handles a container report.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-11493) Ozone: SCM: Add the ability to handle container reports

Reply via email to