[ 
https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126297#comment-17126297
 ] 

runzhiwang edited comment on HDDS-3481 at 6/5/20, 1:25 AM:
-----------------------------------------------------------

[~nanda] [~arp] [~elek] Thanks for review.  I think your suggestions are 
reasonable: 
1. increase hdds.scm.replication.event.timeout for example 4 hours 
2. send the status of replicate/delete container commands from datanode to SCM
Besides maybe we should add another config 
hdds.scm.start.replication.event.timeout, for example 5 minutes, to discover 
the slow datanode earlier which has not started replicate in 5 minutes. 

So I prefer the following solution, which got from your suggestions:
1. SCM ask D3 and D4 to replicate C1
2. when hdds.scm.start.replication.event.timeout (5 minutes) timeout, and D3 
has not started to replicate C1 ( this can get from status of replicate), SCM 
can ask D3 to cancel replicate and ask D5 to replicate C1.
3.  Else, when hdds.scm.replication.event.timeout (4 hours) timeout, and D3 has 
not finished replicate, SCM can ask D3 to cancel replicate and ask D5 to 
replicate C1.

[~nanda] Increase hdds.scm.replication.event.timeout can solve this issue, but 
as you said Container C1 will have only one replica during this time period, if 
D3 and D4 can not process the command.  

What do you think ?




was (Author: yjxxtd):
[~nanda] [~arp] [~elek] Thanks for review.  I think your suggestions are 
reasonable: 
1. increase hdds.scm.replication.event.timeout for example 4 hours 
2. send the status of replicate/delete container commands from datanode to SCM
Besides maybe we should add another config 
hdds.scm.start.replication.event.timeout, for example 5 minutes, to discover 
the slow datanode earlier. 

So I prefer the following solution, which got from your suggestions:
1. SCM ask D3 and D4 to replicate C1
2. when hdds.scm.start.replication.event.timeout (5 minutes) timeout, and D3 
has not started to replicate C1 ( this can get from status of replicate), SCM 
can ask D3 to cancel replicate and ask D5 to replicate C1.
3.  Else, when hdds.scm.replication.event.timeout (4 hours) timeout, and D3 has 
not finished replicate, SCM can ask D3 to cancel replicate and ask D5 to 
replicate C1.

[~nanda] Increase hdds.scm.replication.event.timeout can solve this issue, but 
as you said Container C1 will have only one replica during this time period, if 
D3 and D4 can not process the command.  

What do you think ?



> SCM ask 31 datanodes to replicate the same container
> ----------------------------------------------------
>
>                 Key: HDDS-3481
>                 URL: https://issues.apache.org/jira/browse/HDDS-3481
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>            Reporter: runzhiwang
>            Assignee: runzhiwang
>            Priority: Blocker
>              Labels: TriagePending
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>
> *What's the problem ?*
> As the image shows,  scm ask 31 datanodes to replicate container 2037 every 
> 10 minutes from 2020-04-17 23:38:51.  And at 2020-04-18 08:58:52 scm find the 
> replicate num of container 2037 is 12, then it ask 11 datanodes to delete 
> container 2037. 
>  !screenshot-1.png! 
>  !screenshot-2.png! 
> *What's the reason ?*
> scm check whether  (container replicates num + 
> inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3. If less than 3, it 
> will ask some datanode to replicate the container, and add the action into 
> inflightReplication.get(containerId). The replicate action time out is 10 
> minutes, if action timeout, scm will delete the action from 
> inflightReplication.get(containerId) as the image shows. Then (container 
> replicates num + inflightReplication.get(containerId).size() - 
> inflightDeletion.get(containerId).size()) is less than 3 again, and scm ask 
> another datanode to replicate the container.
> Because replicate container cost a long time,  sometimes it cannot finish in 
> 10 minutes, thus 31 datanodes has to replicate the container every 10 
> minutes.  19 of 31 datanodes replicate container from the same source 
> datanode,  it will also cause big pressure on the source datanode and 
> replicate container become slower. Actually it cost 4 hours to finish the 
> first replicate. 
>  !screenshot-4.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to