[ https://issues.apache.org/jira/browse/HDDS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
runzhiwang updated HDDS-3481: ----------------------------- Description: *What's the problem ?* As the image shows, scm ask 31 datanodes to replicate container 2037 every 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the replicate num of container 2037 is 12, then it ask 11 datanodes to delete container 2037. !screenshot-1.png! !screenshot-2.png! *What's the reason ?* Replicate container cost a long time, so if it cannot finish in 10 minutes, scm check the replicate num less than 3, than scm ask another datanode to replicate the container. 19 of 31 datanodes replicate container from the same source datanode, it will also cause big pressure on the source datanode and replicate container become slower. Actually it cost 4 hours to finish the first replicate. was: *What's the problem ?* As the image shows, scm ask 31 datanodes to replicate container 2037 every 10 minutes. !screenshot-1.png! *What's the reason ?* Replicate container cost a long time, so if it cannot finish in 10 minutes, scm check the replicate num less than 3, than scm ask another datanode to replicate the container. 19 of 31 datanodes replicate container from the same source datanode, it will also cause big pressure on the source datanode and replicate container become slower. Actually it cost 4 hours to finish the first replicate. > SCM ask 31 datanodes to replicate the same container > ---------------------------------------------------- > > Key: HDDS-3481 > URL: https://issues.apache.org/jira/browse/HDDS-3481 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > Attachments: screenshot-1.png, screenshot-2.png > > > *What's the problem ?* > As the image shows, scm ask 31 datanodes to replicate container 2037 every > 10 minutes from 2020-04-17 23:38:51. And at 2020-04-18 08:58:52 scm find the > replicate num of container 2037 is 12, then it ask 11 datanodes to delete > container 2037. > !screenshot-1.png! > !screenshot-2.png! > *What's the reason ?* > Replicate container cost a long time, so if it cannot finish in 10 minutes, > scm check the replicate num less than 3, than scm ask another datanode to > replicate the container. 19 of 31 datanodes replicate container from the > same source datanode, it will also cause big pressure on the source datanode > and replicate container become slower. Actually it cost 4 hours to finish the > first replicate. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org