[ https://issues.apache.org/jira/browse/HDDS-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072660#comment-17072660 ]
Yiqun Lin commented on HDDS-3241: --------------------------------- {quote} Fix me if I am wrong, but in this case the containers are not unknown but additional replicas are detected (unless the full container is deleted in the mean time). {quote} I mean sometimes DN still contain stale containers that SCM already deleted. {quote} I am not sure if I understood, if some of the containers are valid, but some others are invalid, containers can be deleted. {quote} If we startup a completely wrong SCM, it almost cannot exit safemode I think. So I assume unknown deletion behavior can be safe. But as you mentioned, if only some containers are invalid, it can still be deleted. > Invalid container reported to SCM should be deleted > --------------------------------------------------- > > Key: HDDS-3241 > URL: https://issues.apache.org/jira/browse/HDDS-3241 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Affects Versions: 0.4.1 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > For the invalid or out-updated container reported by Datanode, > ContainerReportHandler in SCM only prints error log and doesn't > take any action. > {noformat} > 2020-03-15 05:19:41,072 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received > container report for an unknown container 37 from datanode > 0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, > networkLocation: /dc2/rack1, certSerialId: null}. > org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container > with id #37 not found. > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542) > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2020-03-15 05:19:41,073 ERROR > org.apache.hadoop.hdds.scm.container.ContainerReportHandler: Received > container report for an unknown container 38 from datanode > 0d98dfab-9d34-46c3-93fd-6b64b65ff543{ip: xx.xx.xx.xx, host: lyq-xx.xx.xx.xx, > networkLocation: /dc2/rack1, certSerialId: null}. > org.apache.hadoop.hdds.scm.container.ContainerNotFoundException: Container > with id #38 not found. > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.checkIfContainerExist(ContainerStateMap.java:542) > at > org.apache.hadoop.hdds.scm.container.states.ContainerStateMap.getContainerInfo(ContainerStateMap.java:188) > at > org.apache.hadoop.hdds.scm.container.ContainerStateManager.getContainer(ContainerStateManager.java:484) > at > org.apache.hadoop.hdds.scm.container.SCMContainerManager.getContainer(SCMContainerManager.java:204) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:85) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:126) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:97) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:46) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {noformat} > Actually SCM should inform Datanode to delete its outdated container. > Otherwise, Datanode will always report this invalid container and this dirty > container data will be always kept in Datanode. Sometimes, we bring back a > node that be repaired and it maybe stores stale data and we should have a way > to auto-cleanup them. > We could have a setting to control this auto-deletion behavior if this is a > little risk approach. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org