[ https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nanda kumar updated HDDS-4336: ------------------------------ Status: Patch Available (was: Open) > ContainerInfo does not persist BCSID leading to failed replicas reports > ----------------------------------------------------------------------- > > Key: HDDS-4336 > URL: https://issues.apache.org/jira/browse/HDDS-4336 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM > Affects Versions: 1.1.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Labels: pull-request-available > > If you create a container, and then close it, the BCSID is synced on the > datanodes and then the value is updated in SCM via setting the "sequenceID" > field on the containerInfo object for the container. > If you later restart just SCM, the sequenceID becomes zero, and then > container reports for the replica fail with a stack trace like: > {code} > Exception in thread "EventQueue-ContainerReportForContainerReportHandler" > java.lang.AssertionError > at > org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108) > at > org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130) > at > org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > The assertion here is failing, as it does not allow for the sequenceID to be > changed on a CLOSED container: > {code} > public void updateSequenceId(long sequenceID) { > assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED); > sequenceId = max(sequenceID, sequenceId); > } > {code} > The issue seems to be caused by the serialisation and deserialisation of the > containerInfo object to protobuf, as sequenceId never persisted or restored. > However, I am also confused about how this ever worked, as this is a pretty > significant problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org