[ 
https://issues.apache.org/jira/browse/HDDS-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDDS-4336:
------------------------------
    Status: Patch Available  (was: Open)

> ContainerInfo does not persist BCSID leading to failed replicas reports
> -----------------------------------------------------------------------
>
>                 Key: HDDS-4336
>                 URL: https://issues.apache.org/jira/browse/HDDS-4336
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 1.1.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> If you create a container, and then close it, the BCSID is synced on the 
> datanodes and then the value is updated in SCM via setting the "sequenceID" 
> field on the containerInfo object for the container.
> If you later restart just SCM, the sequenceID becomes zero, and then 
> container reports for the replica fail with a stack trace like:
> {code}
> Exception in thread "EventQueue-ContainerReportForContainerReportHandler" 
> java.lang.AssertionError
>       at 
> org.apache.hadoop.hdds.scm.container.ContainerInfo.updateSequenceId(ContainerInfo.java:176)
>       at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.updateContainerStats(AbstractContainerReportHandler.java:108)
>       at 
> org.apache.hadoop.hdds.scm.container.AbstractContainerReportHandler.processContainerReplica(AbstractContainerReportHandler.java:83)
>       at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.processContainerReplicas(ContainerReportHandler.java:162)
>       at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:130)
>       at 
> org.apache.hadoop.hdds.scm.container.ContainerReportHandler.onMessage(ContainerReportHandler.java:50)
>       at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:81)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> The assertion here is failing, as it does not allow for the sequenceID to be 
> changed on a CLOSED container:
> {code}
>   public void updateSequenceId(long sequenceID) {
>     assert (isOpen() || state == HddsProtos.LifeCycleState.QUASI_CLOSED);
>     sequenceId = max(sequenceID, sequenceId);
>   }
> {code}
> The issue seems to be caused by the serialisation and deserialisation of the 
> containerInfo object to protobuf, as sequenceId never persisted or restored.
> However, I am also confused about how this ever worked, as this is a pretty 
> significant problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to