[ https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926017#comment-16926017 ]
Hudson commented on HDDS-1843: ------------------------------ FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17262/]) HDDS-1843. Undetectable corruption after restart of a datanode. (shashikant: rev 469165e6f29a6e7788f218bdbbc3f7bacf26628b) * (edit) hadoop-hdds/common/src/main/proto/DatanodeContainerProtocol.proto * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/ContainerDispatcher.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/HddsDispatcher.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/DispatcherContext.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestSecureContainerServer.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/server/TestContainerServer.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java * (edit) hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/interfaces/Container.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/TestCSMMetrics.java * (edit) hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java > Undetectable corruption after restart of a datanode > --------------------------------------------------- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Affects Versions: 0.5.0 > Reporter: Shashikant Banerjee > Assignee: Shashikant Banerjee > Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1843.000.patch > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org