[ https://issues.apache.org/jira/browse/HDDS-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shashikant Banerjee reassigned HDDS-1843: ----------------------------------------- Assignee: Shashikant Banerjee (was: Hrishikesh Gadre) > Undetectable corruption after restart of a datanode > --------------------------------------------------- > > Key: HDDS-1843 > URL: https://issues.apache.org/jira/browse/HDDS-1843 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode > Affects Versions: 0.5.0 > Reporter: Shashikant Banerjee > Assignee: Shashikant Banerjee > Priority: Critical > Fix For: 0.5.0 > > > Right now, all write chunks use BufferedIO ie, sync flag is disabled by > default. Also, Rocks Db metadata updates are done in Rocks DB cache first at > Datanode. In case, there comes a situation where the buffered chunk data as > well as the corresponding metadata update is lost as a part of datanode > restart, it may lead to a situation where, it will not be possible to detect > the corruption (not even with container scanner) of this nature in a > reasonable time frame, until and unless there is a client IO failure or Recon > server detects it over time. In order to atleast to detect the problem, Ratis > snapshot on datanode should sync the rocks db file . In such a way, > ContainerScanner will be able to detect this.We can also add a metric around > sync to measure how much of a throughput loss it can incurr. > Thanks [~msingh] for suggesting this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org