[ https://issues.apache.org/jira/browse/HDFS-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604571#comment-16604571 ]
Gabor Bota commented on HDFS-13818: ----------------------------------- Thanks for working on this [~adam.antal]. This feature is starting to look great. I've noticed the following while looking into HDFS-13818.003.patch: * asflicense is missing in {{Corruption}} class * Please consider a better name for the {{Corruption}} class - like {{PbImageCorruption}}. * For Preconditions.checkState in Corruption: please add the error message, what was a failure. We could also consider using {{assert}} for this purpose. * It seems like CorruptionType could be an enum. Maybe we could even use a Set of those enums for different kinds of Corruption * Code structuring: {{OutputEntryBuilder}} could be in the {{PBImageCorruptionDetector}} - it will be the part of it that logic, and we could use {{Corruption}} just for storing data * Please extend the docs in {{hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md}} with the description of this feature * Fix checkstyle issue. There's a [link for it in the Hadoop QA's comment|https://builds.apache.org/job/PreCommit-HDFS-Build/24941/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt] > Extend OIV to detect FSImage corruption > --------------------------------------- > > Key: HDFS-13818 > URL: https://issues.apache.org/jira/browse/HDFS-13818 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Reporter: Adam Antal > Assignee: Adam Antal > Priority: Major > Attachments: HDFS-13818.001.patch, HDFS-13818.002.patch, > HDFS-13818.003.patch, HDFS-13818.003.patch, > OIV_CorruptionDetector_processor.001.pdf, > OIV_CorruptionDetector_processor.002.pdf > > > A follow-up Jira for HDFS-13031: an improvement of the OIV is suggested for > detecting corruptions like HDFS-13101 in an offline way. > The reasoning is the following. Apart from a NN startup throwing the error, > there is nothing in the customer's hand that could reassure him/her that the > FSImages is good or corrupted. > Although real full checking of the FSImage is only possible by the NN, for > stack traces associated with the observed corruption cases the solution of > putting up a tertiary NN is a little bit of overkill. The OIV would be a > handy choice, already having functionality like loading the fsimage and > constructing the folder structure, we just have to add the option of > detecting the null INodes. For e.g. the Delimited OIV processor can already > use in disk MetadataMap, which reduces memory consumption. Also there may be > a window for parallelizing: iterating through INodes for e.g. could be done > distributed, increasing efficiency, and we wouldn't need a high mem-high CPU > setup for just checking the FSImage. > The suggestion is to add a --detectCorruption option to the OIV which would > check the FSImage for consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org