[ https://issues.apache.org/jira/browse/HDFS-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16591765#comment-16591765 ]
Adam Antal commented on HDFS-13818: ----------------------------------- Thanks for the review, [~gabor.bota]. I’ll incorporate the changes for the next patch. As for the broader questions: The PBImageTextWriter’s assumption is that the fsimage is not corrupted. When a parent INode is not found it is considered to be a Reference (so it originates from INodeReferenceSection) - but in a corrupted case, non-reference INode’s parent can be missing as well, and it is mistakenly counted among the snapshots. I tried to overcome this by outputting the missing INodes in the afterOutput() after the original output() function, but some details (like the parentPath) is not written out, although the data is available. It needs further work, probably the cases where IgnoreSnapshotException is thrown must be split to distinguish real snapshots from corruptions. After this change we may have a clearer look on the functionality, and I can start working on the memory footprint and other questions. Thanks for the offline discussion, [~zvenczel] regarding the tests. As I see it, for functional testing, it is sufficient to amend the existing ones, but getting more into detail it is reasonable to have tests for PBImageDelimitedTextWriter, the existing Delimited processor as well as they extend the same core. This may require extra work, and another jira issue could address it. I add some unit test anyways, but for sake of completeness I wonder if we should do this or not. I’ll start working on the doc as well, after missing points has been cleared out. I also uploaded some sort of documentation to the existing functionality - may extend it when uploading newer patches. > Extend OIV to detect FSImage corruption > --------------------------------------- > > Key: HDFS-13818 > URL: https://issues.apache.org/jira/browse/HDFS-13818 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Reporter: Adam Antal > Assignee: Adam Antal > Priority: Major > Attachments: HDFS-13818.001.patch > > > A follow-up Jira for HDFS-13031: an improvement of the OIV is suggested for > detecting corruptions like HDFS-13101 in an offline way. > The reasoning is the following. Apart from a NN startup throwing the error, > there is nothing in the customer's hand that could reassure him/her that the > FSImages is good or corrupted. > Although real full checking of the FSImage is only possible by the NN, for > stack traces associated with the observed corruption cases the solution of > putting up a tertiary NN is a little bit of overkill. The OIV would be a > handy choice, already having functionality like loading the fsimage and > constructing the folder structure, we just have to add the option of > detecting the null INodes. For e.g. the Delimited OIV processor can already > use in disk MetadataMap, which reduces memory consumption. Also there may be > a window for parallelizing: iterating through INodes for e.g. could be done > distributed, increasing efficiency, and we wouldn't need a high mem-high CPU > setup for just checking the FSImage. > The suggestion is to add a --detectCorruption option to the OIV which would > check the FSImage for consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org