[ https://issues.apache.org/jira/browse/HDFS-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548469#comment-14548469 ]
Daryn Sharp commented on HDFS-8402: ----------------------------------- The block check feature was only added in 2.7.0. I'll elaborate further on why the exit code for the new feature is completely broken, and I'd say incompatible with posix standards and user expectations. The existing path scan output has a last line of "The filesystem ... is (HEALTHY|CORRUPT)". Fsck client solely checks the last line to determine the exit code. Quite fragile but it is what it is. The new block scan does not emit a final line - which this patch adds. The block scan feature naively relies on the final line of "Block replica ... is (HEALTHY|CORRUPT)". That means fsck returns: # Success even if any of the 1..n-1 scanned blocks have corrupt replicas. It gets worse. # Success even if the last block's 1..n-1 replicas are corrupt. It gets worse. # Success even if the last block's last replica is CORRUPT. Fsck checks "endsWith" CORRUPT which doesn't match "Block replica .. is CORRUPT Reason:blah". # Summary: Always success even if blocks are corrupt. How useful is that? The options for fixing the exit code are either pattern match all lines to determine a block scan's exit code, which makes fsck even more fragile and doesn't fix old clients. Or emit a final line similar to the path scan that allows both old and new fsck clients (compatible?) to return the correct exit code. I chose the later based on it's a 2.7 feature and doubtful anyone is relying on the last line since it means nothing. (The test failure isn't related) > Fsck exit codes are not reliable > -------------------------------- > > Key: HDFS-8402 > URL: https://issues.apache.org/jira/browse/HDFS-8402 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.7.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Attachments: HDFS-8402.patch > > > HDFS-6663 added the ability to check specific blocks. The exit code is > non-deterministically based on the state (corrupt, healthy, etc) of the last > displayed block's last storage location - instead of whether any of the > checked blocks' storages are corrupt. Blocks with decommissioning or > decommissioned nodes should not be flagged as an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)