[ https://issues.apache.org/jira/browse/HDFS-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418942#comment-15418942 ]
Kihwal Lee commented on HDFS-9696: ---------------------------------- It turns out that HDFS-9406 is not related to this issue. The garbage snapshot filediffs with snapshotId=-1 were being generated by a bug fixed in HDFS-7056 by [~zero45]. {code} /** Is this inode in the latest snapshot? */ public final boolean isInLatestSnapshot(final int latestSnapshotId) { - if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) { + if (latestSnapshotId == Snapshot.CURRENT_STATE_ID || + latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) { return false; } {code} [~shv] explained, {quote} (7) Plamen says this is because Snapshot.findLatestSnapshot() may return NO_SNAPSHOT_ID, which breaks recordModification() if you don't have that additional check. We see it when commitBlockSynchronization() is called for truncated block. {quote} We have actually traced the generation of these filediff entries to {{commitBlockSynchronization()}} activities when the NN was running 2.5. This stops in 2.7 thanks to HDFS-7056. However, the garbage lives on until those files are deleted. Can we have a sanity check during snapshot diff loading so that these entries can be discarded? > Garbage snapshot records lingering forever > ------------------------------------------ > > Key: HDFS-9696 > URL: https://issues.apache.org/jira/browse/HDFS-9696 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.2 > Reporter: Kihwal Lee > Priority: Critical > > We have a cluster where the snapshot feature might have been tested years > ago. When the HDFS does not have any snapshot, but I see filediff records > persisted in its fsimage. Since it has been restarted many times and > checkpointed over 100 times since then, it must haven been persisted and > carried over since then. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org