[ https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Lamb updated HDFS-7682: ------------------------------- Attachment: HDFS-7682.001.patch Hi [~jingzhao], Thanks for looking at this. isLastBlockComplete() covers the case where it's a snapshot path as well as a closed non-snapshot path. The file length is correct in both those cases so it's ok to use that. In the case of a still-being-written file, then isLastBlockComplete() returns false and the code works just same as it does today. The particular case that this patch is fixing is that a snapshotted file is frozen, so the file length is the limit of what should be checksummed, not the block lengths (which include the non-snapshotted portion). I've added more assertions in the test to demonstrate this. In other words, the behavior for non-snapshotted files that are still open (and possibly being appended to) is not changed by this patch, only that of snapshotted files, for which isLastBlockComplete() is a valid check. HDFS-5343 took a similar approach. > {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes > non-snapshotted content > ------------------------------------------------------------------------------------------------ > > Key: HDFS-7682 > URL: https://issues.apache.org/jira/browse/HDFS-7682 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.0 > Reporter: Charles Lamb > Assignee: Charles Lamb > Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch > > > DistributedFileSystem#getFileChecksum of a snapshotted file includes > non-snapshotted content. > The reason why this happens is because DistributedFileSystem#getFileChecksum > simply calculates the checksum of all of the CRCs from the blocks in the > file. But, in the case of a snapshotted file, we don't want to include data > in the checksum that was appended to the last block in the file after the > snapshot was taken. -- This message was sent by Atlassian JIRA (v6.3.4#6332)