[ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7682:
-------------------------------
    Attachment: HDFS-7682.001.patch

Hi [~jingzhao],

Thanks for looking at this.

isLastBlockComplete() covers the case where it's a snapshot path as well as a 
closed non-snapshot path. The file length is correct in both those cases so 
it's ok to use that. In the case of a still-being-written file, then 
isLastBlockComplete() returns false and the code works just same as it does 
today. The particular case that this patch is fixing is that a snapshotted file 
is frozen, so the file length is the limit of what should be checksummed, not 
the block lengths (which include the non-snapshotted portion). I've added more 
assertions in the test to demonstrate this.

In other words, the behavior for non-snapshotted files that are still open (and 
possibly being appended to) is not changed by this patch, only that of 
snapshotted files, for which isLastBlockComplete() is a valid check.

HDFS-5343 took a similar approach.


> {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
> non-snapshotted content
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7682
>                 URL: https://issues.apache.org/jira/browse/HDFS-7682
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch
>
>
> DistributedFileSystem#getFileChecksum of a snapshotted file includes 
> non-snapshotted content.
> The reason why this happens is because DistributedFileSystem#getFileChecksum 
> simply calculates the checksum of all of the CRCs from the blocks in the 
> file. But, in the case of a snapshotted file, we don't want to include data 
> in the checksum that was appended to the last block in the file after the 
> snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to