[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice

Sean Mackrory (JIRA) Fri, 30 Sep 2016 10:49:16 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Mackrory updated HDFS-10797:
---------------------------------
    Attachment: HDFS-10797.007.patch

Thanks, [~xiaochen]. Removed the count variable and fixed the Javadoc. I can 
actually remove a lot of code from INodeDirectory since it's no longer doing 
any actual counting in that loop - it's just identifying INodes that might be 
deleted and need to be looked at at the end, so we don't need to be swapping 
around counts objects there. Where that's now being done is in 
ContentSummaryComputationContext#tallyDeletedSnapshottedINodes, and there I was 
adding the counts from deletedNodes and add it to counts, and taking 
snapshotCounts from those nodes and adding them to snapshotCounts. As you point 
out, I should have added counts to both, like INodeDirectory was doing before. 
TestDFSShell and TestRenamesWithSnapshots now both working...

> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --------------------------------------------------------------------------
>
>                 Key: HDFS-10797
>                 URL: https://issues.apache.org/jira/browse/HDFS-10797
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>         Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, 
> HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, 
> HDFS-10797.006.patch, HDFS-10797.007.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how 
> much disk usage is used by a snapshot by tallying up the files in the 
> snapshot that have since been deleted (that way it won't overlap with regular 
> files whose disk usage is computed separately). However that is determined 
> from a diff that shows moved (to Trash or otherwise) or renamed files as a 
> deletion and a creation operation that may overlap with the list of blocks. 
> Only the deletion operation is taken into consideration, and this causes 
> those blocks to get represented twice in the disk usage tallying.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice

Reply via email to