[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Release Note: Disk usage summaries previously incorrectly counted files twice if they had been renamed (including files moved to Trash) since being snapshotted. Summaries now include current data plus snapshotted data that is no longer under the directory either due to deletion or being moved outside of the directory. (was: Disk usage summaries previously incorrectly counted files twice if they had been renamed since being snapshotted. Summaries now include current data plus snapshotted data that is no longer under in the directory either due to deletion or being moved outside of the directory.) > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.8.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Release Note: Disk usage summaries previously incorrectly counted files twice if they had been renamed since being snapshotted. Summaries now include current data plus snapshotted data that is no longer under in the directory either due to deletion or being moved outside of the directory. Thanks [~xiaochen]. I added a release note... > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.8.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-10797: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 2.8.0 Status: Resolved (was: Patch Available) Committed this to trunk, branch-2 and branch-2.8. Thanks Sean for the great work here and Jing for the helpful reviews! I think this is just a bug fix, and the change in {{du}} output is considered incorrect previously. So not marking this incompatible. [~mackrorysd], do you mind adding a short release note to the jira? Thanks > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.8.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-10797: - Component/s: snapshots > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.8.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-10797: - Affects Version/s: 2.8.0 > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.8.0 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.010.patch > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.010.patch > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch, HDFS-10797.010.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.009.patch Fixing style issues, and always using getCounts() to retrieve counts for thread safety (which I noticed actually is important here, due to the yield() function). > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch, > HDFS-10797.009.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.008.patch Attaching a patch that address the findbugs warnings. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.007.patch Thanks, [~xiaochen]. Removed the count variable and fixed the Javadoc. I can actually remove a lot of code from INodeDirectory since it's no longer doing any actual counting in that loop - it's just identifying INodes that might be deleted and need to be looked at at the end, so we don't need to be swapping around counts objects there. Where that's now being done is in ContentSummaryComputationContext#tallyDeletedSnapshottedINodes, and there I was adding the counts from deletedNodes and add it to counts, and taking snapshotCounts from those nodes and adding them to snapshotCounts. As you point out, I should have added counts to both, like INodeDirectory was doing before. TestDFSShell and TestRenamesWithSnapshots now both working... > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch, HDFS-10797.007.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.006.patch Great! During manual testing, I found a bug where if you snapshot a file, append to it and then rename it, space consumed suddenly decreases: the reference to the "deleted" node in the snapshot diff was getting counted first, and the current state of the file was then ignored. I changed the approach so summarization of any deleted nodes is deferred until the end - then we can choose whether or not to include them only once we have all the context we need. Also added a test case that would've caught the bug I found and cleaned up the code a bunch. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Status: Patch Available (was: Open) > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch, > HDFS-10797.006.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.005.patch Now attaching a patch that should give what I think we agree are the completely correct semantics: - du -s in a given directory will yield the space consumed by all snapshots and current files in the relevant directory, or below, regardless of subsequent renames or deletions. - du -s in a parent directory may yield a smaller value than the sum of all du -s results in child directories, if files have been snapshotted in one child directory and then move to another. There is overlap in the space consumed by each directory in this case. - In no cases is any INode counted twice in the context of a single du -s computation. In my opinion, these semantics are the most correct and least surprising to users, and they are consistent. If I understand your first reply correctly, I *think* you would agree with this, [~jingzhao]? And the implementation resolves the inconsistency we were trying to avoid. Let me know if I've misunderstood anything... Attaching the patch, but going to do a little more manual testing, tinkering and thinking about this before "submitting" again since this is a very different approach from my previous patches. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.004.patch Attaching the patch I meant to attach yesterday: this adds test cases for various inter-directory rename cases. In this patch, some of the test cases (specifically, the ones that check the space used by multiple directories) are expecting incorrect output. Also moved my logic outside of the loop so it applied to a large scope, and indeed - more of the test cases were now behaving correctly, but still not 100% correctly. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch, HDFS-10797.004.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Status: Open (was: Patch Available) > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.003.patch Thanks for the review, [~xiaochen]. I do think that test is worth adding. Attaching a patch with that (it passes) and a timeout. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch, > HDFS-10797.003.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.002.patch Adding a doc comment to the test and fixing the checkstyle issue. I don't believe the 2 test failures are related to my change. > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Assignee: Sean Mackrory Status: Patch Available (was: Open) > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HDFS-10797.001.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10797) Disk usage summary of snapshots causes renamed blocks to get counted twice
[ https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HDFS-10797: - Attachment: HDFS-10797.001.patch Attaching a patch that tries to identify files where the underlying inodes appear in both the DELETED and CREATED portion of the snapshot's diff, and does not count them toward the snapshot's space like it would a simply deleted file. Also added a test case that runs through scenarios like a chain of multiple renames, renaming a file and replacing the original file, and appends (even though they turned out to not have anything to do with the actual bug). > Disk usage summary of snapshots causes renamed blocks to get counted twice > -- > > Key: HDFS-10797 > URL: https://issues.apache.org/jira/browse/HDFS-10797 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Sean Mackrory > Attachments: HDFS-10797.001.patch > > > DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how > much disk usage is used by a snapshot by tallying up the files in the > snapshot that have since been deleted (that way it won't overlap with regular > files whose disk usage is computed separately). However that is determined > from a diff that shows moved (to Trash or otherwise) or renamed files as a > deletion and a creation operation that may overlap with the list of blocks. > Only the deletion operation is taken into consideration, and this causes > those blocks to get represented twice in the disk usage tallying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org