[jira] [Updated] (HADOOP-11666) Revert the format change of du output introduced by HADOOP-6857
[ https://issues.apache.org/jira/browse/HADOOP-11666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-11666: Status: Patch Available (was: Open) Revert the format change of du output introduced by HADOOP-6857 --- Key: HADOOP-11666 URL: https://issues.apache.org/jira/browse/HADOOP-11666 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Akira AJISAKA Assignee: Byron Wong Attachments: HADOOP-6857-revert.patch HADOOP-6857 did two things about `du` at the same time. * Fix a bug for querying snapshottable directory * Change the output format (incompatible change) This issue is to revert the latter from branch-2 for keeping compatibility. The bug fix is left. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-6857: --- Attachment: HADOOP-6857-revert.patch Attached HADOOP-6857-revert.patch. Reverted only the changes that involved command line output. FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Fix For: 2.7.0 Attachments: HADOOP-6857-revert.patch, HADOOP-6857.patch, HADOOP-6857.patch, HADOOP-6857.patch, revert-HADOOP-6857-from-branch-2.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345898#comment-14345898 ] Byron Wong commented on HADOOP-6857: Yea, I agree. The bug fix should remain, but the output changes can be reverted. FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Fix For: 2.7.0 Attachments: HADOOP-6857.patch, HADOOP-6857.patch, HADOOP-6857.patch, revert-HADOOP-6857-from-branch-2.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-6857: --- Attachment: HADOOP-6857.patch Added new patch to fix TestHDFSCLI. I don't think the failure in TestWebHDFSFOrHA has to do with my changes. It passed locally for me. FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, HADOOP-6857.patch, HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-6857: --- Target Version/s: 2.7.0 Hadoop Flags: (was: Incompatible change) Status: Patch Available (was: Reopened) FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-6857: --- Attachment: HADOOP-6857.patch Attached new patch. This patch addresses Scenario 2. Snapshot$Root should computeContentSummary based on its snapshotId rather than the current state. Added unit test to verify this case. FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172771#comment-14172771 ] Byron Wong commented on HADOOP-6857: *Scenario 2*: we still have snapshottable directory /test with same file a. We then create a fresh snapshot ss1. We then run {{hadoop fs -rm -skipTrash /test/a}}. {{hadoop fs -du /test}} gives an empty output, as expected. {{hadoop fs -du -s /test}} outputs: {code} 41 123 /test {code} which makes sense, given that we know about the existence of the snapshot. However, when we run {{hadoop fs -du -s /test/.snapshot/ss1}}, we get: {code} 0 0 /test/.snapshot/ss1 {code} This is inconsistent with the numbers we get when we run {{hadoop fs -du /test/.snapshot/ss1}}: {code} 41 123 /test/.snapshot/ss1/a {code} Upon further investigation, we see that running {{hadoop fs -du -s /test/.snapshot/anySnapshot}} gives us the information about the current state of the real directory. This means that {{hadoop fs -du -s /test/.snapshot/anySnapshot}} is equivalent to running {{hadoop fs -du /test/}} and adding the numbers up, which is non-intuitive. For example, let's add a 2 byte, 3 replication file /test/1 (/test/a is still deleted). Now {{hadoop fs -du -s /test/.snapshot/ss1}} gives us: {code} 2 6 /test/.snapshot/ss1 {code} whereas the results of {{hadoop fs -du /test/.snapshot/ss1}} remains the same: {code} 41 123 /test/.snapshot/ss1/a {code} FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14171831#comment-14171831 ] Byron Wong commented on HADOOP-6857: In the case when a directory /D and snapshot S are in the exact same state (e.g. a fresh snapshot has been made), everything works fine, meaning the sum of the disk consumed numbers reported by -du /D equals the disk consumed number reported by -du -s /D. When /D and S start deviating (files getting renamed, deleted, etc.), the disk consumed calculation will take the lastFileSize within the snapshots, find the maximum replication factor for that file within the snapshots, multiply the 2 together, and increment disk consumed by that number, which inflates the total disk consumed calculation, so -du -s /D the sum of numbers in -du /D. I'd also like to point out that this implementation only takes replication factor of a file into account, even if that replication factor is greater than number of data nodes, which further inflates the -du calculation. For example, if we setrep 10 a file when we only have 3 datanodes, -du will still multiply fileLength * 10, and report that number. FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong reassigned HADOOP-6857: -- Assignee: Byron Wong FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Assignee: Byron Wong Attachments: HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Byron Wong updated HADOOP-6857: --- Attachment: HADOOP-6857.patch Updated [~atm]'s patch to patch on top of trunk. I'd like to point out that creating a snapshot of a directory will increase the disk usage number reported by -du (-s). Is this what we want? FsShell should report raw disk usage including replication factor - Key: HADOOP-6857 URL: https://issues.apache.org/jira/browse/HADOOP-6857 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Alex Kozlov Attachments: HADOOP-6857.patch, show-space-consumed.txt Currently FsShell report HDFS usage with hadoop fs -dus path command. Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe hadoop fs -dus -raw path?). This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)