[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

Byron Wong (JIRA) Tue, 14 Oct 2014 18:11:25 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171831#comment-14171831
 ]


Byron Wong commented on HADOOP-6857:
------------------------------------

In the case when a directory /D and snapshot S are in the exact same state 
(e.g. a fresh snapshot has been made), everything works fine, meaning the sum 
of the disk consumed numbers reported by -du /D equals the disk consumed number 
reported by -du -s /D.
When /D and S start deviating (files getting renamed, deleted, etc.), the disk 
consumed calculation will  take the lastFileSize within the snapshots, find the 
maximum replication factor for that file within the snapshots, multiply the 2 
together, and increment disk consumed by that number, which inflates the total 
disk consumed calculation, so -du -s /D > the sum of numbers in -du /D.

I'd also like to point out that this implementation only takes replication 
factor of a file into account, even if that replication factor is greater than 
number of data nodes, which further inflates the -du calculation. For example, 
if we setrep 10 a file when we only have 3 datanodes, -du will still multiply 
fileLength * 10, and report that number.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>            Assignee: Byron Wong
>         Attachments: HADOOP-6857.patch, show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  
> Since replication level is per file level, it would be nice to add raw disk 
> usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?). 
>  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

Reply via email to