[ https://issues.apache.org/jira/browse/HDFS-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15466139#comment-15466139 ]
Akira Ajisaka commented on HDFS-10778: -------------------------------------- Thanks [~linyiqun] for updating the patch. I tried your patch and get the following output: {noformat} [centos@ip-172-31-21-203 conf]$ hdfs oiv -p FileDistribution -format -step 30 -maxSize 300 -i /hadoop/dfs/name/current/fsimage_0000000000000000307 Processed 0 inodes. Size Range NumFiles (0 B, 30 B] 2 (270 B, 300 B] 32 totalFiles = 34 totalDirectories = 13 totalBlocks = 34 totalSpace = 288198 maxFileSize = 160321 {noformat} Actually maxFileSize is 160321 but the output says the file is in {{(270 B, 300 B\]}}. Would you fix it to output (270B, maxFileSize]? If a fsimage includes empty files, the output is as follows: {noformat} Size Range NumFiles (0 B, 0 B] 1 (0 B, 30 B] 2 {noformat} I'm thinking \[0 B, 0 B\] is better than (0 B, 0 B\]. > Optimize the output result of FileDistribution processor in hdfs oiv command > ---------------------------------------------------------------------------- > > Key: HDFS-10778 > URL: https://issues.apache.org/jira/browse/HDFS-10778 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools > Affects Versions: 2.7.1 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Priority: Minor > Attachments: HDFS-10778.001.patch, HDFS-10778.002.patch, > HDFS-10778.003.patch, HDFS-10778.004.patch, HDFS-10778.005.patch > > > Now It's not directly to understand the output result of the > {{FileDistribution}} processor that in hdfs oiv command for users. For > example, this is a original output: > {code} > Size NumFiles > 0 22556 > 1048576 404971 > 2097152 29259 > 3145728 16937 > 4194304 9197 > 5242880 6889 > 6291456 4930 > 7340032 4070 > 8388608 299384 > 9437184 274623 > {code} > Two aspects make that hard to understand for users. > First, the size column just showed as the number in byte, it's not readable > here. The better way is showed with a binary prefix. > Second, the size column would be better to showed as a size range. It will > let users know the value in {{NumFiles}} column was counted from A size to B > size. > The expected output result should be this: > {code} > Size Range NumFiles > (0 B, 0 B] 1666332 > (0 B, 1 M] 778473 > (1 M, 2 M] 35125 > (2 M, 3 M] 13978 > (3 M, 4 M] 10158 > (4 M, 5 M] 6970 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org