Hi! I'm having trouble figuring out the numbers reported by 'hadoop dfs -dus' versus the numbers reported by the namenode web interface.
I have a 4 node clusters, 4TB of disk on each node. hadoop dfs -dus / hdfs://hdp01-01:9000/ 1691626356288 Numbers on datanode web interface: Capacity : 14.13 TB DFS Remaining : 1.41 TB DFS Used : 11.88 TB My default replication level is 3, but the bulk of my files have their replication level set to two. So looking at the 'dfs -dus' number, in the worst case, I think I should be using 1691626356288*3=5074879068864 bytes, i.e. approx 5TB, not 11.88 as the web interface reports. fsck seems happy: Status: HEALTHY Total size: 1691626405661 B Total dirs: 11780 Total files: 82137 (Files currently being written: 1) Total blocks (validated): 84054 (avg. block size 20125471 B) Minimally replicated blocks: 84054 (100.0 %) Over-replicated blocks: 6 (0.007138268 %) Under-replicated blocks: 1 (0.0011897114 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.731268 Corrupt blocks: 0 Missing replicas: 6 (0.0026135363 %) Number of data-nodes: 4 Number of racks: 1 This is on 0.18.3/Cloudera. I've also verified that the bulk on the data on my disks are under the hadoop/dfs/data/current directory on each disk. Clearly I'm misunderstanding something, or there's something weird going on. Hints? Thanks, \EF -- Erik Forsberg <forsb...@opera.com> Developer, Opera Software - http://www.opera.com/