Hi!

I'm having trouble figuring out the numbers reported by 'hadoop dfs
-dus' versus the numbers reported by the namenode web interface.

I have a 4 node clusters, 4TB of disk on each node.

hadoop dfs -dus /
hdfs://hdp01-01:9000/   1691626356288

Numbers on datanode web interface:

Capacity        :       14.13 TB
DFS Remaining   :       1.41 TB
DFS Used        :       11.88 TB

My default replication level is 3, but the bulk of my files have their
replication level set to two. So looking at the 'dfs -dus' number, in
the worst case, I think I should be using 1691626356288*3=5074879068864
bytes, i.e. approx 5TB, not 11.88 as the web interface reports.

fsck seems happy:

Status: HEALTHY
 Total size:    1691626405661 B
 Total dirs:    11780
 Total files:   82137 (Files currently being written: 1)
 Total blocks (validated):      84054 (avg. block size 20125471 B)
 Minimally replicated blocks:   84054 (100.0 %)
 Over-replicated blocks:        6 (0.007138268 %)
 Under-replicated blocks:       1 (0.0011897114 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.731268
 Corrupt blocks:                0
 Missing replicas:              6 (0.0026135363 %)
 Number of data-nodes:          4
 Number of racks:               1

This is on 0.18.3/Cloudera. 

I've also verified that the bulk on the data on my disks are under the
hadoop/dfs/data/current directory on each disk.

Clearly I'm misunderstanding something, or there's something weird
going on. Hints?

Thanks,
\EF
-- 
Erik Forsberg <forsb...@opera.com>
Developer, Opera Software - http://www.opera.com/

Reply via email to