On Thu, Jan 14, 2010 at 12:43 AM, Erik Forsberg <forsb...@opera.com> wrote: > Hi! > > I'm having trouble figuring out the numbers reported by 'hadoop dfs > -dus' versus the numbers reported by the namenode web interface. > > I have a 4 node clusters, 4TB of disk on each node. > > hadoop dfs -dus / > hdfs://hdp01-01:9000/ 1691626356288 > > Numbers on datanode web interface: > > Capacity : 14.13 TB > DFS Remaining : 1.41 TB > DFS Used : 11.88 TB > > My default replication level is 3, but the bulk of my files have their > replication level set to two. So looking at the 'dfs -dus' number, in > the worst case, I think I should be using 1691626356288*3=5074879068864 > bytes, i.e. approx 5TB, not 11.88 as the web interface reports. > > fsck seems happy: > > Status: HEALTHY > Total size: 1691626405661 B > Total dirs: 11780 > Total files: 82137 (Files currently being written: 1) > Total blocks (validated): 84054 (avg. block size 20125471 B) > Minimally replicated blocks: 84054 (100.0 %) > Over-replicated blocks: 6 (0.007138268 %) > Under-replicated blocks: 1 (0.0011897114 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 2.731268 > Corrupt blocks: 0 > Missing replicas: 6 (0.0026135363 %) > Number of data-nodes: 4 > Number of racks: 1 > > This is on 0.18.3/Cloudera. > > I've also verified that the bulk on the data on my disks are under the > hadoop/dfs/data/current directory on each disk. > > Clearly I'm misunderstanding something, or there's something weird > going on. Hints?
Hey Erik, Are there a lot of files in the tmp directories in dfs.data.dir on each data node? What does du (on the host) for these directories report? This might be HDFS-821. dfsadmin -report output would be useful as well. Thanks, Eli