On Thu, Jan 14, 2010 at 12:43 AM, Erik Forsberg <forsb...@opera.com> wrote:
> Hi!
>
> I'm having trouble figuring out the numbers reported by 'hadoop dfs
> -dus' versus the numbers reported by the namenode web interface.
>
> I have a 4 node clusters, 4TB of disk on each node.
>
> hadoop dfs -dus /
> hdfs://hdp01-01:9000/   1691626356288
>
> Numbers on datanode web interface:
>
> Capacity        :       14.13 TB
> DFS Remaining   :       1.41 TB
> DFS Used        :       11.88 TB
>
> My default replication level is 3, but the bulk of my files have their
> replication level set to two. So looking at the 'dfs -dus' number, in
> the worst case, I think I should be using 1691626356288*3=5074879068864
> bytes, i.e. approx 5TB, not 11.88 as the web interface reports.
>
> fsck seems happy:
>
> Status: HEALTHY
>  Total size:    1691626405661 B
>  Total dirs:    11780
>  Total files:   82137 (Files currently being written: 1)
>  Total blocks (validated):      84054 (avg. block size 20125471 B)
>  Minimally replicated blocks:   84054 (100.0 %)
>  Over-replicated blocks:        6 (0.007138268 %)
>  Under-replicated blocks:       1 (0.0011897114 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     2.731268
>  Corrupt blocks:                0
>  Missing replicas:              6 (0.0026135363 %)
>  Number of data-nodes:          4
>  Number of racks:               1
>
> This is on 0.18.3/Cloudera.
>
> I've also verified that the bulk on the data on my disks are under the
> hadoop/dfs/data/current directory on each disk.
>
> Clearly I'm misunderstanding something, or there's something weird
> going on. Hints?

Hey Erik,

Are there a lot of files in the tmp directories in dfs.data.dir on
each data node? What does du (on the host) for these directories
report? This might be HDFS-821.  dfsadmin -report output would be
useful as well.

Thanks,
Eli

Reply via email to