[
https://issues.apache.org/jira/browse/HADOOP-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682916#action_12682916
]
Igor Bolotin commented on HADOOP-5523:
--------------------------------------
DF and DU sizes on the datanode match very closely with information reported by
dfsadmin command.
Lsof reports some 1000 open files in DFS data directories on the problematic
datanode, but total size for open files is only about 10GB.
Here is something interesting - fsck before datanode restart reports very
significant number of over-replicated blocks (~10% of blocks are
over-replicated):
Status: HEALTHY
Total size: 1472758591906 B (Total open files size: 29050588133 B)
Total dirs: 58431
Total files: 375703 (Files currently being written: 418)
Total blocks (validated): 387205 (avg. block size 3803562 B) (Total open
file blocks (not validated): 595)
Minimally replicated blocks: 387205 (100.0 %)
Over-replicated blocks: 38782 (10.015883 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.1003888
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
After datanode restart - over-replicated nodes are practically gone:
Status: HEALTHY
Total size: 1310669475298 B (Total open files size: 29535016933 B)
Total dirs: 59431
Total files: 377177 (Files currently being written: 387)
Total blocks (validated): 386661 (avg. block size 3389712 B) (Total open
file blocks (not validated): 607)
Minimally replicated blocks: 386661 (100.0 %)
Over-replicated blocks: 272 (0.070345856 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0007036
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
> Datanode stops cleaning disk space
> ----------------------------------
>
> Key: HADOOP-5523
> URL: https://issues.apache.org/jira/browse/HADOOP-5523
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.0
> Environment: Linux
> Reporter: Igor Bolotin
> Priority: Critical
>
> Here is the situation - DFS cluster running Hadoop version 0.19.0. The
> cluster is running on multiple servers with practically identical hardware.
> Everything works perfectly well, except for one thing - from time to time one
> of the data nodes (every time it's a different node) starts to consume more
> and more disk space. The node keeps going and if we don't do anything - it
> runs out of space completely (ignoring 20GB reserved space settings).
> Once restarted - it cleans disk rapidly and goes back to approximately the
> same utilization as the rest of data nodes in the cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.