[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895208#comment-13895208
 ] 

Andrew Ash commented on HDFS-3570:
----------------------------------

Confirmed that this did what I thought it would, and non-DFS used space is 
being taken into account.  Here are my before and after stats when running with 
the default threshold (10%).  The delta between overloaded and underloaded 
isn't exactly at 10% since there's been more activity since the balancer 
finished, but I'm good to go on this.

IP      Capacity        Used    Non DFS used    Used %  Actual Use %
.33     3.22    0.51    1.39    15.84%  27.87%
.35     3.22    1.87    0.20    58.07%  61.92%
.37     3.22    1.79    0.36    55.59%  62.59%
.39     3.22    1.59    0.33    49.38%  55.02%
.41     3.22    0.18    1.91    5.59%   13.74%                          
                                        
IP      Capacity        Used    Non DFS used    Used %  Actual Use %
.33     3.22    0.75    1.32    23.29%  39.47%
.35     3.22    1.64    0.17    50.93%  53.77%
.37     3.22    1.55    0.33    48.14%  53.63%
.39     3.22    1.47    0.31    45.65%  50.52%
.41     3.22    0.52    1.90    16.15%  39.39%


Ready for merging!

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-3570
>                 URL: https://issues.apache.org/jira/browse/HDFS-3570
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 2.0.0-alpha
>            Reporter: Harsh J
>            Assignee: Akira AJISAKA
>            Priority: Minor
>         Attachments: HDFS-3570.2.patch, HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to