[ https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625005#comment-17625005 ]
ASF GitHub Bot commented on HDFS-3570: -------------------------------------- ashutoshcipher commented on PR #5044: URL: https://github.com/apache/hadoop/pull/5044#issuecomment-1293292950 Thank you so much. > Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used > space > -------------------------------------------------------------------------------- > > Key: HDFS-3570 > URL: https://issues.apache.org/jira/browse/HDFS-3570 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover > Affects Versions: 2.0.0-alpha > Reporter: Harsh J > Assignee: Ashutosh Gupta > Priority: Minor > Labels: pull-request-available > Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, > HDFS-3570.aash.1.patch > > > Report from a user here: > https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ, > post archived at http://pastebin.com/eVFkk0A0 > This user had a specific DN that had a large non-DFS usage among > dfs.data.dirs, and very little DFS usage (which is computed against total > possible capacity). > Balancer apparently only looks at the usage, and ignores to consider that > non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a > DFS Usage report from DN is 8% only, its got a lot of free space to write > more blocks, when that isn't true as shown by the case of this user. It went > on scheduling writes to the DN to balance it out, but the DN simply can't > accept any more blocks as a result of its disks' state. > I think it would be better if we _computed_ the actual utilization based on > {{(100-(actual remaining space))/(capacity)}}, as opposed to the current > {{(dfs used)/(capacity)}}. Thoughts? > This isn't very critical, however, cause it is very rare to see DN space > being used for non DN data, but it does expose a valid bug. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org