Hi Bryan,

It is a bug that we calculate the average including marked-for-decomm.
DNs. Please do log a JIRA for this!

On Fri, Jan 17, 2014 at 4:07 AM, Bryan Beaudreault
<bbeaudrea...@hubspot.com> wrote:
> Running CDH4.2 version of HDFS, I may have found a bug in the
> dfs.namenode.replication.considerLoad feature. I would like to query here
> before entering a JIRA (a quick search was fruitless).
>
> I have an HBase cluster in which I recently replaced 70 smaller servers with
> 22 larger ones.  I added the 22 to the cluster, moved all of the HBase
> regions to the new servers, major compacted to re-write locally, then used
> the HDFS decommission to decommission the 65 smaller servers.
>
> This all worked well, and HBase was happy.
>
> However, later on, after the decommission finished, I tried to write a file
> to HDFS from a node that does NOT have a DataNode running on it (HMaster).
> These operations failed because all 92 servers were being set to excluded.
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 for logs.
>
> Reading through the code, I found that the DefaultBlockPlacementPolicy
> calculates the load average of the cluster by doing:  TotalClusterLoad /
> numNodes.  However, numNodes includes decommissioned nodes (which have 0
> load).  Therefore, the average load is artificially low.  Example:
>
> TotalLoad = 250
> numNodes = 92
> decommissionedNodes = 70
>
> avgLoad = 250/92 = 2.71
> trueAvgLoad = 250 / (92 - 70) = 11.36
>
> Because of this math, all of our remaining 22 nodes were considered
> "overloaded", as they were all more than 2x 2.71.  That, with the
> decommissioned nodes already excluded, results in all servers being
> excluded.
>
> (Looking at logs of my regionservers later I did see that a bunch of writes
> were not able to reach their required replication factor as well, though did
> not fail this spectacularly).
>
> Is this a bug or expected behavior?



-- 
Harsh J

Reply via email to