Re: Could only be replicated to 0 nodes, instead of 1

Raghu Angadi Thu, 21 May 2009 12:24:47 -0700

Brian Bockelman wrote:

On May 21, 2009, at 2:01 PM, Raghu Angadi wrote:
I think you should file a jira on this. Most likely this is what ishappening :
* two out of 3 dns can not take anymore blocks.
* While picking nodes for a new block, NN mostly skips the third dn aswell since '# active writes' on it is larger than '2 * avg'.* Even if there is one other block is being written on the 3rd, it isstill greater than (2 * 1/3).
To test this, if you write just one block to an idle cluster it shouldsucceed.
Writing from the client on the 3rd dn succeeds since local node isalways favored.
This particular problem is not that severe on a large cluster but HDFSshould do the sensible thing.
Hey Raghu,
If this analysis is right, I would add it can happen even on largeclusters! I've seen this error at our cluster when we're very full(>97%) and very few nodes have any empty space. This usually happensbecause we have two very large nodes (10x bigger than the rest of thecluster), and HDFS tends to distribute writes randomly -- meaning thesmaller nodes fill up quickly, until the balancer can catch up.

Yes. This would bite when ever a large portion of nodes can not acceptblocks. In general can happen whenever less than half the nodes have anyspace left.


Raghu.

Re: Could only be replicated to 0 nodes, instead of 1

Reply via email to