Re: Could only be replicated to 0 nodes, instead of 1

Brian Bockelman Thu, 21 May 2009 13:33:03 -0700


On May 21, 2009, at 3:10 PM, Stas Oskin wrote:

Hi.
If this analysis is right, I would add it can happen even on largeclusters!
I've seen this error at our cluster when we're very full (>97%) andveryfew nodes have any empty space. This usually happens because wehave twovery large nodes (10x bigger than the rest of the cluster), andHDFS tendsto distribute writes randomly -- meaning the smaller nodes fill upquickly,
until the balancer can catch up.
A bit of topic, do you ran the balancer manually? Or you have somescheduler
that does it?

crontab does it for us, once an hour. We're always importing data, sothe cluster is always out-of-balance.


If the previous balancer didn't exit, the new one will simply exit.

The real trick has been to make sure the balancer doesn't get stuck --a Nagios plugin makes sure that the stdout has been printed to in thelast hour or so, otherwise it kills the running balancer. Stuckbalancers have been an issue in the past.


Brian

Re: Could only be replicated to 0 nodes, instead of 1

Reply via email to