Re: NameNode high availability

Steve Loughran Fri, 02 Oct 2009 02:50:50 -0700

Stas Oskin wrote:

Hi.


Could you share the way in which it didn't quite work? Would be valuable

information for the community.


The idea is to have a Xen machine dedicated to NN, and maybe to SNN, which
would be running over DRBD, as described here:
http://www.drbd.org/users-guide/ch-xen.html

The VM will be monitored by heart-beat, which would restart it on another
node when it fails.

I wanted to go that way as I thought it's perfect in case of small cluster,
as then the node can be re-used for other tasks.
Once the cluster grows reasonably, the VM could be migrated to dedicated
machine in live fashion - with minimum downtime.

Problem is, that it didn't work as expected. The Xen over DRBD is just not
reliable, as described. The most basic operation of live domain migration
works only in 50% of cases. Most often the domain migration leaves the DRBD
in read-only status, meaning the domain can't be cleanly shut down - only
killed. This often leads in turn to NN meta-data corruption.

It's probably a quirk of virtualisation, all those clocks and things,causes trouble for any HA protocol running round the cluster. I wouldnot blame Xen, as VMWare and virtualbox are also tricky.

As you have a virtual infrastructure, why not have an image of the 1aryNN, ready to bring up on demand when the NN goes down, pointed at a copyof the NN datasets?

Re: NameNode high availability

Reply via email to