Edward Capriolo wrote:


Just for reference. Linux HA and some other tools deal with the split
brain decisions by requiring a quorum. A quorum involves having a
third party or having more then 50% of the nodes agree.

An issue with linux-ha and hadoop is that linux-ha is only
supported/tested on clusters of up to 16 nodes.

Usually odd numbers; stops a 50%-50% split.

That is not a hard
limit, but no one claims to have done it on 1000 or so nodes.

If the voting algorithm requires communication with every node then there is an implicit limit.


You
could just install linux HA on a random sampling of 10 nodes across
your network. That would in theory create an effective quorum.




There are other HA approaches that do not involve DRBD. One is store
your name node table on a SAN or and NFS server. Terracotta is another
option that you might want to look at. But no, at the moment there is
no fail-over built into hadoop.

Storing the only copy of the NN data into NFS would make the NFS server an SPOF, and you still need to solve the problems of -detecting NN failure and deciding who else is in charge -making another node the NN by giving it the same hostname/IPAddr as the one that went down.

That is what the linux HA stuff promises

-steve

Reply via email to