Hi. Just wanted to reflect my thoughts on this:
So far DRBD looks as a good enough solution. My only problem, is that it requires from me to operate dedicate machines (physical or virtual) for Hadoop Namenode, in active/passive configuration. I'm interesting in HADOOP-4539 mostly because it would enable me to run the Namenode together with other services, and it could open a way to Active/Active HA in Hadoop (as the next iteration of 4539). Regards. 2009/9/21 Steve Loughran <ste...@apache.org> > Edward Capriolo wrote: > > >> Just for reference. Linux HA and some other tools deal with the split >> brain decisions by requiring a quorum. A quorum involves having a >> third party or having more then 50% of the nodes agree. >> >> An issue with linux-ha and hadoop is that linux-ha is only >> supported/tested on clusters of up to 16 nodes. >> > > Usually odd numbers; stops a 50%-50% split. > > That is not a hard >> limit, but no one claims to have done it on 1000 or so nodes. >> > > If the voting algorithm requires communication with every node then there > is an implicit limit. > > > You >> could just install linux HA on a random sampling of 10 nodes across >> your network. That would in theory create an effective quorum. >> > > > > >> There are other HA approaches that do not involve DRBD. One is store >> your name node table on a SAN or and NFS server. Terracotta is another >> option that you might want to look at. But no, at the moment there is >> no fail-over built into hadoop. >> > > Storing the only copy of the NN data into NFS would make the NFS server an > SPOF, and you still need to solve the problems of -detecting NN failure and > deciding who else is in charge > -making another node the NN by giving it the same hostname/IPAddr as the > one that went down. > > That is what the linux HA stuff promises > > -steve >