I'm currently running with your option B setup and it seems to be reliable
for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
scripts that handle the failover process, including a virtual IP for the
namenode.  I haven't had any real-world unexpected failures to deal with,
yet, but all manual testing has had consistent and reliable results.



-paul


On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:

> Dear Hadoop Community --
>
> I am wondering if it is already possible or in the plans to add capability
> for multiple master nodes. I'm in a situation where I have a master node
> that may potentially be in a less than ideal execution and networking
> environment. For this reason, it's possible that the master node could die
> at any time. On the other hand, the application must always be available. I
> have accessible to me other machines but I'm still unclear on the best
> method to add reliability.
>
> Here are a few options that I'm exploring:
> a) To create a completely secondary Hadoop cluster that we can flip to when
> we detect that the master node has died. This will double hardware costs,
> so
> if we originally have a 5 node cluster, then we would need to pull 5 more
> machines out of somewhere for this decision. This is not the preferable
> choice.
> b) Just mirror the master node via other always available software, such as
> DRBD for real time synchronization. Upon detection we could swap to the
> alternate node.
> c) Or if Hadoop had some functionality already in place, it would be
> fantastic to be able to take advantage of that. I don't know if anything
> like this is available but I could not find anything as of yet. It seems to
> me, however, that having multiple master nodes would be the direction
> Hadoop
> needs to go if it is to be useful in high availability applications. I was
> told there are some papers on Amazon's Elastic Computing that I'm about to
> look for that follow this approach.
>
> In any case, could someone with experience in solving this type of problem
> share how they approached this issue?
>
> Thanks!
>

Reply via email to