Dear Hadoop Community --

I am wondering if it is already possible or in the plans to add capability
for multiple master nodes. I'm in a situation where I have a master node
that may potentially be in a less than ideal execution and networking
environment. For this reason, it's possible that the master node could die
at any time. On the other hand, the application must always be available. I
have accessible to me other machines but I'm still unclear on the best
method to add reliability.

Here are a few options that I'm exploring:
a) To create a completely secondary Hadoop cluster that we can flip to when
we detect that the master node has died. This will double hardware costs, so
if we originally have a 5 node cluster, then we would need to pull 5 more
machines out of somewhere for this decision. This is not the preferable
choice.
b) Just mirror the master node via other always available software, such as
DRBD for real time synchronization. Upon detection we could swap to the
alternate node.
c) Or if Hadoop had some functionality already in place, it would be
fantastic to be able to take advantage of that. I don't know if anything
like this is available but I could not find anything as of yet. It seems to
me, however, that having multiple master nodes would be the direction Hadoop
needs to go if it is to be useful in high availability applications. I was
told there are some papers on Amazon's Elastic Computing that I'm about to
look for that follow this approach.

In any case, could someone with experience in solving this type of problem
share how they approached this issue?

Thanks!

Reply via email to