Good question!
I think we are at the point with HDFS when HA issues should top the list of 
priorities.
There is a plan for the nearest future to turn the secondary name-node into a
"warn" standby. Warm here means that the secondary keeps the namespace in synch 
with
the primary node, and when the primary dies the secondary will only need to 
have the
data-nodes switch to report to it. So you do not need to start a new node as in
cold standby but the new name-node is not available for service right away as in
the hot variant.
In the mean time Paul's solution is the only choice I know of.
Paul if you can, please share your experience.
Thanks,
--Konstantin

Ryan Shih wrote:
Thanks Paul. Sounds like that's the way to go then. We're just starting to
experiment a bit with DRBD so we'll give that a shot and see how it works
out.

On Tue, Jul 29, 2008 at 11:56 AM, paul <[EMAIL PROTECTED]> wrote:

I'm currently running with your option B setup and it seems to be reliable
for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
scripts that handle the failover process, including a virtual IP for the
namenode.  I haven't had any real-world unexpected failures to deal with,
yet, but all manual testing has had consistent and reliable results.



-paul


On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:

Dear Hadoop Community --

I am wondering if it is already possible or in the plans to add
capability
for multiple master nodes. I'm in a situation where I have a master node
that may potentially be in a less than ideal execution and networking
environment. For this reason, it's possible that the master node could
die
at any time. On the other hand, the application must always be available.
I
have accessible to me other machines but I'm still unclear on the best
method to add reliability.

Here are a few options that I'm exploring:
a) To create a completely secondary Hadoop cluster that we can flip to
when
we detect that the master node has died. This will double hardware costs,
so
if we originally have a 5 node cluster, then we would need to pull 5 more
machines out of somewhere for this decision. This is not the preferable
choice.
b) Just mirror the master node via other always available software, such
as
DRBD for real time synchronization. Upon detection we could swap to the
alternate node.
c) Or if Hadoop had some functionality already in place, it would be
fantastic to be able to take advantage of that. I don't know if anything
like this is available but I could not find anything as of yet. It seems
to
me, however, that having multiple master nodes would be the direction
Hadoop
needs to go if it is to be useful in high availability applications. I
was
told there are some papers on Amazon's Elastic Computing that I'm about
to
look for that follow this approach.

In any case, could someone with experience in solving this type of
problem
share how they approached this issue?

Thanks!


Reply via email to