> Gotcha - I thought the long term goal for the BN was to eventually have it
> work as a "warm standby" that could convert into a NN without restart.

This is exactly the goal (long term). To evolve BN into StandbyNode,
which will be able to take over when main NN dies without restarting anything 
else.
And the only remaining step is to implement fail-over mechanism.

--Konstantin

Todd Lipcon wrote:
On Wed, Aug 12, 2009 at 12:06 PM, Konstantin Shvachko <s...@yahoo-inc.com>wrote:

Stas,

There is no HA solution currently for Hadoop.
You can do things like Cloudera describes.
Their solution works with 2 real name-nodes.
No Backup node involved.

As for Backup node, I don't really understand Todd's comment
but the fact is that Backup node (BN) is not a standby
node. The failover procedure is not implemented for BN,
so neither clients nor data-node don't fail-over anywhere
when the main name-node (NN) dies, they don't have a clue.


Gotcha - I thought the long term goal for the BN was to eventually have it
work as a "warm standby" that could convert into a NN without restart.

My mistake

-Todd


The purpose of the BN is
1) to keep an up-to-date image of the namespace in memory.
This does not include block locations.
BN does not know where file blocks are.
2) to make periodic checkpoints, like SecondaryNameNode did,
but more efficiently, since BN does not need to load image
and edits from NN, its namespace is already up-to-date.

There is provision to transform BN to a real standby node,
with failover, but it has not been implemented yet.

Hope this clarifies things.

Thanks,
--Konstantin



Todd Lipcon wrote:

On Wed, Aug 12, 2009 at 3:42 AM, Stas Oskin <stas.os...@gmail.com> wrote:

 Hi.

 You can also use a utility like Linux-HA (aka heartbeat) to handle IP
address failover. It will even send gratuitous ARPs to make sure to get

the

new mac address registered after a failover. Check out this blog for
info
about a setup like this:

http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/

Hope that helps

 Thanks, exactly what I looked for :).
 I presume that with the coming BB node, there won't be need for DRBD, am
I
correct?


I haven't followed that development closely, but I believe that's the
case.
The BackupNode will stream the FSEditLog writes as they occur while
replaying them into its own FSNamesystem. Then during a failover a real
NameNode starts on that FSNamesystem "ready to go". As for how the
BackupNode keeps track of block locations, I'm not sure - is there a
replication stream between BlockManagers too? Or is the cluster in a
broken
state until all of the DNs have processed new block reports?

-Todd



Reply via email to