Hi.

> * You need to have something " a resource manager" keeping an eye on the NN
> from somewhere. Needless to say, that needs to be fairly HA too.
>

> * your NN image has to be ready to go
>

> * when the deployed NA goes away, bring up a new machine with the same
> image, hostname *and IP Address*. You can't always pull the latter off, it
> depends on the infrastructure. Without that, you'd need to bring up all the
> nodes with DNS caching set to a short time and update a DNS entry.
>
> This isn't real HA, its recovery.
>

All this can be done with Heartbeat and Xen:

1) Heartbeat is P2P, so there is no SPOF here.
2) It possible to start running Xen VM machine on another node, in case the
other node has failed.

The only question left, is how to keep access to NN/SNN meta-data, in case
one of the NN VM fails.

Maybe by keeping NFS exports on both Dom0, and writing to them in parallel?

In this case if one of Dom0 fails, taking VM with it, the other will come
up, and read the NFS from it's own Dom0.

It less clean then having the meta-data inside the VM as well, but might
work, as DRBD won't be used here.

1) What do you think of this approach? Maybe there is better solution?

2) What happens if the NN crashes in middle of work - can it corrupt the
meta-data in any way, which would require manual restore of SNN checkpoint?
Meaning, the process will still require use intervention.

3) Any idea where is that list about HA, which was discussed last week?

Thanks again!

Reply via email to