On Wednesday, 19 February 2014 at 00:16:03 UTC, Tolga Cakiroglu
wrote:
TL;DR the link though, how are they detecting that a CPU fails?
An information must be passes outside of CPU to do this. The
only solution comes to my mind is that main CPU changes a
variable on an external memory at every step, and back up CPU
checks it continuously to catch a failure immediately. But this
would require about 50% of CPU's power already.
While thinking about this kind of back up systems, knowing and
reading that some people are really doing is really great.
I'm assuming this has something to do with it:
https://en.wikipedia.org/wiki/Heartbeat_%28computing%29
In clustered servers, the active node sends a continuous signal
indicating it's still alive. This signal is referred to as a
heartbeat. There's a standby node waiting to take over should it
stop receiving this signal.