On Saturday 27 September 2003 06:59, Tom Lane wrote: > Christopher Kings-Lynne <[EMAIL PROTECTED]> writes: > >> ... You can make this work, but the resource costs > >> are steep. > > > > So, after 'n' seconds of waiting, we abandon the slave and the slave > > abandons the master. > > [itch...] But you surely cannot guarantee that the slave and the master > time out at exactly the same femtosecond. What happens when the comm > link comes back online just when one has timed out and the other not? > (Hint: in either order, it ain't good. Double plus ungood if, say, the > comm link manages to deliver the master's "commit confirm" message a > little bit after the master has timed out and decided to abort after all.) > > In my book, timeout-based solutions to this kind of problem are certain > disasters.
I might be (well, am actually) a bit out of my depth here, but surely what happens is if you have machines A,B,C and *any* of them thinks machine C has a problem then it does. If C can still communicate with the others then it is told to reinitialise/go away/start the sirens. If C can't communicate then it's all a bit academic. Granted, if you have intermittent problems on a link and set your timeouts badly then you'll have a very brittle system, but if A thinks C has died, you can't just reverse that decision. -- Richard Huxton Archonet Ltd ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])