On Sat, Nov 21, 2015 at 1:50 AM, Andrei Borzenkov <arvidj...@gmail.com> wrote: > 21.11.2015 03:38, Brian Campbell пишет: >> >> >> What I'm concerned about is the initial failure of crmd on master1 >> that led to master2 deciding to fence it, and then master2's failure >> to fence master1 and thus getting stuck and not being able to manage >> resources. It seems to have simply stopped doing anything, with no >> logs indicating why it did so. >> > > That's actually normal. If fencing is required but could not be performed > cluster is stuck - no further actions can be completed in this state. So the > root cause here seems to be unsuccessful fencing.
Yes, that part I expect. The problem I'm having is that there's no indication of why fencing was unnsuccessful, since we had previously tested fencing and it was working; in fact, we see fencing working later on in the logs, after someone manually reboots master1 it sees it as unclean and sucessfully fences it. So, the problem is that fencing failed to work without anything logged about why, so it's hard to figure out what needs to be fixed to make it more reliable in the future. -- Brian _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org