On Thu, 2017-09-28 at 01:39 +0200, Adam Spiers wrote: > Hi all, > > When I do a > > pkill -9 -f pacemaker_remote > > to simulate failure of a remote node, sometimes I see things like: > > 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: No > ipc providers available for uid 0 gid 0 > 08:29:32 d52-54-00-da-4e-05 pacemaker_remoted[5806]: error: Error > in connection setup (5806-5805-15): Remote I/O error (121) > > ... and the node doesn't get fenced as expected. Other times it > does. > Is this my fault for using an invalid way of simulating failure, or > some kind of bug? > > Sadly I don't have the exact version of pacemaker_remoted to hand, > but > I can provide it tomorrow if necessary. It's not the latest release, > maybe not even the one immediately preceding it. > > Thanks! > Adam
Before fencing, the cluster will try re-establishing the connection. If you've got pacemaker_remote enabled via systemd, systemd may be respawning it quick enough that the cluster reconnect succeeds. Also, until a recent master branch commit, remote nodes would not get fenced if they were not running any resources. And of course, a fencing resource has to be configured for the remote node. If none of those things were the reason, there may be a bug -- a PE input file from the DC for that transition would be helpful. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org