On 20/03/17 12:22 PM, Alexander Markov wrote: > Hello guys, > > it looks like I miss something obvious, but I just don't get what has > happened. > > I've got a number of stonith-enabled clusters within my big POWER boxes. > My stonith devices are two HMC (hardware management consoles) - separate > servers from IBM that can reboot separate LPARs (logical partitions) > within POWER boxes - one per every datacenter. > > So my definition for stonith devices was pretty straightforward: > > primitive st_dc2_hmc stonith:ibmhmc \ > params ipaddr=10.1.2.9 > primitive st_dc1_hmc stonith:ibmhmc \ > params ipaddr=10.1.2.8 > clone cl_st_dc2_hmc st_dc2_hmc > clone cl_st_dc1_hmc st_dc1_hmc > > Everything was ok when we tested failover. But today upon power outage > we lost one DC completely. Shortly after that cluster just literally > hanged itself upong trying to reboot nonexistent node. No failover > occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were > marked "Started UNCLEAN" on nonexistent node. > > UNCLEAN seems to flag a problems with stonith configuration. So my > question is: how to avoid such behaviour? > > Thank you!
Please share your config along with the logs from the nodes that were effected. cheers, digimer -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org