On 20/03/17 12:22 PM, Alexander Markov wrote:
> Hello guys,
> it looks like I miss something obvious, but I just don't get what has
> happened.
> I've got a number of stonith-enabled clusters within my big POWER boxes.
> My stonith devices are two HMC (hardware management consoles) - separate
> servers from IBM that can reboot separate LPARs (logical partitions)
> within POWER boxes - one per every datacenter.
> So my definition for stonith devices was pretty straightforward:
> primitive st_dc2_hmc stonith:ibmhmc \
> params ipaddr=
> primitive st_dc1_hmc stonith:ibmhmc \
> params ipaddr=
> clone cl_st_dc2_hmc st_dc2_hmc
> clone cl_st_dc1_hmc st_dc1_hmc
> Everything was ok when we tested failover. But today upon power outage
> we lost one DC completely. Shortly after that cluster just literally
> hanged itself upong trying to reboot nonexistent node. No failover
> occured. Nonexistent node was marked OFFLINE UNCLEAN and resources were
> marked "Started UNCLEAN" on nonexistent node.
> UNCLEAN seems to flag a problems with stonith configuration. So my
> question is: how to avoid such behaviour?
> Thank you!

Please share your config along with the logs from the nodes that were



Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

Users mailing list: Users@clusterlabs.org

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to