On 02/27/2011 07:58 PM, David Morton wrote: > I'm pretty sure the behavior outlined below is by design (and it does > make sense logically) but I am wondering if there are additional checks > that can be put in place to change the behavior. > > Situation: > - Two node cluster with IPMI STONITH configured > - Both servers running but with openais / pacemaker shutdown > - Start openais on one server only > - Server that starts executes a STONITH reset of the other node > > I imagine this is due to an indeterminate state / no comms between > nodes, the only way to move to a known state is then to bounce the other > node. Is this correct ? > > Is there any way to configure alternate means of confirming the openais > / pacemaker service is not started and avoid a hard reset on the 'other' > node ? ie: log in via ssh and enquire on service state, maybe even check > key resources etc ? > > Is the preferred method to always run openais / pacemaker on all nodes > and manipulate rules to determine where resources run ? typically i > would just shutdown openais to force all resources to one node or the > other to simplify config creation and testing etc.
I can't address openais or pacemaker directly, but in corosync/rhcs (similar foundation) there is an option called 'post_join_delay'. When set to -1, the node will never fire a fence (stonith), but instead the node will wait forever. Perhaps there is a similar option in openais/pacemaker? -- Digimer E-Mail: [email protected] AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
