On Wed, 2018-05-02 at 02:52 +0000, 范国腾 wrote: > Hi, > The cluster has three nodes: one is master and two are slave. Now we > run “pcs cluster stop --all” to stop all of the nodes. Then we run > “pcs cluster start” in the master node. We find it not able to > started. The cause is that the stonith resource could not be started > so all of the other resource could not be started.
This is how quorum works. Only a cluster partition with quorum (at least 2 nodes in your case) can run resources or fence other nodes. That way, if there is a split when all nodes are live, the part of the split with the most nodes wins. We test this case in two cluster system and the result is same: > l If we start all of the three nodes, the stonith resource could be > started. If we stop one node after it starts, the stonith resource > could be migrated to another node and the cluster still work. > l If we start only one or only two nodes, the stonith resource could > not be started. If you start two nodes, they should fence the third, then proceed to run resources. However: > (1) We create the stonith resource using this method in one system: > pcs stonith create ipmi_node1 fence_ipmilan ipaddr="192.168.100.202" > login="ADMIN" passwd="ADMIN" pcmk_host_list="node1" > pcs stonith create ipmi_node2 fence_ipmilan ipaddr="192.168.100.203" > login="ADMIN" passwd="ADMIN" pcmk_host_list="node2" > pcs stonith create ipmi_node3 fence_ipmilan ipaddr="192.168.100.204" > login="ADMIN" passwd="ADMIN" pcmk_host_list="node3" IPMI fencing requires that the IPMI device be responding to requests. If the third node does not have power, the IPMI won't respond, so the fencing will fail, and the cluster will be unable to proceed. Perhaps that is what happened when you tried a two-node test? The customary way around this is to use either sbd or power fencing as a fallback when IPMI fails. > (2) We create the stonith resource using this method in another > system: > pcs stonith create scsi-stonith-device fence_scsi > devices=/dev/mapper/fence pcmk_monitor_action=metadata > pcmk_reboot_action=off pcmk_host_list="node1 node2 node3 node4" meta > provides=unfencing; It's better to set the stonith-action cluster property to off than pcmk_reboot_action. The reason is that the cluster may remap some reboots to off then on, in which case pcmk_reboot_action would not get used. pcmk_reboot_action is intended for when the fence agent has a reboot command by some other name. An exception is that stonith-action applies only to fencing initiated by the cluster. If some external software (e.g. stonith_admin) initiates a reboot explicitly, it will still be a reboot. > The log is in the attachment. > What prevents the stonith resource to be started if we only started > part of the nodes? -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org