On 05/16/2016 12:22 PM, Dimitri Maziuk wrote: > On 05/13/2016 04:31 PM, Ken Gaillot wrote: > >> That is definitely not a properly functioning cluster. Something >> is going wrong at some level. > > Yeah, well... how do I find out what/where?
What happens after "pcs resource cleanup"? "pcs status" reports the time associated with each failure, so you can check whether you are seeing the same failure or a new one. The system log is usually the best starting point, as it will have messages from pacemaker, corosync and the resource agents. You can look around the time of the failure(s) to look for details or anything unusual. Pacemaker also has a detail log (by default, /var/log/pacemaker.log). In general, this is more useful to developers than administrators, but if the system log doesn't help, it can sometimes shed a little more light. > One question: in corosync.conf I have nodelist { node { ring0_addr: > node1_name nodeid: 1 } node { ring0_addr: node2_name nodeid: 2 } } > > Could 'pcs cluster stop/start' reset the interface that resolves > to nodeX_name? If so, that would answer why ssh connections get > killed. No, Pacemaker and pcs don't touch the interfaces (unless of course you explicitly add a cluster resource to do so, which wouldn't work anyway for the interface(s) that corosync itself needs to use). _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org