On 12/05/2011 12:21 PM, Muhammad Sharfuddin wrote: > > On Sun, 2011-12-04 at 23:47 +0100, Andreas Kurz wrote: >> Hello, >> >> On 12/04/2011 09:29 PM, Muhammad Sharfuddin wrote: >>> This cluster reboots(fenced) both nodes, if I disconnects network of any >>> nodes(simulating network failure). >> >> Completely loss of network is indistinguishable for a cluster node to a >> dead peer. >> >>> >>> I want that if any node disconnects from network, resources running on >>> that node should be moved/migrate >>> to the other node(network connected node) >> >> Use ping RA for connectivity checks and use location constraints to move >> resources according to network connectivity (to external ping targets) >> > so in case of having a ping RA with appropriate location rule, does at > least make sure that if any one node lose the network connectivity(i.e > both nodes cant see each other, while only one node is disconnected from > network), the other healthy node(network connected node) wont reboot ... > is it what you said ?
No ... in case of service network loss of one node, resources can move to the other node if it has a better connectivity. For this to work, the nodes still need an extra communication path. > >>> >>> How can I prevent this cluster to reboot(fence) the healthy node(i.e the >>> node whose network is up/available/connected). >> >> Multiple-failure scenarios are challenging and possible solutions for a >> cluster are limited. With enough effort by an administrator every >> cluster can be "tested to death". >> >> You can only minimize the possibility of a split-brain: >> >> * use redundant cluster communication paths (limited to two with corosync) > in my test I disconnected every communication path of one node(and both > rebooted) Did you clone the sbd resource? If yes, don't do that. Start it as a primitive, so in case of a split brain at least one node needs to start the stonith resource which should give the other node an advantage ... adding a start-delay should further increase that advantage. > >> * at least one communication path is direct connected > directly connected communication path and ping RA with location rule.. > will prevent the reboot of healthy node(network connected node) No, don't use the other node as ping target ... that's ccm business ... directly connected networks are simply less error-prone than switched networks ... except for administrative interventions ;-) > >> * use a quorum node >> > i.e I should add another node(quorum node) in this two node cluster. Yes ... you can add a node in permanent standby mode or starting corosync without pacemaker should also work fine. > >> If you are using a network connected fencing device use this network >> also for cluster communication. >> >> To prevent stonith death matches use power-off as stonith action or/and >> don't start cluster services on system startup. >> > cluster does not start at system startup fine Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > >> Regards, >> Andreas >> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > -- > Regards, > > Muhammad Sharfuddin > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems