On 12/05/2011 12:21 PM, Muhammad Sharfuddin wrote:
> 
> On Sun, 2011-12-04 at 23:47 +0100, Andreas Kurz wrote:
>> Hello,
>>
>> On 12/04/2011 09:29 PM, Muhammad Sharfuddin wrote:
>>> This cluster reboots(fenced) both nodes, if I disconnects network of any
>>> nodes(simulating network failure).
>>
>> Completely loss of network is indistinguishable for a cluster node to a
>> dead peer.
>>
>>>
>>> I want that if any node disconnects from network, resources running on
>>> that node should be moved/migrate 
>>> to the other node(network connected node)
>>
>> Use ping RA for connectivity checks and use location constraints to move
>> resources according to network connectivity (to external ping targets)
>>
> so in case of having a ping RA with appropriate location rule, does at
> least make sure that if any one node lose the network connectivity(i.e
> both nodes cant see each other, while only one node is disconnected from
> network), the other healthy node(network connected node) wont reboot ...
> is it what you said ? 

No ... in case of service network loss of one node, resources can move
to the other node if it has a better connectivity. For this to work, the
nodes still need an extra communication path.

>  
>>>
>>> How can I prevent this cluster to reboot(fence) the healthy node(i.e the
>>> node whose network is up/available/connected).
>>
>> Multiple-failure scenarios are challenging and possible solutions for a
>> cluster are limited. With enough effort by an administrator every
>> cluster can be "tested to death".
>>
>> You can only minimize the possibility of a split-brain:
>>  
>> * use redundant cluster communication paths (limited to two with corosync)
> in my test I disconnected every communication path of one node(and both
> rebooted)

Did you clone the sbd resource? If yes, don't do that. Start it as a
primitive, so in case of a split brain at least one node needs to start
the stonith resource which should give the other node an advantage ...
adding a start-delay should further increase that advantage.

> 
>> * at least one communication path is direct connected
> directly connected communication path and ping RA with location rule..
> will prevent the reboot of healthy node(network connected node)

No, don't use the other node as ping target ... that's ccm business ...
directly connected networks are simply less error-prone than switched
networks ... except for administrative interventions ;-)

> 
>> * use a quorum node
>>
> i.e I should add another node(quorum node) in this two node cluster.

Yes ... you can add a node in permanent standby mode or starting
corosync without pacemaker should also work fine.

> 
>> If you are using a network connected fencing device use this network
>> also for cluster communication.
>>
>> To prevent stonith death matches use power-off as stonith action or/and
>> don't start cluster services on system startup.
>>
> cluster does not start at system startup

fine

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


> 
>> Regards,
>> Andreas
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> --
> Regards,
> 
> Muhammad Sharfuddin
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to