date:20170724

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Prasad, Shashank

> I don't think that having a hook that bypass stonith is the right way….

 

The intention is NOT to bypass STONITH. STONITH shall always remain active, and 
an integral part of the cluster. The discussion is about bailing out of 
situations when the STONITH itself fails due to fencing agent failures, and how 
one can automate the process of bailing out.

 

All that the surviving nodes in the cluster need to be informed is that the 
failed node has indeed failed, and therefore the suggestion for a hook.

 

The hook (lets’ say: STONITH-Failure-Recovery-Hook) under discussion will only 
be fired when Fencing Agent fails. STONITH-Failure-Recovery-Hook is realized 
via a script. The "${CRM_alert_rsc}" , "${CRM_alert_task}",   
"${CRM_alert_desc}"  "${CRM_alert_node}" in the Pacemaker Alert can use used to 
match up with STONITH resource and its failures, and invoke the 
STONITH-Failure-Recovery-Hook as appropriate.

 

I also agree with Klaus that a quorum device is a good strategy.

That needs 3rd node in the cluster. If such an option can be exercised, it 
should be.

 

Thanx.

 

 

 

From: Tomer Azran [mailto:tomer.az...@edp.co.il] 
Sent: Tuesday, July 25, 2017 3:00 AM
To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source 
clustering welcomed; Prasad, Shashank
Subject: RE: [ClusterLabs] Two nodes cluster issue

 

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.

I think I will try to use an iScsi target on my qdevice and set SBD to use it.

I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.

 

From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; Prasad, Shashank 
Subject: Re: [ClusterLabs] Two nodes cluster issue

 

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:

Sometimes IPMI fence devices use shared power of the node, and it 
cannot be avoided.

In such scenarios the HA cluster is NOT able to handle the power 
failure of a node, since the power is shared with its own fence device.

The failure of IPMI based fencing can also exist due to other reasons 
also.

 

A failure to fence the failed node will cause cluster to be marked 
UNCLEAN.

To get over it, the following command needs to be invoked on the 
surviving node.

 

pcs stonith confirm  --force

 

This can be automated by hooking a recovery script, when the the 
Stonith resource ‘Timed Out’ event.

To be more specific, the Pacemaker Alerts can be used for watch for 
Stonith timeouts and failures.

In that script, all that’s essentially to be executed is the 
aforementioned command.


If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).



Since the alerts are issued from ‘hacluster’ login, sudo permissions 
for ‘hacluster’ needs to be configured.

 

Thanx.

 

 

From: Klaus Wenninger [mailto:kwenn...@redhat.com 
 ] 
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

 

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:

I personally think that power off node by switched pdu is more 
safe, or not?


True if that is working in you environment. If you can't do a physical 
setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.





S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz  

www.feldhost.cz   - FeldHost™ – profesionální 
hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446 

 

On 24 Jul 2017, at 17:27, Klaus Wenninger

Re: [ClusterLabs] why resources are restarted when a node rejoins a cluster?

2017-07-24 Thread Digimer

On 2017-07-24 11:04 PM, ztj wrote:

Hi all,
I have 2 Centos nodes with
heartbeat and pacemaker-1.1.13 installed, and almost
everything is working fine, I have only apache configured
for testing, when a node goes down the failover is done
correctly, but there's a problem when a node failbacks.

For example, let's say that
Node1 has the lead on apache resource, then I reboot
Node1, so Pacemaker detect it goes down, then apache is
promoted to the Node2 and it keeps there running fine,
that's fine, but when Node1 recovers and joins the cluster
again, apache is restarted on Node2 again.

Anyone knows, why resources
are restarted when a node rejoins a cluster? thanks

You sent this to the moderators, not the list.

Please don't use heartbeat, it is extremely deprecated. Please
switch to corosync.

To offer any other advice, you need to share your config and the
logs from both nodes. Please respond to the list, not
developers-ow...@clusterlabs.org.

digimer
--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

43 matches

Mail list logo