[ClusterLabs] Antw: SDB msgwait & partner reboot time
>>> Jorge Fábregasschrieb am 08.09.2015 um 17:45 in Nachricht <55ef029c.3000...@gmail.com>: > Hi, > > I've read about how important is the relationship between the different > parameters of the SBD device (msgwait & watchdog timeout) & Pacemaker's > stonith timeout. However I've just encountered something that I never > considered: the time elapsed until a node is fully up (after being > fenced) against msgwait. > > Two nodes: sles11a & sles11b. I fenced sles11a (via Hawk's interface > that triggers the sbd resource agent) and watched carefully > /var/log/messages on sles11b: > > > Sept 8 11:27:00 sles11b sbd: Writing reset to node slot sles11a > Sept 8 11:27:00 sles11b sbd: Messaging delay: 40 > > [sles11a is rebooting and it comes up in about 12 seconds] Lucky you (for the fast reboot time), but you have a problem: 1) the msgwait has to be long enough to make (as close as possible to) 100% sure that the node is down when the time has expired. Then the cluster will perform recovery operationms for the down node. If the node is up earlier and joined the cluster, things way be in somewhat disorder. 2) The msgwait has to be long enough to make sure the SBD commands are delivered even if a disk needs some retries, or your storage system is slow while being online (this could mean you do an "online" firmware upgrade where the system won't respond for a few seconds). May guess woule be to increase the node boot time and to decreate the msgwait to somethink like 30 seconds. Usually you have SCSI timeouts around one minute. Also remember that parts of the OS will retry I/O for some time before flagging an error to the application. > > [see a bunch of messages joining the cluster] > > [finally node sles11a is online at about 11:27:25] > > Sept 8 11:27:40 sles11b sbd: Message successfully delivered > > [sles11a is put offline!] > > Sept 8 11:27:41 pengine[4358]: warning: custom_action: Action > p_stonith-sdb_monitor_0 on sles11a > is unrunnable (pending) This is when the node is up and online, but fencing still isn't confirmed? > > I've done it about 5 times and it happens every time. > > My values are: 20 (watchdog timeout) & 40 (msgwait). I know I > know..it's too much for my lab environment but I'm just curious if > there's something wrong or if indeed msgwait NEEDS to be ALWAYS less > than reboot-time. If you want to have an exciting configuration, you could try to get watchdog timeout down to 5 seconds or so, and shorten the msgwait (and possibly other dependign parameters). But make sure support accepts such short values. BTW: We have a msgwait close to 3 minutes, allowing the storage to be not responding for up to 60 seconds. The difference is a safety margin for possible retries... Our physical hosts hardly boot in less than 4 minutes. Regards, Ulrich ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute
Hi, Ken Gaillotwrites: > Pacemaker's upstream master branch has a new feature that will be part > of the eventual 1.1.14 release. > > Fencing topology is used when a node requires multiple fencing devices > (in combination or as fallbacks). Currently, topologies must be > specified by node name (or a regular expression matching node names). > > The new feature allows topologies to specified by node attribute. Sounds like a really useful feature. :) I have implemented initial support for this syntax in crmsh, so this will work fine in the next version of crmsh. Examples of crmsh syntax below: > Previously, if node1 was in rack #1, you'd have to register a fencing > topology by its name, which at the XML level would look like: > > > devices="apc01,apc02"/> > > crm cfg fencing-topology node1: apc01,apc02 > > With the new feature, you could instead register a topology for all > hosts that have a node attribute "rack" whose value is "1": > > > devices="apc01,apc02"/> > > crm cfg fencing-topology rack=1: apc01,apc02 > > You would assign that attribute to all nodes in that rack, e.g.: > >crm_attribute --type nodes --node node1 --name rack --update 1 > crm node attr node1 set rack 1 > > The syntax accepts either '=' or ':' as the separator for the name/value > pair, so target="rack:1" would work in the XML as well. crm cfg fencing-topology rack:1: apc01,apc02 (admittedly perhaps not as clean as using '=', but it works) Cheers, Kristoffer > -- > Ken Gaillot > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute
> On 9 Sep 2015, at 7:45 pm, Kristoffer Grönlundwrote: > > Hi, > > Ken Gaillot writes: > >> Pacemaker's upstream master branch has a new feature that will be part >> of the eventual 1.1.14 release. >> >> Fencing topology is used when a node requires multiple fencing devices >> (in combination or as fallbacks). Currently, topologies must be >> specified by node name (or a regular expression matching node names). >> >> The new feature allows topologies to specified by node attribute. > > Sounds like a really useful feature. :) I have implemented initial > support for this syntax in crmsh, word of warning, i’m in the process of changing it to avoid overloading the ‘target’ attribute and exposing quoting issues stemming from people’s use of ‘=' https://github.com/beekhof/pacemaker/commit/ea4fc1c > so this will work fine in the next > version of crmsh. > > Examples of crmsh syntax below: > >> Previously, if node1 was in rack #1, you'd have to register a fencing >> topology by its name, which at the XML level would look like: >> >> >> > devices="apc01,apc02"/> >> >> > > crm cfg fencing-topology node1: apc01,apc02 > >> >> With the new feature, you could instead register a topology for all >> hosts that have a node attribute "rack" whose value is "1": >> >> >> > devices="apc01,apc02"/> >> >> > > crm cfg fencing-topology rack=1: apc01,apc02 > >> >> You would assign that attribute to all nodes in that rack, e.g.: >> >> crm_attribute --type nodes --node node1 --name rack --update 1 >> > > crm node attr node1 set rack 1 > >> >> The syntax accepts either '=' or ':' as the separator for the name/value >> pair, so target="rack:1" would work in the XML as well. > > crm cfg fencing-topology rack:1: apc01,apc02 > > (admittedly perhaps not as clean as using '=', but it works) > > Cheers, > Kristoffer > >> -- >> Ken Gaillot >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] Problem with fence_virsh in RHEL 6 - selinux denial
I've created an rhbz: https://bugzilla.redhat.com/show_bug.cgi?id=1261711 digimer On 08/09/15 11:04 PM, Digimer wrote: > ere is my cluster.conf, in case it matters: > > > [root@node1 ~]# cat /etc/cluster/cluster.conf > > > > > > > > >port="an-a02n01" delay="15" action="reboot" /> > > > > > > > >port="an-a02n02" action="reboot" /> > > > > > >ipaddr="192.168.122.1" login="root" passwd="it's a secret" /> > > > > > > >name="wait-for-drbd"/> > >force_unmount="1" > fstype="gfs2" mountpoint="/shared" name="sharedfs" /> > > > > >ordered="0" > restricted="1"> > >> ordered="0" > restricted="1"> > >> ordered="1" > restricted="1"> > >priority="1"/> > priority="2"/> > ordered="1" > restricted="1"> > >priority="2"/> > priority="1"/> > exclusive="0" recovery="restart"> >