date:20150909

[ClusterLabs] Antw: SDB msgwait & partner reboot time

2015-09-09 Thread Ulrich Windl

>>> Jorge Fábregas  schrieb am 08.09.2015 um 17:45
in
Nachricht <55ef029c.3000...@gmail.com>:
> Hi,
> 
> I've read about how important is the relationship between the different
> parameters of the SBD device (msgwait & watchdog timeout) & Pacemaker's
> stonith timeout.  However I've just encountered something that I never
> considered:  the time elapsed until a node is fully up (after being
> fenced) against msgwait.
> 
> Two nodes: sles11a & sles11b.  I fenced sles11a (via Hawk's interface
> that triggers the sbd resource agent) and watched carefully
> /var/log/messages on sles11b:
> 
> 
> Sept 8 11:27:00 sles11b  sbd: Writing reset to node slot sles11a
> Sept 8 11:27:00 sles11b  sbd: Messaging delay: 40
> 
> [sles11a is rebooting and it comes up in about 12 seconds]

Lucky you (for the fast reboot time), but you have a problem:
1) the msgwait has to be long enough to make (as close as possible to) 100%
sure that the node is down when the time has expired. Then the cluster will
perform recovery operationms for the down node. If the node is up earlier and
joined the cluster, things way be in somewhat disorder.
2) The msgwait has to be long enough to make sure the SBD commands are
delivered even if a disk needs some retries, or your storage system is slow
while being online (this could mean you do an "online" firmware upgrade where
the system won't respond for a few seconds).

May guess  woule be to increase the node boot time and to decreate the msgwait
to somethink like 30 seconds.

Usually you have SCSI timeouts around one minute. Also remember that parts of
the OS will retry I/O for some time before flagging an error to the
application.

> 
> [see a bunch of messages joining the cluster]
> 
> [finally node sles11a is online at about 11:27:25]
> 
> Sept 8 11:27:40 sles11b sbd: Message successfully delivered
> 
> [sles11a is put offline!]
> 
> Sept 8 11:27:41 pengine[4358]: warning: custom_action: Action
> p_stonith-sdb_monitor_0 on sles11a
>  is unrunnable (pending)

This is when the node is up and online, but fencing still isn't confirmed?

> 
> I've done it about 5 times and it happens every time.
> 
> My values are: 20 (watchdog timeout) & 40 (msgwait).  I know I
> know..it's too much for my lab environment but I'm just curious if
> there's something wrong or if indeed msgwait NEEDS to be ALWAYS less
> than reboot-time.

If you want to have an exciting configuration, you could try to get watchdog
timeout down to 5 seconds or so, and shorten the msgwait (and possibly other
dependign parameters). But make sure support accepts such short values.

BTW: We have a msgwait close to 3 minutes, allowing the storage to be not
responding for up to 60 seconds. The difference is a safety margin for possible
retries... Our physical hosts hardly boot in less than 4 minutes.

Regards,
Ulrich


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute

2015-09-09 Thread Kristoffer Grönlund

Hi,

Ken Gaillot  writes:

> Pacemaker's upstream master branch has a new feature that will be part
> of the eventual 1.1.14 release.
>
> Fencing topology is used when a node requires multiple fencing devices
> (in combination or as fallbacks). Currently, topologies must be
> specified by node name (or a regular expression matching node names).
>
> The new feature allows topologies to specified by node attribute.

Sounds like a really useful feature. :) I have implemented initial
support for this syntax in crmsh, so this will work fine in the next
version of crmsh.

Examples of crmsh syntax below:

> Previously, if node1 was in rack #1, you'd have to register a fencing
> topology by its name, which at the XML level would look like:
>
>
>  devices="apc01,apc02"/>
>
>

crm cfg fencing-topology node1: apc01,apc02

>
> With the new feature, you could instead register a topology for all
> hosts that have a node attribute "rack" whose value is "1":
>
>
>  devices="apc01,apc02"/>
>
>

crm cfg fencing-topology rack=1: apc01,apc02

>
> You would assign that attribute to all nodes in that rack, e.g.:
>
>crm_attribute --type nodes --node node1 --name rack --update 1
>

crm node attr node1 set rack 1

>
> The syntax accepts either '=' or ':' as the separator for the name/value
> pair, so target="rack:1" would work in the XML as well.

crm cfg fencing-topology rack:1: apc01,apc02

(admittedly perhaps not as clean as using '=', but it works)

Cheers,
Kristoffer

> -- 
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in 1.1.14: Fencing topology based on node attribute

2015-09-09 Thread Andrew Beekhof


> On 9 Sep 2015, at 7:45 pm, Kristoffer Grönlund  wrote:
> 
> Hi,
> 
> Ken Gaillot  writes:
> 
>> Pacemaker's upstream master branch has a new feature that will be part
>> of the eventual 1.1.14 release.
>> 
>> Fencing topology is used when a node requires multiple fencing devices
>> (in combination or as fallbacks). Currently, topologies must be
>> specified by node name (or a regular expression matching node names).
>> 
>> The new feature allows topologies to specified by node attribute.
> 
> Sounds like a really useful feature. :) I have implemented initial
> support for this syntax in crmsh,

word of warning, i’m in the process of changing it to avoid overloading the 
‘target’ attribute and exposing quoting issues stemming from people’s use of ‘='

   https://github.com/beekhof/pacemaker/commit/ea4fc1c



> so this will work fine in the next
> version of crmsh.
> 
> Examples of crmsh syntax below:
> 
>> Previously, if node1 was in rack #1, you'd have to register a fencing
>> topology by its name, which at the XML level would look like:
>> 
>>   
>>  >  devices="apc01,apc02"/>
>>   
>> 
> 
> crm cfg fencing-topology node1: apc01,apc02
> 
>> 
>> With the new feature, you could instead register a topology for all
>> hosts that have a node attribute "rack" whose value is "1":
>> 
>>   
>>  >  devices="apc01,apc02"/>
>>   
>> 
> 
> crm cfg fencing-topology rack=1: apc01,apc02
> 
>> 
>> You would assign that attribute to all nodes in that rack, e.g.:
>> 
>>   crm_attribute --type nodes --node node1 --name rack --update 1
>> 
> 
> crm node attr node1 set rack 1
> 
>> 
>> The syntax accepts either '=' or ':' as the separator for the name/value
>> pair, so target="rack:1" would work in the XML as well.
> 
> crm cfg fencing-topology rack:1: apc01,apc02
> 
> (admittedly perhaps not as clean as using '=', but it works)
> 
> Cheers,
> Kristoffer
> 
>> -- 
>> Ken Gaillot 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>> 
> 
> -- 
> // Kristoffer Grönlund
> // kgronl...@suse.com
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabs Developers] Problem with fence_virsh in RHEL 6 - selinux denial

2015-09-09 Thread Digimer

I've created an rhbz:

https://bugzilla.redhat.com/show_bug.cgi?id=1261711

digimer

On 08/09/15 11:04 PM, Digimer wrote:
> ere is my cluster.conf, in case it matters:
> 
> 
> [root@node1 ~]# cat /etc/cluster/cluster.conf
> 
> 
>   
>   
>   
>   
>   
>   
>port="an-a02n01" delay="15" action="reboot" />
>   
>   
>   
>   
>   
>   
>   
>port="an-a02n02" action="reboot" />
>   
>   
>   
>   
>   
>ipaddr="192.168.122.1" login="root" passwd="it's a secret" />
>   
>   
>   
>   
>   
>   
>name="wait-for-drbd"/>
>