date:20180525

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Andrei Borzenkov

25.05.2018 14:44, Klaus Wenninger пишет:
> On 05/25/2018 12:44 PM, Andrei Borzenkov wrote:
>> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger  
>> wrote:
>>> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
 Hi,

 I am checking the watchdog function of SBD (without shared block-device).
 In a two-node cluster, if one cluster is stopped, watchdog is triggered on 
 the remaining node.
 Is this the designed behavior?
>>> SBD without a shared block-device doesn't really make sense on
>>> a two-node cluster.
>>> The basic idea is - e.g. in a case of a networking problem -
>>> that a cluster splits up in a quorate and a non-quorate partition.
>>> The quorate partition stays over while SBD guarantees a
>>> reliable watchdog-based self-fencing of the non-quorate partition
>>> within a defined timeout.
>> Does it require no-quorum-policy=suicide or it decides completely
>> independently? I.e. would it fire also with no-quorum-policy=ignore?
> 
> Finally it will in any case. But no-quorum-policy decides how
> long this will take. In case of suicide the inquisitor will immediately
> stop tickling the watchdog. In all other cases the pacemaker-servant
> will stop pinging the inquisitor which will makes the servant
> timeout after a default of 4 seconds and then the inquisitor will
> stop tickling the watchdog.
> But that is just relevant if Corosync doesn't have 2-node enabled.
> See the comment below for that case.
> 
>>
>>> This idea of course doesn't work with just 2 nodes.
>>> Taking quorum info from the 2-node feature of corosync (automatically
>>> switching on wait-for-all) doesn't help in this case but instead
>>> would lead to split-brain.
>> So what you are saying is that SBD ignores quorum information from
>> corosync and takes its own decisions based on pure count of nodes. Do
>> I understand it correctly?
> 
> Yes, but that is just true for this case where Corosync has 2-node
> enabled.
> > In all other cases (might it be clusters with more than 2 nodes
> or clusters with just 2 nodes but without 2-node enabled in
> Corosync) pacemaker-servant takes quorum-info from
> pacemaker, which will probably come directly from Corosync
> nowadays.
> But as said if 2-node is configured with Corosync everything
> is different: The node-counting is then actually done
> by the cluster-servant and this one will stop pinging the
> inquisitor (instead of the pacemaker-servant) if it doesn't
> count more than 1 node.
> 

Is it conditional on having no shared device or it just checks two_node
value? If it always behaves this way, even with real shared device
present, it means sbd is fundamentally incompatible with two_node and it
better be mentioned in documentation.

> That all said I've just realized that setting 2-node in Corosync
> shouldn't really be dangerous anymore although it doesn't make
> the cluster especially useful either in case of SBD without disk(s).
> 
> Regards,
> Klaus
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Why would a standby node be fenced? (was: How to set up fencing/stonith)

2018-05-25 Thread Casey & Gina

> On May 25, 2018, at 7:01 AM, Casey Allen Shobe  
> wrote:
> 
>> Actually, why is Pacemaker fencing the standby node just because a resource 
>> fails to start there?  I thought only the master should be fenced if it were 
>> assumed to be broken.

This is probably the most important thing to ask outside of the PAF resource 
agent which many may not be as fluent with as pacemaker itself, and perhaps the 
most indicative of me setting something up incorrectly outside of that resource 
agent.

My understanding of fencing was that pacemaker would only fence a node if it 
was the master but had stopped responding, to avoid a split-brain situation.  
Why would pacemaker ever fence a standby node with no resources currently 
allocated to it?

Regards,
-- 
Casey
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

2018-05-25 Thread Casey Allen Shobe

Any advice about how to fix this?  I've been struggling to get things working 
for weeks now and I think this is the final stumbling block I need to figure 
out.

On May 23, 2018, at 2:22 PM, Casey & Gina  wrote:

>>> So now my concern is this - our VM's are distributed across 32 hosts.  One 
>>> condition we were hoping to handle was when one of those host machines 
>>> fails, due to bad memory or something else, as it is likely that not all of 
>>> the nodes within a cluster are residing on the same VM host (there may even 
>>> be some way to configure them to stay on separate hosts in ESX).  In this 
>>> case, a reset command will fail as well, I'd assume.  I had thought that 
>>> when the resource was fenced, it was done with an 'off' command, and that 
>>> the resources would be brought up on a standby node.  Is there a way to 
>>> make this work?
>> 
>> Configure your stonith agent to use "off" instead of "reset".
> 
> I tried a setup with RESETPOWERON="1" for the external/vcenter stonith 
> plugin.  It does seem to work better, but I end up with a node that can't 
> rejoin the cluster without being immediately rebooted, due to the PostgreSQL 
> resource failing.
> 
> I have pcsd set to auto-start at boot, but not pacemaker or corosync.  After 
> I power off the node in vSphere, the node is fenced and then powered back on. 
>  I see it show up in `pcs status` with PCSD Status of Online after a few 
> seconds but shown as OFFLINE in the list of nodes on top since pacemaker and 
> corosync are not running.  If I then do a `pcs cluster start` on the rebooted 
> node, it is again restarted.  So I cannot get it to rejoin the cluster.
> 
> The corosync log from another node in the cluster (pasted below) indicates 
> that PostgreSQL fails to start after pacemaker/corosync are restarted (on 
> d-gp2-dbpg0-1 in this case), but it does not seem to give any reason as to 
> why.  When I look on the failed node, I see that the PostgreSQL log is not 
> being appended, so it doesn't seem it's ever actually trying to start it.  
> I'm not sure where else I could try looking.
> 
> Strangely, if prior to running `pcs cluster start` on the rebooted node, I 
> sudo to postgres, copy the recovery.conf template to the data directory, and 
> use pg_ctl to start the database, it comes up just fine in standby mode.  
> Then if I do `pcs cluster start`, the node rejoins the cluster just fine 
> without any problem.
> 
> Can you tell me why pacemaker is failing to start PostgreSQL in standby mode 
> based on the log data below, or how I can dig deeper into what is going on?  
> Is this due to some misconfiguration on my part?  I thought that PAF would 
> try to do exactly what I do manually, but it doesn't seem this is the case...
> 
> Actually, why is Pacemaker fencing the standby node just because the resource 
> fails to start there?  I thought only the master should be fenced if it were 
> assumed to be broken.
> 
> Thank you for any help you can provide,
> -- 
> Casey
> 
> 
> --
> [2157] d-gp2-dbpg0-2 corosyncnotice  [TOTEM ] A new membership 
> (10.124.164.63:392) was formed. Members joined: 1
> May 22 23:57:19 [2189] d-gp2-dbpg0-2 pacemakerd: info: 
> pcmk_quorum_notification:Membership 392: quorum retained (3)
> May 22 23:57:19 [2197] d-gp2-dbpg0-2   crmd: info: 
> pcmk_quorum_notification:Membership 392: quorum retained (3)
> May 22 23:57:19 [2189] d-gp2-dbpg0-2 pacemakerd:   notice: 
> crm_update_peer_state_iter:  pcmk_quorum_notification: Node d-gp2-dbpg0-1[1] 
> - state is now member (was lost)
> May 22 23:57:19 [2197] d-gp2-dbpg0-2   crmd:   notice: 
> crm_update_peer_state_iter:  pcmk_quorum_notification: Node d-gp2-dbpg0-1[1] 
> - state is now member (was lost)
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: 
> cib_process_request: Forwarding cib_modify operation for section status to 
> master (origin=local/crmd/268)
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: cib_perform_op:
>   Diff: --- 0.35.51 2
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: cib_perform_op:
>   Diff: +++ 0.35.52 (null)
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: cib_perform_op:
>   +  /cib:  @num_updates=52
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: cib_perform_op:
>   +  /cib/status/node_state[@id='1']:  @crm-debug-origin=peer_update_callback
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: 
> cib_process_request: Completed cib_modify operation for section status: OK 
> (rc=0, origin=d-gp2-dbpg0-2/crmd/268, version=0.35.52)
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: 
> cib_process_request: Forwarding cib_modify operation for section nodes to 
> master (origin=local/crmd/272)
> May 22 23:57:19 [2192] d-gp2-dbpg0-2cib: info: 
> cib_process_request: Forwarding cib_modify operation for section status to 
> master (origin=local/crmd/273)
> May 22 23:57:19

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Klaus Wenninger

On 05/25/2018 12:44 PM, Andrei Borzenkov wrote:
> On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger  wrote:
>> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
>>> Hi,
>>>
>>> I am checking the watchdog function of SBD (without shared block-device).
>>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on 
>>> the remaining node.
>>> Is this the designed behavior?
>> SBD without a shared block-device doesn't really make sense on
>> a two-node cluster.
>> The basic idea is - e.g. in a case of a networking problem -
>> that a cluster splits up in a quorate and a non-quorate partition.
>> The quorate partition stays over while SBD guarantees a
>> reliable watchdog-based self-fencing of the non-quorate partition
>> within a defined timeout.
> Does it require no-quorum-policy=suicide or it decides completely
> independently? I.e. would it fire also with no-quorum-policy=ignore?

Finally it will in any case. But no-quorum-policy decides how
long this will take. In case of suicide the inquisitor will immediately
stop tickling the watchdog. In all other cases the pacemaker-servant
will stop pinging the inquisitor which will makes the servant
timeout after a default of 4 seconds and then the inquisitor will
stop tickling the watchdog.
But that is just relevant if Corosync doesn't have 2-node enabled.
See the comment below for that case.

>
>> This idea of course doesn't work with just 2 nodes.
>> Taking quorum info from the 2-node feature of corosync (automatically
>> switching on wait-for-all) doesn't help in this case but instead
>> would lead to split-brain.
> So what you are saying is that SBD ignores quorum information from
> corosync and takes its own decisions based on pure count of nodes. Do
> I understand it correctly?

Yes, but that is just true for this case where Corosync has 2-node
enabled.

In all other cases (might it be clusters with more than 2 nodes
or clusters with just 2 nodes but without 2-node enabled in
Corosync) pacemaker-servant takes quorum-info from
pacemaker, which will probably come directly from Corosync
nowadays.
But as said if 2-node is configured with Corosync everything
is different: The node-counting is then actually done
by the cluster-servant and this one will stop pinging the
inquisitor (instead of the pacemaker-servant) if it doesn't
count more than 1 node.

That all said I've just realized that setting 2-node in Corosync
shouldn't really be dangerous anymore although it doesn't make
the cluster especially useful either in case of SBD without disk(s).

Regards,
Klaus
>
>> What you can do - and what e.g. pcs does automatically - is enable
>> the auto-tie-breaker instead of two-node in corosync. But that
>> still doesn't give you a higher availability than the one of the
>> winner of auto-tie-breaker. (Maybe interesting if you are going
>> for a load-balancing-scenario that doesn't affect availability or
>> for a transient state while setting up a cluste node-by-node ...)
>> What you can do though is using qdevice to still have 'real-quorum'
>> info with just 2 full cluster-nodes.
>>
>> There was quite a lot of discussion round this topic on this
>> thread previously if you search the history.
>>
>> Regards,
>> Klaus
>>
>>> [vmrh75b]# cat /etc/corosync/corosync.conf
>>> (snip)
>>> quorum {
>>> provider: corosync_votequorum
>>> two_node: 1
>>> }
>>>
>>> [vmrh75b]# cat /etc/sysconfig/sbd
>>> # This file has been generated by pcs.
>>> SBD_DELAY_START=no
>>> ## SBD_DEVICE="/dev/vdb1"
>>> SBD_OPTS="-vvv"
>>> SBD_PACEMAKER=yes
>>> SBD_STARTMODE=always
>>> SBD_WATCHDOG_DEV=/dev/watchdog
>>> SBD_WATCHDOG_TIMEOUT=5
>>>
>>> [vmrh75b]# crm_mon -r1
>>> Stack: corosync
>>> Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with 
>>> quorum
>>> Last updated: Fri May 25 13:36:07 2018
>>> Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a
>>>
>>> 2 nodes configured
>>> 0 resources configured
>>>
>>> Online: [ vmrh75a vmrh75b ]
>>>
>>> No resources
>>>
>>> [vmrh75b]# pcs property show
>>> Cluster Properties:
>>>  cluster-infrastructure: corosync
>>>  cluster-name: my_cluster
>>>  dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4
>>>  have-watchdog: true
>>>  stonith-enabled: false
>>>
>>> [vmrh75b]# ps -ef | egrep "sbd|coro|pace"
>>> root  2169 1  0 13:34 ?00:00:00 sbd: inquisitor
>>> root  2170  2169  0 13:34 ?00:00:00 sbd: watcher: Pacemaker
>>> root  2171  2169  0 13:34 ?00:00:00 sbd: watcher: Cluster
>>> root  2172 1  0 13:34 ?00:00:00 corosync
>>> root  2179 1  0 13:34 ?00:00:00 /usr/sbin/pacemakerd -f
>>> haclust+  2180  2179  0 13:34 ?00:00:00 
>>> /usr/libexec/pacemaker/pacemaker-based
>>> root  2181  2179  0 13:34 ?00:00:00 
>>> /usr/libexec/pacemaker/pacemaker-fenced
>>> root  2182  2179  0 13:34 ?00:00:00 
>>> /usr/libexec/pacemaker/pacemaker-execd
>>> haclust+  2183  2179  0 13:34 ?00:00:00 
>>>

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Andrei Borzenkov

On Fri, May 25, 2018 at 10:08 AM, Klaus Wenninger  wrote:
> On 05/25/2018 07:31 AM, 井上 和徳 wrote:
>> Hi,
>>
>> I am checking the watchdog function of SBD (without shared block-device).
>> In a two-node cluster, if one cluster is stopped, watchdog is triggered on 
>> the remaining node.
>> Is this the designed behavior?
>
> SBD without a shared block-device doesn't really make sense on
> a two-node cluster.
> The basic idea is - e.g. in a case of a networking problem -
> that a cluster splits up in a quorate and a non-quorate partition.
> The quorate partition stays over while SBD guarantees a
> reliable watchdog-based self-fencing of the non-quorate partition
> within a defined timeout.

Does it require no-quorum-policy=suicide or it decides completely
independently? I.e. would it fire also with no-quorum-policy=ignore?

> This idea of course doesn't work with just 2 nodes.
> Taking quorum info from the 2-node feature of corosync (automatically
> switching on wait-for-all) doesn't help in this case but instead
> would lead to split-brain.

So what you are saying is that SBD ignores quorum information from
corosync and takes its own decisions based on pure count of nodes. Do
I understand it correctly?

> What you can do - and what e.g. pcs does automatically - is enable
> the auto-tie-breaker instead of two-node in corosync. But that
> still doesn't give you a higher availability than the one of the
> winner of auto-tie-breaker. (Maybe interesting if you are going
> for a load-balancing-scenario that doesn't affect availability or
> for a transient state while setting up a cluste node-by-node ...)
> What you can do though is using qdevice to still have 'real-quorum'
> info with just 2 full cluster-nodes.
>
> There was quite a lot of discussion round this topic on this
> thread previously if you search the history.
>
> Regards,
> Klaus
>
>>
>> [vmrh75b]# cat /etc/corosync/corosync.conf
>> (snip)
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> }
>>
>> [vmrh75b]# cat /etc/sysconfig/sbd
>> # This file has been generated by pcs.
>> SBD_DELAY_START=no
>> ## SBD_DEVICE="/dev/vdb1"
>> SBD_OPTS="-vvv"
>> SBD_PACEMAKER=yes
>> SBD_STARTMODE=always
>> SBD_WATCHDOG_DEV=/dev/watchdog
>> SBD_WATCHDOG_TIMEOUT=5
>>
>> [vmrh75b]# crm_mon -r1
>> Stack: corosync
>> Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with 
>> quorum
>> Last updated: Fri May 25 13:36:07 2018
>> Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a
>>
>> 2 nodes configured
>> 0 resources configured
>>
>> Online: [ vmrh75a vmrh75b ]
>>
>> No resources
>>
>> [vmrh75b]# pcs property show
>> Cluster Properties:
>>  cluster-infrastructure: corosync
>>  cluster-name: my_cluster
>>  dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4
>>  have-watchdog: true
>>  stonith-enabled: false
>>
>> [vmrh75b]# ps -ef | egrep "sbd|coro|pace"
>> root  2169 1  0 13:34 ?00:00:00 sbd: inquisitor
>> root  2170  2169  0 13:34 ?00:00:00 sbd: watcher: Pacemaker
>> root  2171  2169  0 13:34 ?00:00:00 sbd: watcher: Cluster
>> root  2172 1  0 13:34 ?00:00:00 corosync
>> root  2179 1  0 13:34 ?00:00:00 /usr/sbin/pacemakerd -f
>> haclust+  2180  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-based
>> root  2181  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-fenced
>> root  2182  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-execd
>> haclust+  2183  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-attrd
>> haclust+  2184  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-schedulerd
>> haclust+  2185  2179  0 13:34 ?00:00:00 
>> /usr/libexec/pacemaker/pacemaker-controld
>>
>> [vmrh75b]# pcs cluster stop vmrh75a
>> vmrh75a: Stopping Cluster (pacemaker)...
>> vmrh75a: Stopping Cluster (corosync)...
>>
>> [vmrh75b]# tail -F /var/log/messages
>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC 
>> (vmrh75a) is dead
>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition 
>> S_NOT_DC -> S_ELECTION
>> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition 
>> S_ELECTION -> S_INTEGRATION
>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is 
>> now lost
>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a 
>> attributes for peer loss
>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer 
>> vmrh75a
>> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with 
>> id=1 and/or uname=vmrh75a from the membership cache
>> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state 
>> is now lost
>> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with 
>> id=1 and/or uname=vmrh75a from the membership cache
>> May 25 13:37:00 vmrh75b

Re: [ClusterLabs] Questions about SBD behavior

2018-05-25 Thread Klaus Wenninger

On 05/25/2018 07:31 AM, 井上 和徳 wrote:
> Hi,
>
> I am checking the watchdog function of SBD (without shared block-device).
> In a two-node cluster, if one cluster is stopped, watchdog is triggered on 
> the remaining node.
> Is this the designed behavior?

SBD without a shared block-device doesn't really make sense on
a two-node cluster.
The basic idea is - e.g. in a case of a networking problem -
that a cluster splits up in a quorate and a non-quorate partition.
The quorate partition stays over while SBD guarantees a
reliable watchdog-based self-fencing of the non-quorate partition
within a defined timeout.
This idea of course doesn't work with just 2 nodes.
Taking quorum info from the 2-node feature of corosync (automatically
switching on wait-for-all) doesn't help in this case but instead
would lead to split-brain.
What you can do - and what e.g. pcs does automatically - is enable
the auto-tie-breaker instead of two-node in corosync. But that
still doesn't give you a higher availability than the one of the
winner of auto-tie-breaker. (Maybe interesting if you are going
for a load-balancing-scenario that doesn't affect availability or
for a transient state while setting up a cluste node-by-node ...)
What you can do though is using qdevice to still have 'real-quorum'
info with just 2 full cluster-nodes.

There was quite a lot of discussion round this topic on this
thread previously if you search the history.

Regards,
Klaus

>
> [vmrh75b]# cat /etc/corosync/corosync.conf
> (snip)
> quorum {
> provider: corosync_votequorum
> two_node: 1
> }
>
> [vmrh75b]# cat /etc/sysconfig/sbd
> # This file has been generated by pcs.
> SBD_DELAY_START=no
> ## SBD_DEVICE="/dev/vdb1"
> SBD_OPTS="-vvv"
> SBD_PACEMAKER=yes
> SBD_STARTMODE=always
> SBD_WATCHDOG_DEV=/dev/watchdog
> SBD_WATCHDOG_TIMEOUT=5
>
> [vmrh75b]# crm_mon -r1
> Stack: corosync
> Current DC: vmrh75a (version 2.0.0-0.1.rc4.el7-2.0.0-rc4) - partition with 
> quorum
> Last updated: Fri May 25 13:36:07 2018
> Last change: Fri May 25 13:35:22 2018 by root via cibadmin on vmrh75a
>
> 2 nodes configured
> 0 resources configured
>
> Online: [ vmrh75a vmrh75b ]
>
> No resources
>
> [vmrh75b]# pcs property show
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-name: my_cluster
>  dc-version: 2.0.0-0.1.rc4.el7-2.0.0-rc4
>  have-watchdog: true
>  stonith-enabled: false
>
> [vmrh75b]# ps -ef | egrep "sbd|coro|pace"
> root  2169 1  0 13:34 ?00:00:00 sbd: inquisitor
> root  2170  2169  0 13:34 ?00:00:00 sbd: watcher: Pacemaker
> root  2171  2169  0 13:34 ?00:00:00 sbd: watcher: Cluster
> root  2172 1  0 13:34 ?00:00:00 corosync
> root  2179 1  0 13:34 ?00:00:00 /usr/sbin/pacemakerd -f
> haclust+  2180  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-based
> root  2181  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-fenced
> root  2182  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-execd
> haclust+  2183  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-attrd
> haclust+  2184  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-schedulerd
> haclust+  2185  2179  0 13:34 ?00:00:00 
> /usr/libexec/pacemaker/pacemaker-controld
>
> [vmrh75b]# pcs cluster stop vmrh75a
> vmrh75a: Stopping Cluster (pacemaker)...
> vmrh75a: Stopping Cluster (corosync)...
>
> [vmrh75b]# tail -F /var/log/messages
> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: Our peer on the DC 
> (vmrh75a) is dead
> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition 
> S_NOT_DC -> S_ELECTION
> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: notice: State transition 
> S_ELECTION -> S_INTEGRATION
> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Node vmrh75a state is 
> now lost
> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Removing all vmrh75a 
> attributes for peer loss
> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Lost attribute writer 
> vmrh75a
> May 25 13:37:00 vmrh75b pacemaker-attrd[2183]: notice: Purged 1 peer with 
> id=1 and/or uname=vmrh75a from the membership cache
> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Node vmrh75a state is 
> now lost
> May 25 13:37:00 vmrh75b pacemaker-fenced[2181]: notice: Purged 1 peer with 
> id=1 and/or uname=vmrh75a from the membership cache
> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Node vmrh75a state is 
> now lost
> May 25 13:37:00 vmrh75b pacemaker-based[2180]: notice: Purged 1 peer with 
> id=1 and/or uname=vmrh75a from the membership cache
> May 25 13:37:00 vmrh75b pacemaker-controld[2185]: warning: Input 
> I_ELECTION_DC received in state S_INTEGRATION from do_election_check
> May 25 13:37:01 vmrh75b sbd[2171]:   cluster:  warning: set_servant_health: 
> Connected to corosync but requires both nodes present
> May 25 13:37:01 vmrh75b sbd[2171]:   cluster:  warning:

Re: [ClusterLabs] Questions about SBD behavior

[ClusterLabs] Why would a standby node be fenced? (was: How to set up fencing/stonith)

Re: [ClusterLabs] PAF not starting resource successfully after node reboot (was: How to set up fencing/stonith)

Re: [ClusterLabs] Questions about SBD behavior

Re: [ClusterLabs] Questions about SBD behavior

Re: [ClusterLabs] Questions about SBD behavior

6 matches

Site Navigation

Mail list logo

Footer information