[ClusterLabs] Antw: Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Ulrich Windl
Hi!

On the tool changes, I'd prefer --move and --un-move as pair over --move and 
--clear ("clear" is less expressive IMHO).
On "--reprobe -> --refresh": Why not simply "--probe"?
On "--crm_xml -> --xml-text": Why not simply "--xml" (XML IS text)?

Regards,
Ulrich


>>> Ken Gaillot  schrieb am 10.01.2018 um 23:10 in 
>>> Nachricht
<1515622250.4815.19.ca...@redhat.com>:
> Pacemaker 2.0 will be a major update whose main goal is to remove
> support for deprecated, legacy syntax, in order to make the code base
> more maintainable into the future. There will also be some changes to
> default configuration behavior, and the command-line tools.
> 
> I'm hoping to release the first release candidate in the next couple of
> weeks. We'll have a longer than usual rc phase to allow for plenty of
> testing.
> 
> A thoroughly detailed list of changes will be maintained on the
> ClusterLabs wiki:
> 
>   https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes 
> 
> These changes are not final, and we can restore functionality if there
> is a strong need for it. Most user-visible changes are complete (in the
> 2.0 branch on github); major changes are still expected, but primarily
> to the C API.
> 
> Some highlights:
> 
> * Only Corosync version 2 will be supported as the underlying cluster
> layer. Support for Heartbeat and Corosync 1 is removed. (Support for
> the new kronosnet layer will be added in a future version.)
> 
> * The record-pending cluster property now defaults to true, which
> allows status tools such as crm_mon to show operations that are in
> progress.
> 
> * So far, the code base has been reduced by about 17,000 lines of code.
> -- 
> Ken Gaillot 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] 答复: Antw: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Ulrich,

Thank you very much for the help. When we do the performance test, our 
application(pgsql-ha) will start more than 500 process to process the client 
request. Is it possible to make this issue?

Is it any workaround or method to make pacemaker not restart the resource in 
such situation? Now the system could not work if the client sends high call 
load but we could not control the client's behavior. 

Thanks


-邮件原件-
发件人: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] 
发送时间: 2018年1月10日 18:20
收件人: users@clusterlabs.org
主题: [ClusterLabs] Antw: pacemaker reports monitor timeout while CPU is high

Hi!

I only can talk for myself: In former times with HP-UX, we had severe 
performance problems when the load was in the range of 8 to 14 (I/O waits not 
included, average for all logical CPUs), while in Linux we are getting problems 
with a load above 40 (or so) (I/O included, sum of all logical CPUs (which are 
24)). Also I/O waits cause cluster timeouts before CPU load actually matters 
(for us).
So with a load above 400 (not knowing your number of CPUs) it should not be 
that unusual. What is the number of threads in your system at that time?
It might be worth the efforts binding the cluster processes to specific CPUs 
and keep other tasks away from those, but I don't have experience with that.
I guess the "High CPU load detected" message triggers some internal suspend in 
the cluster engine (assuming the cluster engine caused the high load). Of 
course for "external " load the measure won't help...

Regards,
Ulrich


>>> ???  schrieb am 10.01.2018 um 10:40 in 
>>> Nachricht
<4dc98a5d9be144a78fb9a18721743...@ex01.highgo.com>:
> Hello,
> 
> This issue only appears when we run performance test and the CPU is high. 
> The cluster and log is as below. The Pacemaker will restart the Slave 
> Side pgsql-ha resource about every two minutes.
> 
> Take the following scenario for example:(when the pgsqlms RA is 
> called, we print the log “execute the command start (command)”. When 
> the command is

> returned, we print the log “execute the command stop (Command)
(result)”)
> 
> 1. We could see that pacemaker call “pgsqlms monitor” about every 15

> seconds. And it return $OCF_SUCCESS
> 
> 2. In calls monitor command again at 13:56:16, and then it reports 
> timeout error error 13:56:18. It is only 2 seconds but it reports 
> “timeout=1ms”
> 
> 3. In other logs, sometimes after 15 minutes, there is no “execute the

> command start monitor” printed and it reports timeout error directly.
> 
> Could you please tell how to debug or resolve such issue?
> 
> The log:
> 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> start

> monitor
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role start 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role stop 0 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> stop monitor 0 Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: 
> execute the command start

> monitor
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role start 
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop 0 
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command 
> stop monitor 0 Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU 
> load detected:
> 426.77
> Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command 
> start

> monitor
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000 
> process (PID

> 5606) timed out
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606 - 
> timed

> out after 1ms
> Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation for 
> pgsqld on db2: Timed Out | call=102 key=pgsqld_monitor_16000
timeout=1ms
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: 
> db2-pgsqld_monitor_16000:102 [
> /tmp:5432 - accepting connections\n ]
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Processing failed op monitor for pgsqld:0 on db2: unknown 
> error (1) Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing 
> failed op start for

> pgsqld:1 on db1: unknown error (1)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away 
> from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away 
> from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover 
> pgsqld:0#011(Slave
> db2)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Calculated transition 
> 37, saving inputs in /var/lib/pacemaker/pengine/pe-input-1251.bz2
> 
> 
> The Cluster Configuration:
> 2 nodes and 13 resources configured
> 
> Online: [ db1 db2 ]
> 
> Full list of resources:
> 
> Clone Set: dlm-clone 

[ClusterLabs] 答复: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Thank you, Ken.

We have set the timeout to be 10 seconds, but it reports timeout only after 2 
seconds. So it seems not work if I set higher timeouts.
Our application which is managed by pacemaker will start more than 500 process 
to run when running performance test. Does it affect the result? Which log 
could help us to analyze?

> monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)

-邮件原件-
发件人: Ken Gaillot [mailto:kgail...@redhat.com] 
发送时间: 2018年1月11日 0:54
收件人: Cluster Labs - All topics related to open-source clustering welcomed 

主题: Re: [ClusterLabs] pacemaker reports monitor timeout while CPU is high

On Wed, 2018-01-10 at 09:40 +, 范国腾 wrote:
> Hello,
>  
> This issue only appears when we run performance test and the CPU is 
> high. The cluster and log is as below. The Pacemaker will restart the 
> Slave Side pgsql-ha resource about every two minutes.
>  
> Take the following scenario for example:(when the pgsqlms RA is 
> called, we print the log “execute the command start (command)”. When 
> the command is returned, we print the log “execute the command stop
> (Command) (result)”)
> 1. We could see that pacemaker call “pgsqlms monitor” about every
> 15 seconds. And it return $OCF_SUCCESS 2. In calls monitor command 
> again at 13:56:16, and then it reports timeout error error 13:56:18. 
> It is only 2 seconds but it reports “timeout=1ms”
> 3. In other logs, sometimes after 15 minutes, there is no “execute 
> the command start monitor” printed and it reports timeout error 
> directly.
>  
> Could you please tell how to debug or resolve such issue?
>  
> The log:
>  
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> start monitor Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: 
> _confirm_role start Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: 
> _confirm_role stop
> 0
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command 
> stop monitor 0 Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: 
> execute the command start monitor Jan 10 13:55:52 sds2 
> pgsqlms(pgsqld)[5477]: INFO: _confirm_role start Jan 10 13:55:52 sds2 
> pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop
> 0
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command 
> stop monitor 0 Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU 
> load detected:
> 426.77
> Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command 
> start monitor Jan 10 13:56:18 sds2 lrmd[26093]: warning: 
> pgsqld_monitor_16000 process (PID 5606) timed out

There's something more going on than in this log snippet. Notice the process 
that timed out (5606) is not one of the processes that logged above (5240 and 
5477).

Generally, once load gets that high, it's very difficult to maintain 
responsiveness, and the expectation is that another node will fence it.
But it can often be worked around with high timeouts, and/or you can use rules 
to set higher timeouts or maintenance mode during times when high load is 
expected.

> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606
> - timed out after 1ms
> Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation 
> for pgsqld on db2: Timed Out | call=102
> key=pgsqld_monitor_16000 timeout=1ms Jan 10 13:56:18 sds2 
> crmd[26096]:  notice: db2-
> pgsqld_monitor_16000:102 [ /tmp:5432 - accepting connections\n ] Jan 
> 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Processing failed op monitor for pgsqld:0 on db2: unknown 
> error (1) Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing 
> failed op start for pgsqld:1 on db1: unknown error (1) Jan 10 13:56:19 
> sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 after 
> 100 failures (max=100) Jan 10 13:56:19 sds2 pengine[26095]: 
> warning: Forcing pgsql-ha away from db1 after 100 failures 
> (max=100) Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover 
> pgsqld:0#011(Slave db2) Jan 10 13:56:19 sds2 pengine[26095]:  notice: 
> Calculated transition 37, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-1251.bz2
>  
>  
> The Cluster Configuration:
> 2 nodes and 13 resources configured
>  
> Online: [ db1 db2 ]
>  
> Full list of resources:
>  
> Clone Set: dlm-clone [dlm]
>  Started: [ db1 db2 ]
> Clone Set: clvmd-clone [clvmd]
>  Started: [ db1 db2 ]
> ipmi_node1 (stonith:fence_ipmilan):    Started db2
> ipmi_node2 (stonith:fence_ipmilan):    Started db1 Clone Set: 
> clusterfs-clone [clusterfs]
>  Started: [ db1 db2 ]
> Master/Slave Set: pgsql-ha [pgsqld]>
>   Masters: [ db1 ]
> Slaves: [ db2 ]
> Resource Group: mastergroup
>  db1-vip    (ocf::heartbeat:IPaddr2):   Started
>  rep-vip    (ocf::heartbeat:IPaddr2):   Started Resource 
> Group: 

Re: [ClusterLabs] Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jan 2018 16:10:50 -0600
Ken Gaillot  wrote:

> Pacemaker 2.0 will be a major update whose main goal is to remove
> support for deprecated, legacy syntax, in order to make the code base
> more maintainable into the future. There will also be some changes to
> default configuration behavior, and the command-line tools.
> 
> I'm hoping to release the first release candidate in the next couple of
> weeks.

Great news! Congrats.

> We'll have a longer than usual rc phase to allow for plenty of
> testing.
>
> A thoroughly detailed list of changes will be maintained on the
> ClusterLabs wiki:
> 
>   https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes
> 
> These changes are not final, and we can restore functionality if there
> is a strong need for it. Most user-visible changes are complete (in the
> 2.0 branch on github); major changes are still expected, but primarily
> to the C API.
> 
> Some highlights:
> 
> * Only Corosync version 2 will be supported as the underlying cluster
> layer. Support for Heartbeat and Corosync 1 is removed. (Support for
> the new kronosnet layer will be added in a future version.)

I thought (according to some conference slides from sept 2017) knet was mostly
related to corosync directly? Is there some visible impact on Pacemaker too?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?

2018-01-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jan 2018 12:23:59 -0600
Ken Gaillot  wrote:
...
> My question is: has anyone used or tested this, or is anyone interested
> in this? We won't promote it to the default schema unless it is tested.
> 
> My feeling is that it is more likely to be confusing than helpful, and
> there are probably ways to achieve any reasonable use case with
> existing syntax.

For what it worth, I tried to implement such solution to dispatch mulitple
IP addresses to slaves in a 1 master 2 slaves cluster. This is quite time
consuming to wrap its head around sides effects with colocation, scores and
stickiness. My various tests shows everything sounds to behave correctly now,
but I don't feel really 100% confident about my setup.

I agree that there are ways to achieve such a use case with existing syntax.
But this is quite confusing as well. As instance, I experienced a master
relocation when messing with a slave to make sure its IP would move to the
other slave node...I don't remember exactly what was my error, but I could
easily dig for it if needed.

I feel like it fits in the same area that the usability of Pacemaker. Making it
easier to understand. See the recent discussion around the gocardless war story.

My tests was mostly for labs, demo and tutorial purpose. I don't have a
specific field use case. But if at some point this feature is promoted
officially as preview, I'll give it some testing and report here (barring the
fact I'm actually aware some feedback are requested ;)).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-10 Thread Adam Spiers
Ken Gaillot  wrote: 
The initial proposal, after discussion at last year's summit, was to 
use /var/log/cluster/pacemaker.log instead. That turned out to be slightly problematic: it broke some regression tests in a way that wasn't easily fixable, and more significantly, it raises the question of what package should own /var/log/cluster (which different distributions might want to answer differently). 


I thought one option aired at the summit to address this was 
/var/log/clusterlabs, but it's entirely possible my memory's playing 
tricks on me again. 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Choosing between Pacemaker 1.1 and Pacemaker 2.0

2018-01-10 Thread Ken Gaillot
Distribution packagers and users who build Pacemaker themselves will
need to choose between staying on the 1.1 line or moving to 2.0. A new
wiki page lists factors to consider:

https://wiki.clusterlabs.org/wiki/Choosing_Between_Pacemaker_1.1_and_2.
0
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-10 Thread Ken Gaillot
Starting with Pacemaker 2.0.0, the Pacemaker detail log will be kept by
default in /var/log/pacemaker/pacemaker.log (rather than
/var/log/pacemaker.log). This will keep /var/log cleaner.

Pacemaker will still prefer any log file specified in corosync.conf.

The initial proposal, after discussion at last year's summit, was to
use /var/log/cluster/pacemaker.log instead. That turned out to be slightly 
problematic: it broke some regression tests in a way that wasn't easily 
fixable, and more significantly, it raises the question of what package should 
own /var/log/cluster (which different distributions might want to answer 
differently).

So instead, the default log locations can be overridden when building
pacemaker. The ./configure script now has these two options:

--with-logdir
Where to keep pacemaker.log (default /var/log/pacemaker)

--with-bundledir
Where to keep bundle logs (default /var/log/pacemaker/bundles, which
hasn't changed)

Thus, if a packager wants to preserve the 1.1 locations, they can use:

./configure --with-logdir=/var/log

And if a packager wants to use /var/log/cluster as originally planned,
they can use:

./configure --with-logdir=/var/log/cluster --with-
bundledir=/var/log/cluster/bundles

and ensure that pacemaker depends on whatever package owns
/var/log/cluster.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 2.0.0: Reliable exit codes

2018-01-10 Thread Ken Gaillot
Every time you run a command on the command line or in a script, it
returns an exit status. These are most useful in scripts to check for
errors.

Currently, Pacemaker daemons and command-line tools return an
unreliable mishmash of exit status codes, sometimes including negative
numbers (which get bitwise-remapped to the 0-255 range) and/or C
library errno codes (which can vary across OSes).

The only thing scripts could rely on was 0 means success and nonzero
means error.

Beginning with Pacemaker 2.0.0, everything will return a well-defined
set of reliable exit status codes. These codes can be viewed using the
existing crm_error tool using the --exit parameter. For example:

crm_error --exit --list

will list all possible exit statuses, and

crm_error --exit 124

will show a textual description of what exit status 124 means.

This will mainly be of interest to users who script Pacemaker commands
and check the return value. If your scripts rely on the current exit
codes, you may need to update your scripts for 2.0.0.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Ken Gaillot
Pacemaker 2.0 will be a major update whose main goal is to remove
support for deprecated, legacy syntax, in order to make the code base
more maintainable into the future. There will also be some changes to
default configuration behavior, and the command-line tools.

I'm hoping to release the first release candidate in the next couple of
weeks. We'll have a longer than usual rc phase to allow for plenty of
testing.

A thoroughly detailed list of changes will be maintained on the
ClusterLabs wiki:

  https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes

These changes are not final, and we can restore functionality if there
is a strong need for it. Most user-visible changes are complete (in the
2.0 branch on github); major changes are still expected, but primarily
to the C API.

Some highlights:

* Only Corosync version 2 will be supported as the underlying cluster
layer. Support for Heartbeat and Corosync 1 is removed. (Support for
the new kronosnet layer will be added in a future version.)

* The record-pending cluster property now defaults to true, which
allows status tools such as crm_mon to show operations that are in
progress.

* So far, the code base has been reduced by about 17,000 lines of code.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?

2018-01-10 Thread Ken Gaillot
The pacemaker-next schema contains experimental features for testing
before potential release. To use these features, someone must
explicitly set validate-with in their configuration to pacemaker-next
(or its legacy alias, pacemaker-1.1).

There is a feature that has been hanging around in there for a long
time: the ability to reference particular instances of a clone in
constraints, using "rsc-instance"/"with-rsc-instance" (colocation) or
"first-instance"/"then-instance" (ordering).

The originally proposed use case (back in 2009) was having separate IP
addresses, each associated with one copy of the clone:

https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2169

My question is: has anyone used or tested this, or is anyone interested
in this? We won't promote it to the default schema unless it is tested.

My feeling is that it is more likely to be confusing than helpful, and
there are probably ways to achieve any reasonable use case with
existing syntax.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Resource Demote Time Out Question

2018-01-10 Thread Ken Gaillot
On Wed, 2018-01-10 at 16:48 +0100, Ulrich Windl wrote:
> Hi!
> 
> Common pitfall: The default parameters in the RA's metadata are not
> the defaults being configured when you don't specify a value; instead
> they are suggestions for you when configuring (don't ask me why!).
> Instead there is a global default timeout being used when you don't
> specify one.
> I hope I put that correctly. You could verify by manually adding the
> default avlues from the metadata to "demote".
> 
> Regards,
> Ulrich

Yep. That would be in the section of the configuration with "op start
interval=0 timeout=120" ... you want "op demote interval=0 timeout="
with the desired value.

> 
> > > > Marc Smith  schrieb am 10.01.2018 um
> > > > 16:26 in
> 
> Nachricht
> 

Re: [ClusterLabs] corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread Ken Gaillot
On Wed, 2018-01-10 at 12:43 +0530, ashutosh tiwari wrote:
> Hi,
> 
> We have two node cluster running in active/standby mode and having
> IPMI fencing configured.

Be aware that using on-board IPMI as the only fencing method is
problematic -- if the host loses power, the IPMI will not respond, and
the cluster will be unable to recover.

> In case of kernel panic at Active node, standby node is detecting
> node failure in around 30 secs which leads to delay in standby node
> taking the active role.
> 
> we have totem token timeout as 1 msecs. 
> Please let us know in case there is any more configuration
> controlling membership detection.

The logs should show what's taking up the time. Corosync should
recognize the node is lost around the token timeout, then pacemaker has
to contact the IPMI and wait for a successful response before
recovering. It could be that the IPMI takes that long to respond, or
there may be something else causing issues.

> 
> s/w versions.
> 
> centos 6.7
> corosync-1.4.7-5.el6.x86_64
> pacemaker-1.1.14-8.el6.x86_64
> 
> Thanks and Regards,
> Ashutosh Tiwari
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Resource Demote Time Out Question

2018-01-10 Thread Ulrich Windl
Hi!

Common pitfall: The default parameters in the RA's metadata are not the 
defaults being configured when you don't specify a value; instead they are 
suggestions for you when configuring (don't ask me why!). Instead there is a 
global default timeout being used when you don't specify one.
I hope I put that correctly. You could verify by manually adding the default 
avlues from the metadata to "demote".

Regards,
Ulrich


>>> Marc Smith  schrieb am 10.01.2018 um 16:26 in
Nachricht

[ClusterLabs] Resource Demote Time Out Question

2018-01-10 Thread Marc Smith
Hi,

I'm experiencing a time out on a demote operation and I'm not sure
which parameter / attribute needs to be updated to extend the time out
window.

I'm using Pacemaker 1.1.16 and Corosync 2.4.2.

Here are the set of log lines that show the issue (shutdown initiated,
then demote time out after 20 seconds):
--snip--
Jan 10 09:08:13 tgtnode2 pacemakerd[1096]:   notice: Caught 'Terminated' signal
Jan 10 09:08:13 tgtnode2 crmd[1104]:   notice: Caught 'Terminated' signal
Jan 10 09:08:13 tgtnode2 crmd[1104]:   notice: State transition S_IDLE
-> S_POLICY_ENGINE
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Scheduling Node
tgtnode2.parodyne.com for shutdown
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Promote
p_scst_zfs_vols:0^I(Slave -> Master tgtnode1.parodyne.com)
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Demote
p_scst_zfs_vols:1^I(Master -> Stopped tgtnode2.parodyne.com)
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Stop
p_dlm:1^I(tgtnode2.parodyne.com)
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Migrate
p_dummy_g_zfs^I(Started tgtnode2.parodyne.com ->
tgtnode1.parodyne.com)
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Move
p_zfs_pool_one^I(Started tgtnode2.parodyne.com ->
tgtnode1.parodyne.com)
Jan 10 09:08:13 tgtnode2 pengine[1103]:   notice: Calculated
transition 3, saving inputs in
/var/lib/pacemaker/pengine/pe-input-1441.bz2
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17449]: DEBUG:
scst_notify() -> Received a 'pre' / 'demote' notification.
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17449]: DEBUG:
p_scst_zfs_vols notify returned: 0
Jan 10 09:08:13 tgtnode2 crmd[1104]:   notice: Result of notify
operation for p_scst_zfs_vols on tgtnode2.parodyne.com: 0 (ok)
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_monitor() -> SCST version: 3.3.0-rc
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_monitor() -> Resource is running.
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_monitor() -> SCST local target group state: active
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_demote() -> Resource is currently running as Master.
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO: Blocking
all 'zfs_vols' devices...
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG: Waiting
for devices to finish blocking...
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_demote() -> Setting target group 'zfs_vols_local' ALUA state to
'transitioning'...
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO:
Collecting current configuration: done. -> Making requested changes.
-> Setting Target Group attribute 'state' to value 'transitioning' for
target group 'zfs_vols/zfs_vols_local': done. -> Done, 1 change(s)
made. All done.
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_demote() -> Setting target group 'zfs_vols_local' ALUA state to
'unavailable'...
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: INFO:
Collecting current configuration: done. -> Making requested changes.
-> Setting Target Group attribute 'state' to value 'unavailable' for
target group 'zfs_vols/zfs_vols_local': done. -> Done, 1 change(s)
made. All done.
Jan 10 09:08:13 tgtnode2 scst(p_scst_zfs_vols)[17473]: DEBUG:
scst_demote() -> Changing the group's devices to inactive...
Jan 10 09:08:33 tgtnode2 lrmd[1101]:  warning:
p_scst_zfs_vols_demote_0 process (PID 17473) timed out
Jan 10 09:08:33 tgtnode2 crmd[1104]:   notice: Transition aborted by
operation p_scst_zfs_vols_demote_0 'modify' on tgtnode2.parodyne.com:
Event failed
Jan 10 09:08:33 tgtnode2 crmd[1104]:   notice: Transition aborted by
status-2-fail-count-p_scst_zfs_vols doing create
fail-count-p_scst_zfs_vols=1: Transient attribute change
--snip--

So I'm getting a "time out" after 20 seconds of waiting in the demote
operation with this line: Jan 10 09:08:33 tgtnode2 lrmd[1101]:
warning: p_scst_zfs_vols_demote_0 process (PID 17473) timed out

The 20 second time out is consistent when testing this, so I'm sure
it's just a configuration thing, but it's not obvious to me which
parameter/attribute/setting needs to be modified.

The relevant metadata section from the RA referenced above:
--snip--
  










  
--snip--

And the primitive and clone (multi-state) actual cluster configuration
for the referenced resource:
--snip--
primitive p_scst_zfs_vols ocf:esos:scst \
params alua=true device_group=zfs_vols local_tgt_grp=zfs_vols_local
remote_tgt_grp=zfs_vols_remote m_alua_state=active
s_alua_state=unavailable use_trans_state=true set_dev_active=true \
op monitor interval=10 role=Master \
op monitor interval=20 role=Slave \
op start interval=0 timeout=120 \
op stop interval=0 timeout=90
ms ms_scst_zfs_vols p_scst_zfs_vols \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1

[ClusterLabs] Antw: Antw: corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread Ulrich Windl
_peer_state_iter: cman_event_callback: Node tigana[2] - state is
> now lost (was member)
> Jan 10 11:06:33 [19261] orana   crmd:   notice:
> crm_update_peer_state_iter: cman_event_callback: Node tigana[2] - state is
> now lost (was member)
> Jan 10 11:06:33 [19261] orana   crmd: info: peer_update_callback:
>  tigana is now lost (was member)
> Jan 10 11:06:33 [19261] orana   crmd:  warning: match_down_event:   No
> match for shutdown action on tigana
> Jan 10 11:06:33 [19261] orana   crmd:   notice: peer_update_callback:
>  Stonith/shutdown of tigana not matched
> Jan 10 11:06:33 [19261] orana   crmd: info: crm_update_peer_join:
>  peer_update_callback: Node tigana[2] - join-2 phase 4 -> 0
> Jan 10 11:06:33 [19261] orana   crmd: info:
> abort_transition_graph: Transition aborted: Node failure
> (source=peer_update_callback:240, 1)
> Jan 10 11:06:33 corosync [CPG   ] chosen downlist: sender r(0) ip(7.7.7.1)
> ; members(old:2 left:1)
> ++
> 
> this is the logs from standby node(new active).
> kernel panic was triggered at 11:06:00 at the other node and here totem
> change is reported at 11:06:31.
> 
> 30 secs is the cluster recheck timer.
> 
> Regards,
> Ashutosh
> 
> 
> On Wed, Jan 10, 2018 at 3:12 PM, <users-requ...@clusterlabs.org> wrote:
> 
>> Send Users mailing list submissions to
>> users@clusterlabs.org 
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> or, via email, send a message with subject or body 'help' to
>> users-requ...@clusterlabs.org 
>>
>> You can reach the person managing the list at
>> users-ow...@clusterlabs.org 
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Users digest..."
>>
>>
>> Today's Topics:
>>
>>1. corosync taking almost 30 secs to detect node failure in case
>>   of kernel panic (ashutosh tiwari)
>>2. Antw: corosync taking almost 30 secs to detect node failure
>>   in case of kernel panic (Ulrich Windl)
>>3. pacemaker reports monitor timeout while CPU is high (???)
>>
>>
>> --
>>
>> Message: 1
>> Date: Wed, 10 Jan 2018 12:43:46 +0530
>> From: ashutosh tiwari <ashutosh.k...@gmail.com>
>> To: users@clusterlabs.org 
>> Subject: [ClusterLabs] corosync taking almost 30 secs to detect node
>> failure in case of kernel panic
>> Message-ID:
>> <CA+vEgjiKG_VGegT7Q+wCqn6merFNrvegiQs+RHRuxzE=muVb
>> 3...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> We have two node cluster running in active/standby mode and having IPMI
>> fencing configured.
>>
>> In case of kernel panic at Active node, standby node is detecting node
>> failure in around 30 secs which leads to delay in standby node taking the
>> active role.
>>
>> we have totem token timeout as 1 msecs.
>> Please let us know in case there is any more configuration controlling
>> membership detection.
>>
>> s/w versions.
>>
>> centos 6.7
>> corosync-1.4.7-5.el6.x86_64
>> pacemaker-1.1.14-8.el6.x86_64
>>
>> Thanks and Regards,
>> Ashutosh Tiwari
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: <http://lists.clusterlabs.org/pipermail/users/attachments/ 
>> 20180110/235f148d/attachment-0001.html>
>>
>> --
>>
>> Message: 2
>> Date: Wed, 10 Jan 2018 08:32:16 +0100
>> From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de>
>> To: <users@clusterlabs.org>
>> Subject: [ClusterLabs] Antw: corosync taking almost 30 secs to detect
>> node failure in case of kernel panic
>> Message-ID: <5a55c18002a100029...@gwsmtp1.uni-regensburg.de>
>> Content-Type: text/plain; charset=US-ASCII
>>
>> Hi!
>>
>> Maybe define "detecting node failure". Culkd it be your 30 seconds are
>> between detection and reaction? Logs would help here, too.
>>
>> Regards,
>> Ulrich
>>
>>
>> >>> ashutosh tiwari <ashutosh.k...@gmail.com> schrieb am 10.01.2018 um
>> 08:13 in
>> Nachricht
>> <CA+vEgjiKG_VGegT7Q+wCqn6merFNrvegiQs+RHRuxzE=muv...@mail.gmail.com>:
>> > Hi,
>> >
>> > We have two node cluster run

[ClusterLabs] Antw: corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread ashutosh tiwari
he logs from standby node(new active).
kernel panic was triggered at 11:06:00 at the other node and here totem
change is reported at 11:06:31.

30 secs is the cluster recheck timer.

Regards,
Ashutosh


On Wed, Jan 10, 2018 at 3:12 PM, <users-requ...@clusterlabs.org> wrote:

> Send Users mailing list submissions to
> users@clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
>
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>1. corosync taking almost 30 secs to detect node failure in case
>   of kernel panic (ashutosh tiwari)
>2. Antw: corosync taking almost 30 secs to detect node failure
>   in case of kernel panic (Ulrich Windl)
>3. pacemaker reports monitor timeout while CPU is high (???)
>
>
> --
>
> Message: 1
> Date: Wed, 10 Jan 2018 12:43:46 +0530
> From: ashutosh tiwari <ashutosh.k...@gmail.com>
> To: users@clusterlabs.org
> Subject: [ClusterLabs] corosync taking almost 30 secs to detect node
> failure in case of kernel panic
> Message-ID:
> <CA+vEgjiKG_VGegT7Q+wCqn6merFNrvegiQs+RHRuxzE=muVb
> 3...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> We have two node cluster running in active/standby mode and having IPMI
> fencing configured.
>
> In case of kernel panic at Active node, standby node is detecting node
> failure in around 30 secs which leads to delay in standby node taking the
> active role.
>
> we have totem token timeout as 1 msecs.
> Please let us know in case there is any more configuration controlling
> membership detection.
>
> s/w versions.
>
> centos 6.7
> corosync-1.4.7-5.el6.x86_64
> pacemaker-1.1.14-8.el6.x86_64
>
> Thanks and Regards,
> Ashutosh Tiwari
> -- next part --
> An HTML attachment was scrubbed...
> URL: <http://lists.clusterlabs.org/pipermail/users/attachments/
> 20180110/235f148d/attachment-0001.html>
>
> --
>
> Message: 2
> Date: Wed, 10 Jan 2018 08:32:16 +0100
> From: "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de>
> To: <users@clusterlabs.org>
> Subject: [ClusterLabs] Antw: corosync taking almost 30 secs to detect
> node failure in case of kernel panic
> Message-ID: <5a55c18002a100029...@gwsmtp1.uni-regensburg.de>
> Content-Type: text/plain; charset=US-ASCII
>
> Hi!
>
> Maybe define "detecting node failure". Culkd it be your 30 seconds are
> between detection and reaction? Logs would help here, too.
>
> Regards,
> Ulrich
>
>
> >>> ashutosh tiwari <ashutosh.k...@gmail.com> schrieb am 10.01.2018 um
> 08:13 in
> Nachricht
> <CA+vEgjiKG_VGegT7Q+wCqn6merFNrvegiQs+RHRuxzE=muv...@mail.gmail.com>:
> > Hi,
> >
> > We have two node cluster running in active/standby mode and having IPMI
> > fencing configured.
> >
> > In case of kernel panic at Active node, standby node is detecting node
> > failure in around 30 secs which leads to delay in standby node taking the
> > active role.
> >
> > we have totem token timeout as 1 msecs.
> > Please let us know in case there is any more configuration controlling
> > membership detection.
> >
> > s/w versions.
> >
> > centos 6.7
> > corosync-1.4.7-5.el6.x86_64
> > pacemaker-1.1.14-8.el6.x86_64
> >
> > Thanks and Regards,
> > Ashutosh Tiwari
>
>
>
>
> --
>
> Message: 3
> Date: Wed, 10 Jan 2018 09:40:51 +
> From: ??? <fanguot...@highgo.com>
> To: Cluster Labs - All topics related to open-source clustering
> welcomed<users@clusterlabs.org>
> Subject: [ClusterLabs] pacemaker reports monitor timeout while CPU is
> high
> Message-ID: <4dc98a5d9be144a78fb9a18721743...@ex01.highgo.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> This issue only appears when we run performance test and the CPU is high.
> The cluster and log is as below. The Pacemaker will restart the Slave Side
> pgsql-ha resource about every two minutes.
>
> Take the following scenario for example:?when the pgsqlms RA is called, we
> print the log ?execute th

[ClusterLabs] Antw: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread Ulrich Windl
Hi!

I only can talk for myself: In former times with HP-UX, we had severe
performance problems when the load was in the range of 8 to 14 (I/O waits not
included, average for all logical CPUs), while in Linux we are getting problems
with a load above 40 (or so) (I/O included, sum of all logical CPUs (which are
24)). Also I/O waits cause cluster timeouts before CPU load actually matters
(for us).
So with a load above 400 (not knowing your number of CPUs) it should not be
that unusual. What is the number of threads in your system at that time?
It might be worth the efforts binding the cluster processes to specific CPUs
and keep other tasks away from those, but I don't have experience with that.
I guess the "High CPU load detected" message triggers some internal suspend in
the cluster engine (assuming the cluster engine caused the high load). Of
course for "external " load the measure won't help...

Regards,
Ulrich


>>> ???  schrieb am 10.01.2018 um 10:40 in Nachricht
<4dc98a5d9be144a78fb9a18721743...@ex01.highgo.com>:
> Hello,
> 
> This issue only appears when we run performance test and the CPU is high. 
> The cluster and log is as below. The Pacemaker will restart the Slave Side 
> pgsql-ha resource about every two minutes.
> 
> Take the following scenario for example:(when the pgsqlms RA is called, we 
> print the log “execute the command start (command)”. When the command is

> returned, we print the log “execute the command stop (Command)
(result)”)
> 
> 1. We could see that pacemaker call “pgsqlms monitor” about every 15

> seconds. And it return $OCF_SUCCESS
> 
> 2. In calls monitor command again at 13:56:16, and then it reports 
> timeout error error 13:56:18. It is only 2 seconds but it reports 
> “timeout=1ms”
> 
> 3. In other logs, sometimes after 15 minutes, there is no “execute the

> command start monitor” printed and it reports timeout error directly.
> 
> Could you please tell how to debug or resolve such issue?
> 
> The log:
> 
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command start

> monitor
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role start
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role stop 0
> Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command stop 
> monitor 0
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command start

> monitor
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role start
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop 0
> Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command stop 
> monitor 0
> Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU load detected: 
> 426.77
> Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command start

> monitor
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000 process (PID

> 5606) timed out
> Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606 - timed

> out after 1ms
> Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation for 
> pgsqld on db2: Timed Out | call=102 key=pgsqld_monitor_16000
timeout=1ms
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: db2-pgsqld_monitor_16000:102 [ 
> /tmp:5432 - accepting connections\n ]
> Jan 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
> origin=abort_transition_graph
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op monitor 
> for pgsqld:0 on db2: unknown error (1)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op start for

> pgsqld:1 on db1: unknown error (1)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1

> after 100 failures (max=100)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover pgsqld:0#011(Slave 
> db2)
> Jan 10 13:56:19 sds2 pengine[26095]:  notice: Calculated transition 37, 
> saving inputs in /var/lib/pacemaker/pengine/pe-input-1251.bz2
> 
> 
> The Cluster Configuration:
> 2 nodes and 13 resources configured
> 
> Online: [ db1 db2 ]
> 
> Full list of resources:
> 
> Clone Set: dlm-clone [dlm]
>  Started: [ db1 db2 ]
> Clone Set: clvmd-clone [clvmd]
>  Started: [ db1 db2 ]
> ipmi_node1 (stonith:fence_ipmilan):Started db2
> ipmi_node2 (stonith:fence_ipmilan):Started db1
> Clone Set: clusterfs-clone [clusterfs]
>  Started: [ db1 db2 ]
> Master/Slave Set: pgsql-ha [pgsqld]>
> 
>   Masters: [ db1 ]
> 
> Slaves: [ db2 ]
> Resource Group: mastergroup
>  db1-vip(ocf::heartbeat:IPaddr2):   Started
>  rep-vip(ocf::heartbeat:IPaddr2):   Started
> Resource Group: slavegroup
>  db2-vip(ocf::heartbeat:IPaddr2):   Started
> 
> 
> pcs resource show pgsql-ha
> Master: pgsql-ha
>   Meta Attrs: 

[ClusterLabs] pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Hello,

This issue only appears when we run performance test and the CPU is high. The 
cluster and log is as below. The Pacemaker will restart the Slave Side pgsql-ha 
resource about every two minutes.

Take the following scenario for example:(when the pgsqlms RA is called, we 
print the log “execute the command start (command)”. When the command is 
returned, we print the log “execute the command stop (Command) (result)”)

1. We could see that pacemaker call “pgsqlms monitor” about every 15 
seconds. And it return $OCF_SUCCESS

2. In calls monitor command again at 13:56:16, and then it reports timeout 
error error 13:56:18. It is only 2 seconds but it reports “timeout=1ms”

3. In other logs, sometimes after 15 minutes, there is no “execute the 
command start monitor” printed and it reports timeout error directly.

Could you please tell how to debug or resolve such issue?

The log:

Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command start 
monitor
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role start
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: _confirm_role stop 0
Jan 10 13:55:35 sds2 pgsqlms(pgsqld)[5240]: INFO: execute the command stop 
monitor 0
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command start 
monitor
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role start
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: _confirm_role stop 0
Jan 10 13:55:52 sds2 pgsqlms(pgsqld)[5477]: INFO: execute the command stop 
monitor 0
Jan 10 13:56:02 sds2 crmd[26096]:  notice: High CPU load detected: 426.77
Jan 10 13:56:16 sds2 pgsqlms(pgsqld)[5606]: INFO: execute the command start 
monitor
Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000 process (PID 
5606) timed out
Jan 10 13:56:18 sds2 lrmd[26093]: warning: pgsqld_monitor_16000:5606 - timed 
out after 1ms
Jan 10 13:56:18 sds2 crmd[26096]:   error: Result of monitor operation for 
pgsqld on db2: Timed Out | call=102 key=pgsqld_monitor_16000 timeout=1ms
Jan 10 13:56:18 sds2 crmd[26096]:  notice: db2-pgsqld_monitor_16000:102 [ 
/tmp:5432 - accepting connections\n ]
Jan 10 13:56:18 sds2 crmd[26096]:  notice: State transition S_IDLE -> 
S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph
Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op monitor for 
pgsqld:0 on db2: unknown error (1)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Processing failed op start for 
pgsqld:1 on db1: unknown error (1)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 
after 100 failures (max=100)
Jan 10 13:56:19 sds2 pengine[26095]: warning: Forcing pgsql-ha away from db1 
after 100 failures (max=100)
Jan 10 13:56:19 sds2 pengine[26095]:  notice: Recover pgsqld:0#011(Slave db2)
Jan 10 13:56:19 sds2 pengine[26095]:  notice: Calculated transition 37, saving 
inputs in /var/lib/pacemaker/pengine/pe-input-1251.bz2


The Cluster Configuration:
2 nodes and 13 resources configured

Online: [ db1 db2 ]

Full list of resources:

Clone Set: dlm-clone [dlm]
 Started: [ db1 db2 ]
Clone Set: clvmd-clone [clvmd]
 Started: [ db1 db2 ]
ipmi_node1 (stonith:fence_ipmilan):Started db2
ipmi_node2 (stonith:fence_ipmilan):Started db1
Clone Set: clusterfs-clone [clusterfs]
 Started: [ db1 db2 ]
Master/Slave Set: pgsql-ha [pgsqld]>

  Masters: [ db1 ]

Slaves: [ db2 ]
Resource Group: mastergroup
 db1-vip(ocf::heartbeat:IPaddr2):   Started
 rep-vip(ocf::heartbeat:IPaddr2):   Started
Resource Group: slavegroup
 db2-vip(ocf::heartbeat:IPaddr2):   Started


pcs resource show pgsql-ha
Master: pgsql-ha
  Meta Attrs: interleave=true notify=true
  Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
   Attributes: bindir=/usr/local/pgsql/bin pgdata=/home/postgres/data
   Operations: start interval=0s timeout=160s (pgsqld-start-interval-0s)
   stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
   promote interval=0s timeout=130s (pgsqld-promote-interval-0s)
   demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
   monitor interval=15s role=Master timeout=10s 
(pgsqld-monitor-interval-15s)
   monitor interval=16s role=Slave timeout=10s 
(pgsqld-monitor-interval-16s)
   notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org