[ClusterLabs] Antw: Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Ulrich Windl
Hi! On the tool changes, I'd prefer --move and --un-move as pair over --move and --clear ("clear" is less expressive IMHO). On "--reprobe -> --refresh": Why not simply "--probe"? On "--crm_xml -> --xml-text": Why not simply "--xml" (XML IS text)? Regards, Ulrich >>> Ken Gaillot

[ClusterLabs] 答复: Antw: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Ulrich, Thank you very much for the help. When we do the performance test, our application(pgsql-ha) will start more than 500 process to process the client request. Is it possible to make this issue? Is it any workaround or method to make pacemaker not restart the resource in such situation?

[ClusterLabs] 答复: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Thank you, Ken. We have set the timeout to be 10 seconds, but it reports timeout only after 2 seconds. So it seems not work if I set higher timeouts. Our application which is managed by pacemaker will start more than 500 process to run when running performance test. Does it affect the result?

Re: [ClusterLabs] Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jan 2018 16:10:50 -0600 Ken Gaillot wrote: > Pacemaker 2.0 will be a major update whose main goal is to remove > support for deprecated, legacy syntax, in order to make the code base > more maintainable into the future. There will also be some changes to > default

Re: [ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?

2018-01-10 Thread Jehan-Guillaume de Rorthais
On Wed, 10 Jan 2018 12:23:59 -0600 Ken Gaillot wrote: ... > My question is: has anyone used or tested this, or is anyone interested > in this? We won't promote it to the default schema unless it is tested. > > My feeling is that it is more likely to be confusing than

Re: [ClusterLabs] Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-10 Thread Adam Spiers
Ken Gaillot wrote: The initial proposal, after discussion at last year's summit, was to use /var/log/cluster/pacemaker.log instead. That turned out to be slightly problematic: it broke some regression tests in a way that wasn't easily fixable, and more significantly, it

[ClusterLabs] Choosing between Pacemaker 1.1 and Pacemaker 2.0

2018-01-10 Thread Ken Gaillot
Distribution packagers and users who build Pacemaker themselves will need to choose between staying on the 1.1 line or moving to 2.0. A new wiki page lists factors to consider: https://wiki.clusterlabs.org/wiki/Choosing_Between_Pacemaker_1.1_and_2. 0 -- Ken Gaillot

[ClusterLabs] Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-10 Thread Ken Gaillot
Starting with Pacemaker 2.0.0, the Pacemaker detail log will be kept by default in /var/log/pacemaker/pacemaker.log (rather than /var/log/pacemaker.log). This will keep /var/log cleaner. Pacemaker will still prefer any log file specified in corosync.conf. The initial proposal, after discussion

[ClusterLabs] Coming in Pacemaker 2.0.0: Reliable exit codes

2018-01-10 Thread Ken Gaillot
Every time you run a command on the command line or in a script, it returns an exit status. These are most useful in scripts to check for errors. Currently, Pacemaker daemons and command-line tools return an unreliable mishmash of exit status codes, sometimes including negative numbers (which get

[ClusterLabs] Changes coming in Pacemaker 2.0.0

2018-01-10 Thread Ken Gaillot
Pacemaker 2.0 will be a major update whose main goal is to remove support for deprecated, legacy syntax, in order to make the code base more maintainable into the future. There will also be some changes to default configuration behavior, and the command-line tools. I'm hoping to release the first

[ClusterLabs] Does anyone use clone instance constraints from pacemaker-next schema?

2018-01-10 Thread Ken Gaillot
The pacemaker-next schema contains experimental features for testing before potential release. To use these features, someone must explicitly set validate-with in their configuration to pacemaker-next (or its legacy alias, pacemaker-1.1). There is a feature that has been hanging around in there

Re: [ClusterLabs] Antw: Resource Demote Time Out Question

2018-01-10 Thread Ken Gaillot
On Wed, 2018-01-10 at 16:48 +0100, Ulrich Windl wrote: > Hi! > > Common pitfall: The default parameters in the RA's metadata are not > the defaults being configured when you don't specify a value; instead > they are suggestions for you when configuring (don't ask me why!). > Instead there is a

Re: [ClusterLabs] corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread Ken Gaillot
On Wed, 2018-01-10 at 12:43 +0530, ashutosh tiwari wrote: > Hi, > > We have two node cluster running in active/standby mode and having > IPMI fencing configured. Be aware that using on-board IPMI as the only fencing method is problematic -- if the host loses power, the IPMI will not respond, and

[ClusterLabs] Antw: Resource Demote Time Out Question

2018-01-10 Thread Ulrich Windl
Hi! Common pitfall: The default parameters in the RA's metadata are not the defaults being configured when you don't specify a value; instead they are suggestions for you when configuring (don't ask me why!). Instead there is a global default timeout being used when you don't specify one. I

[ClusterLabs] Resource Demote Time Out Question

2018-01-10 Thread Marc Smith
Hi, I'm experiencing a time out on a demote operation and I'm not sure which parameter / attribute needs to be updated to extend the time out window. I'm using Pacemaker 1.1.16 and Corosync 2.4.2. Here are the set of log lines that show the issue (shutdown initiated, then demote time out after

[ClusterLabs] Antw: Antw: corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread Ulrich Windl
running in active/standby mode and having IPMI >> fencing configured. >> >> In case of kernel panic at Active node, standby node is detecting node >> failure in around 30 secs which leads to delay in standby node taking the >> active role. >> >> we have totem token timeou

[ClusterLabs] Antw: corosync taking almost 30 secs to detect node failure in case of kernel panic

2018-01-10 Thread ashutosh tiwari
t; centos 6.7 > corosync-1.4.7-5.el6.x86_64 > pacemaker-1.1.14-8.el6.x86_64 > > Thanks and Regards, > Ashutosh Tiwari > -- next part -- > An HTML attachment was scrubbed... > URL: <http://lists.clusterlabs.org/pipermail/users/attachments/ > 20180110/

[ClusterLabs] Antw: pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread Ulrich Windl
Hi! I only can talk for myself: In former times with HP-UX, we had severe performance problems when the load was in the range of 8 to 14 (I/O waits not included, average for all logical CPUs), while in Linux we are getting problems with a load above 40 (or so) (I/O included, sum of all logical

[ClusterLabs] pacemaker reports monitor timeout while CPU is high

2018-01-10 Thread 范国腾
Hello, This issue only appears when we run performance test and the CPU is high. The cluster and log is as below. The Pacemaker will restart the Slave Side pgsql-ha resource about every two minutes. Take the following scenario for example:(when the pgsqlms RA is called, we print the log