Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-09 Thread Kristoffer Grönlund
"Lentes, Bernd"  writes:

> - On May 8, 2017, at 9:20 PM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
>
>> Hi,
>> 
>> i remember that digimer often campaigns for a fence delay in a 2-node  
>> cluster.
>> E.g. here: 
>> http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html
>> In my eyes it makes sense, so i try to establish that. I have two HP servers,
>> each with an ILO card.
>> I have to use the stonith:external/ipmi agent, the stonith:external/riloe
>> refused to work.
>> 
>> But i don't have a delay parameter there.
>> crm ra info stonith:external/ipmi:
>> 
>> ...
>> pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and 
>> specify
>> the maximum of random delay
>>This prevents double fencing when using slow devices such as sbd.
>>Use this to enable random delay for stonith actions and specify the 
>> maximum of
>>random delay.
>> ...
>> 
>> This is the only delay parameter i can use. But a random delay does not seem 
>> to
>> be a reliable solution.
>> 
>> The stonith:ipmilan agent also provides just a random delay. Same with the 
>> riloe
>> agent.
>> 
>> How did anyone solve this problem ?
>> 
>> Or do i have to edit the RA (I will get practice in that :-))?
>> 
>> 
>
> crm ra info stonith:external/ipmi says there exists a parameter 
> pcmk_delay_max.
> Having a look in  /usr/lib64/stonith/plugins/external/ipmi i don't find 
> anything about delay.
> Also "crm_resource --show-metadata=stonith:external/ipmi" does not say 
> anything about a delay.
>
> Is this "pcmk_delay_max" not implemented ? From where does "crm ra info 
> stonith:external/ipmi" get this info ?
>

pcmk_delay_max is implemented by Pacemaker. crmsh gets the information
about available parameters by querying stonithd directly.

Cheers,
Kristoffer

>
> Bernd
>  
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker occasionally takes minutes to respond

2017-05-09 Thread Attila Megyeri
Actually I found some more details:

there are two resources: A and B

resource B depends on resource A (when the RA monitors B, if will fail if A is 
not running properly)

If I stop resource A, the next monitor operation of "B" will fail. 
Interestingly, this check happens immediately after A is stopped.

B is configured to restart if monitor fails. Start timeout is rather long, 180 
seconds. So pacemaker tries to restart B, and waits.

If I want to start "A", nothing happens until the start operation of "B" fails 
- typically several minutes.


Is this the right behavior?
It appears that pacemaker is blocked until resource B is being started, and I 
cannot really start its dependency...
Shouldn't it be possible to start a resource while another resource is also 
starting?


Thanks,
Attila


From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
Sent: Tuesday, May 9, 2017 9:53 PM
To: users@clusterlabs.org; kgail...@redhat.com
Subject: [ClusterLabs] Pacemaker occasionally takes minutes to respond

Hi Ken, all,


We ran into an issue very similar to the one described in 
https://bugzilla.redhat.com/show_bug.cgi?id=1430112 /  [Intel 7.4 Bug] 
Pacemaker occasionally takes minutes to respond

But  in our case we are not using fencing/stonith at all.

Many times when I want to start/stop/cleanup a resource, it takes tens of 
seconds (or even minutes) till the command gets executed. The logs show nothing 
in that period, the redundant rings show no fault.

Could this be the same issue?

Any hints on how to troubleshoot this?
It is  pacemaker 1.1.10, corosync 2.3.3


Cheers,
Attila



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker occasionally takes minutes to respond

2017-05-09 Thread Attila Megyeri
Hi Ken, all,


We ran into an issue very similar to the one described in 
https://bugzilla.redhat.com/show_bug.cgi?id=1430112 /  [Intel 7.4 Bug] 
Pacemaker occasionally takes minutes to respond

But  in our case we are not using fencing/stonith at all.

Many times when I want to start/stop/cleanup a resource, it takes tens of 
seconds (or even minutes) till the command gets executed. The logs show nothing 
in that period, the redundant rings show no fault.

Could this be the same issue?

Any hints on how to troubleshoot this?
It is  pacemaker 1.1.10, corosync 2.3.3


Cheers,
Attila



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cloned resources ordering and remote nodes problem

2017-05-09 Thread Ken Gaillot
On 04/13/2017 08:49 AM, Radoslaw Garbacz wrote:
> Thank you, however in my case this parameter does not change the
> described behavior.
> 
> I have a more detail example:
> order: res_A-clone -> res_B-clone -> res_C
> when "res_C" is not on the node, which had "res_A" instance failed, it
> will not be restarted, only "res_A" and "res_B" all instances will.
> 
> I implemented a workaround by modifying "res_C" I made it also cloned,
> and now it is restarted.
> 
> 
> My Pacemaker 1.1.16-1.el6
> System: CentOS 6

I haven't been able to reproduce this. Can you attach a configuration
file that exhibits the problem?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-09 Thread Lentes, Bernd


- On May 8, 2017, at 9:20 PM, Bernd Lentes 
bernd.len...@helmholtz-muenchen.de wrote:

> Hi,
> 
> i remember that digimer often campaigns for a fence delay in a 2-node  
> cluster.
> E.g. here: 
> http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html
> In my eyes it makes sense, so i try to establish that. I have two HP servers,
> each with an ILO card.
> I have to use the stonith:external/ipmi agent, the stonith:external/riloe
> refused to work.
> 
> But i don't have a delay parameter there.
> crm ra info stonith:external/ipmi:
> 
> ...
> pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and 
> specify
> the maximum of random delay
>This prevents double fencing when using slow devices such as sbd.
>Use this to enable random delay for stonith actions and specify the 
> maximum of
>random delay.
> ...
> 
> This is the only delay parameter i can use. But a random delay does not seem 
> to
> be a reliable solution.
> 
> The stonith:ipmilan agent also provides just a random delay. Same with the 
> riloe
> agent.
> 
> How did anyone solve this problem ?
> 
> Or do i have to edit the RA (I will get practice in that :-))?
> 
> 

crm ra info stonith:external/ipmi says there exists a parameter pcmk_delay_max.
Having a look in  /usr/lib64/stonith/plugins/external/ipmi i don't find 
anything about delay.
Also "crm_resource --show-metadata=stonith:external/ipmi" does not say anything 
about a delay.

Is this "pcmk_delay_max" not implemented ? From where does "crm ra info 
stonith:external/ipmi" get this info ?


Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)

2017-05-09 Thread Jan Pokorný
On 09/05/17 09:51 -0500, Ken Gaillot wrote:
> On 05/09/2017 02:44 AM, Handra Cs wrote:
>> I am currently trying to configure Pacemaker/Corosync. I managed to
>> install the required packages for the cluster configuration, however I
>> could not start the cluster service. Based on the log file, there was an
>> issue with the directory /var/lib/pacemaker/.
>> 
>> I have tried some suggestions from checking the GID of the root user and
>> ensuring the permission of the folder to be owned by hacluster:haclient,
>> unfortunately there was no luck.
>> 
>> I am currently using RedHat 6.8. Thank you in advance for the help.
> 
> That's odd. The 6.8 packages normally work right out of the box.
> Double-check that /var and /var/lib both exist, are owned by root, and
> have permissions drwxr-xr-x.

You can also check if you see any /var/lib/pacemaker entry in the
output of "rpm -qV pacemaker".  Note that while pacemaker creates
directories like /var/lib/pacemaker/cib with proper permissions and
ownership (early enough) on startup if they don't exist yet, it won't
touch these properties on subsequents starts if the dirs are present.

> Maybe try removing the packages, removing /var/lib/pacemaker, then
> reinstalling. If that doesn't help, open a support ticket with Red
> Hat.
> 
>> 
>> Attached is the log file for your reference.

-- 
Poki


pgpAAkofgaStI.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)

2017-05-09 Thread Ken Gaillot
On 05/09/2017 02:44 AM, Handra Cs wrote:
> Hi there,
> 
> I am currently trying to configure Pacemaker/Corosync. I managed to
> install the required packages for the cluster configuration, however I
> could not start the cluster service. Based on the log file, there was an
> issue with the directory /var/lib/pacemaker/.
> 
> I have tried some suggestions from checking the GID of the root user and
> ensuring the permission of the folder to be owned by hacluster:haclient,
> unfortunately there was no luck.
> 
> I am currently using RedHat 6.8. Thank you in advance for the help.

That's odd. The 6.8 packages normally work right out of the box.
Double-check that /var and /var/lib both exist, are owned by root, and
have permissions drwxr-xr-x. Maybe try removing the packages, removing
/var/lib/pacemaker, then reinstalling. If that doesn't help, open a
support ticket with Red Hat.

> 
> Attached is the log file for your reference.
> 
> Regards,
> Handra
> 
> -- Try the best, do the best, be the best --
> -- 
> Sent from Gmail for iOS Regards, Handra -- Try the best, do the best, be
> the best --


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.17-rc1 now available

2017-05-09 Thread Ken Gaillot
On 05/09/2017 03:51 AM, Lars Ellenberg wrote:
> Yay!
> 
> On Mon, May 08, 2017 at 07:50:49PM -0500, Ken Gaillot wrote:
>> "crm_attribute --pattern" to update or delete all node
>> attributes matching a regular expression
> 
> Just a nit, but "pattern" usually is associated with "glob pattern".
> If it's not a "pattern" but a "regex",
> "--regex" would be more appropriate.
> 
>  :-)
> 
> Cheers,
> 
> Lars

How about "--match", with the help text saying "regular expression"?


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2017-05-09 Thread Vladislav Bogdanov

09.05.2017 00:56, Ken Gaillot wrote:

[...]


Those messages indicate there is a real issue with the CPU load. When
the cluster notices high load, it reduces the number of actions it will
execute at the same time. This is generally a good idea, to avoid making
the load worse.



[...]


message, and 2.0 to get the "High CPU load" message. These are measured
against the 1-minute system load average (the same number you would get
with top, uptime, etc.).


Well, linux loadavg actually has nothing to *CPU* load.

https://en.wikipedia.org/wiki/Load_(computing)

The most common example to prove that is a storage system (I see that 
with in-kernel iSCSI target) with dedicated data disks/arrays, where 
loadavg can be very high (100-200 is not uncommon), but actual CPU usage 
(user+system) is not more that 20%. For such systems load threshold 
plays bad role, unnecessarily slowing down cluster reactions.


Best,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker 1.1.17-rc1 now available

2017-05-09 Thread Lars Ellenberg
Yay!

On Mon, May 08, 2017 at 07:50:49PM -0500, Ken Gaillot wrote:
> "crm_attribute --pattern" to update or delete all node
> attributes matching a regular expression

Just a nit, but "pattern" usually is associated with "glob pattern".
If it's not a "pattern" but a "regex",
"--regex" would be more appropriate.

 :-)

Cheers,

Lars


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)

2017-05-09 Thread Handra Cs
Hi there,

I am currently trying to configure Pacemaker/Corosync. I managed to install
the required packages for the cluster configuration, however I could not
start the cluster service. Based on the log file, there was an issue with
the directory /var/lib/pacemaker/.

I have tried some suggestions from checking the GID of the root user and
ensuring the permission of the folder to be owned by hacluster:haclient,
unfortunately there was no luck.

I am currently using RedHat 6.8. Thank you in advance for the help.

Attached is the log file for your reference.

Regards,
Handra

-- Try the best, do the best, be the best --
-- 
Sent from Gmail for iOS Regards, Handra -- Try the best, do the best, be
the best --


corosync.log
Description: Binary data
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org