Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?
"Lentes, Bernd" writes: > - On May 8, 2017, at 9:20 PM, Bernd Lentes > bernd.len...@helmholtz-muenchen.de wrote: > >> Hi, >> >> i remember that digimer often campaigns for a fence delay in a 2-node >> cluster. >> E.g. here: >> http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html >> In my eyes it makes sense, so i try to establish that. I have two HP servers, >> each with an ILO card. >> I have to use the stonith:external/ipmi agent, the stonith:external/riloe >> refused to work. >> >> But i don't have a delay parameter there. >> crm ra info stonith:external/ipmi: >> >> ... >> pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and >> specify >> the maximum of random delay >>This prevents double fencing when using slow devices such as sbd. >>Use this to enable random delay for stonith actions and specify the >> maximum of >>random delay. >> ... >> >> This is the only delay parameter i can use. But a random delay does not seem >> to >> be a reliable solution. >> >> The stonith:ipmilan agent also provides just a random delay. Same with the >> riloe >> agent. >> >> How did anyone solve this problem ? >> >> Or do i have to edit the RA (I will get practice in that :-))? >> >> > > crm ra info stonith:external/ipmi says there exists a parameter > pcmk_delay_max. > Having a look in /usr/lib64/stonith/plugins/external/ipmi i don't find > anything about delay. > Also "crm_resource --show-metadata=stonith:external/ipmi" does not say > anything about a delay. > > Is this "pcmk_delay_max" not implemented ? From where does "crm ra info > stonith:external/ipmi" get this info ? > pcmk_delay_max is implemented by Pacemaker. crmsh gets the information about available parameters by querying stonithd directly. Cheers, Kristoffer > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker occasionally takes minutes to respond
Actually I found some more details: there are two resources: A and B resource B depends on resource A (when the RA monitors B, if will fail if A is not running properly) If I stop resource A, the next monitor operation of "B" will fail. Interestingly, this check happens immediately after A is stopped. B is configured to restart if monitor fails. Start timeout is rather long, 180 seconds. So pacemaker tries to restart B, and waits. If I want to start "A", nothing happens until the start operation of "B" fails - typically several minutes. Is this the right behavior? It appears that pacemaker is blocked until resource B is being started, and I cannot really start its dependency... Shouldn't it be possible to start a resource while another resource is also starting? Thanks, Attila From: Attila Megyeri [mailto:amegy...@minerva-soft.com] Sent: Tuesday, May 9, 2017 9:53 PM To: users@clusterlabs.org; kgail...@redhat.com Subject: [ClusterLabs] Pacemaker occasionally takes minutes to respond Hi Ken, all, We ran into an issue very similar to the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1430112 / [Intel 7.4 Bug] Pacemaker occasionally takes minutes to respond But in our case we are not using fencing/stonith at all. Many times when I want to start/stop/cleanup a resource, it takes tens of seconds (or even minutes) till the command gets executed. The logs show nothing in that period, the redundant rings show no fault. Could this be the same issue? Any hints on how to troubleshoot this? It is pacemaker 1.1.10, corosync 2.3.3 Cheers, Attila ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker occasionally takes minutes to respond
Hi Ken, all, We ran into an issue very similar to the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1430112 / [Intel 7.4 Bug] Pacemaker occasionally takes minutes to respond But in our case we are not using fencing/stonith at all. Many times when I want to start/stop/cleanup a resource, it takes tens of seconds (or even minutes) till the command gets executed. The logs show nothing in that period, the redundant rings show no fault. Could this be the same issue? Any hints on how to troubleshoot this? It is pacemaker 1.1.10, corosync 2.3.3 Cheers, Attila ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] cloned resources ordering and remote nodes problem
On 04/13/2017 08:49 AM, Radoslaw Garbacz wrote: > Thank you, however in my case this parameter does not change the > described behavior. > > I have a more detail example: > order: res_A-clone -> res_B-clone -> res_C > when "res_C" is not on the node, which had "res_A" instance failed, it > will not be restarted, only "res_A" and "res_B" all instances will. > > I implemented a workaround by modifying "res_C" I made it also cloned, > and now it is restarted. > > > My Pacemaker 1.1.16-1.el6 > System: CentOS 6 I haven't been able to reproduce this. Can you attach a configuration file that exhibits the problem? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?
- On May 8, 2017, at 9:20 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i remember that digimer often campaigns for a fence delay in a 2-node > cluster. > E.g. here: > http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html > In my eyes it makes sense, so i try to establish that. I have two HP servers, > each with an ILO card. > I have to use the stonith:external/ipmi agent, the stonith:external/riloe > refused to work. > > But i don't have a delay parameter there. > crm ra info stonith:external/ipmi: > > ... > pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and > specify > the maximum of random delay >This prevents double fencing when using slow devices such as sbd. >Use this to enable random delay for stonith actions and specify the > maximum of >random delay. > ... > > This is the only delay parameter i can use. But a random delay does not seem > to > be a reliable solution. > > The stonith:ipmilan agent also provides just a random delay. Same with the > riloe > agent. > > How did anyone solve this problem ? > > Or do i have to edit the RA (I will get practice in that :-))? > > crm ra info stonith:external/ipmi says there exists a parameter pcmk_delay_max. Having a look in /usr/lib64/stonith/plugins/external/ipmi i don't find anything about delay. Also "crm_resource --show-metadata=stonith:external/ipmi" does not say anything about a delay. Is this "pcmk_delay_max" not implemented ? From where does "crm ra info stonith:external/ipmi" get this info ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)
On 09/05/17 09:51 -0500, Ken Gaillot wrote: > On 05/09/2017 02:44 AM, Handra Cs wrote: >> I am currently trying to configure Pacemaker/Corosync. I managed to >> install the required packages for the cluster configuration, however I >> could not start the cluster service. Based on the log file, there was an >> issue with the directory /var/lib/pacemaker/. >> >> I have tried some suggestions from checking the GID of the root user and >> ensuring the permission of the folder to be owned by hacluster:haclient, >> unfortunately there was no luck. >> >> I am currently using RedHat 6.8. Thank you in advance for the help. > > That's odd. The 6.8 packages normally work right out of the box. > Double-check that /var and /var/lib both exist, are owned by root, and > have permissions drwxr-xr-x. You can also check if you see any /var/lib/pacemaker entry in the output of "rpm -qV pacemaker". Note that while pacemaker creates directories like /var/lib/pacemaker/cib with proper permissions and ownership (early enough) on startup if they don't exist yet, it won't touch these properties on subsequents starts if the dirs are present. > Maybe try removing the packages, removing /var/lib/pacemaker, then > reinstalling. If that doesn't help, open a support ticket with Red > Hat. > >> >> Attached is the log file for your reference. -- Poki pgpAAkofgaStI.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)
On 05/09/2017 02:44 AM, Handra Cs wrote: > Hi there, > > I am currently trying to configure Pacemaker/Corosync. I managed to > install the required packages for the cluster configuration, however I > could not start the cluster service. Based on the log file, there was an > issue with the directory /var/lib/pacemaker/. > > I have tried some suggestions from checking the GID of the root user and > ensuring the permission of the folder to be owned by hacluster:haclient, > unfortunately there was no luck. > > I am currently using RedHat 6.8. Thank you in advance for the help. That's odd. The 6.8 packages normally work right out of the box. Double-check that /var and /var/lib both exist, are owned by root, and have permissions drwxr-xr-x. Maybe try removing the packages, removing /var/lib/pacemaker, then reinstalling. If that doesn't help, open a support ticket with Red Hat. > > Attached is the log file for your reference. > > Regards, > Handra > > -- Try the best, do the best, be the best -- > -- > Sent from Gmail for iOS Regards, Handra -- Try the best, do the best, be > the best -- ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.17-rc1 now available
On 05/09/2017 03:51 AM, Lars Ellenberg wrote: > Yay! > > On Mon, May 08, 2017 at 07:50:49PM -0500, Ken Gaillot wrote: >> "crm_attribute --pattern" to update or delete all node >> attributes matching a regular expression > > Just a nit, but "pattern" usually is associated with "glob pattern". > If it's not a "pattern" but a "regex", > "--regex" would be more appropriate. > > :-) > > Cheers, > > Lars How about "--match", with the help text saying "regular expression"? ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected
09.05.2017 00:56, Ken Gaillot wrote: [...] Those messages indicate there is a real issue with the CPU load. When the cluster notices high load, it reduces the number of actions it will execute at the same time. This is generally a good idea, to avoid making the load worse. [...] message, and 2.0 to get the "High CPU load" message. These are measured against the 1-minute system load average (the same number you would get with top, uptime, etc.). Well, linux loadavg actually has nothing to *CPU* load. https://en.wikipedia.org/wiki/Load_(computing) The most common example to prove that is a storage system (I see that with in-kernel iSCSI target) with dedicated data disks/arrays, where loadavg can be very high (100-200 is not uncommon), but actual CPU usage (user+system) is not more that 20%. For such systems load threshold plays bad role, unnecessarily slowing down cluster reactions. Best, Vladislav ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker 1.1.17-rc1 now available
Yay! On Mon, May 08, 2017 at 07:50:49PM -0500, Ken Gaillot wrote: > "crm_attribute --pattern" to update or delete all node > attributes matching a regular expression Just a nit, but "pattern" usually is associated with "glob pattern". If it's not a "pattern" but a "regex", "--regex" would be more appropriate. :-) Cheers, Lars ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Fwd: Unable to start cluster (Pacemaker/Corosync)
Hi there, I am currently trying to configure Pacemaker/Corosync. I managed to install the required packages for the cluster configuration, however I could not start the cluster service. Based on the log file, there was an issue with the directory /var/lib/pacemaker/. I have tried some suggestions from checking the GID of the root user and ensuring the permission of the folder to be owned by hacluster:haclient, unfortunately there was no luck. I am currently using RedHat 6.8. Thank you in advance for the help. Attached is the log file for your reference. Regards, Handra -- Try the best, do the best, be the best -- -- Sent from Gmail for iOS Regards, Handra -- Try the best, do the best, be the best -- corosync.log Description: Binary data ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org