Re: [ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?
Sam Gardnerwrote: > I'm having some trouble on a few of my clusters in which the DRBD Slave > resource does not want to come up after a reboot until I manually run > resource cleanup. > > Setting 'start-failure-is-fatal=false' as a global cluster property and a > failure-timeout works to resolve the issue, but I don't really want the start > failure set everywhere. > > While I work on figuring out why the slave resource isn't coming up, is it > possible to set 'start-failure-is-fatal=false' only on the DRBDSlave > resource, or does this need a patch? No, start-failure-is-fatal is a cluster-wide setting. But IIUC you could also set migration-threshold=1 cluster-wide (i.e. in rsc_defaults), and then override it to either 0 or something higher just for this resource. You may find this interesting reading: https://github.com/crowbar/crowbar-ha/pull/102/commits/de94e1e42ba52c2cdb496becbd73f07bc2501871 ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Pacemaker connectivity loss to ISP
Ended up setting up 2 static routes and then using those as the ips designated in the ocf:ping host_list. Works like a charm. Original Message Subject: Pacemaker connectivity loss to ISP Local Time: March 24, 2016 12:12 PM UTC Time: March 24, 2016 5:12 PM From: s...@protonmail.com To: users@clusterlabs.org So I'm trying to figure out the best method to accomplish this. We have a 2 node cluster. We have multiple WANs connected to 2 different ISPs. Generally everything is forced out eth0, eth1 is the backup. ISP1 ISP2 ISP2 ISP1 | | | | | | | | eth0 eth1 eth0 eth1 -- --- | HA1 | | HA2 | -- -- So in this scenario if eth0 loses connectivity to its upstream router/gateway we want it to failover to ha2. I tried this by using the ethmonitor type but it seems to only work if the cable is pulled from the actual interface itself or the swithport is shutdown. We want it to failover if it's unable to ping out to to web through eth0. So if connectivity is lost on the actual modem/gateway it will failover. I looked at using ocf:ping but it doesn't seem to allow me to specify an interface to use. What would be the best method to do this ocf:ping or heartbeat:ethmonitor? Thanks___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?
I'm having some trouble on a few of my clusters in which the DRBD Slave resource does not want to come up after a reboot until I manually run resource cleanup. Setting 'start-failure-is-fatal=false' as a global cluster property and a failure-timeout works to resolve the issue, but I don't really want the start failure set everywhere. While I work on figuring out why the slave resource isn't coming up, is it possible to set 'start-failure-is-fatal=false' only on the DRBDSlave resource, or does this need a patch? I'm running Pacemaker 1.1.12 and Corosync 1.4.8 on a RedHat 6-like system. -- Sam Gardner Trustwave | SMART SECURITY ON DEMAND This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker connectivity loss to ISP
So I'm trying to figure out the best method to accomplish this. We have a 2 node cluster. We have multiple WANs connected to 2 different ISPs. Generally everything is forced out eth0, eth1 is the backup. ISP1 ISP2 ISP2 ISP1 | | | | | | | | eth0 eth1 eth0 eth1 -- --- | HA1 | | HA2 | -- -- So in this scenario if eth0 loses connectivity to its upstream router/gateway we want it to failover to ha2. I tried this by using the ethmonitor type but it seems to only work if the cable is pulled from the actual interface itself or the swithport is shutdown. We want it to failover if it's unable to ping out to to web through eth0. So if connectivity is lost on the actual modem/gateway it will failover. I looked at using ocf:ping but it doesn't seem to allow me to specify an interface to use. What would be the best method to do this ocf:ping or heartbeat:ethmonitor? Thanks___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antwort: Re: Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info
Jan Pokornýschrieb am 24.03.2016 15:38:38: > Von: Jan Pokorný > An: Cluster Labs - All topics related to open-source clustering > welcomed > Datum: 24.03.2016 15:50 > Betreff: Re: [ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: > undefined symbol: crm_procfs_process_info > > On 24/03/16 14:38 +0100, philipp.achmuel...@arz.at wrote: > > Jan Pokorný schrieb am 24.03.2016 12:48:44: > > > >> Von: Jan Pokorný > >> An: Cluster Labs - All topics related to open-source clustering > >> welcomed > >> Datum: 24.03.2016 12:50 > >> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined > >> symbol: crm_procfs_process_info > >> > >> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote: > >>> Jan Pokorný schrieb am 23.03.2016 19:22:13: > >>> > Von: Jan Pokorný > An: users@clusterlabs.org > Datum: 23.03.2016 19:23 > Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: > crm_procfs_process_info > > On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote: > > $ sudo pacemakerd -V > > pacemakerd: symbol lookup error: pacemakerd: undefined symbol: > > crm_procfs_process_info > > For a start, please provide output of: > > ls -l $(rpm -E %{_libdir})/libcrmcommon.so* > ldd $(rpm -E %{_sbindir})/pacemakerd > > Adjust the path per your actual installation, also depending > how you got the pacemaker installed: from RPMs (assumed), > by starting with the sources and compiling by hand, etc. > >>> > >>> i got sources from github and compiled by hand. > >>> > Note that if RPMs were indeed used, you should rather make sure > that the same version of the packages arising from single > SRPM is installed (pacemaker, pacemaker-libs, ...). > >>> > >>> on that hint - i removed all old source directories and startet new > >>> download/compilation today. > >>> after that everything works like expected - may i messed up some old > > files > >>> in working directory. > >> > >> Do you use "make install" as part of your procedure? > >> Where I was headed is that either "ldconfig" invocation might be > >> missing once the libraries are at place, or that /usr/lib* remnants > >> take precedence over /usr/local/lib* files in run-time linking > >> (provided that use use default installation prefix). > > > > Yes, i use "make install" with default parameters to install to my > > environment. still not sure what happened yesterday - may some file > > permission issues during sync files in my environments. > > Additional syncing step might add this sort of fragility. > Anyway, please keep an eye on this should it ever be reproduced. > It's hard to claim native build/install arrangement is flawless > in any case. > I will have a look at that for future installations. Is there any documentation in which order i have to install all relevant cluster components when installing/compiling it with sources from ClusterLabs repository? > -- > Jan (Poki) > [Anhang "att7fk8o.dat" gelöscht von Philipp Achmüller/ARZ/AT] > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info
On 24/03/16 14:38 +0100, philipp.achmuel...@arz.at wrote: > Jan Pokornýschrieb am 24.03.2016 12:48:44: > >> Von: Jan Pokorný >> An: Cluster Labs - All topics related to open-source clustering >> welcomed >> Datum: 24.03.2016 12:50 >> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined >> symbol: crm_procfs_process_info >> >> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote: >>> Jan Pokorný schrieb am 23.03.2016 19:22:13: >>> Von: Jan Pokorný An: users@clusterlabs.org Datum: 23.03.2016 19:23 Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: crm_procfs_process_info On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote: > $ sudo pacemakerd -V > pacemakerd: symbol lookup error: pacemakerd: undefined symbol: > crm_procfs_process_info For a start, please provide output of: ls -l $(rpm -E %{_libdir})/libcrmcommon.so* ldd $(rpm -E %{_sbindir})/pacemakerd Adjust the path per your actual installation, also depending how you got the pacemaker installed: from RPMs (assumed), by starting with the sources and compiling by hand, etc. >>> >>> i got sources from github and compiled by hand. >>> Note that if RPMs were indeed used, you should rather make sure that the same version of the packages arising from single SRPM is installed (pacemaker, pacemaker-libs, ...). >>> >>> on that hint - i removed all old source directories and startet new >>> download/compilation today. >>> after that everything works like expected - may i messed up some old > files >>> in working directory. >> >> Do you use "make install" as part of your procedure? >> Where I was headed is that either "ldconfig" invocation might be >> missing once the libraries are at place, or that /usr/lib* remnants >> take precedence over /usr/local/lib* files in run-time linking >> (provided that use use default installation prefix). > > Yes, i use "make install" with default parameters to install to my > environment. still not sure what happened yesterday - may some file > permission issues during sync files in my environments. Additional syncing step might add this sort of fragility. Anyway, please keep an eye on this should it ever be reproduced. It's hard to claim native build/install arrangement is flawless in any case. -- Jan (Poki) pgphysUMtsqqr.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info
Jan Pokornýschrieb am 24.03.2016 12:48:44: > Von: Jan Pokorný > An: Cluster Labs - All topics related to open-source clustering > welcomed > Datum: 24.03.2016 12:50 > Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined > symbol: crm_procfs_process_info > > On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote: > > Jan Pokorný schrieb am 23.03.2016 19:22:13: > > > >> Von: Jan Pokorný > >> An: users@clusterlabs.org > >> Datum: 23.03.2016 19:23 > >> Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: > >> crm_procfs_process_info > >> > >> On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote: > >>> $ sudo pacemakerd -V > >>> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: > >>> crm_procfs_process_info > >> > >> For a start, please provide output of: > >> > >> ls -l $(rpm -E %{_libdir})/libcrmcommon.so* > >> ldd $(rpm -E %{_sbindir})/pacemakerd > >> > >> Adjust the path per your actual installation, also depending > >> how you got the pacemaker installed: from RPMs (assumed), > >> by starting with the sources and compiling by hand, etc. > > > > i got sources from github and compiled by hand. > > > >> Note that if RPMs were indeed used, you should rather make sure > >> that the same version of the packages arising from single > >> SRPM is installed (pacemaker, pacemaker-libs, ...). > > > > on that hint - i removed all old source directories and startet new > > download/compilation today. > > after that everything works like expected - may i messed up some old files > > in working directory. > > Do you use "make install" as part of your procedure? > Where I was headed is that either "ldconfig" invocation might be > missing once the libraries are at place, or that /usr/lib* remnants > take precedence over /usr/local/lib* files in run-time linking > (provided that use use default installation prefix). Yes, i use "make install" with default parameters to install to my environment. still not sure what happened yesterday - may some file permission issues during sync files in my environments. actually cluster migration is completed and my cluster is running stable $ sudo pcs cluster status Cluster Status: Stack: corosync Current DC: lnx0083a (version 1.1.14-535193a) - partition with quorum Last updated: Thu Mar 24 10:35:10 2016 Last change: Thu Mar 24 10:34:59 2016 by root via cibadmin on lnx0083a 4 nodes and 42 resources configured > > -- > Jan (Poki) > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service
On Wed, Mar 23, 2016 at 6:33 PM, Ferenc Wágnerwrote: > (Please post only to the list, or at least keep it amongst the Cc-s.) > > Momcilo Medic writes: > >> On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner wrote: >>> Momcilo Medic writes: >>> I have three hosts setup in my test environment. They each have two connections to the SAN which has GFS2 on it. Everything works like a charm, except when I reboot a host. Once it tries to stop gfs2-utils service it will just hang. >>> >>> Are you sure the OS reboot sequence does not stop the network or >>> corosync before GFS and DLM? >> >> I specifically configured services to start in this order: >> Corosync - DLM - GFS2-utils >> and to shutdown in this order: >> GFS2-utils - DLM - Corosync. >> >> I've acomplish this with: >> update-rc.d -f corosync remove >> update-rc.d -f corosync-notifyd remove >> update-rc.d -f dlm remove >> update-rc.d -f gfs2-utils remove >> update-rc.d -f xendomains remove >> update-rc.d corosync start 25 2 3 4 5 . stop 35 0 1 6 . >> update-rc.d corosync-notifyd start 25 2 3 4 5 . stop 35 0 1 6 . >> update-rc.d dlm start 30 2 3 4 5 . stop 30 0 1 6 . >> update-rc.d gfs2-utils start 35 2 3 4 5 . stop 25 0 1 6 . >> update-rc.d xendomains start 40 2 3 4 5 . stop 20 0 1 6 . > > I don't know your OS, the above may or may not work. > >> Also, the moment I was capturing logs, corosync and dlm were not >> running as services, but in foreground debugging mode. >> SSH connection did not break until I powered down the host so network >> is not stopped either. > > At least you've got interactive debugging ability then. So try to find > out why the Corosync membership broke down. The output of > corosync-quorumtool and corosync-cpgtool might help. Also try pinging > the Corosync ring0 addresses between the nodes. Dear Feri, Sorry, for leaving out lists from reply, it was hasty mistake :) Just so I put all the information out there: I am using Ubuntu 14.04 across all hosts. I've attached debugging logs in my first post. I cannot figure out what is the key info there. Today, I'll try to use tools you mentioned to see their output before and during the issue. Kind regards, Momcilo "Momo" Medic. (fedorauser) ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org