Re: [ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?

2016-03-24 Thread Adam Spiers
Sam Gardner  wrote:
> I'm having some trouble on a few of my clusters in which the DRBD Slave 
> resource does not want to come up after a reboot until I manually run 
> resource cleanup.
> 
> Setting 'start-failure-is-fatal=false' as a global cluster property and a 
> failure-timeout works to resolve the issue, but I don't really want the start 
> failure set everywhere.
> 
> While I work on figuring out why the slave resource isn't coming up, is it 
> possible to set 'start-failure-is-fatal=false'  only on the DRBDSlave 
> resource, or does this need a patch?

No, start-failure-is-fatal is a cluster-wide setting.  But IIUC you
could also set migration-threshold=1 cluster-wide (i.e. in
rsc_defaults), and then override it to either 0 or something higher
just for this resource.  You may find this interesting reading:

https://github.com/crowbar/crowbar-ha/pull/102/commits/de94e1e42ba52c2cdb496becbd73f07bc2501871

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker connectivity loss to ISP

2016-03-24 Thread S0ke
Ended up setting up 2 static routes and then using those as the ips designated 
in the ocf:ping host_list. Works like a charm.



 Original Message 
Subject: Pacemaker connectivity loss to ISP
Local Time: March 24, 2016 12:12 PM
UTC Time: March 24, 2016 5:12 PM
From: s...@protonmail.com
To: users@clusterlabs.org

So I'm trying to figure out the best method to accomplish this. We have a 2 
node cluster. We have multiple WANs connected to 2 different ISPs. Generally 
everything is forced out eth0, eth1 is the backup.

ISP1 ISP2 ISP2 ISP1

| | | |
| | | |
eth0 eth1 eth0 eth1
-- ---
| HA1 | | HA2 |
-- --
So in this scenario if eth0 loses connectivity to its upstream router/gateway 
we want it to failover to ha2. I tried this by using the ethmonitor type but it 
seems to only work if the cable is pulled from the actual interface itself or 
the swithport is shutdown. We want it to failover if it's unable to ping out to 
to web through eth0. So if connectivity is lost on the actual modem/gateway it 
will failover. I looked at using ocf:ping but it doesn't seem to allow me to 
specify an interface to use. What would be the best method to do this ocf:ping 
or heartbeat:ethmonitor?

Thanks___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Set "start-failure-is-fatal=false" on only one resource?

2016-03-24 Thread Sam Gardner
I'm having some trouble on a few of my clusters in which the DRBD Slave 
resource does not want to come up after a reboot until I manually run resource 
cleanup.

Setting 'start-failure-is-fatal=false' as a global cluster property and a 
failure-timeout works to resolve the issue, but I don't really want the start 
failure set everywhere.

While I work on figuring out why the slave resource isn't coming up, is it 
possible to set 'start-failure-is-fatal=false'  only on the DRBDSlave resource, 
or does this need a patch?

I'm running Pacemaker 1.1.12 and Corosync 1.4.8 on a RedHat 6-like system.
--
Sam Gardner
Trustwave | SMART SECURITY ON DEMAND




This transmission may contain information that is privileged, confidential, 
and/or exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or use of the information contained herein (including any reliance thereon) is 
strictly prohibited. If you received this transmission in error, please 
immediately contact the sender and destroy the material in its entirety, 
whether in electronic or hard copy format.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker connectivity loss to ISP

2016-03-24 Thread S0ke
So I'm trying to figure out the best method to accomplish this. We have a 2 
node cluster. We have multiple WANs connected to 2 different ISPs. Generally 
everything is forced out eth0, eth1 is the backup.

ISP1 ISP2 ISP2 ISP1
| | | |
| | | |
eth0 eth1 eth0 eth1
-- ---
| HA1 | | HA2 |
-- --
So in this scenario if eth0 loses connectivity to its upstream router/gateway 
we want it to failover to ha2. I tried this by using the ethmonitor type but it 
seems to only work if the cable is pulled from the actual interface itself or 
the swithport is shutdown. We want it to failover if it's unable to ping out to 
to web through eth0. So if connectivity is lost on the actual modem/gateway it 
will failover. I looked at using ocf:ping but it doesn't seem to allow me to 
specify an interface to use. What would be the best method to do this ocf:ping 
or heartbeat:ethmonitor?

Thanks___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antwort: Re: Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-24 Thread philipp . achmueller
Jan Pokorný  schrieb am 24.03.2016 15:38:38:

> Von: Jan Pokorný 
> An: Cluster Labs - All topics related to open-source clustering 
> welcomed 
> Datum: 24.03.2016 15:50
> Betreff: Re: [ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: 
> undefined symbol: crm_procfs_process_info
> 
> On 24/03/16 14:38 +0100, philipp.achmuel...@arz.at wrote:
> > Jan Pokorný  schrieb am 24.03.2016 12:48:44:
> > 
> >> Von: Jan Pokorný 
> >> An: Cluster Labs - All topics related to open-source clustering 
> >> welcomed 
> >> Datum: 24.03.2016 12:50
> >> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined 
> >> symbol: crm_procfs_process_info
> >> 
> >> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote:
> >>> Jan Pokorný  schrieb am 23.03.2016 19:22:13:
> >>> 
>  Von: Jan Pokorný 
>  An: users@clusterlabs.org
>  Datum: 23.03.2016 19:23
>  Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: 
>  crm_procfs_process_info
>  
>  On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
> > $ sudo pacemakerd -V
> > pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
> > crm_procfs_process_info
>  
>  For a start, please provide output of:
>  
>  ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
>  ldd $(rpm -E %{_sbindir})/pacemakerd
>  
>  Adjust the path per your actual installation, also depending
>  how you got the pacemaker installed: from RPMs (assumed),
>  by starting with the sources and compiling by hand, etc.
> >>> 
> >>> i got sources from github and compiled by hand. 
> >>> 
>  Note that if RPMs were indeed used, you should rather make sure
>  that the same version of the packages arising from single
>  SRPM is installed (pacemaker, pacemaker-libs, ...).
> >>> 
> >>> on that hint - i removed all old source directories and startet new 
> >>> download/compilation today.
> >>> after that everything works like expected - may i messed up some old 

> > files 
> >>> in working directory.
> >> 
> >> Do you use "make install" as part of your procedure?
> >> Where I was headed is that either "ldconfig" invocation might be
> >> missing once the libraries are at place, or that /usr/lib* remnants
> >> take precedence over /usr/local/lib* files in run-time linking
> >> (provided that use use default installation prefix).
> > 
> > Yes, i use "make install" with default parameters to install to my 
> > environment. still not sure what happened yesterday - may some file 
> > permission issues during sync files in my environments.
> 
> Additional syncing step might add this sort of fragility.
> Anyway, please keep an eye on this should it ever be reproduced.
> It's hard to claim native build/install arrangement is flawless
> in any case.
>
 
I will have a look at that for future installations.
Is there any documentation in which order i have to install all relevant 
cluster components when installing/compiling it with sources from 
ClusterLabs repository?

> -- 
> Jan (Poki)
> [Anhang "att7fk8o.dat" gelöscht von Philipp Achmüller/ARZ/AT] 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-24 Thread Jan Pokorný
On 24/03/16 14:38 +0100, philipp.achmuel...@arz.at wrote:
> Jan Pokorný  schrieb am 24.03.2016 12:48:44:
> 
>> Von: Jan Pokorný 
>> An: Cluster Labs - All topics related to open-source clustering 
>> welcomed 
>> Datum: 24.03.2016 12:50
>> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined 
>> symbol: crm_procfs_process_info
>> 
>> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote:
>>> Jan Pokorný  schrieb am 23.03.2016 19:22:13:
>>> 
 Von: Jan Pokorný 
 An: users@clusterlabs.org
 Datum: 23.03.2016 19:23
 Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: 
 crm_procfs_process_info
 
 On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
> $ sudo pacemakerd -V
> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
> crm_procfs_process_info
 
 For a start, please provide output of:
 
 ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
 ldd $(rpm -E %{_sbindir})/pacemakerd
 
 Adjust the path per your actual installation, also depending
 how you got the pacemaker installed: from RPMs (assumed),
 by starting with the sources and compiling by hand, etc.
>>> 
>>> i got sources from github and compiled by hand. 
>>> 
 Note that if RPMs were indeed used, you should rather make sure
 that the same version of the packages arising from single
 SRPM is installed (pacemaker, pacemaker-libs, ...).
>>> 
>>> on that hint - i removed all old source directories and startet new 
>>> download/compilation today.
>>> after that everything works like expected - may i messed up some old 
> files 
>>> in working directory.
>> 
>> Do you use "make install" as part of your procedure?
>> Where I was headed is that either "ldconfig" invocation might be
>> missing once the libraries are at place, or that /usr/lib* remnants
>> take precedence over /usr/local/lib* files in run-time linking
>> (provided that use use default installation prefix).
> 
> Yes, i use "make install" with default parameters to install to my 
> environment. still not sure what happened yesterday - may some file 
> permission issues during sync files in my environments.

Additional syncing step might add this sort of fragility.
Anyway, please keep an eye on this should it ever be reproduced.
It's hard to claim native build/install arrangement is flawless
in any case.

-- 
Jan (Poki)


pgphysUMtsqqr.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-24 Thread philipp . achmueller
Jan Pokorný  schrieb am 24.03.2016 12:48:44:

> Von: Jan Pokorný 
> An: Cluster Labs - All topics related to open-source clustering 
> welcomed 
> Datum: 24.03.2016 12:50
> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined 
> symbol: crm_procfs_process_info
> 
> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote:
> > Jan Pokorný  schrieb am 23.03.2016 19:22:13:
> > 
> >> Von: Jan Pokorný 
> >> An: users@clusterlabs.org
> >> Datum: 23.03.2016 19:23
> >> Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: 
> >> crm_procfs_process_info
> >> 
> >> On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
> >>> $ sudo pacemakerd -V
> >>> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
> >>> crm_procfs_process_info
> >> 
> >> For a start, please provide output of:
> >> 
> >> ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
> >> ldd $(rpm -E %{_sbindir})/pacemakerd
> >> 
> >> Adjust the path per your actual installation, also depending
> >> how you got the pacemaker installed: from RPMs (assumed),
> >> by starting with the sources and compiling by hand, etc.
> > 
> > i got sources from github and compiled by hand. 
> > 
> >> Note that if RPMs were indeed used, you should rather make sure
> >> that the same version of the packages arising from single
> >> SRPM is installed (pacemaker, pacemaker-libs, ...).
> > 
> > on that hint - i removed all old source directories and startet new 
> > download/compilation today.
> > after that everything works like expected - may i messed up some old 
files 
> > in working directory.
> 
> Do you use "make install" as part of your procedure?
> Where I was headed is that either "ldconfig" invocation might be
> missing once the libraries are at place, or that /usr/lib* remnants
> take precedence over /usr/local/lib* files in run-time linking
> (provided that use use default installation prefix).

Yes, i use "make install" with default parameters to install to my 
environment. still not sure what happened yesterday - may some file 
permission issues during sync files in my environments.
actually cluster migration is completed and my cluster is running stable

$ sudo pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: lnx0083a (version 1.1.14-535193a) - partition with quorum
 Last updated: Thu Mar 24 10:35:10 2016 Last change: Thu Mar 24 
10:34:59 2016 by root via cibadmin on lnx0083a
 4 nodes and 42 resources configured

> 
> -- 
> Jan (Poki)
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterLabsdlm reason for leaving the cluster changes when stopping gfs2-utils service

2016-03-24 Thread Momcilo Medic
On Wed, Mar 23, 2016 at 6:33 PM, Ferenc Wágner  wrote:
> (Please post only to the list, or at least keep it amongst the Cc-s.)
>
> Momcilo Medic  writes:
>
>> On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner  wrote:
>>> Momcilo Medic  writes:
>>>
 I have three hosts setup in my test environment.
 They each have two connections to the SAN which has GFS2 on it.

 Everything works like a charm, except when I reboot a host.
 Once it tries to stop gfs2-utils service it will just hang.
>>>
>>> Are you sure the OS reboot sequence does not stop the network or
>>> corosync before GFS and DLM?
>>
>> I specifically configured services to start in this order:
>> Corosync - DLM - GFS2-utils
>> and to shutdown in this order:
>> GFS2-utils - DLM - Corosync.
>>
>> I've acomplish this with:
>>  update-rc.d -f corosync remove
>>  update-rc.d -f corosync-notifyd remove
>>  update-rc.d -f dlm remove
>>  update-rc.d -f gfs2-utils remove
>>  update-rc.d -f xendomains remove
>>  update-rc.d corosync start 25 2 3 4 5 . stop 35 0 1 6 .
>>  update-rc.d corosync-notifyd start 25 2 3 4 5 . stop 35 0 1 6 .
>>  update-rc.d dlm start 30 2 3 4 5 . stop 30 0 1 6 .
>>  update-rc.d gfs2-utils start 35 2 3 4 5 . stop 25 0 1 6 .
>>  update-rc.d xendomains start 40 2 3 4 5 . stop 20 0 1 6 .
>
> I don't know your OS, the above may or may not work.
>
>> Also, the moment I was capturing logs, corosync and dlm were not
>> running as services, but in foreground debugging mode.
>> SSH connection did not break until I powered down the host so network
>> is not stopped either.
>
> At least you've got interactive debugging ability then.  So try to find
> out why the Corosync membership broke down.  The output of
> corosync-quorumtool and corosync-cpgtool might help.  Also try pinging
> the Corosync ring0 addresses between the nodes.

Dear Feri,

Sorry, for leaving out lists from reply, it was hasty mistake :)
Just so I put all the information out there: I am using Ubuntu 14.04
across all hosts.

I've attached debugging logs in my first post. I cannot figure out
what is the key info there.
Today, I'll try to use tools you mentioned to see their output before
and during the issue.

Kind regards,
Momcilo "Momo" Medic.
(fedorauser)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org