Re: [Pacemaker] Reg. trigger when node failure occurs

2013-12-09 Thread Michael Schwartzkopff
Am Dienstag, 10. Dezember 2013, 12:19:25 schrieb ESWAR RAO:
> Hi All,
> 
> Can someone please let me know if there is a clean to trigger any script by
> pacemaker if HB on a node has stopped/node failed occurred if I ran
> HB+pacemaker on a 3 node setup??
> 
> Thanks
> Eswar
> 
> On Mon, Dec 9, 2013 at 5:16 PM, ESWAR RAO  wrote:
> > Hi All,
> > 
> > I have a 3 node ( node1, node2, node3 ) setup on which HB+pacemaker runs.
> > I have resources running on clone mode on node1 and node2.
> > 
> > Is there anyway to get a trigger when a node failure occurs i.e., can i
> > trigger any script if the node3 fails (on which no resource runs) ???

Yes. run a ocf:pacemaker:ClusterMon resource and read man crm_mon for the 
additional options to call a script.

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Reg. trigger when node failure occurs

2013-12-09 Thread ESWAR RAO
Hi All,

Can someone please let me know if there is a clean to trigger any script by
pacemaker if HB on a node has stopped/node failed occurred if I ran
HB+pacemaker on a 3 node setup??

Thanks
Eswar



On Mon, Dec 9, 2013 at 5:16 PM, ESWAR RAO  wrote:

> Hi All,
>
> I have a 3 node ( node1, node2, node3 ) setup on which HB+pacemaker runs.
> I have resources running on clone mode on node1 and node2.
>
> Is there anyway to get a trigger when a node failure occurs i.e., can i
> trigger any script if the node3 fails (on which no resource runs) ???
>
>
> Thanks
> Eswar
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] is ccs as racy as it feels?

2013-12-09 Thread Brian J. Murrell
So, I'm trying to wrap my head around this need to migrate to pacemaker
+CMAN.  I've been looking at
http://clusterlabs.org/quickstart-redhat.html and
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Cluster_Administration/

It seems "ccs" is the tool to configure the CMAN part of things.

The first URL talks about using ccs to create a local configuration and
then "copy" that around to the rest of the cluster.  Yuck.

The first URL doesn't really cover how one builds up clusters (i.e. over
time) but assumes that you know what your cluster is going to look like
before you build that configuration and says nothing about what to do
when you decide to add new nodes at some later point.  I would guess
more "ccm -f /etc/cluster/cluster.conf" and some more copying around
again.  Does anything need to be prodded to get this new configuration
that was just copied?  I do hope just "prodding" and not a restart of
all services including pacemaker managed resources.

The second URL talks about ricci for propagating the configuration
around.  But it seems to assume that all configuration is done from a
single node and then "sync'd" to the rest of the cluster with ricci in a
"last write wins" sort of work-flow.

So unlike pacemaker itself where any node can modify the configuration
of the CIB (raciness in tools like crm aside), having multiple nodes
using ccs feels quite dangerous in a "last-write-wins" kind of way.  Am
I correct?

This makes it quite difficult to dispatch the task of configuring the
cluster out to the nodes that will be participating in the cluster --
having them configure their own participation.  This distribution of
configuration tasks all works fine for pacemaker-proper (if you avoid
tools like crm) but feels like it's going to blow up when having
multiple nodes trying to add themselves and their own configuration to
the CMAN configuration -- all in parallel.

Am I correct about all of this?  I hope I am not, because if I am this
all feels like a (very) huge step backward from the days where corosync
+pacemaker configuration could be carried out in parallel, on multiple
nodes without having to designate particular (i.e. one per cluster)
nodes as the single configuration point and feeding these designated
nodes the configuration items through a single-threaded work queue all
just to avoid the races that didn't exist using just corosync+pacemaker.

Cheers,
b.




signature.asc
Description: This is a digitally signed message part
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs ping connectivity rule

2013-12-09 Thread Martin Ševčík

relevant parts of the config:

primitive pingd ocf:pacemaker:ping \
params host_list="10.242.50.251 10.242.50.252" multiplier="1"

location l_best_connectivity g_ris \
rule $id="l_best_connectivity-rule" pingd: defined pingd

best regards,
m.

On 12/09/2013 09:43 AM, Bauer, Stefan (IZLBW Extern) wrote:

May i ask, how your configuration snippet look like?

Thank you

Stefan

-Ursprüngliche Nachricht-
Von: Martin Ševčík [mailto:sev...@esys.cz]
Gesendet: Freitag, 6. Dezember 2013 12:26
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] pcs ping connectivity rule

I installed crmsh and configured it via crm commands.

best regards,
m.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
   



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-09 Thread Brian J. Murrell
On Mon, 2013-12-09 at 09:28 +0100, Jan Friesse wrote:
> 
> Error 6 error means "try again". This is happening ether if corosync is
> overloaded or creating new membership. Please take a look to
> /var/log/cluster/corosync.log if you see something strange there (+ make
> sure you have newest corosync).

Would that same information be available in /var/log/messages if I have
configured corosync such as:

logging {
fileline: off
to_stderr: no
to_logfile: no
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}

If so, then the log snippet I posted in the prior message includes all
that corosync had to report.  Should I increase the amount of logging?
Any suggestions on an appropriate amount/flags, etc.?

> (+ make
> sure you have newest corosync).

corosync-1.4.1-15.el6_4.1.x86_64 as shipped by RH in EL6.

Is this new enough?  I know 2.x is also available but I don't think RH
is shipping that yet.  Hopefully their 1.4.1 is still supported.

Cheers,
b.



signature.asc
Description: This is a digitally signed message part
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
Thank you for your time! It works now :)

Stefan

-Ursprüngliche Nachricht-
Von: Michael Schwartzkopff [mailto:m...@sys4.de] 
Gesendet: Montag, 9. Dezember 2013 14:26
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Ressources not moving to node with better connectivity 
- pingd

Am Montag, 9. Dezember 2013, 13:06:04 schrieb Bauer, Stefan:
> Why are some resources listed more than once in the output?
> What is the difference between group_color and native_color?
> If a resource has a value of -INFINITY is it because the cluster 
> already decided that this resource should not run on this host or it 
> can not run on this host due to other reasons?
> 
> I'm not quite sure, how a resource stickiness interferes with the 
> internal decicions taken to migrate.

Everything is points. stickiness is points. constraints result in points.

With every event the cluster calculates the matrix from all nodes and all 
resources. A resource will run on that node it can collect most points.

Beware of impicit constraints that give points. I.e. a colocation

col col_A_with_B inf: A B

will result in -inf points for the resource B on all nodes where A is not 
running.

Look at the output of crm_simulate -s -L, write down the matrix and understand 
it.

--
Mit freundlichen Grüßen,

Michael Schwartzkopff

--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 
81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 16:27:25 schrieb Dvorak Andreas:
> Hi,
> 
> thank you for the quick answers.
> I thought I would need to edit the corosync.conf file. So I do not need to?
> Where should I configure the heartbeat interconnect interfaces?
> 
> With corosync-cfgtool -s it shows the wrong ip? But where does that come
> from?
> 
> Andreas

in a cman setup corosync gets the interface configureation from cman, i.e. 
from /etc/cluster/cluster.conf

If you want to add more interfaceses see:

https://fedorahosted.org/cluster/wiki/MultiHome

Pleaes note that you have to get the hostnames right, that apply to the IP 
addresses if the various interfaces. Please note that the hostname of the main 
interface alos has to fit to the uname -n output.

Greetings,

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread emmanuel segura
the ip is ok, but if you are using cman, you need to edit
/etc/cluster/cluster.conf, that's what i think


2013/12/9 Dvorak Andreas 

> Hi,
>
>
>
> thank you for the quick answers.
>
> I thought I would need to edit the corosync.conf file. So I do not need to?
>
> Where should I configure the heartbeat interconnect interfaces?
>
>
>
> With corosync-cfgtool –s it shows the wrong ip? But where does that come
> from?
>
>
>
> Andreas
>
>
>
> *Von:* emmanuel segura [mailto:emi2f...@gmail.com]
> *Gesendet:* Montag, 9. Dezember 2013 16:05
>
> *An:* The Pacemaker cluster resource manager
> *Betreff:* Re: [Pacemaker] cluster heartbeat is not used
>
>
>
> because in your corosync-cfgtool -s you are using bonding address
>
>
>
> 2013/12/9 Dvorak Andreas 
>
> Hi
>
>
>
> Here it is
>
> cat /proc/net/bonding/bond0
>
> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>
>
>
> Bonding Mode: fault-tolerance (active-backup)
>
> Primary Slave: None
>
> Currently Active Slave: em3
>
> MII Status: up
>
> MII Polling Interval (ms): 100
>
> Up Delay (ms): 0
>
> Down Delay (ms): 0
>
>
>
> Slave Interface: em3
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:1f:66:d7:3b:fe
>
> Slave queue ID: 0
>
>
>
> Slave Interface: p3p3
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: 00:0a:f7:3e:ca:8e
>
> Slave queue ID: 0
>
>
>
> Andreas
>
>
>
> *Von:* emmanuel segura [mailto:emi2f...@gmail.com]
> *Gesendet:* Montag, 9. Dezember 2013 15:50
> *An:* The Pacemaker cluster resource manager
> *Betreff:* Re: [Pacemaker] cluster heartbeat is not used
>
>
>
> show  cat /proc/net/bonding/bond0
>
>
>
> 2013/12/9 Dvorak Andreas 
>
> Dear all,
>
>
>
> during failover tests I found out that I can put down the heartbeat
> interfaces  and the cluster ignores that. But if I put down bond0 the
> fencing is running.
>
> Can please somebody help me?
>
>
>
> bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
>
>   inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
>
>   UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>
>
>
> em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
>
>   inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
>
>   inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> I have the following corosync.conf
>
> # Please read the corosync.conf.5 manual page
>
> compatibility: whitetank
>
> totem {
>
> version: 2
>
> secauth: off
>
> threads: 0
>
> interface {
>
>ringnumber: 0
>
>bindnetaddr: 192.168.2.0
>
>mcastaddr: 226.94.1.1
>
>mcastport: 5405
>
>ttl: 1
>
> }
>
> interface {
>
> ringnumber: 1
>
> bindnetaddr: 192.168.3.0
>
> mcastaddr: 226.94.1.2
>
> mcastport: 5407
>
> ttl: 1
>
> }
>
> }
>
> logging {
>
> fileline: off
>
> to_stderr: no
>
> to_logfile: yes
>
> to_syslog: yes
>
> logfile: /var/log/cluster/corosync.log
>
> debug: off
>
> timestamp: on
>
> logger_subsys {
>
>subsys: AMF
>
>debug: off
>
> }
>
> }
>
> amf {
>
> mode: disabled
>
> }
>
>
>
> I have installed
>
> cluster-glue-libs.1.0.5.x86_64.CentOS
>
> corosynclib.1.4.1.x86_64.CentOS
>
> openais.1.1.1.x86_64.CentOS
>
> cman.3.0.12.1.x86_64.CentOS
>
> pacemaker-libs.1.1.8.x86_64.CentOS
>
> resource-agents.3.9.2.x86_64.CentOS
>
> corosync.1.4.1.x86_64.CentOS
>
> modcluster.0.16.2.x86_64.CentOS
>
> openaislib.1.1.1.x86_64.CentOS
>
> pacemaker-cluster-libs.1.1.8.x86_64.CentOS
>
> pacemaker.1.1.8.x86_64.CentOS
>
> pcs.0.9.26.noarch.CentOS
>
> libqb.0.14.2.x86_64.CentOS
>
> clusterlib.3.0.12.1.x86_64.CentOS
>
> ricci.0.16.2.x86_64.CentOS
>
> pacemaker-cli.1.1.8.x86_64.CentOS
>
> fence-virt.0.2.3.x86_64.CentOS
>
> ccs.0.16.2.x86_64.CentOS
>
>
>
> # corosync-cfgtool -s
>
> Printing ring status.
>
> Local node ID 1
>
> RING ID 0
>
> id= 10.15.28.36
>
> status   = ring 0 active with no faults
>
> # pcs status corosync
>
> Nodeid Name
>
>1   sv2836.muc.baag
>
>2  

Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread Dvorak Andreas
Hi,

thank you for the quick answers.
I thought I would need to edit the corosync.conf file. So I do not need to?
Where should I configure the heartbeat interconnect interfaces?

With corosync-cfgtool -s it shows the wrong ip? But where does that come from?

Andreas

Von: emmanuel segura [mailto:emi2f...@gmail.com]
Gesendet: Montag, 9. Dezember 2013 16:05
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] cluster heartbeat is not used

because in your corosync-cfgtool -s you are using bonding address


2013/12/9 Dvorak Andreas 
mailto:andreas.dvo...@baaderbank.de>>
Hi

Here it is
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:d7:3b:fe
Slave queue ID: 0

Slave Interface: p3p3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0a:f7:3e:ca:8e
Slave queue ID: 0

Andreas

Von: emmanuel segura [mailto:emi2f...@gmail.com]
Gesendet: Montag, 9. Dezember 2013 15:50
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] cluster heartbeat is not used

show  cat /proc/net/bonding/bond0

2013/12/9 Dvorak Andreas 
mailto:andreas.dvo...@baaderbank.de>>
Dear all,

during failover tests I found out that I can put down the heartbeat interfaces  
and the cluster ignores that. But if I put down bond0 the fencing is running.
Can please somebody help me?

bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
  inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
  inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
  inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
  inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

I have the following corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
   ringnumber: 0
   bindnetaddr: 192.168.2.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
   ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 192.168.3.0
mcastaddr: 226.94.1.2
mcastport: 5407
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
   subsys: AMF
   debug: off
}
}
amf {
mode: disabled
}

I have installed
cluster-glue-libs.1.0.5.x86_64.CentOS
corosynclib.1.4.1.x86_64.CentOS
openais.1.1.1.x86_64.CentOS
cman.3.0.12.1.x86_64.CentOS
pacemaker-libs.1.1.8.x86_64.CentOS
resource-agents.3.9.2.x86_64.CentOS
corosync.1.4.1.x86_64.CentOS
modcluster.0.16.2.x86_64.CentOS
openaislib.1.1.1.x86_64.CentOS
pacemaker-cluster-libs.1.1.8.x86_64.CentOS
pacemaker.1.1.8.x86_64.CentOS
pcs.0.9.26.noarch.CentOS
libqb.0.14.2.x86_64.CentOS
clusterlib.3.0.12.1.x86_64.CentOS
ricci.0.16.2.x86_64.CentOS
pacemaker-cli.1.1.8.x86_64.CentOS
fence-virt.0.2.3.x86_64.CentOS
ccs.0.16.2.x86_64.CentOS

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id= 10.15.28.36
status   = ring 0 active with no faults
# pcs status corosync
Nodeid Name
   1   sv2836.muc.baag
   2   sv2837.muc.baag

# crm_mon --one-shot -V
Last updated: Mon Dec  9 14:56:32 2013
Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
Stack: cman
Current DC: sv2836 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
4 Resources configured

Online: [ sv2836 sv2837 ]

ClusterIP(ocf::heartbeat:IPaddr2):Started sv2836
 FIXRoute(ocf::baader:FIXRoute):   Started sv2836
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837

Best re

Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Masopust, Christian
Hi Emmanuel,

thanks for the hint, reading (again) the chapter about resource stickiness, I 
see and understand
the difference :)

br,
christian


Von: emmanuel segura [mailto:emi2f...@gmail.com]
Gesendet: Montag, 09. Dezember 2013 16:12
An: m...@sys4.de; The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] WG: configuration of stonith

I think they sould be

pcs constraint location ipmi-fencing-sv2837 prefers sv2837=-INFINITY
pcs constraint location ipmi-fencing-sv2836 prefers sv2836=-INFINITY



2013/12/9 Michael Schwartzkopff mailto:m...@sys4.de>>
Am Montag, 9. Dezember 2013, 14:58:13 schrieben Sie:
> > > pcs stonith create ipmi-fencing-sv2837 fence_ipmilan
> >
> > pcmk_host_list="sv2837"
> >
> > > ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc
> >
> > delay=15 op monitor
> >
> > > interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan
> > > pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" login=ipmi
> > > passwd=abc delay=15 op monitor interval=60s
> > >
> > > pcs property set stonith-enabled=true
> > >
> > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> > >
> > > pcs status
> > > 
> > >
> > > Full list of resources:
> > >  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836
> > >  FIXRoute (ocf::baader:FIXRoute): Started sv2836
> >
> > >  ipmi-fencing-sv2837  (stonith:fence_ipmilan):
> > Started sv2836
> >
> > >  ipmi-fencing-sv2836  (stonith:fence_ipmilan):
> > Started sv2837
> >
> > This is not optimal. Nothing prevents the resource, that can
> > fence node sv2837
> > to run on host sv2837. You just say, that it should run on
> > node sv2836.
> >
> > Better would be something like
> >
> > crm configure location place-fencing-sv2837 -inf: sv2837
> >
> > or the equivalent in pcs.
> >
> > Greetings,
>
> Hi Michael,
>
> I thought that the lines above will do that:
> > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
>
> Don't they?

I don't  know pcs in depth and I could not find any detailed doc. So I stick
with crmsh.

As far as I can judge your lines above your tell the cluster, that the
resource, that can fence node sv2836, gets INF points if it runs on node
sv2837. But what happens if node sv2837 is down? Nothing prevents the resource
starting on the node that it should fence. So if one node is down both fencing
resources will run on the remaining node. No very nice.

I suggest to assign -INF points to the resource that can fence node sv2836 if
it runs on node sv2836. So it will run on sv2837. If that node is not
available the resource cannot run. On the node (remaining) node sv2836 only
the resource that can fence sv2837.

For details see: http://clusterlabs.org/doc/crm_fencing.html

Mit freundlichen Grüßen,

Michael Schwartzkopff

--
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, 
+49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Masopust, Christian
> > > 
> > > This is not optimal. Nothing prevents the resource, that can
> > > fence node sv2837
> > > to run on host sv2837. You just say, that it should run on
> > > node sv2836.
> > > 
> > > Better would be something like
> > > 
> > > crm configure location place-fencing-sv2837 -inf: sv2837
> > > 
> > > or the equivalent in pcs.
> > > 
> > > Greetings,
> > 
> > Hi Michael,
> > 
> > I thought that the lines above will do that:
> > > > pcs constraint location ipmi-fencing-sv2837 prefers 
> sv2836=INFINITY
> > > > pcs constraint location ipmi-fencing-sv2836 prefers 
> sv2837=INFINITY
> > 
> > Don't they?
> 
> I don't  know pcs in depth and I could not find any detailed 
> doc. So I stick 
> with crmsh.
> 
> As far as I can judge your lines above your tell the cluster, 
> that the 
> resource, that can fence node sv2836, gets INF points if it 
> runs on node 
> sv2837. But what happens if node sv2837 is down? Nothing 
> prevents the resource 
> starting on the node that it should fence. So if one node is 
> down both fencing 
> resources will run on the remaining node. No very nice.
> 
> I suggest to assign -INF points to the resource that can 
> fence node sv2836 if 
> it runs on node sv2836. So it will run on sv2837. If that node is not 
> available the resource cannot run. On the node (remaining) 
> node sv2836 only 
> the resource that can fence sv2837.
> 
> For details see: http://clusterlabs.org/doc/crm_fencing.html
> 
> Mit freundlichen Grüßen,
> 
> Michael Schwartzkopff

Aaaah I see... thanks a lot for the explanation!

br,
christian
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread emmanuel segura
I think they sould be

pcs constraint location ipmi-fencing-sv2837 prefers sv2837=-INFINITY
pcs constraint location ipmi-fencing-sv2836 prefers sv2836=-INFINITY



2013/12/9 Michael Schwartzkopff 

> Am Montag, 9. Dezember 2013, 14:58:13 schrieben Sie:
> > > > pcs stonith create ipmi-fencing-sv2837 fence_ipmilan
> > >
> > > pcmk_host_list="sv2837"
> > >
> > > > ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc
> > >
> > > delay=15 op monitor
> > >
> > > > interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan
> > > > pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" login=ipmi
> > > > passwd=abc delay=15 op monitor interval=60s
> > > >
> > > > pcs property set stonith-enabled=true
> > > >
> > > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> > > >
> > > > pcs status
> > > > 
> > > >
> > > > Full list of resources:
> > > >  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836
> > > >  FIXRoute (ocf::baader:FIXRoute): Started sv2836
> > >
> > > >  ipmi-fencing-sv2837  (stonith:fence_ipmilan):
> > > Started sv2836
> > >
> > > >  ipmi-fencing-sv2836  (stonith:fence_ipmilan):
> > > Started sv2837
> > >
> > > This is not optimal. Nothing prevents the resource, that can
> > > fence node sv2837
> > > to run on host sv2837. You just say, that it should run on
> > > node sv2836.
> > >
> > > Better would be something like
> > >
> > > crm configure location place-fencing-sv2837 -inf: sv2837
> > >
> > > or the equivalent in pcs.
> > >
> > > Greetings,
> >
> > Hi Michael,
> >
> > I thought that the lines above will do that:
> > > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> >
> > Don't they?
>
> I don't  know pcs in depth and I could not find any detailed doc. So I
> stick
> with crmsh.
>
> As far as I can judge your lines above your tell the cluster, that the
> resource, that can fence node sv2836, gets INF points if it runs on node
> sv2837. But what happens if node sv2837 is down? Nothing prevents the
> resource
> starting on the node that it should fence. So if one node is down both
> fencing
> resources will run on the remaining node. No very nice.
>
> I suggest to assign -INF points to the resource that can fence node sv2836
> if
> it runs on node sv2836. So it will run on sv2837. If that node is not
> available the resource cannot run. On the node (remaining) node sv2836 only
> the resource that can fence sv2837.
>
> For details see: http://clusterlabs.org/doc/crm_fencing.html
>
> Mit freundlichen Grüßen,
>
> Michael Schwartzkopff
>
> --
> [*] sys4 AG
>
> http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
> Franziskanerstraße 15, 81669 München
>
> Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
> Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
> Aufsichtsratsvorsitzender: Florian Kirstein
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 15:53:59 schrieb Dvorak Andreas:
> Hi
> 
> Here it is
> cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
> 
> Bonding Mode: fault-tolerance (active-backup)
> Primary Slave: None
> Currently Active Slave: em3
> MII Status: up
> MII Polling Interval (ms): 100
> Up Delay (ms): 0
> Down Delay (ms): 0
> 
> Slave Interface: em3
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: c8:1f:66:d7:3b:fe
> Slave queue ID: 0
> 
> Slave Interface: p3p3
> MII Status: up
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: 00:0a:f7:3e:ca:8e
> Slave queue ID: 0
> 
> Andreas

You see that the MII Status is up.

So it seems that the bond does not catch the interface status from your device 
drivers corectly. Please optimize the bond configuration so that the bond 
interface really catches the status of the interfaces. If everything else 
fails you can fall back to miimon process sending out arps on the interfaces.

See the https://www.kernel.org/doc/Documentation/networking/bonding.txt
for details.

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 14:58:13 schrieben Sie:
> > > pcs stonith create ipmi-fencing-sv2837 fence_ipmilan
> > 
> > pcmk_host_list="sv2837"
> > 
> > > ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc
> > 
> > delay=15 op monitor
> > 
> > > interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan
> > > pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" login=ipmi
> > > passwd=abc delay=15 op monitor interval=60s
> > > 
> > > pcs property set stonith-enabled=true
> > > 
> > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> > > 
> > > pcs status
> > > 
> > > 
> > > Full list of resources:
> > >  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836
> > >  FIXRoute (ocf::baader:FIXRoute): Started sv2836
> > 
> > >  ipmi-fencing-sv2837  (stonith:fence_ipmilan):
> > Started sv2836
> > 
> > >  ipmi-fencing-sv2836  (stonith:fence_ipmilan):
> > Started sv2837
> > 
> > This is not optimal. Nothing prevents the resource, that can
> > fence node sv2837
> > to run on host sv2837. You just say, that it should run on
> > node sv2836.
> > 
> > Better would be something like
> > 
> > crm configure location place-fencing-sv2837 -inf: sv2837
> > 
> > or the equivalent in pcs.
> > 
> > Greetings,
> 
> Hi Michael,
> 
> I thought that the lines above will do that:
> > > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> 
> Don't they?

I don't  know pcs in depth and I could not find any detailed doc. So I stick 
with crmsh.

As far as I can judge your lines above your tell the cluster, that the 
resource, that can fence node sv2836, gets INF points if it runs on node 
sv2837. But what happens if node sv2837 is down? Nothing prevents the resource 
starting on the node that it should fence. So if one node is down both fencing 
resources will run on the remaining node. No very nice.

I suggest to assign -INF points to the resource that can fence node sv2836 if 
it runs on node sv2836. So it will run on sv2837. If that node is not 
available the resource cannot run. On the node (remaining) node sv2836 only 
the resource that can fence sv2837.

For details see: http://clusterlabs.org/doc/crm_fencing.html

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread emmanuel segura
because in your corosync-cfgtool -s you are using bonding address




2013/12/9 Dvorak Andreas 

> Hi
>
>
>
> Here it is
>
> cat /proc/net/bonding/bond0
>
> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>
>
>
> Bonding Mode: fault-tolerance (active-backup)
>
> Primary Slave: None
>
> Currently Active Slave: em3
>
> MII Status: up
>
> MII Polling Interval (ms): 100
>
> Up Delay (ms): 0
>
> Down Delay (ms): 0
>
>
>
> Slave Interface: em3
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: c8:1f:66:d7:3b:fe
>
> Slave queue ID: 0
>
>
>
> Slave Interface: p3p3
>
> MII Status: up
>
> Speed: 1000 Mbps
>
> Duplex: full
>
> Link Failure Count: 0
>
> Permanent HW addr: 00:0a:f7:3e:ca:8e
>
> Slave queue ID: 0
>
>
>
> Andreas
>
>
>
> *Von:* emmanuel segura [mailto:emi2f...@gmail.com]
> *Gesendet:* Montag, 9. Dezember 2013 15:50
> *An:* The Pacemaker cluster resource manager
> *Betreff:* Re: [Pacemaker] cluster heartbeat is not used
>
>
>
> show  cat /proc/net/bonding/bond0
>
>
>
> 2013/12/9 Dvorak Andreas 
>
> Dear all,
>
>
>
> during failover tests I found out that I can put down the heartbeat
> interfaces  and the cluster ignores that. But if I put down bond0 the
> fencing is running.
>
> Can please somebody help me?
>
>
>
> bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
>
>   inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
>
>   UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>
>
>
> em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
>
>   inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
>
>   inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> I have the following corosync.conf
>
> # Please read the corosync.conf.5 manual page
>
> compatibility: whitetank
>
> totem {
>
> version: 2
>
> secauth: off
>
> threads: 0
>
> interface {
>
>ringnumber: 0
>
>bindnetaddr: 192.168.2.0
>
>mcastaddr: 226.94.1.1
>
>mcastport: 5405
>
>ttl: 1
>
> }
>
> interface {
>
> ringnumber: 1
>
> bindnetaddr: 192.168.3.0
>
> mcastaddr: 226.94.1.2
>
> mcastport: 5407
>
> ttl: 1
>
> }
>
> }
>
> logging {
>
> fileline: off
>
> to_stderr: no
>
> to_logfile: yes
>
> to_syslog: yes
>
> logfile: /var/log/cluster/corosync.log
>
> debug: off
>
> timestamp: on
>
> logger_subsys {
>
>subsys: AMF
>
>debug: off
>
> }
>
> }
>
> amf {
>
> mode: disabled
>
> }
>
>
>
> I have installed
>
> cluster-glue-libs.1.0.5.x86_64.CentOS
>
> corosynclib.1.4.1.x86_64.CentOS
>
> openais.1.1.1.x86_64.CentOS
>
> cman.3.0.12.1.x86_64.CentOS
>
> pacemaker-libs.1.1.8.x86_64.CentOS
>
> resource-agents.3.9.2.x86_64.CentOS
>
> corosync.1.4.1.x86_64.CentOS
>
> modcluster.0.16.2.x86_64.CentOS
>
> openaislib.1.1.1.x86_64.CentOS
>
> pacemaker-cluster-libs.1.1.8.x86_64.CentOS
>
> pacemaker.1.1.8.x86_64.CentOS
>
> pcs.0.9.26.noarch.CentOS
>
> libqb.0.14.2.x86_64.CentOS
>
> clusterlib.3.0.12.1.x86_64.CentOS
>
> ricci.0.16.2.x86_64.CentOS
>
> pacemaker-cli.1.1.8.x86_64.CentOS
>
> fence-virt.0.2.3.x86_64.CentOS
>
> ccs.0.16.2.x86_64.CentOS
>
>
>
> # corosync-cfgtool -s
>
> Printing ring status.
>
> Local node ID 1
>
> RING ID 0
>
> id= 10.15.28.36
>
> status   = ring 0 active with no faults
>
> # pcs status corosync
>
> Nodeid Name
>
>1   sv2836.muc.baag
>
>2   sv2837.muc.baag
>
>
>
> # crm_mon --one-shot -V
>
> Last updated: Mon Dec  9 14:56:32 2013
>
> Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
>
> Stack: cman
>
> Current DC: sv2836 - partition with quorum
>
> Version: 1.1.10-1.el6_4.4-368c726
>
> 2 Nodes configured
>
> 4 Resources configured
>
>
>
> Online: [ sv2836 sv2837 ]
>
>
>
> ClusterIP(ocf::heartbeat:IPaddr2):Started sv2836
>
>  FIXRoute(ocf::baader:FIXRoute):   Started sv2836
>
>  ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836
>
>  ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837

Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread emmanuel segura
why you are editing corosync.conf, but as cluster stack, you are using cman?


2013/12/9 emmanuel segura 

> because in your corosync-cfgtool -s you are using bonding address
>
>
>
>
> 2013/12/9 Dvorak Andreas 
>
>> Hi
>>
>>
>>
>> Here it is
>>
>> cat /proc/net/bonding/bond0
>>
>> Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
>>
>>
>>
>> Bonding Mode: fault-tolerance (active-backup)
>>
>> Primary Slave: None
>>
>> Currently Active Slave: em3
>>
>> MII Status: up
>>
>> MII Polling Interval (ms): 100
>>
>> Up Delay (ms): 0
>>
>> Down Delay (ms): 0
>>
>>
>>
>> Slave Interface: em3
>>
>> MII Status: up
>>
>> Speed: 1000 Mbps
>>
>> Duplex: full
>>
>> Link Failure Count: 0
>>
>> Permanent HW addr: c8:1f:66:d7:3b:fe
>>
>> Slave queue ID: 0
>>
>>
>>
>> Slave Interface: p3p3
>>
>> MII Status: up
>>
>> Speed: 1000 Mbps
>>
>> Duplex: full
>>
>> Link Failure Count: 0
>>
>> Permanent HW addr: 00:0a:f7:3e:ca:8e
>>
>> Slave queue ID: 0
>>
>>
>>
>> Andreas
>>
>>
>>
>> *Von:* emmanuel segura [mailto:emi2f...@gmail.com]
>> *Gesendet:* Montag, 9. Dezember 2013 15:50
>> *An:* The Pacemaker cluster resource manager
>> *Betreff:* Re: [Pacemaker] cluster heartbeat is not used
>>
>>
>>
>> show  cat /proc/net/bonding/bond0
>>
>>
>>
>> 2013/12/9 Dvorak Andreas 
>>
>> Dear all,
>>
>>
>>
>> during failover tests I found out that I can put down the heartbeat
>> interfaces  and the cluster ignores that. But if I put down bond0 the
>> fencing is running.
>>
>> Can please somebody help me?
>>
>>
>>
>> bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
>>
>>   inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
>>
>>   inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
>>
>>   UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>>
>>
>>
>> em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
>>
>>   inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
>>
>>   inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
>>
>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>
>>
>>
>> p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
>>
>>   inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
>>
>>   inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
>>
>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>
>>
>>
>> I have the following corosync.conf
>>
>> # Please read the corosync.conf.5 manual page
>>
>> compatibility: whitetank
>>
>> totem {
>>
>> version: 2
>>
>> secauth: off
>>
>> threads: 0
>>
>> interface {
>>
>>ringnumber: 0
>>
>>bindnetaddr: 192.168.2.0
>>
>>mcastaddr: 226.94.1.1
>>
>>mcastport: 5405
>>
>>ttl: 1
>>
>> }
>>
>> interface {
>>
>> ringnumber: 1
>>
>> bindnetaddr: 192.168.3.0
>>
>> mcastaddr: 226.94.1.2
>>
>> mcastport: 5407
>>
>> ttl: 1
>>
>> }
>>
>> }
>>
>> logging {
>>
>> fileline: off
>>
>> to_stderr: no
>>
>> to_logfile: yes
>>
>> to_syslog: yes
>>
>> logfile: /var/log/cluster/corosync.log
>>
>> debug: off
>>
>> timestamp: on
>>
>> logger_subsys {
>>
>>subsys: AMF
>>
>>debug: off
>>
>> }
>>
>> }
>>
>> amf {
>>
>> mode: disabled
>>
>> }
>>
>>
>>
>> I have installed
>>
>> cluster-glue-libs.1.0.5.x86_64.CentOS
>>
>> corosynclib.1.4.1.x86_64.CentOS
>>
>> openais.1.1.1.x86_64.CentOS
>>
>> cman.3.0.12.1.x86_64.CentOS
>>
>> pacemaker-libs.1.1.8.x86_64.CentOS
>>
>> resource-agents.3.9.2.x86_64.CentOS
>>
>> corosync.1.4.1.x86_64.CentOS
>>
>> modcluster.0.16.2.x86_64.CentOS
>>
>> openaislib.1.1.1.x86_64.CentOS
>>
>> pacemaker-cluster-libs.1.1.8.x86_64.CentOS
>>
>> pacemaker.1.1.8.x86_64.CentOS
>>
>> pcs.0.9.26.noarch.CentOS
>>
>> libqb.0.14.2.x86_64.CentOS
>>
>> clusterlib.3.0.12.1.x86_64.CentOS
>>
>> ricci.0.16.2.x86_64.CentOS
>>
>> pacemaker-cli.1.1.8.x86_64.CentOS
>>
>> fence-virt.0.2.3.x86_64.CentOS
>>
>> ccs.0.16.2.x86_64.CentOS
>>
>>
>>
>> # corosync-cfgtool -s
>>
>> Printing ring status.
>>
>> Local node ID 1
>>
>> RING ID 0
>>
>> id= 10.15.28.36
>>
>> status   = ring 0 active with no faults
>>
>> # pcs status corosync
>>
>> Nodeid Name
>>
>>1   sv2836.muc.baag
>>
>>2   sv2837.muc.baag
>>
>>
>>
>> # crm_mon --one-shot -V
>>
>> Last updated: Mon Dec  9 14:56:32 2013
>>
>> Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
>>
>> Stack: cman
>>
>> Current DC: sv2836 - partition with quorum
>>
>> Version: 1.1.10-1.el6_4.4-368c726
>>
>> 2 Nodes c

Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Masopust, Christian
> > pcs stonith create ipmi-fencing-sv2837 fence_ipmilan 
> pcmk_host_list="sv2837"
> > ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc 
> delay=15 op monitor
> > interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan
> > pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" login=ipmi
> > passwd=abc delay=15 op monitor interval=60s
> > 
> > pcs property set stonith-enabled=true
> > 
> > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> > 
> > pcs status
> > 
> > Full list of resources:
> >  ClusterIP  (ocf::heartbeat:IPaddr2):   Started sv2836
> >  FIXRoute   (ocf::baader:FIXRoute): Started sv2836
> >  ipmi-fencing-sv2837(stonith:fence_ipmilan):
> Started sv2836
> >  ipmi-fencing-sv2836(stonith:fence_ipmilan):
> Started sv2837
> 
> This is not optimal. Nothing prevents the resource, that can 
> fence node sv2837 
> to run on host sv2837. You just say, that it should run on 
> node sv2836.
> 
> Better would be something like
> 
> crm configure location place-fencing-sv2837 -inf: sv2837
> 
> or the equivalent in pcs.
> 
> Greetings,

Hi Michael,

I thought that the lines above will do that:

> > pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> > pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY

Don't they?

br,
christian

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread Dvorak Andreas
Hi

Here it is
cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: em3
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: em3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: c8:1f:66:d7:3b:fe
Slave queue ID: 0

Slave Interface: p3p3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0a:f7:3e:ca:8e
Slave queue ID: 0

Andreas

Von: emmanuel segura [mailto:emi2f...@gmail.com]
Gesendet: Montag, 9. Dezember 2013 15:50
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] cluster heartbeat is not used

show  cat /proc/net/bonding/bond0

2013/12/9 Dvorak Andreas 
mailto:andreas.dvo...@baaderbank.de>>
Dear all,

during failover tests I found out that I can put down the heartbeat interfaces  
and the cluster ignores that. But if I put down bond0 the fencing is running.
Can please somebody help me?

bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
  inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
  inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
  inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
  inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

I have the following corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
   ringnumber: 0
   bindnetaddr: 192.168.2.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
   ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 192.168.3.0
mcastaddr: 226.94.1.2
mcastport: 5407
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
   subsys: AMF
   debug: off
}
}
amf {
mode: disabled
}

I have installed
cluster-glue-libs.1.0.5.x86_64.CentOS
corosynclib.1.4.1.x86_64.CentOS
openais.1.1.1.x86_64.CentOS
cman.3.0.12.1.x86_64.CentOS
pacemaker-libs.1.1.8.x86_64.CentOS
resource-agents.3.9.2.x86_64.CentOS
corosync.1.4.1.x86_64.CentOS
modcluster.0.16.2.x86_64.CentOS
openaislib.1.1.1.x86_64.CentOS
pacemaker-cluster-libs.1.1.8.x86_64.CentOS
pacemaker.1.1.8.x86_64.CentOS
pcs.0.9.26.noarch.CentOS
libqb.0.14.2.x86_64.CentOS
clusterlib.3.0.12.1.x86_64.CentOS
ricci.0.16.2.x86_64.CentOS
pacemaker-cli.1.1.8.x86_64.CentOS
fence-virt.0.2.3.x86_64.CentOS
ccs.0.16.2.x86_64.CentOS

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id= 10.15.28.36
status   = ring 0 active with no faults
# pcs status corosync
Nodeid Name
   1   sv2836.muc.baag
   2   sv2837.muc.baag

# crm_mon --one-shot -V
Last updated: Mon Dec  9 14:56:32 2013
Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
Stack: cman
Current DC: sv2836 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
4 Resources configured

Online: [ sv2836 sv2837 ]

ClusterIP(ocf::heartbeat:IPaddr2):Started sv2836
 FIXRoute(ocf::baader:FIXRoute):   Started sv2836
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837

Best regards
Andreas Dvorak

___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs

Re: [Pacemaker] cluster heartbeat is not used

2013-12-09 Thread emmanuel segura
show  cat /proc/net/bonding/bond0


2013/12/9 Dvorak Andreas 

> Dear all,
>
>
>
> during failover tests I found out that I can put down the heartbeat
> interfaces  and the cluster ignores that. But if I put down bond0 the
> fencing is running.
>
> Can please somebody help me?
>
>
>
> bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
>
>   inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
>
>   UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
>
>
>
> em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
>
>   inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
>
>   inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
>
>   inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
>
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>
>
>
> I have the following corosync.conf
>
> # Please read the corosync.conf.5 manual page
>
> compatibility: whitetank
>
> totem {
>
> version: 2
>
> secauth: off
>
> threads: 0
>
> interface {
>
>ringnumber: 0
>
>bindnetaddr: 192.168.2.0
>
>mcastaddr: 226.94.1.1
>
>mcastport: 5405
>
>ttl: 1
>
> }
>
> interface {
>
> ringnumber: 1
>
> bindnetaddr: 192.168.3.0
>
> mcastaddr: 226.94.1.2
>
> mcastport: 5407
>
> ttl: 1
>
> }
>
> }
>
> logging {
>
> fileline: off
>
> to_stderr: no
>
> to_logfile: yes
>
> to_syslog: yes
>
> logfile: /var/log/cluster/corosync.log
>
> debug: off
>
> timestamp: on
>
> logger_subsys {
>
>subsys: AMF
>
>debug: off
>
> }
>
> }
>
> amf {
>
> mode: disabled
>
> }
>
>
>
> I have installed
>
> cluster-glue-libs.1.0.5.x86_64.CentOS
>
> corosynclib.1.4.1.x86_64.CentOS
>
> openais.1.1.1.x86_64.CentOS
>
> cman.3.0.12.1.x86_64.CentOS
>
> pacemaker-libs.1.1.8.x86_64.CentOS
>
> resource-agents.3.9.2.x86_64.CentOS
>
> corosync.1.4.1.x86_64.CentOS
>
> modcluster.0.16.2.x86_64.CentOS
>
> openaislib.1.1.1.x86_64.CentOS
>
> pacemaker-cluster-libs.1.1.8.x86_64.CentOS
>
> pacemaker.1.1.8.x86_64.CentOS
>
> pcs.0.9.26.noarch.CentOS
>
> libqb.0.14.2.x86_64.CentOS
>
> clusterlib.3.0.12.1.x86_64.CentOS
>
> ricci.0.16.2.x86_64.CentOS
>
> pacemaker-cli.1.1.8.x86_64.CentOS
>
> fence-virt.0.2.3.x86_64.CentOS
>
> ccs.0.16.2.x86_64.CentOS
>
>
>
> # corosync-cfgtool -s
>
> Printing ring status.
>
> Local node ID 1
>
> RING ID 0
>
> id= 10.15.28.36
>
> status   = ring 0 active with no faults
>
> # pcs status corosync
>
> Nodeid Name
>
>1   sv2836.muc.baag
>
>2   sv2837.muc.baag
>
>
>
> # crm_mon --one-shot -V
>
> Last updated: Mon Dec  9 14:56:32 2013
>
> Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
>
> Stack: cman
>
> Current DC: sv2836 - partition with quorum
>
> Version: 1.1.10-1.el6_4.4-368c726
>
> 2 Nodes configured
>
> 4 Resources configured
>
>
>
> Online: [ sv2836 sv2837 ]
>
>
>
> ClusterIP(ocf::heartbeat:IPaddr2):Started sv2836
>
>  FIXRoute(ocf::baader:FIXRoute):   Started sv2836
>
>  ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836
>
>  ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837
>
>
>
> Best regards
>
> Andreas Dvorak
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Masopust, Christian

Hi Andreas,

as far as I can say (as a pacemaker novice), everything fine besides
the equal delays for both stoninths... I would suggest to configure
only one with a delay of 15s.

br,
christian 

> -Ursprüngliche Nachricht-
> Von: Dvorak Andreas [mailto:andreas.dvo...@baaderbank.de] 
> Gesendet: Montag, 09. Dezember 2013 14:40
> An: 'pacemaker@oss.clusterlabs.org'
> Betreff: [Pacemaker] WG: configuration of stonith
> 
> Dear all
> 
> My problem with stonith is solved.
> 
> Here is what I did:
> 
> pcs stonith create ipmi-fencing-sv2837 fence_ipmilan 
> pcmk_host_list="sv2837" ipaddr=10.110.28.37 action="off" 
> login=ipmi passwd=abc delay=15 op monitor interval=60s
> pcs stonith create ipmi-fencing-sv2836 fence_ipmilan 
> pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" 
> login=ipmi passwd=abc delay=15 op monitor interval=60s
> 
> pcs property set stonith-enabled=true
> 
> pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> 
> pcs status
> 
> Full list of resources:
>  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836 
>  FIXRoute (ocf::baader:FIXRoute): Started sv2836 
>  ipmi-fencing-sv2837  (stonith:fence_ipmilan):Started sv2836 
>  ipmi-fencing-sv2836  (stonith:fence_ipmilan):Started sv2837
> 
> Best regards,
> Andreas
> 
> -Ursprüngliche Nachricht-
> Von: Dvorak Andreas 
> Gesendet: Montag, 9. Dezember 2013 09:55
> An: 'The Pacemaker cluster resource manager'
> Betreff: Re: [Pacemaker] configuration of stonith
> 
> Dear all,
> 
> thank you for the answers.
> 
> Now I created to stonith resources
> pcs stonith create ipmi-fencing-sv2837 fence_ipmilan 
> pcmk_host_list="sv2837" ipaddr=10.110.28.37 action="reboot" 
> login=abc passwd=abc123 delay=15 op monitor interval=60s pcs 
> stonith create ipmi-fencing-sv2836 fence_ipmilan 
> pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="reboot" 
> login=abc passwd=abc123 delay=15 op monitor interval=60s
> 
> the current status is:
> pcs status
> Cluster name: fix-prod
> Last updated: Mon Dec  9 09:41:48 2013
> Last change: Mon Dec  9 09:40:03 2013 via cibadmin on sv2836
> Stack: cman
> Current DC: sv2837 - partition with quorum
> Version: 1.1.10-1.el6_4.4-368c726
> 2 Nodes configured
> 4 Resources configured
> 
> Online: [ sv2836 sv2837 ]
> 
> Full list of resources:
> 
>  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836 
>  FIXRoute (ocf::baader:FIXRoute): Started sv2836 
>  ipmi-fencing-sv2837  (stonith:fence_ipmilan):Stopped 
>  ipmi-fencing-sv2836  (stonith:fence_ipmilan):Stopped 
> 
> Failed actions:
> ipmi-fencing-sv2837_start_0 on sv2837 'unknown error' 
> (1): call=276, status=Error, last-rc-change='Mon Dec  9 
> 09:39:55 2013', queued=17090ms, exec=0ms
> ipmi-fencing-sv2836_start_0 on sv2837 'unknown error' 
> (1): call=286, status=Error, last-rc-change='Mon Dec  9 
> 09:40:13 2013', queued=17085ms, exec=0ms
> ipmi-fencing-sv2837_start_0 on sv2836 'unknown error' 
> (1): call=369, status=Error, last-rc-change='Mon Dec  9 
> 09:40:13 2013', queued=17085ms, exec=0ms
> ipmi-fencing-sv2836_start_0 on sv2836 'unknown error' 
> (1): call=375, status=Error, last-rc-change='Mon Dec  9 
> 09:40:31 2013', queued=17090ms, exec=0ms
> 
> Do I need to tell the stonith resource where to run and how 
> can I do that?
> In the parameter pcmk_host_list I have the hostname of the other node.
> 
> Best regards,
> Andreas
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ocf resource agents - pre and post scripts

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 15:24:06 schrieb Vladimir:
> Hello everyone,
> 
> Is there a built-in mechanizm in pacemaker to trigger a pre or post
> script or do the ocf resource agents bring something like that?
> 
> I could also create a kind of dummy resource after a primitive
> resource. But I asked my self if there is another/better way.
> 
> Thanks.

No hooks.
But ocf agents are bash. You can insert your own code in the start() and 
stop() functions.

Just create a new directory under /usr/lib/ocf/resource.d and copy the 
original script into that dir. Than use the ocf agent from your new provider 
(=dir name).


Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] ocf resource agents - pre and post scripts

2013-12-09 Thread Vladimir
Hello everyone,

Is there a built-in mechanizm in pacemaker to trigger a pre or post
script or do the ocf resource agents bring something like that? 

I could also create a kind of dummy resource after a primitive
resource. But I asked my self if there is another/better way.

Thanks.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] WG: configuration of stonith

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 14:40:29 schrieb Dvorak Andreas:
> Dear all
> 
> My problem with stonith is solved.
> 
> Here is what I did:
> 
> pcs stonith create ipmi-fencing-sv2837 fence_ipmilan pcmk_host_list="sv2837"
> ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc delay=15 op monitor
> interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan
> pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="off" login=ipmi
> passwd=abc delay=15 op monitor interval=60s
> 
> pcs property set stonith-enabled=true
> 
> pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
> pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY
> 
> pcs status
> 
> Full list of resources:
>  ClusterIP(ocf::heartbeat:IPaddr2):   Started sv2836
>  FIXRoute (ocf::baader:FIXRoute): Started sv2836
>  ipmi-fencing-sv2837  (stonith:fence_ipmilan):Started sv2836
>  ipmi-fencing-sv2836  (stonith:fence_ipmilan):Started sv2837

This is not optimal. Nothing prevents the resource, that can fence node sv2837 
to run on host sv2837. You just say, that it should run on node sv2836.

Better would be something like

crm configure location place-fencing-sv2837 -inf: sv2837

or the equivalent in pcs.

Greetings,

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] cluster heartbeat is not used

2013-12-09 Thread Dvorak Andreas
Dear all,

during failover tests I found out that I can put down the heartbeat interfaces  
and the cluster ignores that. But if I put down bond0 the fencing is running.
Can please somebody help me?

bond0 Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FE
  inet addr:10.15.28.36  Bcast:10.15.255.255  Mask:255.255.0.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bfe/64 Scope:Link
  UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

em4   Link encap:Ethernet  HWaddr C8:1F:66:D7:3B:FF
  inet addr:192.168.2.36  Bcast:192.168.2.255  Mask:255.255.255.0
  inet6 addr: fe80::ca1f:66ff:fed7:3bff/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

p3p4  Link encap:Ethernet  HWaddr 00:0A:F7:3E:CA:8F
  inet addr:192.168.3.36  Bcast:192.168.3.255  Mask:255.255.255.0
  inet6 addr: fe80::20a:f7ff:fe3e:ca8f/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

I have the following corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: off
threads: 0
interface {
   ringnumber: 0
   bindnetaddr: 192.168.2.0
   mcastaddr: 226.94.1.1
   mcastport: 5405
   ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 192.168.3.0
mcastaddr: 226.94.1.2
mcastport: 5407
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
   subsys: AMF
   debug: off
}
}
amf {
mode: disabled
}

I have installed
cluster-glue-libs.1.0.5.x86_64.CentOS
corosynclib.1.4.1.x86_64.CentOS
openais.1.1.1.x86_64.CentOS
cman.3.0.12.1.x86_64.CentOS
pacemaker-libs.1.1.8.x86_64.CentOS
resource-agents.3.9.2.x86_64.CentOS
corosync.1.4.1.x86_64.CentOS
modcluster.0.16.2.x86_64.CentOS
openaislib.1.1.1.x86_64.CentOS
pacemaker-cluster-libs.1.1.8.x86_64.CentOS
pacemaker.1.1.8.x86_64.CentOS
pcs.0.9.26.noarch.CentOS
libqb.0.14.2.x86_64.CentOS
clusterlib.3.0.12.1.x86_64.CentOS
ricci.0.16.2.x86_64.CentOS
pacemaker-cli.1.1.8.x86_64.CentOS
fence-virt.0.2.3.x86_64.CentOS
ccs.0.16.2.x86_64.CentOS

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id= 10.15.28.36
status   = ring 0 active with no faults
# pcs status corosync
Nodeid Name
   1   sv2836.muc.baag
   2   sv2837.muc.baag

# crm_mon --one-shot -V
Last updated: Mon Dec  9 14:56:32 2013
Last change: Mon Dec  9 14:07:03 2013 via cibadmin on sv2836
Stack: cman
Current DC: sv2836 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
4 Resources configured

Online: [ sv2836 sv2837 ]

ClusterIP(ocf::heartbeat:IPaddr2):Started sv2836
 FIXRoute(ocf::baader:FIXRoute):   Started sv2836
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837

Best regards
Andreas Dvorak
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] WG: configuration of stonith

2013-12-09 Thread Dvorak Andreas
Dear all

My problem with stonith is solved.

Here is what I did:

pcs stonith create ipmi-fencing-sv2837 fence_ipmilan pcmk_host_list="sv2837" 
ipaddr=10.110.28.37 action="off" login=ipmi passwd=abc delay=15 op monitor 
interval=60s
pcs stonith create ipmi-fencing-sv2836 fence_ipmilan pcmk_host_list="sv2836" 
ipaddr=10.110.28.36 action="off" login=ipmi passwd=abc delay=15 op monitor 
interval=60s

pcs property set stonith-enabled=true

pcs constraint location ipmi-fencing-sv2837 prefers sv2836=INFINITY
pcs constraint location ipmi-fencing-sv2836 prefers sv2837=INFINITY

pcs status

Full list of resources:
 ClusterIP  (ocf::heartbeat:IPaddr2):   Started sv2836 
 FIXRoute   (ocf::baader:FIXRoute): Started sv2836 
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Started sv2836 
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Started sv2837

Best regards,
Andreas

-Ursprüngliche Nachricht-
Von: Dvorak Andreas 
Gesendet: Montag, 9. Dezember 2013 09:55
An: 'The Pacemaker cluster resource manager'
Betreff: Re: [Pacemaker] configuration of stonith

Dear all,

thank you for the answers.

Now I created to stonith resources
pcs stonith create ipmi-fencing-sv2837 fence_ipmilan pcmk_host_list="sv2837" 
ipaddr=10.110.28.37 action="reboot" login=abc passwd=abc123 delay=15 op monitor 
interval=60s pcs stonith create ipmi-fencing-sv2836 fence_ipmilan 
pcmk_host_list="sv2836" ipaddr=10.110.28.36 action="reboot" login=abc 
passwd=abc123 delay=15 op monitor interval=60s

the current status is:
pcs status
Cluster name: fix-prod
Last updated: Mon Dec  9 09:41:48 2013
Last change: Mon Dec  9 09:40:03 2013 via cibadmin on sv2836
Stack: cman
Current DC: sv2837 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
4 Resources configured

Online: [ sv2836 sv2837 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started sv2836 
 FIXRoute   (ocf::baader:FIXRoute): Started sv2836 
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Stopped 
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Stopped 

Failed actions:
ipmi-fencing-sv2837_start_0 on sv2837 'unknown error' (1): call=276, 
status=Error, last-rc-change='Mon Dec  9 09:39:55 2013', queued=17090ms, 
exec=0ms
ipmi-fencing-sv2836_start_0 on sv2837 'unknown error' (1): call=286, 
status=Error, last-rc-change='Mon Dec  9 09:40:13 2013', queued=17085ms, 
exec=0ms
ipmi-fencing-sv2837_start_0 on sv2836 'unknown error' (1): call=369, 
status=Error, last-rc-change='Mon Dec  9 09:40:13 2013', queued=17085ms, 
exec=0ms
ipmi-fencing-sv2836_start_0 on sv2836 'unknown error' (1): call=375, 
status=Error, last-rc-change='Mon Dec  9 09:40:31 2013', queued=17090ms, 
exec=0ms

Do I need to tell the stonith resource where to run and how can I do that?
In the parameter pcmk_host_list I have the hostname of the other node.

Best regards,
Andreas

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org 
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 13:06:04 schrieb Bauer, Stefan:
> Why are some resources listed more than once in the output?
> What is the difference between group_color and native_color?
> If a resource has a value of -INFINITY is it because the cluster already
> decided that this resource should not run on this host or it can not run on
> this host due to other reasons?
> 
> I'm not quite sure, how a resource stickiness interferes with the internal
> decicions taken to migrate.

Everything is points. stickiness is points. constraints result in points.

With every event the cluster calculates the matrix from all nodes and all 
resources. A resource will run on that node it can collect most points.

Beware of impicit constraints that give points. I.e. a colocation

col col_A_with_B inf: A B

will result in -inf points for the resource B on all nodes where A is not 
running.

Look at the output of crm_simulate -s -L, write down the matrix and understand 
it.

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
Why are some resources listed more than once in the output?
What is the difference between group_color and native_color?
If a resource has a value of -INFINITY is it because the cluster already 
decided that this resource should not run on this host or it can not run on 
this host due to other reasons?

I'm not quite sure, how a resource stickiness interferes with the internal 
decicions taken to migrate.

Stefan


-Ursprüngliche Nachricht-
Von: Michael Schwartzkopff [mailto:m...@sys4.de] 

Test your config.

check the points by

# crm_simulate -s -L

and adjust your scoring system accordingly.


Mit freundlichen Grüßen,

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker very often STONITHs other node

2013-12-09 Thread Nikita Staroverov



Hello,

Thank you for your answer. I have two drbd - /dev/drbd1 and 
/dev/drbd2. And I use them as PVs for LVM which has one Volume Group 
hosting all the VMs.


So should I have as many DRBDs as VMs and get rid off LVM at all?

PS. If it is not a secret what are you recommended timeouts?

Thank you!

First, you must eliminate Timed Out on stop operation. May be increasing 
timeout on stop will help.
My timeouts isn't a secret, of course. I use 5 minutes stopping timeout 
per VM and define longer for VM's that can't stop gracefully in 5 minutes.
I also use KVM virtual domains that always kicked off by VirtualDomain 
ocf agent without any errors (except infomessage in logs).
IMHO, one drbd per VM gives more flexibility in configuration, for 
example you can spread VM's among nodes in cluster, it's especially 
useful in big clusters(6-8 nodes or so).
This setup also helps with drbd write-after-write behavior. It gives 
more simultaneous writethrough operations  and decrease io latency in VM.
For example, if you have many nodes you can create dedicated replication 
network between nodes and get better network throughput.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Reg. trigger when node failure occurs

2013-12-09 Thread ESWAR RAO
Hi All,

I have a 3 node ( node1, node2, node3 ) setup on which HB+pacemaker runs.
I have resources running on clone mode on node1 and node2.

Is there anyway to get a trigger when a node failure occurs i.e., can i
trigger any script if the node3 fails (on which no resource runs) ???


Thanks
Eswar
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] monitor on-fail=ignore not restarting when resource reported as stopped

2013-12-09 Thread Lars Marowsky-Bree
On 2013-12-06T16:06:09, Patrick Hemmer  wrote:

Hi Patrick,

> > For a resource that pacemaker expects to be started, it's an error if it
> > is found to be stopped. Pacemaker can't tell if it is really cleanly
> > stopped, or died, or ...
> Oh, and I'll quote the OCF spec on this one:
> 
> 1 generic or unspecified error (current practice)
> The "monitor" operation shall return this for a crashed, hung or
> otherwise non-functional resource.
> 
> 7 program is not running
> Note: This is not the error code to be returned by a successful
> "stop" operation. A successful "stop" operation shall return 0.
> The "monitor" action shall return this value only for a
> _cleanly_ stopped resource. If in doubt, it should return 1.
> 
> So the OCF spec very clearly states that OCF_ERR_GENERIC means it's
> failed. OCF_NOT_RUNNING means it shut down cleanly. So yes, pacemaker
> can tell if it cleanly stopped.

Yes. I know. I wrote that.

But for a resource that pacemaker expects to be "started", both mean
that something happened, and the resource is no longer in the target
state. e.g., recovery kicks in. In theory, the only action that
OCF_NOT_RUNNING would allow us to skip is the "stop" action before
starting it elsewhere, but we do that anyway as a safety measure.

There's also a difference in how pacemaker handles this in response to
the initial monitor_0 / probe.

> > If you want Pacemaker to recover failed resources, do not set
> > on-fail="ignore". I still don't quite get why you set that when you
> > obviously don't want the associated behaviour?

I still don't understand what you want.

You want "failures" (i.e., rc != 0 or 7) to be ignored, but "stopped" to
be restarted? You can't do that without modifying the resource agent.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 11:27:51 schrieb Bauer, Stefan:
> Hi Michael,
> 
> so that means, either increasing one value or lower the other right?
> Or is lowering resource_stickiness the only reasonable way?
> 
> I tried a multiplier of 1000 but no change in the behavior.
> 
> Stefan
> 
> -Ursprüngliche Nachricht-
> Von: Michael Schwartzkopff [mailto:m...@sys4.de]
> You resource_stickiness is too high in respect to the pingd points. You
> resource "earns" 700 points staying there where it is and only 300 points
> for moving. Reduce your resource_stickiness to a reasonable amount.

Test your config.

check the points by

# crm_simulate -s -L

and adjust your scoring system accordingly.


Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Lars Marowsky-Bree
On 2013-12-09T10:28:32, "Bauer, Stefan (IZLBW Extern)"  
wrote:

> location groupwithping cluster1 \
> rule $id="groupwithping-rule" pingd: defined pingd

I tend to prefer a -inf score for nodes where pingd is *not* defined or
zero.

(Downside is that when you lose all connectivity, all services will be
stopped.)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
Hi Michael,

so that means, either increasing one value or lower the other right?
Or is lowering resource_stickiness the only reasonable way?

I tried a multiplier of 1000 but no change in the behavior.

Stefan

-Ursprüngliche Nachricht-
Von: Michael Schwartzkopff [mailto:m...@sys4.de] 
You resource_stickiness is too high in respect to the pingd points. You 
resource "earns" 700 points staying there where it is and only 300 points for 
moving. Reduce your resource_stickiness to a reasonable amount.

BTW: I hope 127.0.0.1 in the config of the ping resource is only for 
obfuscation on the list and no real configuration.


-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker very often STONITHs other node

2013-12-09 Thread Michał Margula

W dniu 09.12.2013 11:34, Nikita Staroverov pisze:


So, what happens? :)
Rivendell-B tried to stop XEN-acsystemy01, but couldn't do that due to
time out of operation. Failure on stop operation is fatal by default and
leading to stonith.
Rivendell-A caught this and fence rivendell-B.
You also have got some other problems, like clone-LVM not running (but
it is'nt fatal).

I think your servers is overloaded due to one DRBD for all VM's. You
must increase timeout of operations or do something with cluster
configuration.
As for me, i use configuration with one drbd per virtual machine drive,
moderated timeouts and 802.3ad bonding configuration without problems.



Hello,

Thank you for your answer. I have two drbd - /dev/drbd1 and /dev/drbd2. 
And I use them as PVs for LVM which has one Volume Group hosting all the 
VMs.


So should I have as many DRBDs as VMs and get rid off LVM at all?

PS. If it is not a secret what are you recommended timeouts?

Thank you!

--
Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread emmanuel segura
your location ins't enaugh, sorry for my english :)

location mynet mygrp_or_rsc \
rule $id="-rule" -inf: not_defined pingd or pingd number:lte 300



2013/12/9 Bauer, Stefan (IZLBW Extern) 

>  Pardon!
>
>
>
> node debian6-n1
>
> node debian6-n2
>
> primitive p_alias0 ocf:heartbeat:IPaddr2 \
>
> params ip="4.5.6.7" cidr_netmask="24" nic="eth0" \
>
> op start interval="0" timeout="20" \
>
> op stop interval="0" timeout="30" \
>
> op monitor interval="20"
>
> primitive p_conntrackd lsb:conntrackd-sync \
>
> op monitor interval="30s"
>
> primitive p_eth0 ocf:heartbeat:IPaddr2 \
>
> params ip="10.0.2.250" cidr_netmask="24" nic="eth0" \
>
> op start interval="0" timeout="20" \
>
> op stop interval="0" timeout="30" \
>
> op monitor interval="20"
>
> primitive p_openvpn lsb:openvpn \
>
> op start interval="0" timeout="20" \
>
> op stop interval="0" timeout="30" \
>
> op monitor interval="20"
>
> primitive p_ping ocf:pacemaker:ping \
>
> params host_list="7.4.5.6 127.0.0.1" multiplier="150" dampen="5s" \
>
> op start interval="0" timeout="60" \
>
> op stop interval="0" timeout="20" \
>
> op monitor interval="20" timeout="60"
>
> group cluster1 p_eth0 p_alias0 p_openvpn p_conntrackd \
>
> meta target-role="Started"
>
> clone pingclone p_ping \
>
> meta interleave="true"
>
> location groupwithping cluster1 \
>
> rule $id="groupwithping-rule" pingd: defined pingd
>
> colocation cluster inf: p_eth0 p_alias0 p_openvpn p_conntrackd
>
> property $id="cib-bootstrap-options" \
>
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>
> cluster-infrastructure="openais" \
>
> expected-quorum-votes="2" \
>
> no-quorum-policy="ignore" \
>
> stonith-enabled="false"
>
> rsc_defaults $id="rsc-options" \
>
> resource-stickiness="100"
>
>
>
> Stefan
>
>
>
> *Von:* emmanuel segura [mailto:emi2f...@gmail.com]
> *Gesendet:* Montag, 9. Dezember 2013 11:22
> *An:* The Pacemaker cluster resource manager
> *Betreff:* Re: [Pacemaker] Ressources not moving to node with better
> connectivity - pingd
>
>
>
> where is your config?
>
>
>
> 2013/12/9 Bauer, Stefan (IZLBW Extern) 
>
> Hi List,
>
>
>
> even though following well known documentations about a ping clone
> resource my resources are not moving to the node with the better
> connectivity:
>
>
>
> 2 Nodes configured, 2 expected votes
>
> 6 Resources configured.
>
> 
>
>
>
> Online: [ debian6-n2 debian6-n1 ]
>
>
>
> Resource Group: cluster1
>
>  p_eth0 (ocf::heartbeat:IPaddr2):   Started debian6-n2
>
>  p_alias0   (ocf::heartbeat:IPaddr2):   Started debian6-n2
>
>  p_openvpn  (lsb:openvpn):  Started debian6-n2
>
>  p_conntrackd   (lsb:conntrackd-sync):  Started debian6-n2
>
> Clone Set: pingclone [p_ping]
>
>  Started: [ debian6-n1 debian6-n2 ]
>
>
>
> Node Attributes:
>
> * Node debian6-n2:
>
> + pingd : 150   : Connectivity is
> degraded (Expected=300)
>
> * Node debian6-n1:
>
> + pingd : 300
>
>
>
> I would expect the resources to move to N1.
>
> Resource-stickiness is set to 100.
>
> 2 Pinghosts are configured – n2 can right now only reach a single pinghost.
>
>
>
> Resource  Score Node   Stickiness #Fail
> Migration-Threshold
>
> p_alias0  700   debian6-n2 1000
>
> p_alias0  -INFINITY debian6-n1 1000
>
> p_conntrackd  100   debian6-n2 1000
>
> p_conntrackd  -INFINITY debian6-n1 1000
>
> p_eth01650  debian6-n2 1000
>
> p_eth0300   debian6-n1 1000
>
> p_openvpn 300   debian6-n2 1000
>
> p_openvpn -INFINITY debian6-n1 1000
>
> p_ping:0  100   debian6-n1 1000
>
> p_ping:0  -INFINITY debian6-n2 1000
>
> p_ping:1  0 debian6-n1 1000
>
> p_ping:1  100   debian6-n2 1000
>
>
>
> Anybody see what the problem could be?
>
> To be honest I did not fully understood the deeper function of how the
> scores are calculated.
>
>
>
> Thank you.
>
>
>
> Stefan
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
>
> --
> esta es mi vida e me la vivo hasta que dios quiera
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.cluster

Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 10:28:32 schrieb Bauer, Stefan:
> Pardon!
> 
> node debian6-n1
> node debian6-n2
> primitive p_alias0 ocf:heartbeat:IPaddr2 \
> params ip="4.5.6.7" cidr_netmask="24" nic="eth0" \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="30" \
> op monitor interval="20"
> primitive p_conntrackd lsb:conntrackd-sync \
> op monitor interval="30s"
> primitive p_eth0 ocf:heartbeat:IPaddr2 \
> params ip="10.0.2.250" cidr_netmask="24" nic="eth0" \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="30" \
> op monitor interval="20"
> primitive p_openvpn lsb:openvpn \
> op start interval="0" timeout="20" \
> op stop interval="0" timeout="30" \
> op monitor interval="20"
> primitive p_ping ocf:pacemaker:ping \
> params host_list="7.4.5.6 127.0.0.1" multiplier="150" dampen="5s" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="20" \
> op monitor interval="20" timeout="60"
> group cluster1 p_eth0 p_alias0 p_openvpn p_conntrackd \
> meta target-role="Started"
> clone pingclone p_ping \
> meta interleave="true"
> location groupwithping cluster1 \
> rule $id="groupwithping-rule" pingd: defined pingd
> colocation cluster inf: p_eth0 p_alias0 p_openvpn p_conntrackd
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> stonith-enabled="false"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
> 
> Stefan
> 
> Von: emmanuel segura [mailto:emi2f...@gmail.com]
> Gesendet: Montag, 9. Dezember 2013 11:22
> An: The Pacemaker cluster resource manager
> Betreff: Re: [Pacemaker] Ressources not moving to node with better
> connectivity - pingd
> 
> where is your config?
> 
> 2013/12/9 Bauer, Stefan (IZLBW Extern)
> mailto:stefan.ba...@iz.bwl.de>> Hi List,
> 
> even though following well known documentations about a ping clone resource
> my resources are not moving to the node with the better connectivity:
> 
> 2 Nodes configured, 2 expected votes
> 6 Resources configured.
> 
> 
> Online: [ debian6-n2 debian6-n1 ]
> 
> Resource Group: cluster1
>  p_eth0 (ocf::heartbeat:IPaddr2):   Started debian6-n2
>  p_alias0   (ocf::heartbeat:IPaddr2):   Started debian6-n2
>  p_openvpn  (lsb:openvpn):  Started debian6-n2
>  p_conntrackd   (lsb:conntrackd-sync):  Started debian6-n2
> Clone Set: pingclone [p_ping]
>  Started: [ debian6-n1 debian6-n2 ]
> 
> Node Attributes:
> * Node debian6-n2:
> + pingd : 150   : Connectivity is
> degraded (Expected=300) * Node debian6-n1:
> + pingd : 300
> 
> I would expect the resources to move to N1.
> Resource-stickiness is set to 100.
> 2 Pinghosts are configured - n2 can right now only reach a single pinghost.
> 
> Resource  Score Node   Stickiness #Fail   
> Migration-Threshold p_alias0  700   debian6-n2 1000
> p_alias0  -INFINITY debian6-n1 1000
> p_conntrackd  100   debian6-n2 1000
> p_conntrackd  -INFINITY debian6-n1 1000
> p_eth01650  debian6-n2 1000
> p_eth0300   debian6-n1 1000
> p_openvpn 300   debian6-n2 1000
> p_openvpn -INFINITY debian6-n1 1000
> p_ping:0  100   debian6-n1 1000
> p_ping:0  -INFINITY debian6-n2 1000
> p_ping:1  0 debian6-n1 1000
> p_ping:1  100   debian6-n2 1000
> 
> Anybody see what the problem could be?
> To be honest I did not fully understood the deeper function of how the
> scores are calculated.
> 
> Thank you.
> 
> Stefan

You resource_stickiness is too high in respect to the pingd points. You 
resource "earns" 700 points staying there where it is and only 300 points for 
moving. Reduce your resource_stickiness to a reasonable amount.

BTW: I hope 127.0.0.1 in the config of the ping resource is only for 
obfuscation on the list and no real configuration.


-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://ww

Re: [Pacemaker] Pacemaker very often STONITHs other node

2013-12-09 Thread Nikita Staroverov



Hello,

Still did not receive any hints from you. And you are definitely my 
only hope before I switch to Proxmox or (even worse) some commercial 
stuff.


At least can you tell mi if mode 4 could cause trouble with Corosync?

Thanks!


According to your logs, posted before, the reason was:

Nov 23 15:18:50 rivendell-B lrmd: [9526]: WARN: XEN-acsystemy01:stop 
process (PID 20760) timed out (try 1).  Killing with signal SIGTERM (15).
Nov 23 15:18:50 rivendell-B lrmd: [9526]: WARN: operation stop[115] on 
XEN-acsystemy01 for client 9529: pid 20760 timed out
Nov 23 15:18:50 rivendell-B crmd: [9529]: ERROR: process_lrm_event: LRM 
operation XEN-acsystemy01_stop_0 (115) Timed Out (timeout=24ms)


Then rivendell-A did its job:

Nov 23 15:18:45 rivendell-A crmd: [8840]: WARN: status_from_rc: Action 
117 (XEN-acsystemy01_stop_0) on rivendell-B failed (target: 0 vs. rc: 
-2): Error
Nov 23 15:18:45 rivendell-A crmd: [8840]: WARN: update_failcount: 
Updating failcount for XEN-acsystemy01 on rivendell-B after failed stop: 
rc=-2 (update=INFINITY, time=1385216325)
Nov 23 15:18:45 rivendell-A crmd: [8840]: info: abort_transition_graph: 
match_graph_event:277 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=XEN-acsystemy01_last_failure_0, 
magic=2:-2;117:5105:0:e3a546ba-30f9-4d69-803a-d27b

0ef626c4, cib=0.3259.139) : Event failed
Nov 23 15:18:45 rivendell-A crmd: [8840]: notice: run_graph:  
Transition 5105 (Complete=11, Pending=0, Fired=0, Skipped=28, 
Incomplete=2, Source=/var/lib/pengine/pe-input-69.bz2): Stopped
Nov 23 15:18:45 rivendell-A crmd: [8840]: notice: do_state_transition: 
State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ 
input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Nov 23 15:18:45 rivendell-A pengine: [8839]: notice: unpack_config: On 
loss of CCM Quorum: Ignore
Nov 23 15:18:45 rivendell-A pengine: [8839]: WARN: unpack_rsc_op: 
Processing failed op primitive-LVM:1_last_failure_0 on rivendell-B: not 
running (7)
Nov 23 15:18:45 rivendell-A pengine: [8839]: WARN: unpack_rsc_op: 
Processing failed op XEN-acsystemy01_last_failure_0 on rivendell-B: 
unknown exec error (-2)
Nov 23 15:18:45 rivendell-A pengine: [8839]: WARN: pe_fence_node: Node 
rivendell-B will be fenced to recover from resource failure(s)
Nov 23 15:18:45 rivendell-A pengine: [8839]: notice: 
common_apply_stickiness: clone-LVM can fail 99 more times on 
rivendell-B before being forced off
Nov 23 15:18:45 rivendell-A pengine: [8839]: notice: 
common_apply_stickiness: clone-LVM can fail 99 more times on 
rivendell-B before being forced off
Nov 23 15:18:45 rivendell-A pengine: [8839]: WARN: stage6: Scheduling 
Node rivendell-B for STONITH



So, what happens? :)
Rivendell-B tried to stop XEN-acsystemy01, but couldn't do that due to 
time out of operation. Failure on stop operation is fatal by default and 
leading to stonith.

Rivendell-A caught this and fence rivendell-B.
You also have got some other problems, like clone-LVM not running (but 
it is'nt fatal).


I think your servers is overloaded due to one DRBD for all VM's. You 
must increase timeout of operations or do something with cluster 
configuration.
As for me, i use configuration with one drbd per virtual machine drive, 
moderated timeouts and 802.3ad bonding configuration without problems.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
Pardon!

node debian6-n1
node debian6-n2
primitive p_alias0 ocf:heartbeat:IPaddr2 \
params ip="4.5.6.7" cidr_netmask="24" nic="eth0" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="30" \
op monitor interval="20"
primitive p_conntrackd lsb:conntrackd-sync \
op monitor interval="30s"
primitive p_eth0 ocf:heartbeat:IPaddr2 \
params ip="10.0.2.250" cidr_netmask="24" nic="eth0" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="30" \
op monitor interval="20"
primitive p_openvpn lsb:openvpn \
op start interval="0" timeout="20" \
op stop interval="0" timeout="30" \
op monitor interval="20"
primitive p_ping ocf:pacemaker:ping \
params host_list="7.4.5.6 127.0.0.1" multiplier="150" dampen="5s" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="20" \
op monitor interval="20" timeout="60"
group cluster1 p_eth0 p_alias0 p_openvpn p_conntrackd \
meta target-role="Started"
clone pingclone p_ping \
meta interleave="true"
location groupwithping cluster1 \
rule $id="groupwithping-rule" pingd: defined pingd
colocation cluster inf: p_eth0 p_alias0 p_openvpn p_conntrackd
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

Stefan

Von: emmanuel segura [mailto:emi2f...@gmail.com]
Gesendet: Montag, 9. Dezember 2013 11:22
An: The Pacemaker cluster resource manager
Betreff: Re: [Pacemaker] Ressources not moving to node with better connectivity 
- pingd

where is your config?

2013/12/9 Bauer, Stefan (IZLBW Extern) 
mailto:stefan.ba...@iz.bwl.de>>
Hi List,

even though following well known documentations about a ping clone resource my 
resources are not moving to the node with the better connectivity:

2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ debian6-n2 debian6-n1 ]

Resource Group: cluster1
 p_eth0 (ocf::heartbeat:IPaddr2):   Started debian6-n2
 p_alias0   (ocf::heartbeat:IPaddr2):   Started debian6-n2
 p_openvpn  (lsb:openvpn):  Started debian6-n2
 p_conntrackd   (lsb:conntrackd-sync):  Started debian6-n2
Clone Set: pingclone [p_ping]
 Started: [ debian6-n1 debian6-n2 ]

Node Attributes:
* Node debian6-n2:
+ pingd : 150   : Connectivity is 
degraded (Expected=300)
* Node debian6-n1:
+ pingd : 300

I would expect the resources to move to N1.
Resource-stickiness is set to 100.
2 Pinghosts are configured - n2 can right now only reach a single pinghost.

Resource  Score Node   Stickiness #Fail
Migration-Threshold
p_alias0  700   debian6-n2 1000
p_alias0  -INFINITY debian6-n1 1000
p_conntrackd  100   debian6-n2 1000
p_conntrackd  -INFINITY debian6-n1 1000
p_eth01650  debian6-n2 1000
p_eth0300   debian6-n1 1000
p_openvpn 300   debian6-n2 1000
p_openvpn -INFINITY debian6-n1 1000
p_ping:0  100   debian6-n1 1000
p_ping:0  -INFINITY debian6-n2 1000
p_ping:1  0 debian6-n1 1000
p_ping:1  100   debian6-n2 1000

Anybody see what the problem could be?
To be honest I did not fully understood the deeper function of how the scores 
are calculated.

Thank you.

Stefan

___
Pacemaker mailing list: 
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread emmanuel segura
where is your config?


2013/12/9 Bauer, Stefan (IZLBW Extern) 

>  Hi List,
>
>
>
> even though following well known documentations about a ping clone
> resource my resources are not moving to the node with the better
> connectivity:
>
>
>
> 2 Nodes configured, 2 expected votes
>
> 6 Resources configured.
>
> 
>
>
>
> Online: [ debian6-n2 debian6-n1 ]
>
>
>
> Resource Group: cluster1
>
>  p_eth0 (ocf::heartbeat:IPaddr2):   Started debian6-n2
>
>  p_alias0   (ocf::heartbeat:IPaddr2):   Started debian6-n2
>
>  p_openvpn  (lsb:openvpn):  Started debian6-n2
>
>  p_conntrackd   (lsb:conntrackd-sync):  Started debian6-n2
>
> Clone Set: pingclone [p_ping]
>
>  Started: [ debian6-n1 debian6-n2 ]
>
>
>
> Node Attributes:
>
> * Node debian6-n2:
>
> + pingd : 150   : Connectivity is
> degraded (Expected=300)
>
> * Node debian6-n1:
>
> + pingd : 300
>
>
>
> I would expect the resources to move to N1.
>
> Resource-stickiness is set to 100.
>
> 2 Pinghosts are configured – n2 can right now only reach a single pinghost.
>
>
>
> Resource  Score Node   Stickiness #Fail
> Migration-Threshold
>
> p_alias0  700   debian6-n2 1000
>
> p_alias0  -INFINITY debian6-n1 1000
>
> p_conntrackd  100   debian6-n2 1000
>
> p_conntrackd  -INFINITY debian6-n1 1000
>
> p_eth01650  debian6-n2 1000
>
> p_eth0300   debian6-n1 1000
>
> p_openvpn 300   debian6-n2 1000
>
> p_openvpn -INFINITY debian6-n1 1000
>
> p_ping:0  100   debian6-n1 1000
>
> p_ping:0  -INFINITY debian6-n2 1000
>
> p_ping:1  0 debian6-n1 1000
>
> p_ping:1  100   debian6-n2 1000
>
>
>
> Anybody see what the problem could be?
>
> To be honest I did not fully understood the deeper function of how the
> scores are calculated.
>
>
>
> Thank you.
>
>
>
> Stefan
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>


-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Ressources not moving to node with better connectivity - pingd

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
Hi List,

even though following well known documentations about a ping clone resource my 
resources are not moving to the node with the better connectivity:

2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ debian6-n2 debian6-n1 ]

Resource Group: cluster1
 p_eth0 (ocf::heartbeat:IPaddr2):   Started debian6-n2
 p_alias0   (ocf::heartbeat:IPaddr2):   Started debian6-n2
 p_openvpn  (lsb:openvpn):  Started debian6-n2
 p_conntrackd   (lsb:conntrackd-sync):  Started debian6-n2
Clone Set: pingclone [p_ping]
 Started: [ debian6-n1 debian6-n2 ]

Node Attributes:
* Node debian6-n2:
+ pingd : 150   : Connectivity is 
degraded (Expected=300)
* Node debian6-n1:
+ pingd : 300

I would expect the resources to move to N1.
Resource-stickiness is set to 100.
2 Pinghosts are configured - n2 can right now only reach a single pinghost.

Resource  Score Node   Stickiness #Fail
Migration-Threshold
p_alias0  700   debian6-n2 1000
p_alias0  -INFINITY debian6-n1 1000
p_conntrackd  100   debian6-n2 1000
p_conntrackd  -INFINITY debian6-n1 1000
p_eth01650  debian6-n2 1000
p_eth0300   debian6-n1 1000
p_openvpn 300   debian6-n2 1000
p_openvpn -INFINITY debian6-n1 1000
p_ping:0  100   debian6-n1 1000
p_ping:0  -INFINITY debian6-n2 1000
p_ping:1  0 debian6-n1 1000
p_ping:1  100   debian6-n2 1000

Anybody see what the problem could be?
To be honest I did not fully understood the deeper function of how the scores 
are calculated.

Thank you.

Stefan
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] configuration of stonith

2013-12-09 Thread Michael Schwartzkopff
Am Montag, 9. Dezember 2013, 07:42:59 schrieb Masopust, Christian:
> > >> If you're using 1.1.10+,
> > >> 
> > >> pcs stonith create fence_pcmk1_ipmi fence_ipmilan \
> > >> 
> > >> pcmk_host_list="pcmk-1" ipaddr="pcmk-1.ipmi" \
> > >> action="reboot" login="admin" passwd="secret" delay=15 \
> > >> op monitor interval=60s
> > >> 
> > >> pcs stonith create fence_pcmk2_ipmi fence_ipmilan \
> > >> 
> > >> pcmk_host_list="pcmk-2" ipaddr="pcmk-2.ipmi" \
> > >> action="reboot" login="admin" passwd="secret" delay=15 \
> > >> op monitor interval=60s
> > >> 
> > >> is sufficient.
> > > 
> > > Hi,
> > > 
> > > just two questions about setting these stonith:
> > > 
> > > - shouldn't the delay's be different to avoid a stonith-battle?
> > 
> > As Emmanuel said, yes, it is needed to avoid dual-fencing in two-node
> > clusters, though the issue is not restricted to rhcs (or any HA
> > clustering that allows two nodes).
> > 
> > The node with the 'delay="15"' will have a 15 second
> > head-start, so in a
> > network partition triggered fence, the node with the delay
> > should always
> > live and the node without the delay will be immediately fenced.
> > 
> > > - when creating these stonith I see them both started on one single
> > > 
> > >   node. Don't I need some location constraints?  Such that
> > 
> > "fence_pcmk1"
> > 
> > >   only runs on pcmk2 and vice versa?
> > 
> > What version of pacemaker are you using?
> 
> Hi Digimer,
> 
> first when seeing this behaviour there was version 1.1.8. This weekend
> I've updated to 1.1.10 (latest available with CentOS 6.5) and now I see
> that fence_pcmk1 is started at pcmk1 and fence_pcmk2 at pcmk2.
> Is that correct? To my (probably wrong) understanding it should be
> vice-versa, shouldn't it?

fence_pcmk1 is the resource that can fence node pcmk1. If that resource runs 
on node pcmk1 it is useless since node 2 has to be able to fence node 1.

Add location constraints that prevent the fencing resource that can fence node 
1 trunning on node 1.

-- 
Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Axel von der Ohe, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] configuration of stonith

2013-12-09 Thread Dvorak Andreas
Dear all,

thank you for the answers.

Now I created to stonith resources 
pcs stonith create ipmi-fencing-sv2837 fence_ipmilan pcmk_host_list="sv2837" 
ipaddr=10.110.28.37 action="reboot" login=abc passwd=abc123 delay=15 op monitor 
interval=60s
pcs stonith create ipmi-fencing-sv2836 fence_ipmilan pcmk_host_list="sv2836" 
ipaddr=10.110.28.36 action="reboot" login=abc passwd=abc123 delay=15 op monitor 
interval=60s

the current status is:
pcs status
Cluster name: fix-prod
Last updated: Mon Dec  9 09:41:48 2013
Last change: Mon Dec  9 09:40:03 2013 via cibadmin on sv2836
Stack: cman
Current DC: sv2837 - partition with quorum
Version: 1.1.10-1.el6_4.4-368c726
2 Nodes configured
4 Resources configured

Online: [ sv2836 sv2837 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started sv2836 
 FIXRoute   (ocf::baader:FIXRoute): Started sv2836 
 ipmi-fencing-sv2837(stonith:fence_ipmilan):Stopped 
 ipmi-fencing-sv2836(stonith:fence_ipmilan):Stopped 

Failed actions:
ipmi-fencing-sv2837_start_0 on sv2837 'unknown error' (1): call=276, 
status=Error, last-rc-change='Mon Dec  9 09:39:55 2013', queued=17090ms, 
exec=0ms
ipmi-fencing-sv2836_start_0 on sv2837 'unknown error' (1): call=286, 
status=Error, last-rc-change='Mon Dec  9 09:40:13 2013', queued=17085ms, 
exec=0ms
ipmi-fencing-sv2837_start_0 on sv2836 'unknown error' (1): call=369, 
status=Error, last-rc-change='Mon Dec  9 09:40:13 2013', queued=17085ms, 
exec=0ms
ipmi-fencing-sv2836_start_0 on sv2836 'unknown error' (1): call=375, 
status=Error, last-rc-change='Mon Dec  9 09:40:31 2013', queued=17090ms, 
exec=0ms

Do I need to tell the stonith resource where to run and how can I do that?
In the parameter pcmk_host_list I have the hostname of the other node.

Best regards,
Andreas

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pcs ping connectivity rule

2013-12-09 Thread Bauer, Stefan (IZLBW Extern)
May i ask, how your configuration snippet look like? 

Thank you

Stefan

-Ursprüngliche Nachricht-
Von: Martin Ševčík [mailto:sev...@esys.cz] 
Gesendet: Freitag, 6. Dezember 2013 12:26
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] pcs ping connectivity rule

I installed crmsh and configured it via crm commands.

best regards,
m.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again

2013-12-09 Thread Jan Friesse
Brian J. Murrell (brian) napsal(a):
> I seem to have another instance where pacemaker fails to exit at the end
> of a shutdown.  Here's the log from the start of the "service pacemaker
> stop":
> 
> Dec  3 13:00:39 wtm-60vm8 crmd[14076]:   notice: do_state_transition: State 
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
> cause=C_IPC_MESSAGE origin=handle_response ]
> Dec  3 13:00:39 wtm-60vm8 crmd[14076]: info: do_te_invoke: Processing 
> graph 19 (ref=pe_calc-dc-1386093636-83) derived from 
> /var/lib/pengine/pe-input-40.bz2

...

> Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]:error: send_cpg_message: 
> Sending message via cpg FAILED: (rc=6) Try again
> Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]:   notice: pcmk_shutdown_worker: 
> Shutdown complete
> Dec  3 13:05:08 wtm-60vm8 pacemakerd[14067]: info: main: Exiting 
> pacemakerd
> 
> These types of shutdown failure issues seem to always end up with the series 
> of:
> 
> error: send_cpg_message: Sending message via cpg FAILED: (rc=6) Try again
> 
> Even though the above messages seem to indicate that pacemaker did
> finally exit it did not as can be seen looking at the process table:
> 
> 14032 ?Ssl0:01 corosync
> 14067 ?S  0:00 pacemakerd
> 14071 ?Ss 0:00  \_ /usr/libexec/pacemaker/cib
> 
> So what does this "sending message via cpg FAILED: (rc=6)" mean exactly?
> 

Error 6 error means "try again". This is happening ether if corosync is
overloaded or creating new membership. Please take a look to
/var/log/cluster/corosync.log if you see something strange there (+ make
sure you have newest corosync).

Regards,
  Honza

> Or any other ideas what happened to this shutdown to cause it to fail/hang 
> ultimately?
> 
> Cheers,
> b.
> 
> 
> 
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org