from:"Florian Crouzat"

Re: [Pacemaker] Clustermon issue

2014-12-19 Thread Florian Crouzat


Le 18/12/2014 16:21, Marco Querci a écrit :

Hi all,
I have a pacemaker + corosync cluster installed on a CentOS 6.5
I have a resource ClusterMon with external_agent param set up.
Before last pacemaker update the events notification to the external
script worked perfectly.
After pacemaker update to version 1.1.11, the ClusterMon resource
continues to work but stop notifying to the external agent.
I followed setup instructions found on internet but I can't figure how
it doesn't work as expected.

Any help will be appreciated.
Many thanks.



Hello, please paste your full configuration here please so we understand 
how you use the ClusterMon stuff.
Remember that on RHEL 6.x, SNMP support is not built in ; but that's 
probably why you use an external_agent. I just need to make sure by 
reading your configuration.


Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to get external-agent to work?

2014-03-27 Thread Florian Crouzat

Le 26/03/2014 18:55, John Wei a écrit :
> I am trying to receive notification through an external program on
> CentOS 6.5 with pacemaker. I have tried crm_mon -E and ClusterMon RA,
> none of them seem to work.
> Anyone has insight what I might have done wrong?
> 

Maybe get informations from this old howto I wrote couple month ago:
http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/

or, provides use with more info, logs, configurations, etc.

-- 
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] question on resource colocation

2013-12-18 Thread Florian Crouzat


Le 18/12/2013 17:20, Brusq, Jerome a écrit :

Dear all,

It is maybe a stupid question… Sorry I’m a new user of pacemaker.


Not at all.


I have 2 nodes and 5 resources (resource1, resource2, resource3,
resource4 and resource5). The 5 resources have to work together on the
same node.

I tried the feature group and colocation .. but I have always the same
issue :

When node2 is power off, node1 is active and all resources are up. If
service3 fails, pacemaker stops resource4 and resource5 as well  (I have
read this is the normal behavior)…


This is because group is a syntaxic shortcut for colocation + ordering.
The important word here beeing "ordering". If you stop 3, then 4 and 5 
will stop as they *require* 3 to be up.



Is there a way for pacemaker to restart only service3 ?


I think you would have to remove your group and create all the 
colocations constraints and no ordering constraits. I'm not sure it's 
possible to create a group of unordered resources.




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] does adding a second ring actually work with cman?

2013-12-17 Thread Florian Crouzat


Le 16/12/2013 17:28, Brian J. Murrell a écrit :


So is there something I am misunderstanding or is this not actually
working?  The node I was trying this on is EL6.4.


Redundant ring protocol in CMAN works very fine with >=RHEL6.4 (I only 
use in active mode though).


Is it possible that lotus-5vm8 (from DNS) and lotus-5vm8-ring1 (from 
/etc/hosts) resolves to the same IP (10.128.0.206) which could maybe 
confuse cman and make it decide that there is only one ring ?


In my case, to be extra safe, and because I want everything to work 
without network in case of admin interventions/failures (whatever), I 
always put both entries in /etc/hosts, eg:


192.168.10.1 node01 node01.example.com
192.168.150.1 node01_alt node01_alt.example.com

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] howto group resources without having an order

2013-11-26 Thread Florian Crouzat


Le 26/11/2013 10:49, Bauer, Stefan (IZLBW Extern) a écrit :

The thing is, that resource sets are not configurable without editing the xml 
directly. True?


Not true at all.


Crm configure only knows group order and colocation.


Nop, eg (from memories, check the syntax, it exists):

order foo INF: (A B) C D (E F G)
colo bar INF: (E F G) D C (B A)


I just don't want to mess with the raw xml files if there are other options.


You should never edit xml, indeed.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] howto group resources without having an order

2013-11-26 Thread Florian Crouzat


Le 26/11/2013 10:19, Bauer, Stefan (IZLBW Extern) a écrit :

Hi,

thank you for your input - unfortunately i want to go another path if possible 
to not not have to change more parts of my configuration:


So basically you want to fix your non-working configuration without 
changing your (non-working) configuration ? Right, that seems reasonnable.




I have setup so far:

group cluster1 p_eth0 p_conntrackd
location groupwithping cluster1 \
rule "id="groupwithping-rule" pingd: defined pingd
colocation cluster inf: p_eth0 p_conntrackd

Now I cannot simply add p_openvpn1 + openvpn2 to the above colocation because 
then the order is active. Why is colocation even taking care of the order of 
the resources?!



No you cannot.


If I change it to:

Colocation cluster inf: p_eth0 p_conntrackd p_openvpn1 p_openvpn2 - I cannot 
start openvpn2 without having openvpn1 up.
This is not what I want.


What you want, I already told you.



Thank you.

Stefan


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] howto group resources without having an order

2013-11-26 Thread Florian Crouzat


Le 26/11/2013 07:40, Bauer, Stefan (IZLBW Extern) a écrit :

Dear Developers & Users,

i have 4 resources: p_eth0 p_conntrackd p_openvpn1 p_openvpn2

Right now, I use group and colocation to let p_eth0 and p_conntrackd
start in the right order (first eth0, then conntrackd).

I want now to also include p_openvpn1 + 2 but not having them in any
order. Means – running on the same cluster node but independent from
each other.

I want to be able to not depend on openvpn1 to start openvpn2 (that’s
the default behavior iirc without groups/orders).

Any help is greatly appreciated.

Best regards

Stefan


Use resources sets (both for ordering[1] and collocation[2]).
And play with the value of parameters "sequential=" and "require-all=".

[1] - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-ordering.html
[2] - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-collocation.html







--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] configuration files of cluster

2013-11-18 Thread Florian Crouzat


Le 18/11/2013 15:58, Dvorak Andreas a écrit :

Dear all

Now I finished the configuation of my  first pacemaker cluster and would
like to backup configuration files.

But I do not find any.


The configuration and cluster state are shared accros the cluster nodes 
and physically stored in an XML file called a CIB (Cluster Information 
Base).
At a given time, the CIB stores the exact state of the cluster (with 
scores, roles, & such)




Can you please help me needs to put in  a backup?

With pca config I get a nice output, but I would like to have that in
one or more files.


Assuming you just want to save your starting configuration and not the 
state of the cluster or anything complex, saving the output of "pcs 
config" is alright, and can be easily restored. But you could also 
create "shadow cibs" and restart from them (~ save points).


In any case, I'd say, do not touch the XML files.



My cluster installation

Pacemaker 1.1.8

cman 3.0.12.1

pcs 0.9.26

ccs 0.16.2

resource-agents 3.9.2

corosync 1.4.1

Best regards

Andreas Dvorak



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] running scripts in a resource

2013-11-15 Thread Florian Crouzat


Le 15/11/2013 13:32, Dvorak Andreas a écrit :

Dear all,

I would like to create a resource that sets up routes an a node.


I'm quite sure there are dedicated resources to manage routes *but* if 
you want to script stuff yourself, then create a lsb resource (eg: 
lsb:my_routes) which adds routes in the start() section, checks for them 
in the status() section and removes them in the stop() section.
Make sure your return code are LSB compliant: 
clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html



I have got three scripts

check_routes.sh

delete_routes.sh

setup_routes.sh

but unfortunately I do not find documentation how to do that. Can
somebody please help me?




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Offline Cluster edit

2013-10-15 Thread Florian Crouzat


Le 15/10/2013 09:39, Robert Lindgren a écrit :

I have a cluster that is offline, and I can't start it to do edits
(since IPs and so will conflict with old cluster). What is the preferred
way of doing the edits (change IPs) so that I can start the cluster?


Can't you start only one of the node unplugged from the network with the 
wrong configuration, edit and update the configuration with pcs/crmsh, 
replug the network when it's all okay, then have other nodes rejoin the 
cluster (with old conf), they automagically update to new conf as they 
see they are deprecated, and there are no conflicts in the entire process.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] create 2-node Active/Passive firewall cluster

2013-09-19 Thread Florian Crouzat


Le 19/09/2013 11:43, David Lang a écrit :

On Thu, 19 Sep 2013, Florian Crouzat wrote:


Le 18/09/2013 20:34, Jeff Weber a ?crit :

I am looking to create a  2-node Active/Passive firewall cluster.  I am
an experienced Linux user, but new to HA clusters. I have scanned
"Clusters From Scratch" and "Pacemaker Explained".  I found these docs
helpful, but a bit overwhelming, being new to HA clusters.

My goals:
* create 2-node Active/Passive firewall cluster
* Each FW node has an external, and internal interface
* Cluster software presents external, internal VIPs
* VIPs must be co-located on same node
* One node is preferred for VIP locations
* If any interface fails on node currently hosting VIPs, VIPs move to
other node

For simplicity sake, I'll start by creating VIPs, and add firewall
plumbing to the VIPs in the future.

My config:
CentOS-6.3 based distro +
corosync-1.4.1-1
pacemaker-1.1.8-1
pcs-0.9.26-1
resource-agents-3.9.2-12
and all required dependencies

My questions:

This sounds like a common use case, but I could not find an
example/HOWTO.  Did I miss it?

Do I have the correct HA cluster packages, versions to start work?
Do I also need the cman?, ccs packages?

How many interfaces should each cluster node have?
 2 interfaces: internal, external
 or
 3 interfaces: internal, external, monitor

Do I need to configure corosync.conf/totem/interface/bindnetaddr, and if
so, bind to what net?

$1M question:
How to configure cluster to monitor all internal, external cluster
interfaces, and perform
failover?  Here's my estimate:

* create external VIP as IpAddr2 and bind to external interfaces
* create internal VIP as IpAddr2 and bind to internal interfaces
* co-locate both VIPs together
* specify a location constraint for preferred node

Any help would be appreciated,
thanks
Jeff



I have several two-nodes firewall clusters running pacemaker+cman
(since EL6.4) and they work perfectly. My setup is as follow:

Both node boots in a "passive" firewall state (via chkconfig). In this
state, only corosync trafic is allowed between nodes (and admin access
on non-VIP IPs). From that state, they both start cman+pacemaker and
via a location preference + 3 ping nodes, the node with the best score
starts the resources.
Resources are a group of 30+ IPaddr2, iptables and custom daemons such
as bind, postfix, ldirectord, etc. All resources are collocated and
ordered so they all are on the same node and starts in a correct order
(first I get the VIPs then I start the firewall, then I bind the
daemons, etc)

VIPs are not really monitored as pacemaker doesn't really do that, it
just checks the IP is present in some sort of "sudo ip addr ls | fgrep
" ; if you unplug the network cable, it won't see it: that's where
you define wisely your ping nodes so that you can monitor the
connectivity of certain subnet/gateway from all nodes and decide which
is the best connected one in case of incident.

If you like, I can paste configuration files (cluster.conf + CIB)


I've been running active/failover firewall clusters with heartbeat since
about 2000, and one suggestion that I would make. If you can leave all
the daemons running all the time, the failover process is far more
robust (and faster since you don't have daemons to start). If you set
net.ipv4.ip_nonlocal_bind you can even have the daemons startup binding
to the VIP addresses that don't yet exist.

If you do not have to have the daemons bound to the VIP, the fact that
they are always running on the backup box gives you a quick way to check
if a failover would solve the problem or not by having a client connect
directly to the second box. The drawback is that someone may configure
something to point directly at a box and not at a VIP and you won't
detect it (without log analysis) until the box they point at actually
goes down.

David Lang


I never thought about that, it seems it could be interesting, especially 
with slow (start|stop)ing daemons such as squid.


In my case, my daemons would be protected by the "passive firewall 
state" that my nodes have when they don't host resources.


Thanks for bringing this up.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] very slow pacemaker/corosync shutdown

2013-09-19 Thread Florian Crouzat


Le 19/09/2013 00:25, David Lang a écrit :

I'm frequently running into a problem that shutting down
pacemaker/corosync takes a very long time (several minutes)


Just to be 100% sure, you always respect the stop order ? Pacemaker 
*then* CMAN/corosync ?


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] create 2-node Active/Passive firewall cluster

2013-09-19 Thread Florian Crouzat


Le 18/09/2013 20:34, Jeff Weber a écrit :

I am looking to create a  2-node Active/Passive firewall cluster.  I am
an experienced Linux user, but new to HA clusters. I have scanned
"Clusters From Scratch" and "Pacemaker Explained".  I found these docs
helpful, but a bit overwhelming, being new to HA clusters.

My goals:
* create 2-node Active/Passive firewall cluster
* Each FW node has an external, and internal interface
* Cluster software presents external, internal VIPs
* VIPs must be co-located on same node
* One node is preferred for VIP locations
* If any interface fails on node currently hosting VIPs, VIPs move to
other node

For simplicity sake, I'll start by creating VIPs, and add firewall
plumbing to the VIPs in the future.

My config:
CentOS-6.3 based distro +
corosync-1.4.1-1
pacemaker-1.1.8-1
pcs-0.9.26-1
resource-agents-3.9.2-12
and all required dependencies

My questions:

This sounds like a common use case, but I could not find an
example/HOWTO.  Did I miss it?

Do I have the correct HA cluster packages, versions to start work?
Do I also need the cman?, ccs packages?

How many interfaces should each cluster node have?
 2 interfaces: internal, external
 or
 3 interfaces: internal, external, monitor

Do I need to configure corosync.conf/totem/interface/bindnetaddr, and if
so, bind to what net?

$1M question:
How to configure cluster to monitor all internal, external cluster
interfaces, and perform
failover?  Here's my estimate:

* create external VIP as IpAddr2 and bind to external interfaces
* create internal VIP as IpAddr2 and bind to internal interfaces
* co-locate both VIPs together
* specify a location constraint for preferred node

Any help would be appreciated,
thanks
Jeff



I have several two-nodes firewall clusters running pacemaker+cman (since 
EL6.4) and they work perfectly. My setup is as follow:


Both node boots in a "passive" firewall state (via chkconfig). In this 
state, only corosync trafic is allowed between nodes (and admin access 
on non-VIP IPs). From that state, they both start cman+pacemaker and via 
a location preference + 3 ping nodes, the node with the best score 
starts the resources.
Resources are a group of 30+ IPaddr2, iptables and custom daemons such 
as bind, postfix, ldirectord, etc. All resources are collocated and 
ordered so they all are on the same node and starts in a correct order 
(first I get the VIPs then I start the firewall, then I bind the 
daemons, etc)


VIPs are not really monitored as pacemaker doesn't really do that, it 
just checks the IP is present in some sort of "sudo ip addr ls | fgrep 
" ; if you unplug the network cable, it won't see it: that's where 
you define wisely your ping nodes so that you can monitor the 
connectivity of certain subnet/gateway from all nodes and decide which 
is the best connected one in case of incident.


If you like, I can paste configuration files (cluster.conf + CIB)

Cheers

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] CMAN nodes online

2013-09-16 Thread Florian Crouzat

Le 16/09/2013 17:30, Gopalakrishnan N a écrit :

Hey Guys,

The OS am running is CentOS 6.4 (64bit) and I have disabled IPtables and
SeLinux.

My goal is to make Apache Tomcat as HA. As a first step thought of
testing with Apache.

My network setup is like this,
Node1 is connected to switch
Note2 is connected to switch.

My cluster.conf file is as follows,
[root@test01 ~]# cat /etc/cluster/cluster.conf

And @some point of time am able to see both nodes are registered,
[root@test01 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M104 2013-09-17 02:01:11 test01
2 M108 2013-09-17 02:15:07 test02

And @sometimes with crm_mon -1, I get the following

[root@test01 ~]# crm_mon -1
Last updated: Tue Sep 17 02:47:27 2013
Last change: Tue Sep 17 00:42:43 2013 via crmd on test01
Stack: cman
Current DC: NONE
4 Nodes configured, 2 expected votes
0 Resources configured.

Node test01: UNCLEAN (offline)
Node test01.iopextech.com <http://test01.iopextech.com>: UNCLEAN (offline)
Node test02: UNCLEAN (offline)
Node test02.iopextech.com <http://test02.iopextech.com>: UNCLEAN (offline)

Thanks.

Ok, if you are still unhappy with CMAN check your switch for multicast.

From there, I assume you are happy with cman otherwise there are no
reasons to start pacemaker, but you did, so you get my point ;)

This kind of errors often indicates that your /etc/hosts or DNS is wrong
and pacemaker has a hard time to map hostnames and IPs, etc (not sure
about the internals there...)
Get rid of the invalid nodes by editing the configuration; stop
pacemaker; fix your resolutions and retry.

In the entire process, I usually use FQDN in my cluster.conf file, and I
add them and the corresponding ring IP in my /etc/hosts file on all
nodes. I also use the same FQDN in my pacemaker configuration.

Cheers

On Mon, Sep 16, 2013 at 8:40 PM, Florian Crouzat
mailto:gen...@floriancrouzat.net>> wrote:

Le 16/09/2013 14:18, Gopalakrishnan N a écrit :

Do I need to have a cross over cable between each node? Is it
mandatory?

Nop it doesn't.
In your case, I'd check the network architecture and/or firewalling
regarding multicast. You probably either have wrong iptables and/or
a switch dropping your multicast corosync ring(s).

Also, please, as Andreas said: try to communicate with us in a more
efficient way: more context, more informations and more output
(pasted somewhere).

We are happy to help people, but we don't have to waste our time
trying to understand what these exact people doesn't tell us because
they are lazy.

ps: also use 'corosync-objctl' ; it's a good command to debug rings
and configurations.

Cheers,
Florian.

On Mon, Sep 16, 2013 at 8:01 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>
<mailto:gopalakrishnan.an@__gmail.com
<mailto:gopalakrishnan...@gmail.com>>> wrote:

Again the when i restarted the pacemaker and cman not the
nodes are
not in online, back to square 1.

node1 shows only node1 online, and node2 says node2 online.
I don't
know what's happening in the background...

Any advice would be appreciated..

Thanks.

On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>
<mailto:gopalakrishnan.an@__gmail.com
<mailto:gopalakrishnan...@gmail.com>>>

wrote:

Hi guys,

I got it, basically it tool some time to propogate and
now two
nodes are showing online...

Thanks.

On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>
<mailto:gopalakrishnan.an@__gmail.com
<mailto:gopalakrishnan...@gmail.com>>> wrote:

I have configured CMAN as per the link

http://clusterlabs.org/doc/en-__US/Pacemaker/1.1-plugin/html-__single/Clusters_from_Scratch/__index.html#_configuring_cman

<http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#_configuring_cman>
but
when I type cman_tools nodes only one node is
online even
thought the cluster.conf is propogated in other
node as well.

what could be the reason, in node1, cman_tool nodes
shows
only node1 online, in node2 it shows only node2 is
online.
How to make two nodes as onlin

Re: [Pacemaker] CMAN nodes online

2013-09-16 Thread Florian Crouzat


Le 16/09/2013 14:18, Gopalakrishnan N a écrit :

Do I need to have a cross over cable between each node? Is it mandatory?


Nop it doesn't.
In your case, I'd check the network architecture and/or firewalling 
regarding multicast. You probably either have wrong iptables and/or a 
switch dropping your multicast corosync ring(s).


Also, please, as Andreas said: try to communicate with us in a more 
efficient way: more context, more informations and more output (pasted 
somewhere).


We are happy to help people, but we don't have to waste our time trying 
to understand what these exact people doesn't tell us because they are lazy.


ps: also use 'corosync-objctl' ; it's a good command to debug rings and 
configurations.


Cheers,
Florian.




On Mon, Sep 16, 2013 at 8:01 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>> wrote:

Again the when i restarted the pacemaker and cman not the nodes are
not in online, back to square 1.

node1 shows only node1 online, and node2 says node2 online. I don't
know what's happening in the background...

Any advice would be appreciated..

Thanks.


On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>>
wrote:

Hi guys,

I got it, basically it tool some time to propogate and now two
nodes are showing online...

Thanks.


On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N
mailto:gopalakrishnan...@gmail.com>> wrote:

I have configured CMAN as per the link

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#_configuring_cman
 but
when I type cman_tools nodes only one node is online even
thought the cluster.conf is propogated in other node as well.

what could be the reason, in node1, cman_tool nodes shows
only node1 online, in node2 it shows only node2 is online.
How to make two nodes as online, even thought CMAN service
is running in both nodes.

Thanks in advance.

Regards,
Gopal






___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker with Apache Tomcat

2013-09-11 Thread Florian Crouzat


Le 09/09/2013 17:34, Gopalakrishnan N a écrit :

Hi,

Any tutorial to install pacemaker with Apache Tomcat...

Regards,
Gopal




Yes, first setup a pacemaker cluster by following the guides (Cluster 
from scratch) at http://clusterlabs.org/doc/


Then ask your eventual questions about managing a tomcat resource.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Does ocf:heartbeat:IPaddr2 support binding the virtual IP on a bond interface?

2013-08-29 Thread Florian Crouzat


Le 28/08/2013 19:18, Xiaomin Zhang a écrit :

Actually I don't know how to specify the bond interface to assign this
virtual IP.



$ sudo crm ra meta IPaddr2

search for "nic" and make sure the underlaying interface is up as 
pacemaker doesn't do "ifup" but create aliases on already created 
interfaces (cf prerequisite in the "nic" section).


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Using "avoids" location constraint

2013-07-08 Thread Florian Crouzat


Le 08/07/2013 09:49, Andrew Morgan a écrit :

I'm attempting to implement a 3 node cluster where only 2 nodes are
there to actually run the services and the 3rd is there to form a quorum
(so that the cluster stays up when one of the 2 'workload' nodes fails).

To this end, I added a location avoids contraint so that the services
(including drbd) don't get placed on the 3rd node (drbd3)...

pcs constraint location ms_drbd avoids drbd3.localdomain

the problem is that this constraint doesn't appear to be enforced and I
see failed actions where Pacemaker has attempted to start the services
on drbd3. In most cases I can just ignore the error but if I attempt to
migrate the services using "pcs move" then it causes a fatal startup
loop for drbd. If I migrate by adding an extra location contraint
preferring the other workload node then I can migrate ok.

I'm using Oracle Linux 6.4; drbd83-utils 8.3.11; corosync 1.4.1; cman
3.0.12.1; Pacemaker 1.1.8 & pcs 1.1.8



I'm no quorum-node expert but I believe your initial design isn't optimal.
You could probably even run with only two nodes (real nodes) and 
no-quorum-policy=ignore + fencing (for data integrity) [1]

This is what most (all?) people with two nodes clusters do.

But if you really believe you need to be quorate, then I think you need 
to define your third node as quorum-node in corosync/cman (not sure how 
since EL6.4 and CMAN) and I cannot find a valid link. IIRC with such 
definition, you won't need the location constraints.



[1] 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_perform_a_failover.html#_quorum_and_two_node_clusters




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-07-01 Thread Florian Crouzat


Le 29/06/2013 01:22, Andrew Beekhof a écrit :


On 29/06/2013, at 12:22 AM, Digimer  wrote:


On 06/28/2013 06:21 AM, Andrew Beekhof wrote:


On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree  wrote:


On 2013-06-27T12:53:01, Digimer  wrote:


primitive fence_n01_psu1_off stonith:fence_apc_snmp \
   params ipaddr="an-p01" pcmk_reboot_action="off" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"
primitive fence_n01_psu1_on stonith:fence_apc_snmp \
   params ipaddr="an-p01" pcmk_reboot_action="on" port="1"
pcmk_host_list="an-c03n01.alteeve.ca"


So every device twice, including location constraints? I see potential
for optimization by improving how the fence code handles this ... That's
abhorrently complex. (And I'm not sure the 'action' parameter ought to
be overwritten.)


I'm not crazy about it either because it means the device is tied to a specific 
command.
But it seems to be something all the RHCS people try to do...


Maybe something in the rhcs water cooler made us all mad... ;)


Glad you got it working, though.


location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca

[...]

I'm not sure you need any of these location constraints, by the way. Did
you test if it works without them?


Again, this is after just one test. I will want to test it several more
times before I consider it reliable. Ideally, I would love to hear
Andrew or others confirm this looks sane/correct.


It looks correct, but not quite sane. ;-) That seems not to be
something you can address, though. I'm thinking that fencing topology
should be smart enough to, if multiple fencing devices are specified, to
know how to expand them to "first all off (if off fails anywhere, it's a
failure), then all on (if on fails, it is not a failure)". That'd
greatly simplify the syntax.


The RH agents have apparently already been updated to support multiple ports.
I'm really not keen on having the stonith-ng doing this.


This doesn't help people who have dual power rails/PDUs for power
redundancy.


I'm yet to be convinced that having two PDUs is helping those people in the 
first place.
If it were actually useful, I suspect more than two/three people would have 
asked for it in the last decade.


Well, it's probably because many people are still toying around with 
pacemaker and I assume that not many advanced RHCS users have yet tried 
to translate their current RHCS cluster to pacemaker. Digimer and I did, 
and we both failed having the equivalent  configuration we had 
in our RHCS setup.


I suspect more and more people will hit this issue soon or later.

Anyway, whatever will follow in terms of configuration primitive or API, 
thanks to Digimer tests we now have something (even if unelegant) working :)



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm ptest does not show graphics

2013-06-20 Thread Florian Crouzat


Le 20/06/2013 14:42, Michael Schwartzkopff a écrit :

Am Donnerstag, 20. Juni 2013, 13:43:10 schrieb Florian Crouzat:



My original question was about crm.


It was about an old tool that doesn't work in your environment.

I was just trying to guide you to the correct direction with a working 
an updated tool instead of fixing something prehistoric.

Now if you are not happy with my answers, then good luck.

Nothing do to here anymore.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm ptest does not show graphics

2013-06-20 Thread Florian Crouzat


Le 20/06/2013 13:06, Michael Schwartzkopff a écrit :

Am Donnerstag, 20. Juni 2013, 11:03:53 schrieb Florian Crouzat:
 > I usually use "crm_simulate -S -L -VVV".

 From inside the crm subshell???


Nope.

$ type -a crm_simulate
crm_simulate is /usr/sbin/crm_simulate

$ sudo yum whatprovides $(type -a crm_simulate)

[...]

pacemaker-cli-1.1.8-7.el6.x86_64 : Command line tools for controlling 
Pacemaker clusters

Repo: installed

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm ptest does not show graphics

2013-06-20 Thread Florian Crouzat


Le 19/06/2013 10:31, Michael Schwartzkopff a écrit :

Hi,

When I enter:

# crm configure

(... do some changes ...)

and enter

# crm(live)# ptest



I believe ptest is not recommended anymore and might even be deprecated 
and replaced by "crm_simulate".

I usually use "crm_simulate -S -L -VVV".

Actually, ptest is documented under the 1.0 version[1] of the doc while 
crm_simulate is documented under the 1.1 version[2] which is the most 
up-to-date one.



[1] - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-config-testing-changes.html
[2] - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Can I use Pacemaker release 1.1.8 for production clusters?

2013-06-10 Thread Florian Crouzat


Le 10/06/2013 16:46, Michael Furman a écrit :

Hi all!

According to the Wiki http://clusterlabs.org/wiki/Releases



This page has not been updated since 10:49, 11 February 2011



Even numbered release series (eg. 0.6.x, 1.0.x) are recommended for
production clusters.

I need to install Pacemaker on Centos 6 machines.
Unfortunately, the main Centos repository contains only 1.1.8-7.el6 version.

Questions:
Can I use Pacemaker release 1.1.8 for production clusters (we want to
work with the Centos repository)?


I hope so


Do you expect to change existing features in 1.1.8?


I believe it's not really a tech preview anymore since EL6.4 so I'd 
expect things not to move a lot anymore until RHEL7




Do you have uncompleted features in 1.1.8?

What repository contains Pacemaker release 1.0.12?

Thanks for your help,

Michael



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 17:29, Dejan Muhamedagic a écrit :

Approximative syntax, do not blame me !
>
>* crm configure property maintenance-mode=true
>* crm resource stop R1 # it won't stop as it's in maintenance-mode

A recent crmsh knows that it can delete a resource if
maintenance-mode is set.



Oh. Okay, thanks :)

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] failcount,start/stop-failure in crm_mon

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 16:43, Vadym Chepkov a écrit :


On Jun 6, 2013, at 10:29 AM, Wolfgang Routschka 
 wrote:


Hi,

one question today about deleting start/stop error in crm_mon.

How can I delete failure/errors in crm_mon without having to restart/refresh 
resources?


crm resource cleanup some-resource


Beware that resources can move when failcounts are cleared.
Eg: removing a +INF failcount makes the node eligible to host this 
resource again, and depending on your configuration, resource could move 
back there instantly.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 16:35, Andreas Mock a écrit :

Hi all,

is there a way to remove a resource from a group without
disturbing the other resources in the group.

The following example:
- G1 has R1 R2 R3
- All resources are started
- Stopping R1 would cause a stop of R2 R3
- So, the idea was:
* crm configure edit => remove R1 from the group while running
* stop resource
* delete resource

BUT: At some point (which we couldn't find out at
the moment) all remaining resources of the group are
restarted. It seems that the change of the implicit
dependency tree of the initial group forces a rebuild
of that tree including a restart of that group.
(Andrew: Is this assumption right?)

So, is there are way to add/remove resources from
group without disturbing the other resources.
It's clear to me that the resources would restart
when the node assignment after removing would change.

Hints welcome.



Approximative syntax, do not blame me !

* crm configure property maintenance-mode=true
* crm resource stop R1 # it won't stop as it's in maintenance-mode
* crm configure delete R1
* crm configure show # very that all references to R1 are gone
* crm resource reprobe # the cluster double check the status of declared 
resources and sees that everything is fine and R1 doesn't exists anymore
* crm_mon -Arf1 # double check that everything is "started (unmanaged)" 
and R1 is gone
* crm_simulate -S -L -VVV # optional, to check what would happen when 
leaving maintenance-mode

* crm configure property maintenance-mode=false

If something goes wrong while in maintenance-mode, crm resource cleanup 
foo might be handy. Nothing should move, start or stop until you leave 
maintenance-mode anyway. I use this scenario very often, to add or 
remove IPaddr2 resources to a group of 30+ IPaddr2.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] failed actions after resource creation

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 16:25, andreas graeper a écrit :

hi and thanks.
(better sentences: i will give my best)


Okay


on inactive node there is actually only /etc/init.d/nfs and neither
nfs-common nor nfs-kernel-server.
is monitor not only looking for the running service on active node, but
for the situation on inactive node, too ?


Well, the main goal of a cluster is the ability to move resources 
between members in case of failures of the hosting node so yes, of 
course pacemaker checks the capacity of all cluster nodes to host the 
resources ..



so i would have expected, that the missing nfs-kernel-server was
reported, too.


i guess, this can be handled only with a init-script 'nfs' (same name on
both nodes) that is starting/killing nfs-commo/nfs-kernel-server ?
or is there another solution ?


If you want to clusterize nfs, than have all your node nfs-ready, 
otherwise, I really don't see what is your inactive node good for ?




what is monitor in case of resource managed by lsb-script doing ?
is it calling `service xxx status` ?


Yes.


what does the monitor expect on node where service is running / not
running ?


http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html



thanks in advance
andreas


You are welcome.




2013/6/6 Florian Crouzat mailto:gen...@floriancrouzat.net>>

Le 06/06/2013 15:49, andreas graeper a écrit :

  p_nfscommon_monitor_0 (node=linag, call=189, rc=5,
status=complete): not installed


Sounds obvious: "not installed". Node "linag" is missing some
daemons/scripts , probably nfs-related. Check your nfs packages and
configuration on both nodes, node1 should be missing something.


what can i do ?


Better sentences.

--
Cheers,
    Florian Crouzat





--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] failed actions after resource creation

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 15:49, andreas graeper a écrit :

 p_nfscommon_monitor_0 (node=linag, call=189, rc=5,
status=complete): not installed


Sounds obvious: "not installed". Node "linag" is missing some 
daemons/scripts , probably nfs-related. Check your nfs packages and 
configuration on both nodes, node1 should be missing something.



what can i do ?


Better sentences.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] target: 7 vs. rc: 0 error

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 15:47, ESWAR RAO a écrit :

Please let me know if there's any chance that I can monitor the already
running resources ?


That sentence makes no sense to me.

As I said, you must not start resource from outside the cluster 
(runlevels). Let the resource-manager (pacemaker) start them and all 
your troubles will seem so faraway.


And if you want to /add/ a new resource in the cluster configuration and 
that this very resource is already running from outside the cluster, 
then you should activate maintenance-mode, define the new resource, 
reprobe so the cluster sees it is running, and exit maintenance-mode.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] target: 7 vs. rc: 0 error

2013-06-06 Thread Florian Crouzat


Le 06/06/2013 11:40, ESWAR RAO a écrit :

Hi All,

Can someone help me in my below setup:

I am trying to monitor RA on a 2 node cluster setup.
The resources are already running before start of HB+pacemaker.
+
#crm configure primitive oc_d1 lsb::testd1 meta allow-migrate="true"
migration-threshold="1" failure-timeout="30s" op monitor interval="3s"
#crm configure clone oc_d1_clone oc_d1

#crm configure primitive oc_d2 lsb::testd2 meta allow-migrate="true"
migration-threshold="3" failure-timeout="30s" op monitor interval="5s"
#crm configure clone oc_d2_clone oc_d2
+

But the resources are getting stopped on one node with below errors:

  [11519]: WARN: status_from_rc: Action 9 (oc_d1:0_monitor_0) on
ubuntu191 failed (target: 7 vs. rc: 0): Error
Jun 06 14:45:22 ubuntu191 pengine: [11712]: notice: LogActions: Stop
oc_d1:0  (ubuntu191)

  Clone Set: oc_d1_clone [oc_d1]
  Started: [ ubuntu190 ]
  Stopped: [ oc_d1:1 ]
  Clone Set: oc_d2_clone [oc_d2]
  Started: [ ubuntu190 ]
  Stopped: [ oc_d2:1 ]

Thanks
Eswar



1/ There is no question in your email.
2/ You should not start your HA resources in the runlevels if you intend 
to have them clusterized: pacemaker (aka: the resource manager) should 
start them.
3/ Eventually, if you persist against point 2, check what does 
return-code 7 means, and eventually read the sources from your RA to 
fine what could possibly returns 7.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] group resource starting parallel

2013-06-06 Thread Florian Crouzat


Le 05/06/2013 16:23, Wolfgang Routschka a écrit :

Hi Guys,

one question about group  resource for starting parallel configuring with 
crmshell (Scientific Linux 64
with pacemaker 1.1.8-7, cman-3.0.12.1-49 and crmsh-1.2.5-55).

in my 2 node cluster I´ll configured a group with 40 ip-address resources for 
easy managing. Now I want that
start the resources parallel.

in my crmshell I cannot use the option "meta ordered=false"  - these option is 
no longer disponse for my information

Afte searching i found "resource sets" so I hope it´s correct for my way  to 
parallel my resources but I
can´t configure resource sets in crmshell in my opinion.

How can I configure my resources to start parallel?

Greetings Wolfgang


Well, as one the primary author of pacemaker once said[1] "Unordered 
and/or uncolocated groups are an abomination."


I don't know if his position has moved but a group beeing a syntaxic 
shortcut for ordering+collocation, trying to make it behave otherwise 
might not be a good idea, even if I understand your need to address a 
group of 40 resources in a command.


Question: the couple seconds (if not a single second) required to start 
synchronously 40 IPaddr2 RA are too long to wait for you ? Why do you 
/must/ start them in parallel ?



[1] - 
http://oss.clusterlabs.org/pipermail/pacemaker/2011-January/008969.html



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] lsb resource manager

2013-06-04 Thread Florian Crouzat


Le 04/06/2013 11:55, andreas graeper a écrit :

but pacemaker should realize that lsb:xxx did not start ?!
what is to do ? maybe the init scripts return is not correct ?!


Check your init-script against lsb compliance: 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] more newbie questions

2013-06-04 Thread Florian Crouzat


Le 04/06/2013 06:44, Alex Samad - Yieldbroker a écrit :


but I am looking for info on the
op start
op monito
op stop

where can I find that. Googling show these in examples but doesn't explain them.


Operations are not tied to a resource agent but are generic: start, 
stop, monitor, and eventually promote/demote, etc.


More information in the official documentation, here 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html 




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Shutdown of pacemaker service takes 20 minutes

2013-05-30 Thread Florian Crouzat


Le 30/05/2013 13:57, Johan Huysmans a écrit :


When my resource has received the stop command, it will stop, but this
takes some time.
When the status is monitored during shutdown of the resource this will
fail, as the resource is configured as on-fail="block",
the resource is set to unmanaged.

Is this a bug? or can I workaround this issue?



You should probably override the default timeout for the stop() 
operation with a custom values of your choice.


See table 5.3 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html#_monitoring_resources_for_failure


Eg:

primitive foo ocf:x:y \
op monitor on-fail="restart" interval="10s" \
op start interval="0" timeout="2min" \
    op stop interval="0" timeout="5min"


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] HA for apache, doesnot work with pacemaker

2013-05-29 Thread Florian Crouzat


Le 29/05/2013 11:48, Gopi Krishna B a écrit :

 Allow from 10.102.228.60


Maybe you need Allow from 127.0.0.1 , or ::1 (localhost).

Anyway, proceed step by step and validate your apache setup first, 
especially that the /status works fine.


Then you can try to add HA to it, and provides pacemaker logs once you 
are sure apache is properly configured but pacemaker still fails to 
bring it up.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] newbie question(s)

2013-05-24 Thread Florian Crouzat


Le 24/05/2013 15:36, Nick Khamis a écrit :


We are looking to put something together but
don't wan't to use cman for certain things and pacemaker for other. This
will be strictly and active/active using OpenAIS and pacemaker. Has that
been decoupled, stable and sorted out?



I don't have the slightest idea.
Sorry I never used any of these products, I just run couples good ol' 
cman+pacemaker 2 nodes clusters without shared storage.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] newbie question(s)

2013-05-24 Thread Florian Crouzat

Le 24/05/2013 04:15, Alex Samad - Yieldbroker a écrit :

-Original Message-
From: Florian Crouzat [mailto:gen...@floriancrouzat.net]
Sent: Thursday, 23 May 2013 6:27 PM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] newbie question(s)

[snip]

You could also wait for a failover where the VIP or (any resource) will fail to
properly stop, the cluster doesn't know what do to on stop-failures (beside
fencing), it freezes in this weird state => you have two slaves on your
network until an admin fixes it.

True, what would you use for quorum (it's a 2 node cluster)?

I never used quorum on any 2 nodes cluster, by definition it makes no 
sense, so I always used no-quorum-policy=ignore.

If you really want quorum, you need a third player, either a dedicated 
quorum pseudo-node (don't know much about these) or a third node.

Hope it helps...

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] newbie question(s)

2013-05-23 Thread Florian Crouzat


Le 22/05/2013 02:13, Alex Samad - Yieldbroker a écrit :

> >Any help or suggestions muchly appreciated
> >

>
>Also, fencing!

Not sure that I need it. The app is always running on both nodes, it's just the 
ip address that is shared



A cluster without fencing is not a cluster, by definition. Proper 
fencing should be at least two different methods, eg: IPMI + PDU.


Some peoples live with clusters without fencing, they mostly rely on 
luck  and luck is not very "High Availability".


If you want a proof that you need it, just drop all corosync network 
traffic => split brain, you have two masters on your network until an 
admin fixes it.


You could also wait for a failover where the VIP or (any resource) will 
fail to properly stop, the cluster doesn't know what do to on 
stop-failures (beside fencing), it freezes in this weird state => you 
have two slaves on your network until an admin fixes it.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Post script question

2013-05-22 Thread Florian Crouzat


Le 22/05/2013 15:02, Daniel Gullin a écrit :

Hi, is there is any possibilities to use a “post-script” when a failover
has happened ? I have a corosync/pacemaker installation with two
services, filesystem and IP and two nodes.

The system should be in passive/active mode. When a failover has happen
the passive node should mount the shared disk and migrate the shared IP
as well, when that is

finished I want pacemake to run a script on the “new” active node… Could
I do this ?

Thanks

Daniel




Maybe there are more elegant ways I haven't heard about but you can 
totally create a lsb resource firing a custom script of yours in the 
start() section, and by smart usage of group or collocation+ordering, 
force this resource to be started after all the others.


Beware of the lsb compliance of your resource...

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] newbie question(s)

2013-05-21 Thread Florian Crouzat

tat can fail 99 more times on dc1wwwrp02 before being forced off
May 21 09:09:28 dc1wwwrp01 pengine[2355]:   notice: process_pe_message: 
Transition 5487: PEngine Input stored in: /var/lib/pengine/pe-input-1485.bz2
May 21 09:09:28 dc1wwwrp01 crmd[2356]:   notice: do_state_transition: State 
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
May 21 09:09:28 dc1wwwrp01 crmd[2356]: info: do_te_invoke: Processing graph 
5487 (ref=pe_calc-dc-1369091368-5548) derived from 
/var/lib/pengine/pe-input-1485.bz2
May 21 09:09:28 dc1wwwrp01 crmd[2356]:   notice: run_graph:  Transition 
5487 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pengine/pe-input-1485.bz2): Complete
May 21 09:09:28 dc1wwwrp01 crmd[2356]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
May 21 09:12:35 dc1wwwrp01 cib[2351]: info: cib_stats: Processed 1 
operations (0.00us average, 0% utilization) in the last 10min
May 21 09:23:12 dc1wwwrp01 crm_resource[5165]:error: unpack_rsc_op: 
Preventing ybrpstat from re-starting on dc1wwwrp01: operation monitor failed 
'insufficient privileges' (rc=4)
May 21 09:23:12 dc1wwwrp01 crm_resource[5165]:error: unpack_rsc_op: 
Preventing ybrpstat from re-starting on dc1wwwrp02: operation monitor failed 
'insufficient privileges' (rc=4)
May 21 09:23:12 dc1wwwrp01 cib[2351]: info: cib_process_request: Operation 
complete: op cib_delete for section 
//node_state[@uname='dc1wwwrp01']//lrm_resource[@id='ybrpstat'] 
(origin=local/crmd/5589, version=0.101.36): ok (rc=0)
May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: delete_resource: Removing 
resource ybrpstat for 5165_crm_resource (internal) on dc1wwwrp01
May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: notify_deleted: Notifying 
5165_crm_resource on dc1wwwrp01 that ybrpstat was deleted
May 21 09:23:12 dc1wwwrp01 crmd[2356]:  warning: decode_transition_key: Bad 
UUID (crm-resource-5165) in sscanf result (3) for 0:0:crm-resource-5165
May 21 09:23:12 dc1wwwrp01 attrd[2354]:   notice: attrd_trigger_update: Sending flush 
op to all hosts for: fail-count-ybrpstat ()
May 21 09:23:12 dc1wwwrp01 cib[2351]: info: cib_process_request: Operation 
complete: op cib_delete for section 
//node_state[@uname='dc1wwwrp01']//lrm_resource[@id='ybrpstat'] 
(origin=local/crmd/5590, version=0.101.37): ok (rc=0)
May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: abort_transition_graph: 
te_update_diff:320 - Triggered transition abort (complete=1, tag=lrm_rsc_op, 
id=ybrpstat_last_0, magic=0:0;3:3572:0:c348b36c-f6dd-4a7d-ac5b-01a3b8ce3c34, 
cib=0.101.37) : Resource op removal
May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: abort_transition_graph: 
te_update_diff:320 - Triggered transition abort (complete=1, tag=lrm_rsc_op, 
id=ybrpstat_last_0, magic=0:0;3:3572:0:c348b36c-f6dd-4a7d-ac5b-01a3b8ce3c34, 
cib=0.101.37) : Resource op removal


From node2

===
May 21 09:20:03 dc1wwwrp02 lrmd: [2045]: info: rsc:ybrpip:16: monitor
May 21 09:23:12 dc1wwwrp02 lrmd: [2045]: info: cancel_op: operation monitor[16] 
on ocf::IPaddr2::ybrpip for client 2048, its parameters: 
CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.6] 
CRM_meta_timeout=[2] CRM_meta_interval=[5000] ip=[10.32.21.10]  cancelled
May 21 09:23:12 dc1wwwrp02 lrmd: [2045]: info: rsc:ybrpip:20: stop
May 21 09:23:12 dc1wwwrp02 cib[2043]: info: apply_xml_diff: Digest 
mis-match: expected dcee73fe6518ac0d4b3429425d5dfc16, calculated 
4a39d2ad25d50af2ec19b5b24252aef8
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_process_diff: Diff 0.101.36 
-> 0.101.37 not applied to 0.101.36: Failed application of an update diff
May 21 09:23:12 dc1wwwrp02 cib[2043]: info: cib_server_process_diff: 
Requesting re-sync from peer
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_server_process_diff: Not 
applying diff 0.101.36 -> 0.101.37 (sync in progress)
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_server_process_diff: Not 
applying diff 0.101.37 -> 0.102.1 (sync in progress)
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_server_process_diff: Not 
applying diff 0.102.1 -> 0.102.2 (sync in progress)
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_server_process_diff: Not 
applying diff 0.102.2 -> 0.102.3 (sync in progress)
May 21 09:23:12 dc1wwwrp02 cib[2043]:   notice: cib_server_process_diff: Not 
applying diff 0.102.3 -> 0.102.4 (sync in progress)

Any help or suggestions muchly appreciated



Also, fencing!



Thanks
Alex





--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] question about interface failover

2013-05-21 Thread Florian Crouzat


Le 18/05/2013 20:23, christopher barry a écrit :

On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote:

Le 16/05/2013 21:45, christopher barry a écrit :

Greetings,

I've setup a new 2-node mysql cluster using
* drbd 8.3.1.3
* corosync 1.4.2
* pacemaker 117
on Debian Wheezy nodes.

failover seems to be working fine for everything except the ips manually
configured on the interfaces.


This sentence makes no sense to me.
The cluster will not failover something that is not clusterized (a
'manually' configured IP...)

What are you trying to achieve exactly ?
Also, could you pastebin the output of "crm_mon -Arf1" I find it more
easy to read.




see config here:
http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
+g09RcJvhHbgrY1JuN7D+gA4=

If I bring down an interface, when the cluster restarts it, it only
starts it with the vip - the original ip and route have been removed.


Makes sense if you added the 'original' IP manually...
You should have non-VIP in /etc/sysconfig/network/ifcfg-*
But then again, please precise what you are trying to achieve.



not sure what to do to make sure the permanent ip and the routes get
restored. I'm not all that versed on the cluster commandline yet, and
I'm using LCMC for most of my usage.





(@howard2.rjmetrics.com)-(14:00 / Sat May 18)
[-][~]# crm_mon -Arf1

Last updated: Sat May 18 14:00:27 2013
Last change: Thu May 16 17:33:07 2013 via crm_attribute on
howard3.rjmetrics.com
Stack: openais
Current DC: howard3.rjmetrics.com - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ]

Full list of resources:

  Master/Slave Set: ms_drbd_mysql [p_drbd_mysql]
  Masters: [ howard2.rjmetrics.com ]
  Slaves: [ howard3.rjmetrics.com ]
  Resource Group: g_mysql
  p_fs_mysql(ocf::heartbeat:Filesystem):Started
howard2.rjmetrics.com
  ClusterPrivateIP  (ocf::heartbeat:IPaddr2):   Started
howard2.rjmetrics.com
  ClusterPublicIP   (ocf::heartbeat:IPaddr2):   Started
howard2.rjmetrics.com
  p_mysql   (ocf::heartbeat:mysql): Started howard2.rjmetrics.com

Node Attributes:
* Node howard3.rjmetrics.com:
 + master-p_drbd_mysql:0: 1000
* Node howard2.rjmetrics.com:
 + master-p_drbd_mysql:1: 1

Migration summary:
* Node howard3.rjmetrics.com:
p_drbd_mysql:1: migration-threshold=100 fail-count=1
* Node howard2.rjmetrics.com:
ClusterPublicIP: migration-threshold=100 fail-count=1

Failed actions:
 p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29,
rc=-2, status=Timed Out): unknown exec error
 ClusterPublicIP_monitor_3 (node=howard2.rjmetrics.com, call=122,
rc=7, status=complete): not running


howard2 and howard3 are the two clustered servers.

During testing, when I ifdown either eth0 or eth1, the cluster starts
the vip back up, but the other non-vip IPs and routes do not get
started. I'm running Debian, so these are configured
in /etc/network/interfaces. Saying 'manually' configured was misleading
on my part, sorry about that.


Mhh, I cannot reproduce right now but I was pretty sure that IPaddr2 
used "ip addr add X.X.X.X/YY dev ZZ" so I was expecting that ifdowning 
device ZZ would prevent pacemaker to re-up the VIP as the underlaying 
device doesn't exists anymore.
It's even proved by the fact that the non-vip doesn't come up again: 
IPaddr2 doesn't ifup, it add an alias to an existing device.

See "sudo crm ra meta IPaddr2" and search for "nic="

Anyway, "ifdown" is not a valid use case to test your cluster, this 
doesn't represent any possible valid production scenario.




eth0 is the public interface, and eth1 is the private interface. eth2
and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled
between the nodes.

The test I was doing was to pull cables from eth0 and eth1, which hung
the cluster. My assumption is that I need to add more configuration
elements to manage the other IPs and also setup some ping hosts that
when unreachable will initiate failover. What would help me I think is
an example config or pointers to how to add these elements.


Well, without digging much in your configuration, you need ping-nodes 
yes so that your most connected nodes "wins", and you also need fencing, 
that is mandatory on any cluster.


Here's sample configuration for ping nodes and a location constraing so 
that the most connected nodes hosts the resource "foo":



primitive ping-gw-sw1-sw2 ocf:pacemaker:ping \
params host_list="192.168.10.1 192.168.2.11 192.168.2.12" 
dampen="35s" attempts="2" timeout="2" multiplier="100" \

op monitor interval="

Re: [Pacemaker] question about interface failover

2013-05-17 Thread Florian Crouzat


Le 16/05/2013 21:45, christopher barry a écrit :

Greetings,

I've setup a new 2-node mysql cluster using
* drbd 8.3.1.3
* corosync 1.4.2
* pacemaker 117
on Debian Wheezy nodes.

failover seems to be working fine for everything except the ips manually
configured on the interfaces.


This sentence makes no sense to me.
The cluster will not failover something that is not clusterized (a 
'manually' configured IP...)


What are you trying to achieve exactly ?
Also, could you pastebin the output of "crm_mon -Arf1" I find it more 
easy to read.





see config here:
http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g
+g09RcJvhHbgrY1JuN7D+gA4=

If I bring down an interface, when the cluster restarts it, it only
starts it with the vip - the original ip and route have been removed.


Makes sense if you added the 'original' IP manually...
You should have non-VIP in /etc/sysconfig/network/ifcfg-*
But then again, please precise what you are trying to achieve.



not sure what to do to make sure the permanent ip and the routes get
restored. I'm not all that versed on the cluster commandline yet, and
I'm using LCMC for most of my usage.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-13 Thread Florian Crouzat


Le 09/05/2013 16:40, Steven Bambling a écrit :

I'm having some issues with getting some cluster  monitoring setup and
configured on a 3 node multi-state cluster.   I'm using Florian's blog
as an example
http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/.



Btw, if any clarification is required on this blog post, let me know ;)

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] cman based cluster fencindevice fence_pcmk

2013-04-11 Thread Florian Crouzat


Le 11/04/2013 16:49, Wolfgang Routschka a écrit :

Hi all,
one question today about cman based cluster on rhel6 and clone systems
with fencingdeviceagent fence_pcmk
In my scenario the stonithdevice is a IBM based IPMI-Management
Interface (IMM) so I want to use fence_ipmilan from package resource-agents.
after reading rhel quickstart guide
_http://clusterlabs.org/quickstart-redhat.html_ and cluster from Scratch
8.2.3 Configuring CMAN Fencing
_http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08s02s03.html_
I´m not sure how can I use fence_ipmilan for stonith device resource
because in my configuration fencingdevice is fence_pcmk
"ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk"
Greetings
Wolfgang



All "ccs" commands address CMAN.
With these, you only configure CMAN to delegate fencing to Pacemaker.
Then, with regular Pacemaker commands and by defining usual primitives 
(fence_ipmilan, fence_wti ...), you achieve resource-level fencing.


Do not mix the two concepts.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ping resource polling skew

2013-03-20 Thread Florian Crouzat


Le 20/03/2013 04:11, Quentin Smith a écrit :


Is there any way to get Pacemaker to delay resource transitions until at
least one full polling cycle has happened, so that in the event of an
outage of the ping target, resources stay put where they are running?


there is the "dampen" parameter  use a high value like 3 or more
times the monitor-interval to give all nodes the chance to detect the
dead target(s), that should help.


Does that actually help in this case? My understanding is that the
dampen parameter will delay the attribute change for each host, but
those delays will still tick down separately for each node, resulting in
exactly the same behavior, just delayed by dampen seconds.



I have had the same questions, and I was quite surprised to see this 
issue wasn't really mentioned anywhere.

So far, I've been relying on the dampen parameter.
Here is my resource definition:

primitive ping-nq-sw-swsec ocf:pacemaker:ping \
params host_list="192.168.10.1 192.168.2.11 192.168.2.12" 
dampen="35s" attempts="2" timeout="2" multiplier="100" \

op monitor interval="15s"

As I understand it, a node cannot trigger any transition until 35s 
(dampen) had passed since this particular node lost a ping-node.
And by setting a monitor interval of 15s, I can be sure that within this 
35s, all nodes should have marked that ping-node as dead and continue to 
all have a common score => nothing moves (35s > 2*15s so at least all 
nodes have pinged twice during the dampen delay)


Hope that helps.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Running a custom script

2013-03-12 Thread Florian Crouzat


Le 12/03/2013 12:17, Michael Smith a écrit :

Hi,

I think it’s quite simple but I can’t see how to do it.

I have a two node cluster serving up a samba share and allowing access
via sftp, these I have all working fine, but for the following.

In order to have samba and sftp working together on the same directories
I need to do a mount –o bind  .

This I need to be done when the cluster fails over, it’s a one off event
and doesn’t have a any running process or require a start/stop script.
I have written the script but I now need it to run as a clustered
resource, once the file system has been mounted (drdb) and before I
start samba and set the virtual IP.

Hope you can help.

Thanks

Michael Smith




Define a lsb resource (see Pacemaker Explained), it can be anything from 
an init-script to a custom script, as long as it's LSB compliant[1]. You 
can have it behave like any other shipped resource-agent.


[1] - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] set the timeout

2013-03-11 Thread Florian Crouzat


Le 11/03/2013 15:44, fatcha...@gmx.de a écrit :

Hi,

how can I adjust the timeout for the start of a mysqld or a httpd in a config. 
It seems to me as if this is a little to short for our systems.
One of my configuration looks like this:



See 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/_resource_operations.html

It's the same as defining a timeout for a "monitor" operation.
You can do it for any operation.

Eg:

primitive foo lsb:bar \
op monitor on-fail="restart" interval="10s" \
op start interval="0" timeout="3min" \
op stop interval="0" timeout="1min"

Don't forget stop timeouts or you'll (probably) get fenced ;)

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Monitor process, migrate only ip resources

2013-02-20 Thread Florian Crouzat


Le 20/02/2013 09:07, Grant Bagdasarian a écrit :

3)When I only stop the kamailio process on the primary node, the process
is restarted again. Which is also good, but I thought it would migrate
everything to the secondary node when the kamailio process stopped.



Pacemaker always tries to restart a failed resources on it's original 
hosting node.

Only if it fails to do so it will migrate the resource(s) to another node.


If there is something wrong on the primary node which causes the
kamailio process to keep crashing and restarting, it could be a hazard
for our production environment.



I guess the question is: is failcount incremented when a failed resource 
*is* restarted on the same node and no transition occurs .



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Monitor process, migrate only ip resources

2013-02-19 Thread Florian Crouzat


Le 19/02/2013 13:54, Grant Bagdasarian a écrit :

Hello,

I wish to monitor a certain running process and migrate floating IP
addresses when this process stops running.

My current configuration is as following:

crm(live)configure# show

node $id="8fe81814-6e85-454f-b77b-5783cc18f4c6" proxy1

node $id="ceb5c90f-ee6a-44b9-b722-78781f6a61ab" proxy2

primitive sip_ip ocf:heartbeat:IPaddr \

 params ip="10.0.0.1" cidr_netmask="255.255.255.0" nic="eth1" \

 op monitor interval="40s" timeout="20s"

primitive sip_ip_2 ocf:heartbeat:IPaddr \

 params ip="10.0.0.2" cidr_netmask="255.255.255.0" nic="eth1" \

 op monitor interval="40s" timeout="20s"

primitive sip_ip_3 ocf:heartbeat:IPaddr \

 params ip="10.0.0.3" cidr_netmask="255.255.255.0" nic="eth1" \

 op monitor interval="40s" timeout="20s"

location sip_ip_pref sip_ip 100: proxy1

location sip_ip_pref_2 sip_ip_2 101: proxy1

location sip_ip_pref_3 sip_ip_3 102: proxy1

property $id="cib-bootstrap-options" \

 dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \

 cluster-infrastructure="Heartbeat" \

 stonith-enabled="false"

Couple days ago our kamailio process stopped and the ip resources
weren’t migrated to our secondary node.


Of course, why would they ?



The secondary node already has the kamailio process running.


First, you remove kamailio from runlevel so that it doesn't start on 
server boot and only the cluster manages it.




How do I configure ha so that the kamailio process is monitored every x
seconds and when it has stopped the three ip addresses are migrated to
the secondary node?


Then, you define a kamailio resource (possibly by using lsb:kamailio is 
there is not a real resource-agent).


Finally, you create a group with your 3 IPs + kamailio.
Remember, groups are ordered and collocated set of resources that starts 
from left to right, and are stopped in the opposite order.

Possibly, you define a location constraint for the group to prefer proxy1.

In the end, it might be something like:

...
primitive sip_ip ocf:heartbeat:IPaddr \
params ip="10.0.0.1" cidr_netmask="255.255.255.0" nic="eth1" \
op monitor interval="40s" timeout="20s"
primitive sip_ip_2 ocf:heartbeat:IPaddr \
params ip="10.0.0.2" cidr_netmask="255.255.255.0" nic="eth1" \
op monitor interval="40s" timeout="20s"
primitive sip_ip_3 ocf:heartbeat:IPaddr \
params ip="10.0.0.3" cidr_netmask="255.255.255.0" nic="eth1" \
op monitor interval="40s" timeout="20s"
primitive kamailio lsb:kamailio \
op monitor interval="40s" timeout="20s"
group SIP sip_ip sip_ip_2 sip_ip_3 kamailio
location SIP_prefer_proxy1 SIP 100: proxy1
...

Note that with such as config, kamailio is restarted when it is 
migrated, which I assumed is something you want, so that it can bind on 
the sip_ip ...



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Online add a new node to cluster communicating by UDPU

2013-02-12 Thread Florian Crouzat


Le 12/02/2013 10:39, Lars Marowsky-Bree a écrit :

On 2013-02-12T10:24:29, Florian Crouzat  wrote:


There might be other ways, probably cleaner, but you can always:


"always" is relative. This doesn't work for services like
GFS2/OCFS2/cLVM2.

Regards,
 Lars



Yeah right, I don't use shared storage in my clusters, so my mind 
skipped that part, you are right.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Online add a new node to cluster communicating by UDPU

2013-02-12 Thread Florian Crouzat


Le 12/02/2013 10:10, Michal Fiala a écrit :

Hello,

is there a way how to online add a new node to corosync/pacemaker
cluster, where nodes communicate by unicast UDP?

Thanks

Michal



There might be other ways, probably cleaner, but you can always:
* put the cluster into maintenance-mode
* shutdown the cluster stack on all nodes (pacemaker + corosync)
* reconfigure the ring(s) on all nodes
* start corosync on all nodes
* test connectivity (corosync-objctl | fgrep members)
* start pacemaker

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2013-02-07 Thread Florian Crouzat


Le 05/02/2013 02:36, Andrew Beekhof a écrit :

On Fri, Feb 1, 2013 at 8:21 PM, Florian Crouzat
 wrote:

Le 01/02/2013 03:48, Andrew Beekhof a écrit :


On Tue, Jan 22, 2013 at 3:18 AM, Florian Crouzat
 wrote:


Le 29/11/2012 22:10, Andrew Beekhof a écrit :


Not so fast :-)

crm_mon supports

  -E, --external-agent=value
 A program to run when resource operations take place.

  -e, --external-recipient=value A recipient for your program
(assuming you want the program to send something to someone).

so without recompiling, you can call a script - possibly it could call
something that sends out snmp alerts ;-)





So, I took a first shot at writing an external-agent script that would
somehow reproduce the behavior of crm_mon when SNMP support is built-in.

Basically, to refocus the discussion, I've written this script because I
want to be alerted via SNMP on most of the cluster events but sadly my
version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use
this
feature combined with ocf:pacemaker:ClusterMon but I can use crm_mon
ability
to trigger an external-agent (script, binary...)

Script: http://files.floriancrouzat.net/clusterMon.sh
It respects PCMK-MIB.txt.



Is that something you'd like to include with pacemaker?



Well, I'm really not sure it would be useful for anyone as it not generic at
all, and highly oriented for my needs.


Looks pretty configurable to me.


Maybe it would fit better in an "example" section of Pacemaker_Explained


That also works :)



Well, do as you please. I'm by no means able to choose :)

Not sure what you meant by "Is that something you'd like to include with 
pacemaker?" but feel free to commit the script to any "extra" or 
"script" folder in pacemaker sources, maybe remove my name and add the 
required license(s) to it or just paste the few bash lines I produced to 
the correct section of the correct documentation.


I did some polishing on the comments in the scripts since Chapter 7 of 
Pacemaker Explained has been published 
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-notification-external.html)



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2013-02-01 Thread Florian Crouzat


Le 01/02/2013 03:48, Andrew Beekhof a écrit :

On Tue, Jan 22, 2013 at 3:18 AM, Florian Crouzat
 wrote:

Le 29/11/2012 22:10, Andrew Beekhof a écrit :


Not so fast :-)

crm_mon supports

 -E, --external-agent=value
A program to run when resource operations take place.

 -e, --external-recipient=value A recipient for your program
(assuming you want the program to send something to someone).

so without recompiling, you can call a script - possibly it could call
something that sends out snmp alerts ;-)




So, I took a first shot at writing an external-agent script that would
somehow reproduce the behavior of crm_mon when SNMP support is built-in.

Basically, to refocus the discussion, I've written this script because I
want to be alerted via SNMP on most of the cluster events but sadly my
version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use this
feature combined with ocf:pacemaker:ClusterMon but I can use crm_mon ability
to trigger an external-agent (script, binary...)

Script: http://files.floriancrouzat.net/clusterMon.sh
It respects PCMK-MIB.txt.


Is that something you'd like to include with pacemaker?


Well, I'm really not sure it would be useful for anyone as it not 
generic at all, and highly oriented for my needs.

Maybe it would fit better in an "example" section of Pacemaker_Explained



ps: yes I know, lots of comments and few actual lines of code but that's
just because
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#ch-notification
hasn't been published yet.


/me kicks of a rebuild now.


Thank you for that, hope my poor writing skills won't be noticed =)


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2013-01-21 Thread Florian Crouzat


Le 29/11/2012 22:10, Andrew Beekhof a écrit :


Not so fast :-)

crm_mon supports

-E, --external-agent=value
   A program to run when resource operations take place.

-e, --external-recipient=value A recipient for your program
(assuming you want the program to send something to someone).

so without recompiling, you can call a script - possibly it could call
something that sends out snmp alerts ;-)



So, I took a first shot at writing an external-agent script that would 
somehow reproduce the behavior of crm_mon when SNMP support is built-in.


Basically, to refocus the discussion, I've written this script because I 
want to be alerted via SNMP on most of the cluster events but sadly my 
version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use 
this feature combined with ocf:pacemaker:ClusterMon but I can use 
crm_mon ability to trigger an external-agent (script, binary...)


Script: http://files.floriancrouzat.net/clusterMon.sh
It respects PCMK-MIB.txt.

ps: yes I know, lots of comments and few actual lines of code but that's 
just because 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#ch-notification 
hasn't been published yet.


Any directions, hints or corrections are welcome.

Florian.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2012-12-10 Thread Florian Crouzat


Le 07/12/2012 03:19, Andrew Beekhof a écrit :

On Fri, Dec 7, 2012 at 2:58 AM, Florian Crouzat
 wrote:

I cannot find any good place where to write this new content in Pacemaker
Explained. If advised (in terms of table of contents), I'll happily provide
a patch.


I reckon just before


https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt#L78

is as good as any :)



Actually, there was an obvious place ; undocumented Chapter 7: 
"Receiving Notification for Cluster Events".


Please find attached a git patch, it's basically my first time with both 
asciidoc and git, also, I'm not a native English speaker so please be 
indulgent.


--
Cheers,
Florian Crouzat
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt 
b/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt
index e69de29..ad99977 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt
@@ -0,0 +1,126 @@
+= Receiving Notification for Cluster Events =
+
+anchor:ch-notifications[Chapter 7, Receiving Notification for Cluster Events]
+indexterm:[Resource,Notification]
+
+A Pacemaker cluster is an event driven system. In this context, an event is a
+resource failure or configuration change (not exhaustive).
+
+The +ocf:pacemaker:ClusterMon+ resource can monitor the cluster status and
+triggers alerts on each cluster event. This resource runs +crm_mon+ in the
+background at regular intervals (configurable) and uses +crm_mon+ capabilities
+to send emails (SMTP), SNMP traps or to execute an external program via the
++extra_options+ parameter.
+
+[NOTE]
+Depending on your system settings and compilation settings, SNMP or email
+alerts might be unavailable. Check +crm_mon --help+ output to see if these
+options are available to you. In any case, executing an external agent will
+always be available, and you can have this agent to send emails, SNMP traps,
+or whatever action you develop.
+
+[[s-notification-snmp]]
+== Configuring SNMP Notifications ==
+indexterm:[Resource,Notification,SNMP]
+
+Requires an IP to send SNMP traps to, and a SNMP community.
+Pacemaker MIB is found in _/usr/share/snmp/mibs/PCMK-MIB.txt_
+
+.Configuring ClusterMon to send SNMP traps
+=
+[source,XML]
+
+
+   
+   
+   
+   
+   
+
+
+=
+
+[[s-notification-email]]
+== Configuring Email Notifications ==
+indexterm:[Resource,Notification,SMTP,Email]
+
+Requires a user to send mail alerts to. "Mail-From", SMTP relay and Subject 
prefix can also be configured.
+
+.Configuring ClusterMon to send email alerts
+=
+[source,XML]
+
+
+   
+   
+   
+   
+   
+
+
+=
+
+[[s-notification-external]]
+== Configuring Notifications via External-Agent ==
+
+Requires a program (external-agent) to run when resource operations take
+place, and an external-recipient (IP address, Email address, URIâ€¦). When
+triggered, the external-agent is fed with dynamically filled environnement
+variables describing precisely the cluster event that occurred. By making
+smart usage of these variables in your external-agent code, you can trigger
+any action.
+
+.Configuring ClusterMon to execute an external-agent
+=
+[source,XML]
+
+
+   
+   
+   
+   
+   
+
+
+=
+
+.Environment Variables Passed to the External Agent
+[width="95%",cols="1m,2<",options="header",align="center"]
+|=
+
+|Environment Variable
+|Description
+
+|CRM_notify_recipient
+| The static external-recipient from the resource definition.
+ indexterm:[Environment Variable,CRM_notify_recipient]
+
+|CRM_notify_node
+| The node on which the status change happened.
+ indexterm:[Environment Variable,CRM_notify_node]
+
+|CRM_notify_rsc
+| The name of the resource that changed the status.
+ indexterm:[Environment Variable,CRM_notify_rsc]
+
+|CRM_notify_task
+| The operation that caused the status change.
+ indexterm:[Environment Variable,CRM_notify_task]
+
+|CRM_notify_desc
+| The textual output relevant error code of the operation (if any) that caused 
the status change.
+ indexterm:[Environment Variable,CRM_notify_desc]
+
+|CRM_notify_rc
+| The return code of the operation.
+ indexterm:[Environment Variable,CRM_notify_rc]
+
+|CRM_notify_target_rc
+| The expected return code of the operation.
+ indexterm:[Environment Variable,CRM_notify_target_rc]
+
+|CRM_notify_status
+| The numerical representation of the status of the operation.
+ indexterm:[Environment Variable,CRM_notify_target_rc]
+
+|=
diff --git a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml 
b/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml
index 54662d8..aa1eab4 100644
--- a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.x

Re: [Pacemaker] crm_mon SNMP support

2012-12-06 Thread Florian Crouzat


Le 05/12/2012 01:38, Andrew Beekhof a écrit :



On Tuesday, December 4, 2012, Florian Crouzat wrote:

Le 03/12/2012 03:27, Andrew Beekhof a écrit :

On Sat, Dec 1, 2012 at 1:07 AM, Florian Crouzat
 wrote:

Le 29/11/2012 22:10, Andrew Beekhof a écrit :


Not so fast :-)

crm_mon supports

  -E, --external-agent=value
 A program to run when resource
operations take place.

  -e, --external-recipient=value A recipient for
your program
(assuming you want the program to send something to
someone).

so without recompiling, you can call a script - possibly
it could call
something that sends out snmp alerts ;-)




Oh, great!

I had a hard time understanding these two options and how
they relate, you
helped me on IRC but I'll reply here so there is a trace in
case someone is
also interested.


Thanks for that. I really need to make some time to document this.


If you have a suggestion as where this documentation should go, I
might propose a patch. I'm not sure crm_mon --help or man crm_mon
can be more verbose than they already are. Giving a full example and
mentioning the ENV variables to use in the external-agent etc is too
long for these brief doc.
What do you think ?


The proper place would be "pacemaker explained"
Happily it lives in the source tree (doc/Pacemaker_Explained/en-US/Ch-*)
in asciidoc format.
A patch with the details above would  be most welcome :)


I cannot find any good place where to write this new content in 
Pacemaker Explained. If advised (in terms of table of contents), I'll 
happily provide a patch.




Here is my resource:

primitive ClusterMon ocf:pacemaker:ClusterMon \
  params user="root" update="30" extra_options="-E
/usr/local/bin/foo.sh -e 192.168.1.2" \
  op monitor on-fail="restart" interval="10" \
  meta target-role="Started"
clone ClusterMon-clone ClusterMon

Here is the content of my script:

$ cat /usr/local/bin/foo.sh

#!/bin/bash

(
echo CRM_notify_recipient $CRM_notify_recipient
echo CRM_notify_node $CRM_notify_node
echo CRM_notify_rsc $CRM_notify_rsc
echo CRM_notify_task $CRM_notify_task
echo CRM_notify_desc $CRM_notify_desc
echo CRM_notify_rc $CRM_notify_rc
echo CRM_notify_target_rc $CRM_notify_target_rc
echo CRM_notify_status $CRM_notify_status
echo
) > /tmp/pacemaker.log

Finally, this is the resulting log of one execution. The
script is executed
on each cluster operation/transition (monitor, stop, start) etc.

$ cat /tmp/pacemaker.log

CRM_notify_recipient 192.168.1.2
CRM_notify_node scoresby2.lyra-network.com
<http://scoresby2.lyra-network.com>
CRM_notify_rsc F
CRM_notify_task monitor
CRM_notify_desc ok
CRM_notify_rc 0
CRM_notify_target_rc 0
CRM_notify_status 0

One just has to do some scripting with these variables to
match its needs.
In my case, I guess I want a SNMP trap whenever
    CRM_notify_rc != 0.

        Thanks


    --
    Florian Crouzat



--
Cheers,
Florian Crouzat





--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2012-12-03 Thread Florian Crouzat


Le 03/12/2012 03:27, Andrew Beekhof a écrit :

On Sat, Dec 1, 2012 at 1:07 AM, Florian Crouzat
 wrote:

Le 29/11/2012 22:10, Andrew Beekhof a écrit :



Not so fast :-)

crm_mon supports

 -E, --external-agent=value
A program to run when resource operations take place.

 -e, --external-recipient=value A recipient for your program
(assuming you want the program to send something to someone).

so without recompiling, you can call a script - possibly it could call
something that sends out snmp alerts ;-)




Oh, great!

I had a hard time understanding these two options and how they relate, you
helped me on IRC but I'll reply here so there is a trace in case someone is
also interested.


Thanks for that. I really need to make some time to document this.


If you have a suggestion as where this documentation should go, I might 
propose a patch. I'm not sure crm_mon --help or man crm_mon can be more 
verbose than they already are. Giving a full example and mentioning the 
ENV variables to use in the external-agent etc is too long for these 
brief doc.

What do you think ?



Here is my resource:

primitive ClusterMon ocf:pacemaker:ClusterMon \
 params user="root" update="30" extra_options="-E
/usr/local/bin/foo.sh -e 192.168.1.2" \
 op monitor on-fail="restart" interval="10" \
 meta target-role="Started"
clone ClusterMon-clone ClusterMon

Here is the content of my script:

$ cat /usr/local/bin/foo.sh

#!/bin/bash

(
echo CRM_notify_recipient $CRM_notify_recipient
echo CRM_notify_node $CRM_notify_node
echo CRM_notify_rsc $CRM_notify_rsc
echo CRM_notify_task $CRM_notify_task
echo CRM_notify_desc $CRM_notify_desc
echo CRM_notify_rc $CRM_notify_rc
echo CRM_notify_target_rc $CRM_notify_target_rc
echo CRM_notify_status $CRM_notify_status
echo
) > /tmp/pacemaker.log

Finally, this is the resulting log of one execution. The script is executed
on each cluster operation/transition (monitor, stop, start) etc.

$ cat /tmp/pacemaker.log

CRM_notify_recipient 192.168.1.2
CRM_notify_node scoresby2.lyra-network.com
CRM_notify_rsc F
CRM_notify_task monitor
CRM_notify_desc ok
CRM_notify_rc 0
CRM_notify_target_rc 0
CRM_notify_status 0

One just has to do some scripting with these variables to match its needs.
In my case, I guess I want a SNMP trap whenever CRM_notify_rc != 0.

Thanks


--
Florian Crouzat



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Getting Started

2012-12-03 Thread Florian Crouzat


Le 03/12/2012 15:24, Brett Maton a écrit :

Hi List,

   I'm new to corosync / pacemaker so please forgive my ignorance!

   I currently have Postgres streaming replication between node1(master) and 
node2(slave, hot standby), the replication user authenticates to master using 
an md5 password.
   All good there...

   My goal use pacemaker / heartbeat to move VIP and promote node2 if node1 
fails, without using drdb or pg-pool.

   What I'm having trouble with is finding resources for learning what I need 
to configure with regards to corosync / pacemaker to implement failover.  All 
of the guides I've found use DRDB and/or a much more robust network 
configuration.

I'm currently using CentOS 6.3 with PostgreSQL 9.2

corosync-1.4.1-7.el6_3.1.x86_64
pacemaker-1.1.7-6.el6.x86_64

node1   192.168.0.1
node2   192.168.0.2
dbVIP   192.168.0.101

Any help and suggested reading appreciated.

Thanks in advance,
Brett



Well, if you don't need shared storage and only a VIP over which 
postgres runs, I guess the official guide should be good:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/

Forget the drdb stuff, and base your configuration on the httpd examples 
that collocates a VIP and an httpd daemon in an active/passive two nodes 
cluster. (Chapter 6).



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2012-11-30 Thread Florian Crouzat


Le 29/11/2012 22:10, Andrew Beekhof a écrit :


Not so fast :-)

crm_mon supports

-E, --external-agent=value
   A program to run when resource operations take place.

-e, --external-recipient=value A recipient for your program
(assuming you want the program to send something to someone).

so without recompiling, you can call a script - possibly it could call
something that sends out snmp alerts ;-)



Oh, great!

I had a hard time understanding these two options and how they relate, 
you helped me on IRC but I'll reply here so there is a trace in case 
someone is also interested.


Here is my resource:

primitive ClusterMon ocf:pacemaker:ClusterMon \
params user="root" update="30" extra_options="-E 
/usr/local/bin/foo.sh -e 192.168.1.2" \

op monitor on-fail="restart" interval="10" \
meta target-role="Started"
clone ClusterMon-clone ClusterMon

Here is the content of my script:

$ cat /usr/local/bin/foo.sh

#!/bin/bash

(
echo CRM_notify_recipient $CRM_notify_recipient
echo CRM_notify_node $CRM_notify_node
echo CRM_notify_rsc $CRM_notify_rsc
echo CRM_notify_task $CRM_notify_task
echo CRM_notify_desc $CRM_notify_desc
echo CRM_notify_rc $CRM_notify_rc
echo CRM_notify_target_rc $CRM_notify_target_rc
echo CRM_notify_status $CRM_notify_status
echo
) > /tmp/pacemaker.log

Finally, this is the resulting log of one execution. The script is 
executed on each cluster operation/transition (monitor, stop, start) etc.


$ cat /tmp/pacemaker.log

CRM_notify_recipient 192.168.1.2
CRM_notify_node scoresby2.lyra-network.com
CRM_notify_rsc F
CRM_notify_task monitor
CRM_notify_desc ok
CRM_notify_rc 0
CRM_notify_target_rc 0
CRM_notify_status 0

One just has to do some scripting with these variables to match its 
needs. In my case, I guess I want a SNMP trap whenever CRM_notify_rc != 0.


Thanks


--
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] crm_mon SNMP support

2012-11-29 Thread Florian Crouzat


Le 29/11/2012 01:27, Andrew Beekhof a écrit :

On Wed, Nov 28, 2012 at 2:34 AM, Florian Crouzat
 wrote:

Hi,

I have in my current production configuration the following resource:

primitive SNMPMonitor ocf:heartbeat:ClusterMon \
 params pidfile="/var/run/crm_mon.pid" extra_options="-S 192.168.2.3
-C public" \
 op monitor on-fail="restart" interval="10s"

I was working a couple months ago, and I haven't touched it since.
Apparently, I missed a couple changelogs :/

I was investigating why I wasn't receiving SNMP traps anymore during the
last couples of migration/changes in the cluster state.
I found out that my version of crm_mon is compiled without SNMP (or email)
supports.

$ sudo crm_mon -$ && cat /etc/redhat-release
Pacemaker 1.1.6-3.el6
Written by Andrew Beekhof
CentOS release 6.2 (Final)

I found out the following changelogs:

 * Mon Sep 26 2011 Andrew Beekhof  1.1.6-2
  - Do not build in support for heartbeat, snmp, esmtp by default
  - Create a package for cluster unaware libraries to minimze our
footprint on non-cluster nodes
  - Better package descriptions

What are my options, knowing that I'm in a PCI-DSS environment that forbids
any compiler in production, and that I'd rather not maintain myself a
snmp-enabled version of the package ?


I'm not familiar with the term PCI-DSS... does that allow you to
rebuild src.rpm packages?
If so, just run:
rpmbuild --with snmp --rebuild pacemaker-.src.rpm


Yes I can (tm).
FYI, PCI-DSS defines the securities requirements that you must follow 
whenever you handle credit card data (eg: you work in the credit card 
industry). Amongst many other things, it forbids compiler in production.


Although, I could recompile in my lab, either from scratch with gcc/make 
or as you suggested. But I have so many things to keep 
up-to-date/running that I'm not sure I'll manage to keep pacemaker-cli 
up to date and PCI-DSS also requires that you update every packages 
within a month after every update/erratas.


I guess there will never be a pacemaker-cli-snmp package, so I don't 
have any options anymore except hire someone to start packaging stuff =)


Thanks for your suggestions though.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] crm_mon SNMP support

2012-11-27 Thread Florian Crouzat


Hi,

I have in my current production configuration the following resource:

primitive SNMPMonitor ocf:heartbeat:ClusterMon \
params pidfile="/var/run/crm_mon.pid" extra_options="-S 
192.168.2.3 -C public" \

op monitor on-fail="restart" interval="10s"

I was working a couple months ago, and I haven't touched it since.
Apparently, I missed a couple changelogs :/

I was investigating why I wasn't receiving SNMP traps anymore during the 
last couples of migration/changes in the cluster state.
I found out that my version of crm_mon is compiled without SNMP (or 
email) supports.


$ sudo crm_mon -$ && cat /etc/redhat-release
Pacemaker 1.1.6-3.el6
Written by Andrew Beekhof
CentOS release 6.2 (Final)

I found out the following changelogs:

* Mon Sep 26 2011 Andrew Beekhof  1.1.6-2
 - Do not build in support for heartbeat, snmp, esmtp by default
 - Create a package for cluster unaware libraries to minimze our
   footprint on non-cluster nodes
 - Better package descriptions

What are my options, knowing that I'm in a PCI-DSS environment that 
forbids any compiler in production, and that I'd rather not maintain 
myself a snmp-enabled version of the package ?


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Recommendations in reducing failover response time

2012-11-08 Thread Florian Crouzat


Le 05/11/2012 17:05, Arturo Borrero Gonzalez a écrit :

Wich version of pacemaker/RAs are you using?

thanks,

Regards.



$ sudo rpm -qa | grep -e pacemaker -e corosync -e resource-agent
corosynclib-1.4.1-4.el6_2.3.x86_64
pacemaker-libs-1.1.6-3.el6.x86_64
pacemaker-cli-1.1.6-3.el6.x86_64
resource-agents-3.9.2-7.el6.x86_64
pacemaker-1.1.6-3.el6.x86_64
pacemaker-cluster-libs-1.1.6-3.el6.x86_64
corosync-1.4.1-4.el6_2.3.x86_64

$ cat /etc/redhat-release
CentOS release 6.2 (Final)

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Recommendations in reducing failover response time

2012-11-05 Thread Florian Crouzat

;
primitive p_xorp lsb:/etc/init.d/xorp \
 op monitor interval="5"
group g_ipv p_ipv_vlan27 p_ipv p_ipv_vlan7 p_ipv_vlan6 p_ipv_vlan54
p_ipv_vlan51 p_ipv_vlan31 p_ipv_vlan34 p_ipv_nat p_ipv_vlan23
p_ipv_vlan10 p_ipv_vlan28 p_ipv_openvpn p_ipv_v6 p_ipv_v6_vlan51
p_ipv_v6_vlan31 p_ipv_v6_vlan6 p_ipv_v6_vlan7 p_ipv_v6_vlan27
p_ipv_v6_vlan54 p_ipv_v6_vlan34
colocation dhcp-ipv inf: p_dhcp g_ipv
colocation firewall-ipv inf: p_firewall g_ipv
colocation openvpn-ipv inf: p_openvpn p_ipv
colocation radvd-ipv inf: p_radvd g_ipv
colocation xorp-ipv inf: p_xorp g_ipv
property $id="cib-bootstrap-options" \
 dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 stonith-enabled="false" \
 no-quorum-policy="ignore" \
 last-lrm-refresh="1352111523"
rsc_defaults $id="rsc-options" \
 resource-stickiness="100"





I think these colocation constraints are missing ordering constraint, 
for security reasons you should make sure that the VIPs are first 
created, then the firewall should be started, and finally anything else, 
otherwise openvpn or radvd could start /before/ the firewall.


Colocation doesn't imply ordering. (had a hard time understanding that 
one...)


In my configuration, I aggregated all that stuff into one colocation set 
+ one ordering set. Maybe you could see if it fits best and/or helps you 
with the duration of the failover.


Eg:

group IPHA eth0.10HA eth0.11HA eth0.12HA eth0.13HA eth0.14HA eth0.15HA 
eth0.16HA eth0.18HA eth0.19HA eth0.20HA eth0.21HA eth0.22HA eth0.23-1HA 
eth0.23-254HA eth0.24HA eth0.26HA eth0.2HA eth0.3-10HA eth0.3-15HA 
eth0.3-30HA eth0.4HA eth0.5-10HA eth0.5-215HA eth0.5-230HA eth0.7HA 
eth0.8HA eth0HA eth1-2HA eth1-3HA eth1-pubHA eth1.2HA eth1.97HA 
eth1.98HA eth1.99HA eth0.27HA eth0.28HA eth0.29HA eth0.30HA \

meta target-role="Started" \
meta globally-unique="false" target-role="Started"

[...]

colocation c_foo inf: ( bind ldirectord ldirectordBDD 
ldirectordMasterSlave openvpn stunnel ) IPHA firewall
order o_foo inf: IPHA firewall ( bind ldirectord ldirectordBDD 
ldirectordMasterSlave openvpn stunnel )


Well, overall, sorry I didn't really helped you, just wanted to 
highlight some configuration tweaks as I run almost the same cluster.


Cheers.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] lsb: could not parse meta-data

2012-10-22 Thread Florian Crouzat


Le 22/10/2012 14:23, vishal kumar a écrit :

Hi



Please do suggest me where am i going wrong.
Thanks for the help.



See: crm ra help meta
Then try something like: crm ra meta sshd lsb # parameters order matter

Anyway, you won't learn anything out of meta-datas from a LSB 
initscript, because it's just a script (not cluster oriented, not a real 
resource agent), it's not multistate, nothing like that, only 
start/stop/monitor and default mandatory settings.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] resource can not be started

2012-10-18 Thread Florian Crouzat


Le 18/10/2012 03:13, Mia Lueng a écrit :

I just set "one" order rule:

order rg_apache_order : res_ip_apache res_apache

You mean the rule
group rg_apache2 res_apache res_ip_apache \
  meta target-role="Started"

also indicates an order rule ?


Yep, and a collocation as I said.
It's all in the doc.

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Pacemaker_Explained/index.html#group-resources


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] resource can not be started

2012-10-17 Thread Florian Crouzat


Le 17/10/2012 11:56, Mia Lueng a écrit :


group rg_apache2 res_apache res_ip_apache \
meta target-role="Started"
order rg_apache_order : res_ip_apache res_apache


It seems not correct. A group is already a shortcut for an ordered 
collocation. Either use collocation+ordering, or a group, not "almost 
both".


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Resource agent IPaddr2 failed to start

2012-10-09 Thread Florian Crouzat


Le 09/10/2012 11:17, Soni Maula Harriz a écrit :



This is the configuration :
crm configure show
node cluster1
node cluster2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
 params ip="xxx.xxx.xxx.289" cidr_netmask="32" \
 op monitor interval="30s"
property $id="cib-bootstrap-options" \
 dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
 cluster-infrastructure="openais" \
 expected-quorum-votes="2" \
 stonith-enabled="false"




Off-topic advice, for a two nodes cluster such as yours, you *must* set 
the property no-quorum-policy="ignore"


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Resource agent IPaddr2 failed to start

2012-10-09 Thread Florian Crouzat


Le 09/10/2012 10:39, Soni Maula Harriz a écrit :

Dear all,

I'm a newbie in clustering. I have been following the 'Cluster from
scratch' tutorial.
I use Centos 6.3 and install pacemaker and corosync from : yum install
pacemaker corosync

This is the version i got
Pacemaker 1.1.7-6.el6
Corosync Cluster Engine, version '1.4.1'

This time i have been this far : adding IPaddr2 resource
(http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_adding_a_resource.html)
everything goes well before adding IPaddr2 resource.
when i run 'crm status', it print out


Last updated: Tue Oct  9 14:58:30 2012
Last change: Tue Oct  9 13:53:41 2012 via cibadmin on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ cluster1 cluster2 ]


Failed actions:
 ClusterIP_start_0 (node=cluster1, call=3, rc=6, status=complete):
not configured


This is the error i got from /var/log/message
Oct  8 17:07:16 cluster2 IPaddr2(ClusterIP)[15969]: ERROR:
[/usr/lib64/heartbeat/findif -C] failed
Oct  8 17:07:16 cluster2 crmd[15937]:  warning: status_from_rc: Action 4
(ClusterIP_start_0) on cluster2 failed (target: 0 vs. rc: 6): Error
Oct  8 17:07:16 cluster2 pengine[15936]:error: unpack_rsc_op:
Preventing ClusterIP from re-starting anywhere in the cluster :
operation start failed 'not configured' (rc=6)


I have been searching through the google, but can't find the right
solution for my problem.
I have stopped the firewall and disabled the SElinux.
Any help would be appreciated.


Just a wild guess, you haven't created the network interface associated.

Eg: for this resource to work:

primitive eth0.21HA ocf:heartbeat:IPaddr2 \
params ip="10.0.8.1" cidr_netmask="29" nic="eth0.21" \
op monitor on-fail="restart" interval="10s"

You *must* have "ifup-ed" the eth0.21 iface before otherwise Pacemaker 
cannot apply the VIP because there is no iface. You can create eth0.21 
iface with fake IP such as 127.10.10.10/32 ofc.


Quote of "sudo crm ra meta IPaddr2":

nic (string, [eth0]): Network interface
The base network interface on which the IP address will be brought online.

If left empty, the script will try and determine this from the routing 
table.


Do NOT specify an alias interface in the form eth0:1 or anything here; 
rather, specify the base interface only.


Prerequisite:

*There must be at least one static IP address, which is not managed by 
the cluster, assigned to the network interface.*


If you can not assign any static IP address on the interface, modify 
this kernel parameter: sysctl -w net.ipv4.conf.all.promote_secondaries=1 
(or per device)





--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2012-08-24 Thread Florian Crouzat


Le 24/08/2012 01:36, Andrew Martin a écrit :

The dampen parameter tells the cluster to wait before making any decision, so 
that if the IP comes back online within the dampen period then no action is 
taken. Is this correct?


This is also my understanding of this parameter.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?

2012-08-23 Thread Florian Crouzat


Le 22/08/2012 18:23, Andrew Martin a écrit :

Hello,


I have a 3 node Pacemaker + Heartbeat cluster (two real nodes and 1 quorum node 
that cannot run resources) running on Ubuntu 12.04 Server amd64. This cluster 
has a DRBD resource that it mounts and then runs a KVM virtual machine from. I 
have configured the cluster to use ocf:pacemaker:ping with two other devices on 
the network (192.168.0.128, 192.168.0.129), and set constraints to move the 
resources to the most well-connected node (whichever node can see more of these 
two devices):

primitive p_ping ocf:pacemaker:ping \
params name="p_ping" host_list="192.168.0.128 192.168.0.129" multiplier="1000" 
attempts="8" debug="true" \
op start interval="0" timeout="60" \
op monitor interval="10s" timeout="60"
...

clone cl_ping p_ping \
meta interleave="true"

...
location loc_run_on_most_connected g_vm \
rule $id="loc_run_on_most_connected-rule" p_ping: defined p_ping


Today, 192.168.0.128's network cable was unplugged for a few seconds and then 
plugged back in. During this time, pacemaker recognized that it could not ping 
192.168.0.128 and restarted all of the resources, but left them on the same 
node. My understanding was that since neither node could ping 192.168.0.128 
during this period, pacemaker would do nothing with the resources (leave them 
running). It would only migrate or restart the resources if for example node2 
could ping 192.168.0.128 but node1 could not (move the resources to where 
things are better-connected). Is this understanding incorrect? If so, is there 
a way I can change my configuration so that it will only restart/migrate 
resources if one node is found to be better connected?

Can you tell me why these resources were restarted? I have attached the syslog 
as well as my full CIB configuration.

Thanks,

Andrew Martin



This is an interesting question and I'm also interested in answers.

I had the same observations, and there is also the case where the 
monitor() aren't synced across all nodes so, "Node 1 issue a monitor() 
on the ping resource and finds ping-node dead, node2 hasn't pinged yet, 
so node1 moves things to node2 but node2 now issue a monitor() and also 
finds ping-node dead."


The only solution I found was to adjust the dampen parameter to at least 
2*monitor().interval so that I can be *sure* that all nodes have issued 
a monitor() and they all decreased they scores so that when a decision 
occurs, nothings move.


It's been a long time I haven't tested, my cluster is very very stable, 
I guess I should retry to validate it's still a working trick.




dampen (integer, [5s]): Dampening interval
The time to wait (dampening) further changes occur

Eg:

primitive ping-nq-sw-swsec ocf:pacemaker:ping \
params host_list="192.168.10.1 192.168.2.11 192.168.2.12" 
dampen="35s" attempts="2" timeout="2" multiplier="100" \

op monitor interval="15s"




--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ping - resource failover

2012-08-13 Thread Florian Crouzat


Le 13/08/2012 08:09, Nicolai Langfeldt a écrit :

On 2012-08-10 16:30, Josh wrote:

location location_www1-vip www1-vip \
 rule $id="location_www1-vip-rule" pingd: defined pingd


It will probably work better if you:
* Write the rule correctly ;-)


pingd: defined pingd is a perfectly valid syntax as explained in this 
post 
http://www.woodwose.net/thatremindsme/2011/04/the-pacemaker-ping-resource-agent/ 



I happen to use it just fine.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] group resource - altering default order

2012-06-14 Thread Florian Crouzat


Le 14/06/2012 14:55, Nicolaas Stuyt a écrit :

Hello,
Is there a way to affect a groups ordering behavior? The default appears
to be left to right which I can understand. However if I wish to create
a group of "like primitives" and not primitives who are dependant on
each other - for example a list of (ocf::heartbeat:Filesystem)
primitives - then I care less about the order. I was hoping to specify a
meta characteristic like "lazy" I guess; something like meta
order="lazy" or "not-applicable". This ordering behavior seems to exist
in colocation as well.
What I wish to achieve is to take a file system resource down - say for
maintenance and then allow it to come back when maintenance is completed
without affecting the other Filesystem primitives down the list.
To provide a visual illustration of this I created a cloned group of
dummies using (ocf::pacemaker:Dummy) and I have stopped just the last
dummy primitive:
Clone Set: cl_dummies [grp_dummies]
Resource Group: grp_dummies:0
dummy1:0 (ocf::pacemaker:Dummy): Started node1
dummy2:0 (ocf::pacemaker:Dummy): Started node1
dummy3:0 (ocf::pacemaker:Dummy): Started node1
dummy4:0 (ocf::pacemaker:Dummy): Started node1
dummy5:0 (ocf::pacemaker:Dummy): Stopped
Resource Group: grp_dummies:1
dummy1:1 (ocf::pacemaker:Dummy): Started node2
dummy2:1 (ocf::pacemaker:Dummy): Started node2
dummy3:1 (ocf::pacemaker:Dummy): Started node2
dummy4:1 (ocf::pacemaker:Dummy): Started node2
dummy5:1 (ocf::pacemaker:Dummy): Stopped
When all are running crm_mon presents as:
Clone Set: cl_dummies [grp_dummies]
Started: [ node1 node2]
What I want to achieve is to take a middle dummy primitive out but leave
the rest running. When I do that - for example on dummy3; dummies 4 and
5 also get taken out such that I am left with this:
Clone Set: cl_dummies [grp_dummies]
Resource Group: grp_dummies:0
dummy1:0 (ocf::pacemaker:Dummy): Started node1
dummy2:0 (ocf::pacemaker:Dummy): Started node1
dummy3:0 (ocf::pacemaker:Dummy): Stopped
dummy4:0 (ocf::pacemaker:Dummy): Stopped
dummy5:0 (ocf::pacemaker:Dummy): Stopped
Resource Group: grp_dummies:1
dummy1:1 (ocf::pacemaker:Dummy): Started node2
dummy2:1 (ocf::pacemaker:Dummy): Started node2
dummy3:1 (ocf::pacemaker:Dummy): Stopped
dummy4:1 (ocf::pacemaker:Dummy): Stopped
dummy5:1 (ocf::pacemaker:Dummy): Stopped
What I would like is this:
Clone Set: cl_dummies [grp_dummies]
Resource Group: grp_dummies:0
dummy1:0 (ocf::pacemaker:Dummy): Started node1
dummy2:0 (ocf::pacemaker:Dummy): Started node1
dummy3:0 (ocf::pacemaker:Dummy): Stopped
dummy4:0 (ocf::pacemaker:Dummy): Started node1
dummy5:0 (ocf::pacemaker:Dummy): Started node1
Resource Group: grp_dummies:1
dummy1:1 (ocf::pacemaker:Dummy): Started node2
dummy2:1 (ocf::pacemaker:Dummy): Started node2
dummy3:1 (ocf::pacemaker:Dummy): Stopped
dummy4:1 (ocf::pacemaker:Dummy): Started node2
dummy5:1 (ocf::pacemaker:Dummy): Started node2
A work around I have considered is to edit the group and resort the list
so that the Filesystem primitive I'm interested in working on is named
last but I wonder if I'll remember ;-) to do that the next time I need
to perform maintenance.
Or is there a better way to do this that I have not been introduced too yet?
Regards,
Nick


It's already been discussed a couple times.
See 
http://oss.clusterlabs.org/pipermail/pacemaker/2011-March/009435.html 
and follow link/answers.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Florian Crouzat


Le 08/06/2012 13:01, Juan M. Sierra a écrit :

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial
state of my cluster was this:

/Online: [ node2 node1 ]

node1-STONITH (stonith:external/ipmi): Started node2
node2-STONITH (stonith:external/ipmi): Started node1
Clone Set: Connected
Started: [ node2 node1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 node1 ]
ftp-vip (ocf::heartbeat:IPaddr): Started node1
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node1: pingd=2000
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
/

and then, I removed the electric connection of node1, the state was the
next:

/Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
Online: [ node2 ]

node1-STONITH (stonith:external/ipmi): Started node2 FAILED
Clone Set: Connected
Started: [ node2 ]
Stopped: [ ping:1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 ]
Stopped: [ ldirectord:1 ]
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
node1-STONITH: migration-threshold=100 fail-count=100

Failed actions:
node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
invalid parameter
node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
status=complete): status: unknown
node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
unknown error
/

I was hoping that node2 take the management of ftp-vip resource, but it
wasn't in that way. node1 kept in a unclean state and node2 didn't take
the management of its resources. When I put back the electric connection
of node1 and it was recovered then, node2 took the management of ftp-vip
resource.

I've seen some similar conversations here. Please, could you show me
some idea about this subject or some thread where this is discussed?

Thanks a lot!

Regards,



It has been discussed for resource failover but I guess it's the same: 
http://oss.clusterlabs.org/pipermail/pacemaker/2012-May/014260.html


The motto here (discovered it a couple days ago) is "better have a 
hanged cluster than a corrupted one, especially with shared 
filesystem/resources.".
So, node1 failed but node2 hasn't been able to confirm its death because 
stonith failed apparently, then, the design choice is for the cluster to 
hang while waiting for a way to know the real state of node1 (at reboot 
in this case).



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ldirectord ocf errors

2012-06-05 Thread Florian Crouzat

Le 04/06/2012 19:52, Jake Smith a écrit :

- Original Message -

From: "Dennis Jacobfeuerborn"
To: pacemaker@oss.clusterlabs.org
Sent: Monday, June 4, 2012 12:55:48 PM
Subject: [Pacemaker] ldirectord ocf errors

Hi,
I'm trying to get started with ldirectord on a Centos 6.2 system but
things
don't seem to work well at the moment.

I get the following when I try to set up a ocf:heartbeat:ldirectord
resource:

Where did you get that RA ?
It's not part of the resource-agents-3.9.2-7.el6.x86_64 package for 
CentOS 6.2 and my system seems to be up to date with default repos and 
EPEL. I'm currently having 3 ldirectord resources as LSB scripts, so I'm 
interested.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] R: Pacemaker installation - Failed dependencies

2012-05-22 Thread Florian Crouzat


Le 22/05/2012 12:03, Chiesa Stefano a écrit :

Hello Rene, thanks for your answer.
Yours is a "general consideration" anyway, right?

In the Pacemaker installation log below "heartbeat" is not mentioned.
So the failed dependancies are related to Pacemaker.

Even if in the future I'll use corosync I'll face the same errors, or not?

Sorry in case I didn't understand your answer..

Stefano.


Since EL6.0 linux-ha is part of base.
You must remove your clusterlabs repofile I guess.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] ping resource

2012-05-11 Thread Florian Crouzat


Le 11/05/2012 14:56, fatcha...@gmx.de a écrit :


When I deactivate the interface card which is connected with default gateway on 
one node, nothing happens.


Actually, something happen, the ping score.
You can check its value with  cibadmin -Q | grep pingd or crm_mon -Arf1 
| fgrep ping



primitive ping-gateway ocf:pacemaker:ping \
 params host_list="192.168.xxx.1" multiplier="100"
clone pingclone ping-gateway \
 meta interleave="true"



Any Suggestions are welcome


But since you haven't linked your resources and the ping score, nothing 
moves.


I recommend you read these links:
* 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html
* 
http://www.woodwose.net/thatremindsme/2011/04/the-pacemaker-ping-resource-agent/



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to stop a resource?

2012-03-15 Thread Florian Crouzat


Le 15/03/2012 12:50, Tim Ward a écrit :


So, does anyone have any ideas as to what is going on here, and/or how to 
actually stop and then delete something? - thanks!


$ crm configure
property maintenance-mode=true
commit
quit
$ /etc/init.d/myResource stop
$ crm resource
cleanup myResource # so that it sees it's actually stopped
cd
configure
edit # now you can delete myResource
commit
maintenance-mode=false
commit
quit

The maintenance-mode might not be useful depending on your groups, 
colocations and orders.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Surprisingly fast start of resources on cluster failover.

2012-03-13 Thread Florian Crouzat


Le 07/03/2012 22:08, Lars Ellenberg a écrit :


Did you also time it as
   time /etc/init.d/firewall>out.txt 2>err.txt


Yes, it's the same, 65 seconds vs 24 from Pacemaker.
It only prints ~250 lines.


Other than above suggestion,
did you verify that it ends up doing the same thing
when started from pacemaker,
compared to when started by you from commandline?
Did you compare the results?



Absolutely the same, this is my core firewall, it sures works fine when 
starting from pacemaker.


It's probably just a side effect of the server reboot, and/or the fact 
that it doesn't yet handles named, ldirector, stunnel & such because in 
the pacemaker resource group, "firewall" goes first.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Surprisingly fast start of resources on cluster failover.

2012-03-06 Thread Florian Crouzat


Hi,

On a two nodes active/passive cluster, I placed a location constraint of 
50 for #uname node1. As soon as applied, things moved from node2 to 
node1: right.

I have a lsb init script defined as a resource:

$ crm configure show firewall
primitive firewall lsb:firewall\
op monitor on-fail="restart" interval="10s" \
op start interval="0" timeout="3min" \
op stop interval="0" timeout="1min" \
meta target-role="Started"

This lsb takes a long time to start, at least 55 seconds when fired from 
my shell over ssh.

It logs a couple things to std{out,err}.
I have Florian's rsyslog config: 
https://github.com/fghaas/pacemaker/blob/syslog/extra/rsyslog/pacemaker.conf.in


So, while node1 was taking-over, I noticed in 
/var/log/pacemaker/lrmd.log that it only took 24 seconds to start that 
resource.


2012-03-06T07:20:11.844573+01:00 node1 lrmd: [9322]: info: 
rsc:firewall:129: start
2012-03-06T07:20:11.864758+01:00 node1 lrmd: [9322]: info: RA output: 
(firewall:start:stdout) Starting. Becoming active

[...]
2012-03-06T07:20:35.133591+01:00 node1 lrmd: [9322]: info: RA output: 
(firewall:start:stderr)  #033[33;01m*#033[0m New rules are now applied.


My question: how comes pacemaker starts a resources twice as fast than I 
do from CLI ?



--
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] colocation quandery

2012-02-22 Thread Florian Crouzat


Le 22/02/2012 17:06, Jean-Francois Malouin a écrit :


can't wrap my head around it so I created a test
environment to play around the different scenarios. But...
is it possible to use crm to create the colocation as in Example 6.16 in

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-collocation.html

I've only been able to stuff the relevant xml bits into the cib using cibadmin.



Figure 6.4 is: colocation foo inf: A B ( C D E ) F G
Same goes for ordered sets.

--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] iptables cluster

2012-02-13 Thread Florian Crouzat


Le 13/02/2012 10:21, Karlis Kisis a écrit :

Question #2:
The whole clustering thingy works by stopping the service on one node
and starting it on the other. In my case, I would not want iptables to
be stopped but instead restarted with a "passive" config, like block
all traffic from outside (instead of dropping firewall entirely). How
would I go about it? Custom scripts?


Yes
In fact, I have such a setup, I created a LSB compliant initscript for 
iptables (/etc/init.d/firewall) and added a lsb:firewall resource.

 /etc/init.d/firewall start(): /usr/local/firewall/firewall.sh
 /etc/init.d/firewall stop(): /usr/local/firewall/firewall-passive.sh
As for the status() function, you'd have to decide a way to know in 
which state you are.


--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Cluster Shutdown Fails - causes nodes to hang

2012-01-30 Thread Florian Crouzat


Le 30/01/2012 01:42, Gruen, Wolfgang a écrit :


 > Issuing /etc/init.d/corosync stop or /etc/init.d/pacemaker stop causes


On a running cluster, you need to stop pacemaker first, it's mandatory 
AFAIK.



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] How to flush the arp cache of a router?

2012-01-26 Thread Florian Crouzat


Le 26/01/2012 01:05, ge...@riseup.net a écrit :

I tried to put it into a file
and made this executable, and used lsb: to call it (which didn't work).
Then I googled for hours to find out, how to call scripts from within crm,
but had no success...


You should be able to run any script you want using a lsb RA.
Just make sure to write your init script to be LSB compliant: 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html 
and this should work.


Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Failed actions: ...not installed

2012-01-24 Thread Florian Crouzat


Le 24/01/2012 13:21, Stallmann, Andreas a écrit :


Any ideas on how to get this running?


If you want to debug what pacemaker does with your RA, I'd suggest you 
insert set -x at the top of /usr/lib/ocf/resource.d/heartbeat/mysql and 
read the (now flooded) logs.


Cheers,
Florian.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

84 matches

Mail list logo