Re: [Pacemaker] PM 1.1.5- make errors -SOLVED!

2011-11-28 Thread Nikita Michalko
Hi Andrew,

Problem solved: 
for some unclear reason were permissions on the first server for:

/usr/lib64/heartbeat  only 400  for root !
 - so simple:
chmod 755 /usr/lib64/heartbeat 

did the trick ...

Thank you very much for your time!

Nikita Michalko  


Am Montag, 28. November 2011 00:33:37 schrieb Andrew Beekhof:
 On Fri, Nov 25, 2011 at 3:07 AM, Nikita Michalko
 
 michalko.sys...@a-i-p.com wrote:
  Hi Andrew,
 
  it did't help: I've  #ifdef'd out. the reference to
  `terminate_ais_connection' i the file
  Pacemaker-1-1-c86cb93c5a57/crmd/control.c, succesfully compiled 
  installed PM, but after start of heartbeat am I faced with following
  errors: heartbeat: [14007]: WARN: Managed /usr/lib64/heartbeat/attrd
  process 14270 exited with return code 126
  ERROR: Respawning client /usr/lib64/heartbeat/attrd: and so on for
  other demons/programms  too: crmd, ccm, cib.
 
 Ok, so it did help - it let you compile it.  That it doesn't run is a
 different issue.
 
 What distro/arch is this running on?
 
  - see ha-log attached
 
  Interesting :
  on the second server runs all like a charm ...
 
  OS: SLES11/SP1
 
  ha.cf:
  logfile /var/log/ha-log
  debugfile /var/log/ha-debug
  debug 0
  cluster HLUG708
  logfacility local1
  udpport 708
  ucast eth0 hlugl9
  ucast eth1 hlugl9
  ucast eth2 hlugl9
  coredumps true
  auto_failback on
  keepalive 5
  warntime 10
  deadtime 15
  initdead 120
  node hlugl8
  node hlugl9
  crm respawn
  autojoin other
 
  Any ideas ?
 
  TIA!
 
  Nikita Michalko
 
  Am Freitag, 21. Oktober 2011 03:12:31 schrieb Andrew Beekhof:
  Looks like do_ha_control() is calling corosync specific functions when
  only support for heartbeat is being built.
  They'd just need to be #ifdef'd out.
 
 
  On Thu, Oct 20, 2011 at 9:54 PM, Nikita Michalko
 
  michalko.sys...@a-i-p.com wrote:
   Hi all,
  
   the next problem I need help ;-(
   PM Version: 1.1.5 (Pacemaker-1-1-c86cb93c5a57.tar.bz2)
   - configured with:
   configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc
   --with- heartbeat --with-stonith --with-pacemaker
   --with-daemon-user=$CLUSTER_USER -- with-daemon-group=$CLUSTER_GROUP
--enable-fatal-warnings=no --with-ras- set=linux-ha
  
   After make I get the following error:
   ...
   gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../include -I../include
   -I../include - I../libltdl   -I../libltdl  -I/usr/include/glib-2.0 -
   I/usr/lib64/glib-2.0/include   -I/usr/include/libxml2  -g -O2
   -I/usr/include - I/usr/include/heartbeat -ggdb3 -O0  -fgnu89-inline
   -fstack-protector-all -Wall - Waggregate-return -Wbad-function-cast
   -Wcast-align -Wdeclaration-after-statement -Wendif-labels
   -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -
   Winline
   -Wmissing-prototypes -Wmissing-declarations -Wnested-externs
   -Wno-long- long -Wno-strict-aliasing -Wpointer-arith
   -Wstrict-prototypes -Wwrite-strings - MT te_callbacks.o -MD -MP -MF
   .deps/te_callbacks.Tpo -c -o te_callbacks.o te_callbacks.c
   mv -f .deps/te_callbacks.Tpo .deps/te_callbacks.Po
   /bin/sh ../libtool --tag=CC  --tag=CC   --mode=link gcc -std=gnu99  -g
   -O2 - I/usr/include -I/usr/include/heartbeat -ggdb3 -O0
-fgnu89-inline -fstack- protector-all -Wall -Waggregate-return
   -Wbad-function-cast -Wcast-align - Wdeclaration-after-statement
   -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat- security
   -Wformat-nonliteral -Winline
   -Wmissing-prototypes -Wmissing- declarations -Wnested-externs
   -Wno-long-long -Wno-strict-aliasing -Wpointer- arith
   -Wstrict-prototypes -Wwrite-strings   -o crmd main.o crmd.o corosync.o
   fsa.o control.o messages.o ccm.o callbacks.o election.o join_client.o
   join_dc.o subsystems.o cib.o pengine.o tengine.o lrm.o utils.o misc.o
   te_events.o te_actions.o te_utils.o te_callbacks.o -lhbclient
   -lccmclient -llrm ../lib/fencing/libstonithd.la
   ../lib/transition/libtransitioner.la ../lib/pengine/libpe_rules.la
   ../lib/cib/libcib.la
   ../lib/common/libcrmcluster.la ../lib/common/libcrmcommon.la -lplumb
   -lpils - lbz2 -lxslt -lxml2 -lc -lglib-2.0 -luuid -lrt -ldl
-lglib-2.0 -lltdl libtool: link: gcc -std=gnu99 -g -O2 -I/usr/include
   -I/usr/include/heartbeat - ggdb3 -O0 -fgnu89-inline
   -fstack-protector-all -Wall -Waggregate-return -Wbad- function-cast
   -Wcast-align
   -Wdeclaration-after-statement -Wendif-labels -Wfloat- equal -Wformat=2
   -Wformat-security -Wformat-nonliteral -Winline -Wmissing- prototypes
   -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-
   aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -o
   .libs/crmd main.o crmd.o corosync.o fsa.o control.o messages.o ccm.o
   callbacks.o election.o join_client.o join_dc.o subsystems.o cib.o
   pengine.o tengine.o lrm.o utils.o misc.o te_events.o te_actions.o
   te_utils.o te_callbacks.o  /usr/lib64/liblrm.so
   ../lib/fencing/.libs/libstonithd.so -L/usr/lib64 -L/lib64
   /usr/lib64/libstonith.so ../lib/transition/.libs/libtransitioner.so
  

Re: [Pacemaker] colocation issue with master-slave resources

2011-11-28 Thread Andreas Kurz
On 11/28/2011 04:51 AM, Patrick H. wrote:
 I'm trying to setup a colocation rule so that a couple of master-slave
 resources cant be master unless another resource is running on the same
 node, and am getting the exact opposite of what I want. The master-slave
 resources are getting promoted to master on the node which this other
 resource isnt running on.
 
 In the below example, 'stateful1:Master' and 'stateful2:Master' should
 be on the same node 'dummy' is on. It works just fine if I change the
 colocation around so that 'dummy' depends on the stateful resources
 being master, but I dont want that. I want dummy to be able to run no
 matter what, but the stateful resources not be able to become master
 without dummy.
 
 
 # crm status
 
 Last updated: Mon Nov 28 03:47:04 2011
 Stack: cman
 Current DC: devlvs03 - partition with quorum
 Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
 2 Nodes configured, 2 expected votes
 6 Resources configured.
 
 
 Online: [ devlvs04 devlvs03 ]
 
  dummy(ocf::pacemaker:Dummy):Started devlvs03
  Master/Slave Set: stateful1-ms [stateful1]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]
  Master/Slave Set: stateful2-ms [stateful2]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]
 
 
 # crm configure show
 node devlvs03 \
 attributes standby=off
 node devlvs04 \
 attributes standby=off
 primitive dummy ocf:pacemaker:Dummy \
 meta target-role=Started
 primitive stateful1 ocf:pacemaker:Stateful
 primitive stateful2 ocf:pacemaker:Stateful
 ms stateful1-ms stateful1
 ms stateful2-ms stateful2
 colocation stateful1-colocation inf: stateful1-ms:Master dummy
 colocation stateful2-colocation inf: stateful2-ms:Master dummy

use dummy:Started ... default is to use same role as left resource, and
Dummy will never be in role Master ...

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

 property $id=cib-bootstrap-options \
 dc-version=1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f \
 cluster-infrastructure=cman \
 expected-quorum-votes=2 \
 stonith-enabled=false \
 no-quorum-policy=ignore \
 last-lrm-refresh=1322450542
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Postgresql streaming replication failover - RA needed

2011-11-28 Thread Attila Megyeri
Hi Takatoshi,

I understand your point and I agree that the correct behavior is not to start 
replication when data consistency exists.
The only thing I do not really understand is how it could have happened:

1) nodes were in sync (psql1=PRI, psql2=STREAMING|SYNC)
2) I shut down node psql1 (by placing it into standby)
3) At this moment psql1's baseline became higher by 20? What could cause this? 
Probably the demote operation itself? There were no clients connected - and 
there was definitively no write operation to the db (except if not from system 
side).

On the other hand - thank you very much for your contribution, the RA works 
very well and I really appreciate your work and help!

Bests,

Attil

-Original Message-
From: Takatoshi MATSUO [mailto:matsuo@gmail.com]
Sent: 2011. november 28. 2:10
To: The Pacemaker cluster resource manager
Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Hi Attila

Primary can not send all wals to HotStandby whether primary is shut down 
normally.
These logs validate it.

 Nov 27 16:03:27 psql1 pgsql[12204]: INFO: My Timeline ID and
 Checkpoint : 14:2320 Nov 27 16:03:27 psql1 pgsql[12204]:
 INFO: psql2 master baseline : 14:2300

psql1's location was  2320 when it was demoted.
OTOH psql2's location was 2300  when it was promoted.

It means that psql1's data was newer than psql2's one at that time.
The gap is 20.

As you said you can start psql1's PostgreSQL manually, but PostgreSQL can't 
realize this occurrence.
If you start HotStandby at psql1, data is replicated after 2320.
It's inconsistency.

Thanks,
Takatoshi MATSUO


2011/11/28 Attila Megyeri amegy...@minerva-soft.com:
 Hi Takatoshi,

 I don't think it is inconsistency problem - for me it looks like some RA bug.
 I think so, because postgres starts properly outside pacemaker.

 When pacemaker starts node psql1 I see only:

 postgresql:0_start_0 (node=psql1, call=9, rc=1, status=complete):
 unknown error

 and the postgres log is empty - so I suppose that it does not even try to 
 start it.

 What I tested was:
 - I had a stable cluster, where psql1 was the master, psql2 was the
 slave
 - I put psql1 into standby mode. (node psql1 standby) to test
 failover
 - After a while psql2 became the PRI, which is very good
 - When I put psql1 back online, postgres wouldn't start anymore from 
 pacemaker (unknown error).


 I tried to start postgres manually from the shell it worked fine, even the 
 monitor was able to see that it became in SYNC (obviously the master/slave 
 group was showing improper state as psql was started outside pacemaker.

 I don't think data inconsistency is the case, partially because there are no 
 clients connected, partially because psql starts properly outside pacemaker.

 Here is what is relevant from the log:

 Nov 27 16:02:50 psql1 pgsql[11021]: DEBUG: PostgreSQL is running as a primary.
 Nov 27 16:02:51 psql1 pgsql[11021]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:03:00 psql1 pgsql[11556]:
 DEBUG: notify: pre for demote Nov 27 16:03:00 psql1 pgsql[11590]: INFO: 
 Stopping PostgreSQL on demote.
 Nov 27 16:03:02 psql1 pgsql[11590]: INFO: waiting for server to shut
 down. done server stopped Nov 27 16:03:02 psql1 pgsql[11590]: INFO: 
 Removing /var/lib/pgsql/PGSQL.lock.
 Nov 27 16:03:02 psql1 pgsql[11590]: INFO: PostgreSQL is down Nov 27
 16:03:02 psql1 pgsql[11590]: INFO: Changing pgsql-status on psql1 : PRI-STOP.
 Nov 27 16:03:02 psql1 pgsql[11590]: DEBUG: Created recovery.conf.
 host=10.12.1.28, user=postgres Nov 27 16:03:02 psql1 pgsql[11590]: INFO: 
 Setup all nodes as an async.
 Nov 27 16:03:02 psql1 pgsql[11732]: DEBUG: notify: post for demote Nov
 27 16:03:02 psql1 pgsql[11732]: DEBUG: post-demote called. Demote
 uname is psql1 Nov 27 16:03:02 psql1 pgsql[11732]: INFO: My Timeline
 ID and Checkpoint : 14:2320 Nov 27 16:03:02 psql1 pgsql[11732]: 
 WARNING: Can't get psql2 master baseline. Waiting...
 Nov 27 16:03:03 psql1 pgsql[11732]: INFO: psql2 master baseline :
 14:2300 Nov 27 16:03:03 psql1 pgsql[11732]: ERROR: My data is 
 inconsistent.
 Nov 27 16:03:03 psql1 pgsql[11867]: DEBUG: notify: pre for stop Nov 27
 16:03:03 psql1 pgsql[11969]: INFO: PostgreSQL is 

Re: [Pacemaker] Postgresql streaming replication failover - RA needed

2011-11-28 Thread Takatoshi MATSUO
Hi Attila

2011/11/28 Attila Megyeri amegy...@minerva-soft.com:
 Hi Takatoshi,

 I understand your point and I agree that the correct behavior is not to start 
 replication when data consistency exists.
 The only thing I do not really understand is how it could have happened:

 1) nodes were in sync (psql1=PRI, psql2=STREAMING|SYNC)
 2) I shut down node psql1 (by placing it into standby)
 3) At this moment psql1's baseline became higher by 20?  What could cause 
 this? Probably the demote operation itself? There were no clients connected - 
 and there was definitively no write operation to the db (except if not from 
 system side).

Yes, PostgreSQL executes a CHECKPOINT when it is shut down normally on demote.

 On the other hand - thank you very much for your contribution, the RA works 
 very well and I really appreciate your work and help!

Not at all. Don't mention it.

Regards,
Takatoshi MATSUO


 Bests,

 Attil

 -Original Message-
 From: Takatoshi MATSUO [mailto:matsuo@gmail.com]
 Sent: 2011. november 28. 2:10
 To: The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed

 Hi Attila

 Primary can not send all wals to HotStandby whether primary is shut down 
 normally.
 These logs validate it.

 Nov 27 16:03:27 psql1 pgsql[12204]: INFO: My Timeline ID and
 Checkpoint : 14:2320 Nov 27 16:03:27 psql1 pgsql[12204]:
 INFO: psql2 master baseline : 14:2300

 psql1's location was  2320 when it was demoted.
 OTOH psql2's location was 2300  when it was promoted.

 It means that psql1's data was newer than psql2's one at that time.
 The gap is 20.

 As you said you can start psql1's PostgreSQL manually, but PostgreSQL can't 
 realize this occurrence.
 If you start HotStandby at psql1, data is replicated after 2320.
 It's inconsistency.

 Thanks,
 Takatoshi MATSUO


 2011/11/28 Attila Megyeri amegy...@minerva-soft.com:
 Hi Takatoshi,

 I don't think it is inconsistency problem - for me it looks like some RA bug.
 I think so, because postgres starts properly outside pacemaker.

 When pacemaker starts node psql1 I see only:

 postgresql:0_start_0 (node=psql1, call=9, rc=1, status=complete):
 unknown error

 and the postgres log is empty - so I suppose that it does not even try to 
 start it.

 What I tested was:
 - I had a stable cluster, where psql1 was the master, psql2 was the
 slave
 - I put psql1 into standby mode. (node psql1 standby) to test
 failover
 - After a while psql2 became the PRI, which is very good
 - When I put psql1 back online, postgres wouldn't start anymore from 
 pacemaker (unknown error).


 I tried to start postgres manually from the shell it worked fine, even the 
 monitor was able to see that it became in SYNC (obviously the master/slave 
 group was showing improper state as psql was started outside pacemaker.

 I don't think data inconsistency is the case, partially because there are no 
 clients connected, partially because psql starts properly outside pacemaker.

 Here is what is relevant from the log:

 Nov 27 16:02:50 psql1 pgsql[11021]: DEBUG: PostgreSQL is running as a 
 primary.
 Nov 27 16:02:51 psql1 pgsql[11021]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:53 psql1 pgsql[11142]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:55 psql1 pgsql[11272]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:02:57 psql1 pgsql[11368]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: 
 PostgreSQL is running as a primary.
 Nov 27 16:03:00 psql1 pgsql[11463]: DEBUG: node=psql2,
 state=STREAMING, sync_state=SYNC Nov 27 16:03:00 psql1 pgsql[11556]:
 DEBUG: notify: pre for demote Nov 27 16:03:00 psql1 pgsql[11590]: INFO: 
 Stopping PostgreSQL on demote.
 Nov 27 16:03:02 psql1 pgsql[11590]: INFO: waiting for server to shut
 down. done server stopped Nov 27 16:03:02 psql1 pgsql[11590]: INFO: 
 Removing /var/lib/pgsql/PGSQL.lock.
 Nov 27 16:03:02 psql1 pgsql[11590]: INFO: PostgreSQL is down Nov 27
 16:03:02 psql1 pgsql[11590]: INFO: Changing pgsql-status on psql1 : 
 PRI-STOP.
 Nov 27 16:03:02 psql1 pgsql[11590]: DEBUG: Created recovery.conf.
 host=10.12.1.28, user=postgres Nov 27 16:03:02 psql1 pgsql[11590]: INFO: 
 Setup all nodes as an async.
 Nov 27 16:03:02 psql1 pgsql[11732]: DEBUG: notify: post for demote Nov
 27 16:03:02 psql1 pgsql[11732]: DEBUG: post-demote called. Demote
 uname is psql1 Nov 27 16:03:02 psql1 pgsql[11732]: INFO: My Timeline
 ID and Checkpoint : 14:2320 Nov 27 16:03:02 psql1 pgsql[11732]: 
 WARNING: Can't get psql2 master baseline. Waiting...
 Nov 27 16:03:03 psql1 pgsql[11732]: INFO: 

Re: [Pacemaker] colocation issue with master-slave resources

2011-11-28 Thread Patrick H.

Sent: Mon Nov 28 2011 01:31:22 GMT-0700 (MST)
From: Andreas Kurz andr...@hastexo.com
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] colocation issue with master-slave resources

On 11/28/2011 04:51 AM, Patrick H. wrote:

I'm trying to setup a colocation rule so that a couple of master-slave
resources cant be master unless another resource is running on the same
node, and am getting the exact opposite of what I want. The master-slave
resources are getting promoted to master on the node which this other
resource isnt running on.

In the below example, 'stateful1:Master' and 'stateful2:Master' should
be on the same node 'dummy' is on. It works just fine if I change the
colocation around so that 'dummy' depends on the stateful resources
being master, but I dont want that. I want dummy to be able to run no
matter what, but the stateful resources not be able to become master
without dummy.


# crm status

Last updated: Mon Nov 28 03:47:04 2011
Stack: cman
Current DC: devlvs03 - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ devlvs04 devlvs03 ]

  dummy(ocf::pacemaker:Dummy):Started devlvs03
  Master/Slave Set: stateful1-ms [stateful1]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]
  Master/Slave Set: stateful2-ms [stateful2]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]


# crm configure show
node devlvs03 \
 attributes standby=off
node devlvs04 \
 attributes standby=off
primitive dummy ocf:pacemaker:Dummy \
 meta target-role=Started
primitive stateful1 ocf:pacemaker:Stateful
primitive stateful2 ocf:pacemaker:Stateful
ms stateful1-ms stateful1
ms stateful2-ms stateful2
colocation stateful1-colocation inf: stateful1-ms:Master dummy
colocation stateful2-colocation inf: stateful2-ms:Master dummy

use dummy:Started ... default is to use same role as left resource, and
Dummy will never be in role Master ...

Regards,
Andreas
Tried that too (just not the configuration at the time I sent the 
email), no effect.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Fencing libvirt/KVM nodes running on different hosts?

2011-11-28 Thread Andreas Ntaflos
Hi,

Scenario: two physical virtualisation hosts run various KVM-based
virtual machines, managed by Libvirt. Two VMs, one on each host, form a
Pacemaker cluster, say for a simple database server, using DRBD and a
virtual/cluster IP address. Using Ubuntu 10.04 and Pacemaker 1.1.6, with
Corosync 1.4.2 on the hosts and guests.

How do I implement node-level fencing in this scenario?

Can the rather new external/libvirt STONITH plugin be used here? It
seems to me it only supports a single hypervisor URI to connect to and
expects all VMs/nodes that can be fenced to be running on the same
hypervisor.

Looking at http://www.clusterlabs.org/wiki/Guest_Fencing it says that
fencing guests running on multiple hosts is not supported in
fence-virt/fence-virtd.

What are my options here? How do other people manage node-level
fencing/STONITH when the nodes are VMs and running on different physical
hosts (which seems like the sensible thing to do, considering a single
host is a SPOF)?

Sorta related question: are Pacemaker clusters based on virtual machines
(and Libvirt) really so uncommon that there isn't a quasi-definitive
answer to this? Like If you use Libvirt, implement fencing by using
this or that STONITH plugin.

Thanks in advance,

Andreas

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to build Pacemaker with Cman support?

2011-11-28 Thread Andrew Beekhof
2011/11/28 Богомолов Дмитрий Викторович beats...@mail.ru:
 Thanks for your reply!


 28 ноября 2011, 03:54 от Andrew Beekhof and...@beekhof.net:
 2011/11/28 Богомолов Дмитрий Викторович beats...@mail.ru:
  Hello.
  Addition. OS - Ubuntu 11.10
  I have  installed libcman-dev, and know in config.log I can see

 I'm pretty sure the builds of pacemaker that come with ubuntu support
 cman already.
 No it's not.
 I have tried to upgrade from distributives: oneiric-proposed, 
 ppa.launchpad.net/ubuntu-ha,
 ppa.launchpad.net/ubuntu-ha-maintainers
 There is no luck.
 I post about it on ubuntu communiti forum, there is no answer.
 http://ubuntuforums.org/showthread.php?t=1885340
 And i found bug report log without answer
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639548

 That's why i trying now to build pacemaker from sources.

 I selected ubuntu because of simplicity and oneiric distr because of most 
 recent.

 I want to get Xen VM on cluster, I have tried active/passive configuration, 
 but it's not exactly what i need. So know i try to get active active 
 configuration.

 
  configure:16634: checking for cman
 
  configure:16638: result: yes

Ok, but you originally posted:

configure:16634: checking for cman

configure:16638: result: no

So maybe something changed?

 
  But, after :
  make  make install
  service pacemaker start
  I still get this log event:
  ERROR: read_config: Corosync configured for CMAN but this build of 
  Pacemaker
  doesn't support it
  Please, help!
 
  Hello.
 
  I try to configure Active/Active cluster Cman+Pacemaker, that described
  there:
  http://www.clusterlabs.org/doc/en-US..._from_Scratch/
  I set Cman, but when I start Pacemaker with this command:
  $sudo service pacemaker start
  I get this log event:
  ERROR: read_config: Corosync configured for CMAN but this build of 
  Pacemaker
  doesn't support it
 
  Now I try to build Pacemaker with Cman.
 
  I follow instructions there http://www.clusterlabs.org/wiki/Install
 
  only difference for configuring Pacemaker:
 
  ./autogen.sh  ./configure --prefix=$PREFIX --with-lcrso-dir=$LCRSODIR
  -with-cman=yes
 
  But after installing pacemaker, I have the same error.
 
  When I look on config.log, I can see this:
 
  configure:16634: checking for cman
 
  configure:16638: result: no
 
  So, help please, how to build pacemaker with cman support?
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing libvirt/KVM nodes running on different hosts?

2011-11-28 Thread Andrew Beekhof
On Tue, Nov 29, 2011 at 6:55 AM, Andreas Ntaflos
d...@pseudoterminal.org wrote:
 Hi,

 Scenario: two physical virtualisation hosts run various KVM-based
 virtual machines, managed by Libvirt. Two VMs, one on each host, form a
 Pacemaker cluster, say for a simple database server, using DRBD and a
 virtual/cluster IP address. Using Ubuntu 10.04 and Pacemaker 1.1.6, with
 Corosync 1.4.2 on the hosts and guests.

 How do I implement node-level fencing in this scenario?

 Can the rather new external/libvirt STONITH plugin be used here? It
 seems to me it only supports a single hypervisor URI to connect to and
 expects all VMs/nodes that can be fenced to be running on the same
 hypervisor.

 Looking at http://www.clusterlabs.org/wiki/Guest_Fencing it says that
 fencing guests running on multiple hosts is not supported in
 fence-virt/fence-virtd.

 What are my options here? How do other people manage node-level
 fencing/STONITH when the nodes are VMs and running on different physical
 hosts (which seems like the sensible thing to do, considering a single
 host is a SPOF)?

 Sorta related question: are Pacemaker clusters based on virtual machines
 (and Libvirt) really so uncommon that there isn't a quasi-definitive
 answer to this? Like If you use Libvirt, implement fencing by using
 this or that STONITH plugin.

You could try fence_xvm or fence_virt from the RHCS set of stonith
agents (which pacemaker also supports).
I believe it also handles the case when the guest could be on one of
multiple hosts, personally I only use it for a single host.

Lon might have some documentation pointers...


 Thanks in advance,

 Andreas

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] colocation issue with master-slave resources

2011-11-28 Thread Patrick H.

Sent: Mon Nov 28 2011 15:27:10 GMT-0700 (MST)
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.org Andreas Kurz andr...@hastexo.com

Subject: Re: [Pacemaker] colocation issue with master-slave resources

Perhaps try and ordering constraint, I may have also fixed something
in this area for 1.1.6 so an upgrade might also help

On Tue, Nov 29, 2011 at 1:38 AM, Patrick H.pacema...@feystorm.net  wrote:

Sent: Mon Nov 28 2011 01:31:22 GMT-0700 (MST)
From: Andreas Kurzandr...@hastexo.com
To: The Pacemaker cluster resource managerpacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] colocation issue with master-slave resources

On 11/28/2011 04:51 AM, Patrick H. wrote:

I'm trying to setup a colocation rule so that a couple of master-slave
resources cant be master unless another resource is running on the same
node, and am getting the exact opposite of what I want. The master-slave
resources are getting promoted to master on the node which this other
resource isnt running on.

In the below example, 'stateful1:Master' and 'stateful2:Master' should
be on the same node 'dummy' is on. It works just fine if I change the
colocation around so that 'dummy' depends on the stateful resources
being master, but I dont want that. I want dummy to be able to run no
matter what, but the stateful resources not be able to become master
without dummy.


# crm status

Last updated: Mon Nov 28 03:47:04 2011
Stack: cman
Current DC: devlvs03 - partition with quorum
Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
6 Resources configured.


Online: [ devlvs04 devlvs03 ]

  dummy(ocf::pacemaker:Dummy):Started devlvs03
  Master/Slave Set: stateful1-ms [stateful1]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]
  Master/Slave Set: stateful2-ms [stateful2]
  Masters: [ devlvs04 ]
  Slaves: [ devlvs03 ]


# crm configure show
node devlvs03 \
 attributes standby=off
node devlvs04 \
 attributes standby=off
primitive dummy ocf:pacemaker:Dummy \
 meta target-role=Started
primitive stateful1 ocf:pacemaker:Stateful
primitive stateful2 ocf:pacemaker:Stateful
ms stateful1-ms stateful1
ms stateful2-ms stateful2
colocation stateful1-colocation inf: stateful1-ms:Master dummy
colocation stateful2-colocation inf: stateful2-ms:Master dummy

use dummy:Started ... default is to use same role as left resource, and
Dummy will never be in role Master ...

Regards,
Andreas

Tried that too (just not the configuration at the time I sent the email), no
effect.


Upgraded to 1.1.6 and put in an ordering constraint, still no joy.

# crm status

Last updated: Mon Nov 28 23:09:37 2011
Last change: Mon Nov 28 23:08:34 2011 via cibadmin on devlvs03
Stack: cman
Current DC: devlvs03 - partition with quorum
Version: 1.1.6-1.el6-b379478e0a66af52708f56d0302f50b6f13322bd
2 Nodes configured, 2 expected votes
5 Resources configured.


Online: [ devlvs04 devlvs03 ]

 dummy(ocf::pacemaker:Dummy):Started devlvs03
 Master/Slave Set: stateful1-ms [stateful1]
 Masters: [ devlvs04 ]
 Slaves: [ devlvs03 ]
 Master/Slave Set: stateful2-ms [stateful2]
 Masters: [ devlvs04 ]
 Slaves: [ devlvs03 ]


# crm configure show
node devlvs03 \
attributes standby=off
node devlvs04 \
attributes standby=off
primitive dummy ocf:pacemaker:Dummy \
meta target-role=Started
primitive stateful1 ocf:pacemaker:Stateful
primitive stateful2 ocf:pacemaker:Stateful
ms stateful1-ms stateful1
ms stateful2-ms stateful2
colocation stateful1-colocation inf: stateful1-ms:Master dummy:Started
colocation stateful2-colocation inf: stateful2-ms:Master dummy:Started
order stateful1-start inf: dummy:start stateful1-ms:promote
order stateful2-start inf: dummy:start stateful2-ms:promote
property $id=cib-bootstrap-options \
dc-version=1.1.6-1.el6-b379478e0a66af52708f56d0302f50b6f13322bd \
cluster-infrastructure=cman \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1322450542

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fencing libvirt/KVM nodes running on different hosts?

2011-11-28 Thread Vladislav Bogdanov
28.11.2011 22:55, Andreas Ntaflos wrote:
 Hi,
 
 Scenario: two physical virtualisation hosts run various KVM-based
 virtual machines, managed by Libvirt. Two VMs, one on each host, form a
 Pacemaker cluster, say for a simple database server, using DRBD and a
 virtual/cluster IP address. Using Ubuntu 10.04 and Pacemaker 1.1.6, with
 Corosync 1.4.2 on the hosts and guests.
 
 How do I implement node-level fencing in this scenario?

I use set of:
* qpid server
* libvirt-qpid on each host which runs VMs
* fence-virtd with multicast listener
* fence-virtd-libvirt-qpid (patched, patches were posted to pacemaker
list 03.10.2011 for those who need it)
* fence_xvm as a fencing agent

Major problem I recently discovered is that you can have only one
instance of fence_xvm process running (executing) on a node
simultaneously, because it binds to predefined port.

Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] LCMC user guide

2011-11-28 Thread Rasto Levrinc
Hi,

Here is a new LCMC user guide, well sort of:

http://lcmc.sourceforge.net/lcmc-user-guide.html

The LCMC is a great piece of software, only thing missing was a user
guide. So I started to write one, but got bored after like two sentences
every time and I couldn't simply do it. But good news is that I've created
the so-called annotated-screenshot book instead. It turned out that this
form is actually better than the intended user guide, it's much easier
to understand, remember and learn from it.

This guide is work in progress, the Pacemaker part is mostly done, the
DRBD and KVM parts are still missing, but I am (slowly) working on it.

Rasto

-- 
Dipl.-Ing. Rastislav Levrinc
rasto.levr...@gmail.com
Linux Cluster Management Console
http://lcmc.sf.net/

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org