Re: [Pacemaker] How to build Pacemaker with Cman support?
Hello. 29 ноября 2011, 02:24 от Andrew Beekhof : > 2011/11/28 Богомолов Дмитрий Викторович : > > Thanks for your reply! > > > > > > 28 ноября 2011, 03:54 от Andrew Beekhof : > >> 2011/11/28 Богомолов Дмитрий Викторович : > >> > Hello. > >> > Addition. OS - Ubuntu 11.10 > >> > I have installed libcman-dev, and know in config.log I can see > >> > >> I'm pretty sure the builds of pacemaker that come with ubuntu support > >> cman already. > > No it's not. > > I have tried to upgrade from distributives: oneiric-proposed, > > ppa.launchpad.net/ubuntu-ha, > > ppa.launchpad.net/ubuntu-ha-maintainers > > There is no luck. > > I post about it on ubuntu communiti forum, there is no answer. > > http://ubuntuforums.org/showthread.php?t=1885340 > > And i found bug report log without answer > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639548 > > > > That's why i trying now to build pacemaker from sources. > > > > I selected ubuntu because of simplicity and oneiric distr because of most > > recent. > > > > I want to get Xen VM on cluster, I have tried active/passive configuration, > > but it's not exactly what i need. So know i try to get active active > > configuration. > >> > >> > > >> > configure:16634: checking for cman > >> > > >> > configure:16638: result: yes > > Ok, but you originally posted: > > configure:16634: checking for cman > > configure:16638: result: no > > So maybe something changed? Yes. And i wrote about it. First i tried to build this way: aptitude build-dep pacemaker apt-get source pacemaker ./autogen.sh ./configure --enable-fatal-warnings=no --with-cman --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr ./make ./make install this way i get in config.log: configure:16634: checking for cman configure:16638: result: no then i install libcman-dev, ./make clean ./autogen.sh ./configure --enable-fatal-warnings=no --with-cman --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr ./make ./make install And now i get configure:16634: checking for cman configure:16638: result: yes But, when i succesfully start cman, then start pacemaker, which failed to start, i get: ERROR: read_config: Corosync configured for CMAN but this build of Pacemaker doesn't support it > > >> > > >> > But, after : > >> > make && make install > >> > service pacemaker start > >> > I still get this log event: > >> > ERROR: read_config: Corosync configured for CMAN but this build of > >> > Pacemaker > >> > doesn't support it > >> > Please, help! > >> > > >> > Hello. > >> > > >> > I try to configure Active/Active cluster Cman+Pacemaker, that described > >> > there: > >> > http://www.clusterlabs.org/doc/en-US..._from_Scratch/ > >> > I set Cman, but when I start Pacemaker with this command: > >> > $sudo service pacemaker start > >> > I get this log event: > >> > ERROR: read_config: Corosync configured for CMAN but this build of > >> > Pacemaker > >> > doesn't support it > >> > > >> > Now I try to build Pacemaker with Cman. > >> > > >> > I follow instructions there http://www.clusterlabs.org/wiki/Install > >> > > >> > only difference for configuring Pacemaker: > >> > > >> > ./autogen.sh && ./configure --prefix=$PREFIX --with-lcrso-dir=$LCRSODIR > >> > -with-cman=yes > >> > > >> > But after installing pacemaker, I have the same error. > >> > > >> > When I look on config.log, I can see this: > >> > > >> > configure:16634: checking for cman > >> > > >> > configure:16638: result: no > >> > > >> > So, help please, how to build pacemaker with cman support? > >> > > >> > ___ > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs: http://bugs.clusterlabs.org > >> > > >> > > >> > > >> > > >> > > >> > ___ > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> > Bugs: http://bugs.clusterlabs.org > >> > > >> > > >> > > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
30.11.2011 14:08, Vadim Bulst wrote: > Hello, > > first of all I'd like to ask you a general question: > > Does somebody successfully set up a clvm cluster with pacemaker and run > it in productive mode? I will say yes after I finally resolve remaining dlm&fencing issues. > > Now back to the concrete problem: > > I configured two interfaces for corosync: > > root@bbzclnode04:~# corosync-cfgtool -s > Printing ring status. > Local node ID 897624256 > RING ID 0 > id= 192.168.128.53 > status= ring 0 active with no faults > RING ID 1 > id= 192.168.129.23 > status= ring 1 active with no faults > > RRD set to passive > > I also made some changes to my cib: > > node bbzclnode04 > node bbzclnode06 > node bbzclnode07 > primitive clvm ocf:lvm2:clvmd \ > params daemon_timeout="30" \ > meta target-role="Started" Please instruct clvmd to use corosync stack instead of openais (-I corosync): otherwise it uses LCK service which is not mature and I observed major problems with it. > primitive dlm ocf:pacemaker:controld \ > meta target-role="Started" > group dlm-clvm dlm clvm > clone dlm-clvm-clone dlm-clvm \ > meta interleave="true" ordered="true" > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="3" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > last-lrm-refresh="1322643084" > > I cleaned and restarted the resources - nothing! : > > crm(live)resource# cleanup dlm-clvm-clone > Cleaning up dlm:0 on bbzclnode04 > Cleaning up dlm:0 on bbzclnode06 > Cleaning up dlm:0 on bbzclnode07 > Cleaning up clvm:0 on bbzclnode04 > Cleaning up clvm:0 on bbzclnode06 > Cleaning up clvm:0 on bbzclnode07 > Cleaning up dlm:1 on bbzclnode04 > Cleaning up dlm:1 on bbzclnode06 > Cleaning up dlm:1 on bbzclnode07 > Cleaning up clvm:1 on bbzclnode04 > Cleaning up clvm:1 on bbzclnode06 > Cleaning up clvm:1 on bbzclnode07 > Cleaning up dlm:2 on bbzclnode04 > Cleaning up dlm:2 on bbzclnode06 > Cleaning up dlm:2 on bbzclnode07 > Cleaning up clvm:2 on bbzclnode04 > Cleaning up clvm:2 on bbzclnode06 > Cleaning up clvm:2 on bbzclnode07 > Waiting for 19 replies from the CRMd... OK > > crm_mon: > > > Last updated: Wed Nov 30 10:15:09 2011 > Stack: openais > Current DC: bbzclnode04 - partition with quorum > Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f > 3 Nodes configured, 3 expected votes > 1 Resources configured. > > > Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ] > > > Failed actions: > clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete): > unknown error > clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete): > unknown error > clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete): > unknown error > > > When I look in the log - there is a message which tells me that may be > another clvm process is already running - but it isn't so. > > "clvmd could not create local socket Another clvmd is probably already > running" > > Or is it a permission problem - writing to the filesystem? Is there a > way to get rid of it? You can try to run it manually under strace. It will show you what happens. > > Shell I use a different distro - our install from source? > > > Am 24.11.2011 22:59, schrieb Andreas Kurz: >> Hello, >> >> On 11/24/2011 10:12 PM, Vadim Bulst wrote: >>> Hi Andreas, >>> >>> I changed my cib: >>> >>> node bbzclnode04 >>> node bbzclnode06 >>> node bbzclnode07 >>> primitive clvm ocf:lvm2:clvmd \ >>> params daemon_timeout="30" >>> primitive dlm ocf:pacemaker:controld >>> group g_lock dlm clvm >>> clone g_lock-clone g_lock \ >>> meta interleave="true" >>> property $id="cib-bootstrap-options" \ >>> dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ >>> cluster-infrastructure="openais" \ >>> expected-quorum-votes="3" \ >>> no-quorum-policy="ignore" \ >>> stonith-enabled="false" \ >>> last-lrm-refresh="1322049979 >>> >>> but no luck at all. >> I assume you did at least a cleanup on clvm and it still does not work >> ... next step would be to grep for ERROR in your cluster log and look >> for other suspicious messages to find out why clvm is not that motivated >> to start. >> >>> "And use Corosync 1.4.x with redundant rings and automatic ring recovery >>> feature enabled." >>> >>> I got two interfaces per server - there are bonded together and bridged >>> for virtualization. Only one untagged vlan. I tried to give a tagged >>> Vlan Bridge a Address but didn't worked. My network conf looks like that: >> One ore two extra nics are quite affordable today to build e.g. a direct >> connection between the nodes (if possible) >> >> Regards, >> Andreas >> >> >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.or
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
Am 30.11.2011 12:22, schrieb Vladislav Bogdanov: 30.11.2011 14:08, Vadim Bulst wrote: Hello, first of all I'd like to ask you a general question: Does somebody successfully set up a clvm cluster with pacemaker and run it in productive mode? I will say yes after I finally resolve remaining dlm&fencing issues. Now back to the concrete problem: I configured two interfaces for corosync: root@bbzclnode04:~# corosync-cfgtool -s Printing ring status. Local node ID 897624256 RING ID 0 id= 192.168.128.53 status= ring 0 active with no faults RING ID 1 id= 192.168.129.23 status= ring 1 active with no faults RRD set to passive I also made some changes to my cib: node bbzclnode04 node bbzclnode06 node bbzclnode07 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout="30" \ meta target-role="Started" Please instruct clvmd to use corosync stack instead of openais (-I corosync): otherwise it uses LCK service which is not mature and I observed major problems with it. primitive dlm ocf:pacemaker:controld \ meta target-role="Started" group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave="true" ordered="true" property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="3" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1322643084" I cleaned and restarted the resources - nothing! : crm(live)resource# cleanup dlm-clvm-clone Cleaning up dlm:0 on bbzclnode04 Cleaning up dlm:0 on bbzclnode06 Cleaning up dlm:0 on bbzclnode07 Cleaning up clvm:0 on bbzclnode04 Cleaning up clvm:0 on bbzclnode06 Cleaning up clvm:0 on bbzclnode07 Cleaning up dlm:1 on bbzclnode04 Cleaning up dlm:1 on bbzclnode06 Cleaning up dlm:1 on bbzclnode07 Cleaning up clvm:1 on bbzclnode04 Cleaning up clvm:1 on bbzclnode06 Cleaning up clvm:1 on bbzclnode07 Cleaning up dlm:2 on bbzclnode04 Cleaning up dlm:2 on bbzclnode06 Cleaning up dlm:2 on bbzclnode07 Cleaning up clvm:2 on bbzclnode04 Cleaning up clvm:2 on bbzclnode06 Cleaning up clvm:2 on bbzclnode07 Waiting for 19 replies from the CRMd... OK crm_mon: Last updated: Wed Nov 30 10:15:09 2011 Stack: openais Current DC: bbzclnode04 - partition with quorum Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 3 Nodes configured, 3 expected votes 1 Resources configured. Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ] Failed actions: clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete): unknown error clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete): unknown error clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete): unknown error When I look in the log - there is a message which tells me that may be another clvm process is already running - but it isn't so. "clvmd could not create local socket Another clvmd is probably already running" Or is it a permission problem - writing to the filesystem? Is there a way to get rid of it? You can try to run it manually under strace. It will show you what happens. Here we go: root@bbzclnode07:~# strace clvmd -d -I cororsync execve("/usr/sbin/clvmd", ["clvmd", "-d", "-I", "cororsync"], [/* 18 vars */]) = 0 brk(0) = 0x12f7000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9f09dad000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=25864, ...}) = 0 mmap(NULL, 25864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9f09da6000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0 mmap(NULL, 2109704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9f0998b000 mprotect(0x7f9f0998d000, 2097152, PROT_NONE) = 0 mmap(0x7f9f09b8d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f9f09b8d000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/libdevmapper-event.so.1.02.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \24\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=18704, ...}) = 0 mmap(NULL, 2113872, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9f09786000 mprotect(0x7f9f0978a000, 2093056, PROT_NONE) = 0 mmap(0x7f9f09989000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f9f09989000 close(3)
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
@Vladislav Where and how can I set the switch for the cluster manager if it runs as a resource. Am 30.11.2011 13:10, schrieb Vadim Bulst: Am 30.11.2011 12:22, schrieb Vladislav Bogdanov: 30.11.2011 14:08, Vadim Bulst wrote: Hello, first of all I'd like to ask you a general question: Does somebody successfully set up a clvm cluster with pacemaker and run it in productive mode? I will say yes after I finally resolve remaining dlm&fencing issues. Now back to the concrete problem: I configured two interfaces for corosync: root@bbzclnode04:~# corosync-cfgtool -s Printing ring status. Local node ID 897624256 RING ID 0 id= 192.168.128.53 status= ring 0 active with no faults RING ID 1 id= 192.168.129.23 status= ring 1 active with no faults RRD set to passive I also made some changes to my cib: node bbzclnode04 node bbzclnode06 node bbzclnode07 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout="30" \ meta target-role="Started" Please instruct clvmd to use corosync stack instead of openais (-I corosync): otherwise it uses LCK service which is not mature and I observed major problems with it. primitive dlm ocf:pacemaker:controld \ meta target-role="Started" group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave="true" ordered="true" property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="3" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1322643084" I cleaned and restarted the resources - nothing! : crm(live)resource# cleanup dlm-clvm-clone Cleaning up dlm:0 on bbzclnode04 Cleaning up dlm:0 on bbzclnode06 Cleaning up dlm:0 on bbzclnode07 Cleaning up clvm:0 on bbzclnode04 Cleaning up clvm:0 on bbzclnode06 Cleaning up clvm:0 on bbzclnode07 Cleaning up dlm:1 on bbzclnode04 Cleaning up dlm:1 on bbzclnode06 Cleaning up dlm:1 on bbzclnode07 Cleaning up clvm:1 on bbzclnode04 Cleaning up clvm:1 on bbzclnode06 Cleaning up clvm:1 on bbzclnode07 Cleaning up dlm:2 on bbzclnode04 Cleaning up dlm:2 on bbzclnode06 Cleaning up dlm:2 on bbzclnode07 Cleaning up clvm:2 on bbzclnode04 Cleaning up clvm:2 on bbzclnode06 Cleaning up clvm:2 on bbzclnode07 Waiting for 19 replies from the CRMd... OK crm_mon: Last updated: Wed Nov 30 10:15:09 2011 Stack: openais Current DC: bbzclnode04 - partition with quorum Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 3 Nodes configured, 3 expected votes 1 Resources configured. Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ] Failed actions: clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete): unknown error clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete): unknown error clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete): unknown error When I look in the log - there is a message which tells me that may be another clvm process is already running - but it isn't so. "clvmd could not create local socket Another clvmd is probably already running" Or is it a permission problem - writing to the filesystem? Is there a way to get rid of it? You can try to run it manually under strace. It will show you what happens. Here we go: root@bbzclnode07:~# strace clvmd -d -I cororsync execve("/usr/sbin/clvmd", ["clvmd", "-d", "-I", "cororsync"], [/* 18 vars */]) = 0 brk(0) = 0x12f7000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9f09dad000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=25864, ...}) = 0 mmap(NULL, 25864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9f09da6000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\r\0\0\0\0\0\0", 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=14768, ...}) = 0 mmap(NULL, 2109704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9f0998b000 mprotect(0x7f9f0998d000, 2097152, PROT_NONE) = 0 mmap(0x7f9f09b8d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f9f09b8d000 close(3)= 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/libdevmapper-event.so.1.02.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \24\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=18704, ...}) = 0 mmap(NULL, 2113872, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f9f09786000 mprotect(0x7f9f0978a000, 2093056, PR
[Pacemaker] lrmd hanging
So last night I was supposed to get a cluster running, everything worked ok on a virtual environment using the same software and by my experience I only had to install pacemaker and corosync (from the ubuntu 10.04 ppa) and get it rolling. What really happened was: I could use crm configure to set properties to the cluster like resource stickiness and quorum and disable stonith. When I tried to add primitives, the crm just hang there, without returning an error or completing. I noticed those two entries in the log, everytime crm tries to configure something the first time: Nov 30 05:33:26 server lrmd: [18102]: debug: on_msg_register:client lrmadmin [18159] registered Nov 30 05:33:26 server lrmd: [18102]: debug: on_receive_cmd: the IPC to client [pid:18159] disconnected. Also, when I stop corosync it sends a TERM signal for lrmd but it doesn't exit, even after some minutes, I have to kill -9 it. I tried to strace lrmd but it's stuck on a FUTEX that really doesn't really help a lot: Process 32764 attached - interrupt to quit futex(0xe070d8, FUTEX_WAIT_PRIVATE, 2, NULL^C Anyone has any idea what would make lrmd to just hang? []s core ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Strange failover behaviour SOLVED
Hi Andreas, Thank you for your answer. Didn't know about the instability issues with DRBD 8.4. The reason why I compiled everything myself is the fact that the versions that are shipped with CentOS 6 had some problems as well, like heartbeatprocesses taking up 100% cpu. Don't have this now I did try switching to the ocf way of stopping/starting httpd, and this seems to work perfectly. Thanks again, another problem solved. Hans -Original Message- From: Andreas Kurz [mailto:andr...@hastexo.com] Sent: Wednesday, November 30, 2011 00:12 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Strange failover behaviour On 11/29/2011 07:14 PM, Hans Lammerts wrote: > Hi there, > > > > I have something strange I would like the community to give it’s > opinion on. I can’t figure out > > what is going wrong. > > > > I have a 2 node cluster (named cl1 and cl2). On this cluster I’m > running MySQL, Apache, and > > Zarafa. Both clusters run CentOS 6. > > I have downloaded all latest sources for DRBD, Cluster Glue, Resource > Agents, Heartbeat > > and Pacemaker and compiled them. Everything seems to be OK. BTW ... no need to compile Pacemaker/Glue/Agents ... it is shipped with CentOS 6 ... and use DRBD 8.4.0 only for test setups, there are some known stability issues. > > > > I believe my Pacemaker setup to be OK, but I may be mistaken. Will > attach the config below. > > > > What I experience when I do a failover from cl1 to cl2 is that MySQL > and Zarafa failover without > > any problems, but httpd seems to be getting in a loop of starting and > stopping. > > The error that is displayed is this : > > > > apache2_monitor_1 (node=cl2, call=502, rc=7, status=complete): not > running > the cluster and apache logs should give you good hints on the problem ... > > > If I remember to set the failcount of the apache2 resource to 0, httpd > will eventually start after > > quite a number of retries : > > > > [root@cl2 httpd]# crm resource failcount apache2 show cl2 > > scope=status name=fail-count-apache2 value=69 > > > > If I forget to reset the failcount (something you should not need to > do), the failcount will reach > > infinity at some time in the future, and httpd won’t start. The number > of times Pacemaker > > retries Is also different every time. > > > > Wait, it gets stranger… > > Putting cl1 online again, the fallback is initiated, and this goes > without any problems. So, it looks > > like the problems reside only on the second cluster half. The hardware > of cl2 is different from cl1, and > > it is the slower machine of the two. > > Yes, I made very sure every configuration file is the same on both nodes. > > And yes, I made sure the server-status section in httpd.conf is > uncommented, as is the > > ExtendedStatus directive. Doing a wget -O - > http://localhost/server-status?auto works you are using the lsb script ... this does a simple pid check, at least on the SL 6.1 test machines in my lab. Have you tried the ocf RA? > > perfectly. > > > > Can anyone please tell me what the problem could be here ? Dig through your logs ... or hire someone to do it for you ;-) Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > Thanks. > > > > Versioninfo: > > CentOS 6.0 > > DRBD 8.4.0 > Glue 1.0.8 > > Resource agents 3.9.2 > > Heartbeat 3.0.5 > > Pacemaker 1.0.11 > > > > Pacemaker config: > > > > node $id="62b94e0a-532f-4f99-acdb-57d6052a5635" cl1 \ > > attributes standby="on" > > node $id="7444dfb4-2c9b-4130-83c4-c0cd3d7ec006" cl2 \ > > attributes standby="off" > > primitive apache2 lsb:httpd \ > > op monitor interval="10" timeout="30" \ > > op start interval="0" timeout="120" \ > > op stop interval="0" timeout="120" \ > > meta target-role="Started" > > primitive drbd_http ocf:linbit:drbd \ > > params drbd_resource="http" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="100" \ > > op monitor interval="59s" role="Master" timeout="30s" \ > > op monitor interval="60s" role="Slave" timeout="30s" > > primitive drbd_mysql ocf:linbit:drbd \ > > params drbd_resource="mysql" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="100" \ > > op monitor interval="59s" role="Master" timeout="30s" \ > > op monitor interval="60s" role="Slave" timeout="30s" > > primitive drbd_zarafa ocf:linbit:drbd \ > > params drbd_resource="zarafa" \ > > op start interval="0" timeout="240" \ > > op stop interval="0" timeout="100" \ > > op monitor interval="59s" role="Master" timeout="30s" \ > > op monitor interval="60s" role="Slave" timeout="30s" > > primitive http_fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd1" directory="/var/www/html
Re: [Pacemaker] Regarding Stonith RAs
Hello Andreas, Pacemaker is not built with Heartbeat support on RHEL-6 and its derivatives. How do I check this and what steps do I need to take to resolve this issue. Thanks and regards Neha Chatrath On Thu, Nov 24, 2011 at 5:38 PM, neha chatrath wrote: > Hello, > > I could get list of Stontih RAs by installing cman, clvm, ricci, > pacemaker, rgmanages RPMs provided by CentOS 6 distribution. > But unfortunately after installing these packages, all the process related > to Pacemaker are not coming up on starting Heartbeat Deamon. > When I start Heartbeat daemon, only following process are started: > > Pacemaker is not built with Heartbeat support on RHEL-6 and its > derivatives. > root@p init.d]# ps -eaf |grep heartbeat > root 3522 1 0 17:26 ?00:00:00 heartbeat: master control > process > root 3525 3522 0 17:26 ?00:00:00 heartbeat: FIFO > reader > root 3526 3522 0 17:26 ?00:00:00 heartbeat: write: bcast > eth1 > root 3527 3522 0 17:26 ?00:00:00 heartbeat: read: bcast > eth1 > root 3538 3381 0 17:26 pts/300:00:00 grep heartbeat > > In the log messages, following error logs are observed: > "Nov 24 17:26:19 p heartbeat: [3522]: debug: Signing on API client 3539 > (ccm) > Nov 24 17:26:19 p ccm: [3539]: info: Hostname: p > Nov 24 17:26:19 p attrd: [3543]: info: Invoked: /usr/lib/heartbeat/attrd > Nov 24 17:26:19 p stonith-ng: [3542]: info: Invoked: > /usr/lib/heartbeat/stonithd > Nov 24 17:26:19 p cib: [3540]: info: Invoked: /usr/lib/heartbeat/cib > *Nov 24 17:26:19 p lrmd: [3541]: ERROR: socket_wait_conn_new: trying to > create in /var/run/heartbeat/lrm_cmd_sock bind:: No such file or directory > * > Nov 24 17:26:19 p lrmd: [3541]: ERROR: main: can not create wait > connection for command. > Nov 24 17:26:19 p lrmd: [3541]: ERROR: Startup aborted (can't create comm > channel). Shutting down. > Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed /usr/lib/heartbeat/lrmd > -r process 3541 exited with return code 100. > Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client /usr/lib/heartbeat/lrmd > -r exited with return code 100. > Nov 24 17:26:19 p attrd: [3543]: info: crm_log_init_worker: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Nov 24 17:26:19 p attrd: [3543]: info: main: Starting up > Nov 24 17:26:19 p stonith-ng: [3542]: info: crm_log_init_worker: Changed > active directory to /var/lib/heartbeat/cores/root > Nov 24 17:26:19 p cib: [3540]: info: crm_log_init_worker: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Nov 24 17:26:19 p attrd: [3543]: CRIT: get_cluster_type: This installation > of Pacemaker does not support the '(null)' cluster infrastructure. > Terminating. > Nov 24 17:26:19 p stonith-ng: [3542]: CRIT: get_cluster_type: This > installation of Pacemaker does not support the '(null)' cluster > infrastructure. Terminating. > Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed > /usr/lib/heartbeat/attrd process 3543 exited with return code 100. > Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client > /usr/lib/heartbeat/attrd exited with return code 100. > Nov 24 17:26:19 p heartbeat: [3522]: info: the send queue length from > heartbeat to client ccm is set to 1024 > Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed > /usr/lib/heartbeat/stonithd process 3542 exited with return code 100. > Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client > /usr/lib/heartbeat/stonithd exited with return code 100. > *Nov 24 17:26:19 p cib: [3540]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > /var/lib/heartbeat/crm/cib.xml.sig)* > Nov 24 17:26:19 p cib: [3540]: debug: log_data_element: readCibXmlFile: > [on-disk] validate-with="pacemaker-1.2" cib-last-written="Mon Nov 21 11:09:22 2011" > > ... > > Nov 24 17:26:19 p crmd: [3544]: info: crmd_init: Starting crmd > Nov 24 17:26:19 p crmd: [3544]: debug: s_crmd_fsa: Processing I_STARTUP: [ > state=S_STARTING cause=C_STARTUP origin=crmd_init ] > Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace:// > A_LOG > Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace:// > A_STARTUP > Nov 24 17:26:19 p crmd: [3544]: debug: do_startup: Registering Signal > Handlers > Nov 24 17:26:19 p crmd: [3544]: debug: do_startup: Creating CIB and LRM > objects > Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace:// > A_CIB_START > Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch: > Attempting to talk on: /var/run/crm/cib_rw > Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch: > Could not init comms on: /var/run/crm/cib_rw > Nov 24 17:26:19 p crmd: [3544]: debug: cib_native_signon_raw: Connection > to command channel failed > Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch: > Attempting to talk on: /var/run/crm/cib_callback > Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch
Re: [Pacemaker] 2 node cluster questions
Hi Mark, Thanks for your help! Indeed, you get at race condition... that’s why an external quorum daemon (such as the one supplied with HP’s ServiceGuard) would be nice. It looks like this is what Linux HA is heading for (http://theclusterguy.clusterlabs.org/post/907043024/introducing-the-pacemaker-master-control-process-for, setup number 4) but it’s not there yet. The only way to do it with SBD is to avoid auto startup of a cluster node (e.g. disable corosync and pacemaker init scripts) --> avoids fencing the other after being fenced. Or to use SBD on iSCSI storage... if you have no network connection, you cannot fence and sdb will make watchdog timeout --> the node which lost network connectivity is going to be reset. If network connectivity is still missing at reboot, it cannot fence the other... otherwise it’ll join the cluster. Any flaw here? That is +/- the same as with a quorum server... if it cannot reach the quorum server, the node can startup but should not fence the other (which has quorum). Anyway, I must admit: a quorum server or 3rd node seems safer by design. Rgds, Dirk From: mark - pacemaker list [mailto:m+pacema...@nerdish.us] Sent: Friday, November 25, 2011 8:27 PM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] 2 node cluster questions Hi Dirk, On Fri, Nov 25, 2011 at 6:05 AM, Hellemans Dirk D wrote: Hello everyone, I’ve been reading a lot lately about using Corosync/Openais in combination with Pacemaker: SuSe Linux documentation, Pacemaker & Linux-ha website, interesting blogs, mailinglists, etc. As I’m particularly interested in how well two node clusters (located within the same server room) are handled, I was a bit confused by the fact that quorum disks/ quorum servers are (not yet?) supported/used. Some suggested to add a third node which is not actively participating (e.g. only running corosync or with hearbeat but in standby mode). That might be a solution but doesn’t “feel” right, especially if you consider multiple two-node clusters... that would require a lot of extra “quorum only nodes”. Somehow SBD (storage based death) in combination with a hardware watchdog timer seemed to also provide a solution: run it on top of iSCSI storage and you end up with a fencing device and some sort of “network based quorum” as tiebreaker. If one node loses network connectivity, sbd + watchdog will make sure it’s being fenced. I’d love to hear your ideas about 2 node cluster setups. What is the best way to do it? Any chance we’ll get quorum disks/ quorum servers in the (near) future? Our experience with a two-node SBD-based cluster wasn't good. After setup, we started on failure scenarios. The first test was to drop network connectivity for one of the nodes while both could still access storage. The nodes fenced each other (sort of like a STONITH deathmatch you can read about), killing all services and leaving us waiting for both nodes to boot back up. Obviously, a complete failure of testing, we didn't even proceed with further checks. We took a standard PC and built it out as a third node, giving the cluster true quorum, and now it's rock-solid and absolutely correct in every failure scenario we throw at it. For production use, the very real possibility of two nodes killing each other just wasn't worth the risk to us. If you go with two nodes and SBD, do a lot of testing. No matter how much you test though, if they lose visibility to each other on the network but can both still see the storage, you've got a race where the node that *should* be fenced (the one that has its network cables disconnected) can fence the node that is still 100% healthy and actively serving clients. Maybe there's a way to configure around that, I'd be interested in hearing how if so. Regards, Mark In addition, say you’re not using sbd but an IPMI based fencing solution. You lose network connectivity on one of the nodes (I know, they’re redundant but still...sh*t happens ;) Does Pacemaker know which of both nodes lost network connectivity? E.g.: node 1 runs Oracle database, node 2 nothing. Node 2 loses network connectivity (e.g. both NICs without signal because unplugged by an errant technician ;) )... => split brain situation occurs, but who’ll be fenced? The one with Oracle running ?? I really hope not... cause in this case, the cluster can “see” there’s no signal on the NICs of node2. Would be interesting to know more about how Pacemaker/corosync makes such kind of decisions... how to choose which one will be fenced in case of split brain. Is it randomly chosen? Is it the DC which decides? Based on NIC state? I did some quick testing with 2 VMs and at first, it looks like Pacemaker/corosync always fences the correct node, or: the node where I unplugged the “virtual” cable. I’m curious! Thanks a lot!
Re: [Pacemaker] How to build Pacemaker with Cman support?
Could you show the output of: pacemakerd --features Please make sure that you don't have pcmk file in /corosync/service.d/ Cheers, Nick. 2011/11/30 Богомолов Дмитрий Викторович : > Hello. > 29 ноября 2011, 02:24 от Andrew Beekhof : >> 2011/11/28 Богомолов Дмитрий Викторович : >> > Thanks for your reply! >> > >> > >> > 28 ноября 2011, 03:54 от Andrew Beekhof : >> >> 2011/11/28 Богомолов Дмитрий Викторович : >> >> > Hello. >> >> > Addition. OS - Ubuntu 11.10 >> >> > I have installed libcman-dev, and know in config.log I can see >> >> >> >> I'm pretty sure the builds of pacemaker that come with ubuntu support >> >> cman already. >> > No it's not. >> > I have tried to upgrade from distributives: oneiric-proposed, >> > ppa.launchpad.net/ubuntu-ha, >> > ppa.launchpad.net/ubuntu-ha-maintainers >> > There is no luck. >> > I post about it on ubuntu communiti forum, there is no answer. >> > http://ubuntuforums.org/showthread.php?t=1885340 >> > And i found bug report log without answer >> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639548 >> > >> > That's why i trying now to build pacemaker from sources. >> > >> > I selected ubuntu because of simplicity and oneiric distr because of most >> > recent. >> > >> > I want to get Xen VM on cluster, I have tried active/passive >> > configuration, but it's not exactly what i need. So know i try to get >> > active active configuration. >> >> >> >> > >> >> > configure:16634: checking for cman >> >> > >> >> > configure:16638: result: yes >> >> Ok, but you originally posted: >> >> configure:16634: checking for cman >> >> configure:16638: result: no >> >> So maybe something changed? > Yes. And i wrote about it. > First i tried to build this way: > aptitude build-dep pacemaker > apt-get source pacemaker > ./autogen.sh > ./configure --enable-fatal-warnings=no --with-cman > --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr > ./make > ./make install > this way i get in config.log: > configure:16634: checking for cman > configure:16638: result: no > > then i install libcman-dev, > ./make clean > ./autogen.sh > ./configure --enable-fatal-warnings=no --with-cman > --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr > ./make > ./make install > And now i get > configure:16634: checking for cman > configure:16638: result: yes > > But, when i succesfully start cman, then start pacemaker, which failed to > start, i get: > ERROR: read_config: Corosync configured for CMAN but this build of Pacemaker > doesn't support it > >> >> >> > >> >> > But, after : >> >> > make && make install >> >> > service pacemaker start >> >> > I still get this log event: >> >> > ERROR: read_config: Corosync configured for CMAN but this build of >> >> > Pacemaker >> >> > doesn't support it >> >> > Please, help! >> >> > >> >> > Hello. >> >> > >> >> > I try to configure Active/Active cluster Cman+Pacemaker, that described >> >> > there: >> >> > http://www.clusterlabs.org/doc/en-US..._from_Scratch/ >> >> > I set Cman, but when I start Pacemaker with this command: >> >> > $sudo service pacemaker start >> >> > I get this log event: >> >> > ERROR: read_config: Corosync configured for CMAN but this build of >> >> > Pacemaker >> >> > doesn't support it >> >> > >> >> > Now I try to build Pacemaker with Cman. >> >> > >> >> > I follow instructions there http://www.clusterlabs.org/wiki/Install >> >> > >> >> > only difference for configuring Pacemaker: >> >> > >> >> > ./autogen.sh && ./configure --prefix=$PREFIX --with-lcrso-dir=$LCRSODIR >> >> > -with-cman=yes >> >> > >> >> > But after installing pacemaker, I have the same error. >> >> > >> >> > When I look on config.log, I can see this: >> >> > >> >> > configure:16634: checking for cman >> >> > >> >> > configure:16638: result: no >> >> > >> >> > So, help please, how to build pacemaker with cman support? >> >> > >> >> > ___ >> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > >> >> > Project Home: http://www.clusterlabs.org >> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> > Bugs: http://bugs.clusterlabs.org >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > ___ >> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > >> >> > Project Home: http://www.clusterlabs.org >> >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> > Bugs: http://bugs.clusterlabs.org >> >> > >> >> > >> >> >> > >> > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker ma
Re: [Pacemaker] lrmd hanging
Hi, On Wed, Nov 30, 2011 at 11:16:40AM -0200, coredump wrote: > So last night I was supposed to get a cluster running, everything > worked ok on a virtual environment using the same software and by my > experience I only had to install pacemaker and corosync (from the > ubuntu 10.04 ppa) and get it rolling. What really happened was: I > could use crm configure to set properties to the cluster like resource > stickiness and quorum and disable stonith. When I tried to add > primitives, the crm just hang there, without returning an error or > completing. > I noticed those two entries in the log, everytime crm tries to > configure something the first time: > > Nov 30 05:33:26 server lrmd: [18102]: debug: on_msg_register:client > lrmadmin [18159] registered > Nov 30 05:33:26 server lrmd: [18102]: debug: on_receive_cmd: the IPC > to client [pid:18159] disconnected. > > Also, when I stop corosync it sends a TERM signal for lrmd but it > doesn't exit, even after some minutes, I have to kill -9 it. I tried > to strace lrmd but it's stuck on a FUTEX that really doesn't really > help a lot: > > Process 32764 attached - interrupt to quit > futex(0xe070d8, FUTEX_WAIT_PRIVATE, 2, NULL^C > > Anyone has any idea what would make lrmd to just hang? It's probably support for the ubuntu specific init system. That bug (in glib) has been fixed but I don't know if there are fixed packages. Though cluster-glue apparently doesn't need to be updated, only glib. Best to open a bug report with ubuntu. Thanks, Dejan > []s > core > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
30.11.2011 15:51, Vadim Bulst wrote: > @Vladislav > > Where and how can I set the switch for the cluster manager if it runs as > a resource. Ahm, I use my own RA for clvmd, and don't remember if upstream has that possibility. Please find attached. I can't say it is perfect, it is just a quick hack over accidentally found one, but it does its function for me. Set 'avoid_lck' to 1. Best, Vladislav > > > Am 30.11.2011 13:10, schrieb Vadim Bulst: >> Am 30.11.2011 12:22, schrieb Vladislav Bogdanov: >>> 30.11.2011 14:08, Vadim Bulst wrote: Hello, first of all I'd like to ask you a general question: Does somebody successfully set up a clvm cluster with pacemaker and run it in productive mode? >>> I will say yes after I finally resolve remaining dlm&fencing issues. >>> Now back to the concrete problem: I configured two interfaces for corosync: root@bbzclnode04:~# corosync-cfgtool -s Printing ring status. Local node ID 897624256 RING ID 0 id= 192.168.128.53 status= ring 0 active with no faults RING ID 1 id= 192.168.129.23 status= ring 1 active with no faults RRD set to passive I also made some changes to my cib: node bbzclnode04 node bbzclnode06 node bbzclnode07 primitive clvm ocf:lvm2:clvmd \ params daemon_timeout="30" \ meta target-role="Started" >>> Please instruct clvmd to use corosync stack instead of openais (-I >>> corosync): otherwise it uses LCK service which is not mature and I >>> observed major problems with it. >>> primitive dlm ocf:pacemaker:controld \ meta target-role="Started" group dlm-clvm dlm clvm clone dlm-clvm-clone dlm-clvm \ meta interleave="true" ordered="true" property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="openais" \ expected-quorum-votes="3" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ last-lrm-refresh="1322643084" I cleaned and restarted the resources - nothing! : crm(live)resource# cleanup dlm-clvm-clone Cleaning up dlm:0 on bbzclnode04 Cleaning up dlm:0 on bbzclnode06 Cleaning up dlm:0 on bbzclnode07 Cleaning up clvm:0 on bbzclnode04 Cleaning up clvm:0 on bbzclnode06 Cleaning up clvm:0 on bbzclnode07 Cleaning up dlm:1 on bbzclnode04 Cleaning up dlm:1 on bbzclnode06 Cleaning up dlm:1 on bbzclnode07 Cleaning up clvm:1 on bbzclnode04 Cleaning up clvm:1 on bbzclnode06 Cleaning up clvm:1 on bbzclnode07 Cleaning up dlm:2 on bbzclnode04 Cleaning up dlm:2 on bbzclnode06 Cleaning up dlm:2 on bbzclnode07 Cleaning up clvm:2 on bbzclnode04 Cleaning up clvm:2 on bbzclnode06 Cleaning up clvm:2 on bbzclnode07 Waiting for 19 replies from the CRMd... OK crm_mon: Last updated: Wed Nov 30 10:15:09 2011 Stack: openais Current DC: bbzclnode04 - partition with quorum Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 3 Nodes configured, 3 expected votes 1 Resources configured. Online: [ bbzclnode04 bbzclnode06 bbzclnode07 ] Failed actions: clvm:1_start_0 (node=bbzclnode06, call=11, rc=1, status=complete): unknown error clvm:0_start_0 (node=bbzclnode04, call=11, rc=1, status=complete): unknown error clvm:2_start_0 (node=bbzclnode07, call=11, rc=1, status=complete): unknown error When I look in the log - there is a message which tells me that may be another clvm process is already running - but it isn't so. "clvmd could not create local socket Another clvmd is probably already running" Or is it a permission problem - writing to the filesystem? Is there a way to get rid of it? >>> You can try to run it manually under strace. It will show you what >>> happens. >> >> >> Here we go: >> >> root@bbzclnode07:~# strace clvmd -d -I cororsync >> execve("/usr/sbin/clvmd", ["clvmd", "-d", "-I", "cororsync"], [/* 18 >> vars */]) = 0 >> brk(0) = 0x12f7000 >> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or >> directory) >> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, >> 0) = 0x7f9f09dad000 >> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or >> directory) >> open("/etc/ld.so.cache", O_RDONLY) = 3 >> fstat(3, {st_mode=S_IFREG|0644, st_size=25864, ...}) = 0 >> mmap(NULL, 25864, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f9f09da6000 >> close(3)= 0 >> access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or >> directory) >> open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY)
Re: [Pacemaker] lrmd hanging
Hey Dejan, 2 questions: 1) My test environment was virtual, but using the same versions than the server, and it worked. 2) Can you point me to a bug report about this Glib bug? On Wed, Nov 30, 2011 at 12:30, Dejan Muhamedagic wrote: > Hi, > > On Wed, Nov 30, 2011 at 11:16:40AM -0200, coredump wrote: >> So last night I was supposed to get a cluster running, everything >> worked ok on a virtual environment using the same software and by my >> experience I only had to install pacemaker and corosync (from the >> ubuntu 10.04 ppa) and get it rolling. What really happened was: I >> could use crm configure to set properties to the cluster like resource >> stickiness and quorum and disable stonith. When I tried to add >> primitives, the crm just hang there, without returning an error or >> completing. >> I noticed those two entries in the log, everytime crm tries to >> configure something the first time: >> >> Nov 30 05:33:26 server lrmd: [18102]: debug: on_msg_register:client >> lrmadmin [18159] registered >> Nov 30 05:33:26 server lrmd: [18102]: debug: on_receive_cmd: the IPC >> to client [pid:18159] disconnected. >> >> Also, when I stop corosync it sends a TERM signal for lrmd but it >> doesn't exit, even after some minutes, I have to kill -9 it. I tried >> to strace lrmd but it's stuck on a FUTEX that really doesn't really >> help a lot: >> >> Process 32764 attached - interrupt to quit >> futex(0xe070d8, FUTEX_WAIT_PRIVATE, 2, NULL^C >> >> Anyone has any idea what would make lrmd to just hang? > > It's probably support for the ubuntu specific init system. That > bug (in glib) has been fixed but I don't know if there are fixed > packages. Though cluster-glue apparently doesn't need to be > updated, only glib. Best to open a bug report with ubuntu. > > Thanks, > > Dejan > >> []s >> core >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] lrmd hanging
On 30.11.2011 14:16, coredump wrote: > So last night I was supposed to get a cluster running, everything > worked ok on a virtual environment using the same software and by my > experience I only had to install pacemaker and corosync (from the > ubuntu 10.04 ppa) and get it rolling. What really happened was: I Which PPA are you using? This one: https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa/ has everything you need for 10.04, including glib and rhcs fixes. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
On 30.11.2011 13:10, Vadim Bulst wrote: > I created now the directory "/var/run/lvm" . It wasn't there - work for > the package maintainer. Hm... That directory is used for file based locking. clvmd shouldn't be using that. Did you set up cluster locking in /etc/lvm/lvm.conf (locking_type)? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] colocation issue with master-slave resources
Sent: Mon Nov 28 2011 16:10:01 GMT-0700 (MST) From: Patrick H. To: The Pacemaker cluster resource manager Andreas Kurz Subject: Re: [Pacemaker] colocation issue with master-slave resources Sent: Mon Nov 28 2011 15:27:10 GMT-0700 (MST) From: Andrew Beekhof To: The Pacemaker cluster resource manager Andreas Kurz Subject: Re: [Pacemaker] colocation issue with master-slave resources Perhaps try and ordering constraint, I may have also fixed something in this area for 1.1.6 so an upgrade might also help On Tue, Nov 29, 2011 at 1:38 AM, Patrick H. wrote: Sent: Mon Nov 28 2011 01:31:22 GMT-0700 (MST) From: Andreas Kurz To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] colocation issue with master-slave resources On 11/28/2011 04:51 AM, Patrick H. wrote: I'm trying to setup a colocation rule so that a couple of master-slave resources cant be master unless another resource is running on the same node, and am getting the exact opposite of what I want. The master-slave resources are getting promoted to master on the node which this other resource isnt running on. In the below example, 'stateful1:Master' and 'stateful2:Master' should be on the same node 'dummy' is on. It works just fine if I change the colocation around so that 'dummy' depends on the stateful resources being master, but I dont want that. I want dummy to be able to run no matter what, but the stateful resources not be able to become master without dummy. # crm status Last updated: Mon Nov 28 03:47:04 2011 Stack: cman Current DC: devlvs03 - partition with quorum Version: 1.1.5-5.el6-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 6 Resources configured. Online: [ devlvs04 devlvs03 ] dummy(ocf::pacemaker:Dummy):Started devlvs03 Master/Slave Set: stateful1-ms [stateful1] Masters: [ devlvs04 ] Slaves: [ devlvs03 ] Master/Slave Set: stateful2-ms [stateful2] Masters: [ devlvs04 ] Slaves: [ devlvs03 ] # crm configure show node devlvs03 \ attributes standby="off" node devlvs04 \ attributes standby="off" primitive dummy ocf:pacemaker:Dummy \ meta target-role="Started" primitive stateful1 ocf:pacemaker:Stateful primitive stateful2 ocf:pacemaker:Stateful ms stateful1-ms stateful1 ms stateful2-ms stateful2 colocation stateful1-colocation inf: stateful1-ms:Master dummy colocation stateful2-colocation inf: stateful2-ms:Master dummy use dummy:Started ... default is to use same role as left resource, and Dummy will never be in role Master ... Regards, Andreas Tried that too (just not the configuration at the time I sent the email), no effect. Upgraded to 1.1.6 and put in an ordering constraint, still no joy. # crm status Last updated: Mon Nov 28 23:09:37 2011 Last change: Mon Nov 28 23:08:34 2011 via cibadmin on devlvs03 Stack: cman Current DC: devlvs03 - partition with quorum Version: 1.1.6-1.el6-b379478e0a66af52708f56d0302f50b6f13322bd 2 Nodes configured, 2 expected votes 5 Resources configured. Online: [ devlvs04 devlvs03 ] dummy(ocf::pacemaker:Dummy):Started devlvs03 Master/Slave Set: stateful1-ms [stateful1] Masters: [ devlvs04 ] Slaves: [ devlvs03 ] Master/Slave Set: stateful2-ms [stateful2] Masters: [ devlvs04 ] Slaves: [ devlvs03 ] # crm configure show node devlvs03 \ attributes standby="off" node devlvs04 \ attributes standby="off" primitive dummy ocf:pacemaker:Dummy \ meta target-role="Started" primitive stateful1 ocf:pacemaker:Stateful primitive stateful2 ocf:pacemaker:Stateful ms stateful1-ms stateful1 ms stateful2-ms stateful2 colocation stateful1-colocation inf: stateful1-ms:Master dummy:Started colocation stateful2-colocation inf: stateful2-ms:Master dummy:Started order stateful1-start inf: dummy:start stateful1-ms:promote order stateful2-start inf: dummy:start stateful2-ms:promote property $id="cib-bootstrap-options" \ dc-version="1.1.6-1.el6-b379478e0a66af52708f56d0302f50b6f13322bd" \ cluster-infrastructure="cman" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1322450542" Well there is a really ugly workaround that solves this. If I convert 'dummy' to a master-slave resource, and just have the slave do nothing. It does obey the colocation rule when I tell it to keep the Master roles on the same box. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CLVM & Pacemaker & Corosync on Ubuntu Omeiric Server
30.11.2011 19:27, Ante Karamatic wrote: > On 30.11.2011 13:10, Vadim Bulst wrote: > >> I created now the directory "/var/run/lvm" . It wasn't there - work for >> the package maintainer. > > Hm... That directory is used for file based locking. clvmd shouldn't be > using that. Did you set up cluster locking in /etc/lvm/lvm.conf > (locking_type)? bind(3, {sa_family=AF_FILE, path="/var/run/lvm/clvmd.sock"}, 110) It tries to create unix socket there. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] lrmd hanging
It turned out to be the libglib bug, fixed with the packages from the ppa. Thanks! On Wed, Nov 30, 2011 at 14:13, Ante Karamatic wrote: > On 30.11.2011 14:16, coredump wrote: > >> So last night I was supposed to get a cluster running, everything >> worked ok on a virtual environment using the same software and by my >> experience I only had to install pacemaker and corosync (from the >> ubuntu 10.04 ppa) and get it rolling. What really happened was: I > > Which PPA are you using? This one: > > https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa/ > > has everything you need for 10.04, including glib and rhcs fixes. > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] [PATCH]Build error of pacemaker-1.0.12
Hi I overlooked the following errors in build of pacemaker-1.0.12. -- cc1: warnings being treated as errors remote.c: In function 'create_tls_session': remote.c:85: warning: passing argument 1 of 'gnutls_dh_set_prime_bits' from incompatible pointer type gmake[2]: *** [remote.lo] Error 1 -- I send the patch for the error mentioned above. Regards, Tomo remote.c.patch Description: Binary data ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to build Pacemaker with Cman support?
I Solved the problem. Need to install libfense-dev. After that ./configure script includes needed directives in Makefile. And now i am doing experiments with active/active configuration. Thanks, anyway, for your attention and advises. 30 ноября 2011, 13:37 от Богомолов Дмитрий Викторович : > Hello. > 29 ноября 2011, 02:24 от Andrew Beekhof : > > 2011/11/28 Богомолов Дмитрий Викторович : > > > Thanks for your reply! > > > > > > > > > 28 ноября 2011, 03:54 от Andrew Beekhof : > > >> 2011/11/28 Богомолов Дмитрий Викторович : > > >> > Hello. > > >> > Addition. OS - Ubuntu 11.10 > > >> > I have installed libcman-dev, and know in config.log I can see > > >> > > >> I'm pretty sure the builds of pacemaker that come with ubuntu support > > >> cman already. > > > No it's not. > > > I have tried to upgrade from distributives: oneiric-proposed, > > > ppa.launchpad.net/ubuntu-ha, > > > ppa.launchpad.net/ubuntu-ha-maintainers > > > There is no luck. > > > I post about it on ubuntu communiti forum, there is no answer. > > > http://ubuntuforums.org/showthread.php?t=1885340 > > > And i found bug report log without answer > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=639548 > > > > > > That's why i trying now to build pacemaker from sources. > > > > > > I selected ubuntu because of simplicity and oneiric distr because of most > > > recent. > > > > > > I want to get Xen VM on cluster, I have tried active/passive > > > configuration, but it's not exactly what i need. So know i try to get > > > active active configuration. > > >> > > >> > > > >> > configure:16634: checking for cman > > >> > > > >> > configure:16638: result: yes > > > > Ok, but you originally posted: > > > > configure:16634: checking for cman > > > > configure:16638: result: no > > > > So maybe something changed? > Yes. And i wrote about it. > First i tried to build this way: > aptitude build-dep pacemaker > apt-get source pacemaker > ./autogen.sh > ./configure --enable-fatal-warnings=no --with-cman > --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr > ./make > ./make install > this way i get in config.log: > configure:16634: checking for cman > configure:16638: result: no > > then i install libcman-dev, > ./make clean > ./autogen.sh > ./configure --enable-fatal-warnings=no --with-cman > --with-lcrso-dir=/usr/libexec/lcrso --prefix=/usr > ./make > ./make install > And now i get > configure:16634: checking for cman > configure:16638: result: yes > > But, when i succesfully start cman, then start pacemaker, which failed to > start, i get: > ERROR: read_config: Corosync configured for CMAN but this build of Pacemaker > doesn't support it > > > > > >> > > > >> > But, after : > > >> > make && make install > > >> > service pacemaker start > > >> > I still get this log event: > > >> > ERROR: read_config: Corosync configured for CMAN but this build of > > >> > Pacemaker > > >> > doesn't support it > > >> > Please, help! > > >> > > > >> > Hello. > > >> > > > >> > I try to configure Active/Active cluster Cman+Pacemaker, that described > > >> > there: > > >> > http://www.clusterlabs.org/doc/en-US..._from_Scratch/ > > >> > I set Cman, but when I start Pacemaker with this command: > > >> > $sudo service pacemaker start > > >> > I get this log event: > > >> > ERROR: read_config: Corosync configured for CMAN but this build of > > >> > Pacemaker > > >> > doesn't support it > > >> > > > >> > Now I try to build Pacemaker with Cman. > > >> > > > >> > I follow instructions there http://www.clusterlabs.org/wiki/Install > > >> > > > >> > only difference for configuring Pacemaker: > > >> > > > >> > ./autogen.sh && ./configure --prefix=$PREFIX --with-lcrso-dir=$LCRSODIR > > >> > -with-cman=yes > > >> > > > >> > But after installing pacemaker, I have the same error. > > >> > > > >> > When I look on config.log, I can see this: > > >> > > > >> > configure:16634: checking for cman > > >> > > > >> > configure:16638: result: no > > >> > > > >> > So, help please, how to build pacemaker with cman support? > > >> > > > >> > ___ > > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > >> > > > >> > Project Home: http://www.clusterlabs.org > > >> > Getting started: > > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> > Bugs: http://bugs.clusterlabs.org > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > ___ > > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > >> > > > >> > Project Home: http://www.clusterlabs.org > > >> > Getting started: > > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > >> > Bugs: http://bugs.clusterlabs.org > > >> > > > >> > > > >> > > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mai