Re: [Pacemaker] [Openais] Linux HA on debian sparc
I recompile my kernel without hugetlb .. and the result are the same.. My test program still resulting: PATH=/dev/shm/teste123XX page size=2 fd=3 ADDR_ORIG:0xe000a000 ADDR:0x Erro And Pacemaker still resulting because the mmap error: Could not initialize Cluster Configuration Database API instance error 2 For make sure that i have disable the hugetlb there is my /proc/meminfo: MemTotal: 33093488 kB MemFree:32855616 kB Buffers:5600 kB Cached:53480 kB SwapCached:0 kB Active:45768 kB Inactive: 28104 kB Active(anon): 18024 kB Inactive(anon): 1560 kB Active(file): 27744 kB Inactive(file):26544 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6104680 kB SwapFree:6104680 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 14936 kB Mapped: 7736 kB Shmem: 4624 kB Slab: 39184 kB SReclaimable: 10088 kB SUnreclaim:29096 kB KernelStack:7088 kB PageTables: 1160 kB Quicklists:17664 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:22651424 kB Committed_AS: 519368 kB VmallocTotal: 1069547520 kB VmallocUsed: 11064 kB VmallocChunk: 1069529616 kB 2011/6/1 Steven Dake sd...@redhat.com: On 06/01/2011 07:42 AM, william felipe_welter wrote: Steven, cat /proc/meminfo ... HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB ... It definitely requires a kernel compile and setting the config option to off. I don't know the debian way of doing this. The only reason you may need this option is if you have very large memory sizes, such as 48GB or more. Regards -steve Its 4MB.. How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to kernel ?) 2011/6/1 Steven Dake sd...@redhat.com mailto:sd...@redhat.com On 06/01/2011 01:05 AM, Steven Dake wrote: On 05/31/2011 09:44 PM, Angus Salkeld wrote: On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: Angus, I make some test program (based on the code coreipcc.c) and i now i sure that are problems with the mmap systems call on sparc.. Source code of my test program: #include stdlib.h #include sys/mman.h #include stdio.h #define PATH_MAX 36 int main() { int32_t fd; void *addr_orig; void *addr; char path[PATH_MAX]; const char *file = teste123XX; size_t bytes=10024; snprintf (path, PATH_MAX, /dev/shm/%s, file); printf(PATH=%s\n,path); fd = mkstemp (path); printf(fd=%d \n,fd); addr_orig = mmap (NULL, bytes, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); printf(ADDR_ORIG:%p ADDR:%p\n,addr_orig,addr); if (addr != addr_orig) { printf(Erro); } } Results on x86: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 Results on sparc: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0xf7f72000 ADDR:0x Note: 0x == MAP_FAILED (from man mmap) RETURN VALUE On success, mmap() returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. But im wondering if is really needed to call mmap 2 times ? What are the reason to call the mmap 2 times, on the second time using the address of the first? Well there are 3 calls to mmap() 1) one to allocate 2 * what you need (in pages) 2) maps the first half of the mem to a real file 3) maps the second half of the mem to the same file The point is when you write to an address over the end of the first half of memory it is taken care of the the third mmap which maps the address back to the top of the file for you. This means you don't have to worry about ringbuffer wrapping which can be a headache. -Angus interesting this mmap operation doesn't work on sparc linux. Not sure how I can help here - Next step would be a follow up with the sparc linux mailing list. I'll do that and cc you on the message - see if we get any response. http://vger.kernel.org/vger-lists.html 2011/5/31 Angus Salkeld asalk...@redhat.com
Re: [Pacemaker] A question and demand to a resource placement strategy function
On 06/01/11 18:51, Yuusuke IIDA wrote: Hi, Yan An answer becomes slow, and really I'm sorry. (2011/05/13 15:06), Gao,Yan wrote: I understand that you think the improvement for the non-default placement strategy makes sense to the default too. Though the default is somewhat intended not to be affected by any placement strategy so that the behaviors of existing pengine test cases and users' deployments remain unchanged. I think that a function dispersed with the number of the start of the resource has a problem at the time of default setting. This problem is the Pacemaker-1.0 series, but does the same movement. If it could be settled by this correction, I thought a correction to be applicable in Pacemaker-1.0. Should not this problem be revised? This would affect dozens of existing regression tests, although most of the changes are just the scores of clone instances, which are due to different resource allocating orders. Given 1.0 is in such a maintenance state, I'm not sure we should do that for 1.0. Andrew, what do you think about it? Perhaps we should fix the resource-number-balancing for default strategy in 1.1 at least? For utilization strategy, load-balancing is still done based on the number of resources allocated to a node. That might be a choice. When I do not set capacity by utilization setting in Pacemaker-1.1 , expected movement is possible! Best Regards, Yuusuke IIDA Regards, Yan -- Gao,Yan y...@novell.com Software Engineer China Server Team, SUSE. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Make resources more important than others
Hi list, I have a conceptual question about making Pacemaker treat some resources a lot more important than others. I'm afraid the answer is going to come back sorry you can't, but I just want to confirm. I have a simple 2 node cluster that runs two services (MySQL and a Java daemon) on top of network attached storage that can float between both nodes. CRM config looks a bit like: node node1 node node2 primitive lsb:mysql mysql primitive lsb:java java primitve lsb:storage storage colocation mysql_with_storage inf: mysql storage colocation java_with_storage inf: java storage order mysql_after_storage: storage:start mysql:start symmetrical=true order java_after_storage: storage:start java:start symmetrical=true I've got the interesting requirement that mysql is vastly more important than the java resource. The java has to run on the same server as mysql. Also, if mysql is stopped/unmanaged, the java should still be running on what would be the correct node for mysql. I've mostly achieved this by colocating the java and mysql to the underlying storage. It gets tricky when we start simulating hard failures. If we simulate a hard error with mysql, the storage moves to another node, then mysql and java, and everything's great. What I don't want is the opposite to occur. I don't want any hard error with java to make the mysql move, as I consider the mysql more important than java and I don't want to outage mysql if the java has a problem. I'd love for the mysql to stay running where it is and have an administrator come along and cleanup what's wrong with the java. I guess I want to somehow describe the java resource as not as important as mysql, and this is what I don't think is possible. I know I'm talking about very edge cases by simulating hard errors (eg: monitor return code 5), but it'd be nice to achieve. Any thoughts? Thanks in advance, -Luke -- Luke Bigum Information Systems luke.bi...@lmax.com | http://www.lmax.com LMAX, Yellow Building, 1A Nicholas Road, London W11 4AN The information in this e-mail and any attachment is confidential and is intended only for the named recipient(s). The e-mail may not be disclosed or used by any person other than the addressee, nor may it be copied in any way. If you are not a named recipient please notify the sender immediately and delete any copies of this message. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Any view or opinions presented are solely those of the author and do not necessarily represent those of the company. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] which version of pacemaker prefer to use
Hello I have one question which pacemaker version prefer to use 1.1 or 1.0 1.0 is marked as stable, but all documentation resources refer to version 1.1. I'm little bit confusion ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Could not connect to the CIB: Remote node did not respond
Hello After network problem I can't edit my CIB configuration Cluster with 2 nodes. On node: crm(live)configure# commit Could not connect to the CIB: Remote node did not respond ERROR: creating tmp shadow __crmshell.19381 failed cibadmin -Q Call cib_query failed (-41): Remote node did not respond null cibadmin -Ql works All cluster resources works fine. crm_mon work on both nodes Current DC indicates on node. How to fix this? -- Viacheslav Biriukov BR http://biriukov.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] multiple cib in one pacemaker installation
Hello Is it possible have multiple cib in one cluster installation. I try to describe what i want: For example we have multiple undependable resources (as resource i mean here ip address files system + service (like apache, nginx and etc)). Now as they placed in one cib configuration (crm configure show) is very big and is difficultly read(many text). And i think that if we separate this undependable resources in multiple cib read and edit configuration becomes much more convenient (something like shadow cib, but live not sandbox). If this is not implemented it would be great if this be applied in future versions. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Failover when storage fails
Just to update the list with the outcome of this issue, it's resolved in Pacemaker 1.1.5. Cheers, Max -Original Message- From: Max Williams [mailto:max.willi...@betfair.com] Sent: 13 May 2011 09:55 To: The Pacemaker cluster resource manager (pacemaker@oss.clusterlabs.org) Subject: Re: [Pacemaker] Failover when storage fails Well this is not what I am seeing here. Perhaps a bug? I also tried adding op stop interval=0 timeout=10 to the LVM resources but still when the storage disappears the cluster just stops where it is and those log entries (below) just get printed in a loop. Cheers, Max -Original Message- From: Tim Serong [mailto:tser...@novell.com] Sent: 13 May 2011 04:22 To: The Pacemaker cluster resource manager (pacemaker@oss.clusterlabs.org) Subject: Re: [Pacemaker] Failover when storage fails On 5/12/2011 at 02:28 AM, Max Williams max.willi...@betfair.com wrote: After further testing even with stonith enabled the cluster still gets stuck in this state, presumably waiting on IO. I can get around it by setting on-fail=fence on the LVM resources but shouldn't Pacemaker be smart enough to realise the host is effectively offline? If you've got STONITH enabled, nodes should just get fenced when this occurs, without your having to specify on-fail=fence for the monitor op. What *should* happen is, the monitor fails or times out, then pacemaker will try to stop the resource. If the stop also fails or times out, the node will be fenced. See: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-operations.html Also, http://ourobengr.com/ha#causes is relevant here. Regards, Tim Or am I missing some timeout value that would fix this situation? pacemaker-1.1.2-7.el6.x86_64 corosync-1.2.3-21.el6.x86_64 RHEL 6.0 Config: node host001.domain \ attributes standby=off node host002.domain \ attributes standby=off primitive MyApp_IP ocf:heartbeat:IPaddr \ params ip=192.168.104.26 \ op monitor interval=10s primitive MyApp_fs_graph ocf:heartbeat:Filesystem \ params device=/dev/VolGroupB00/AppLV2 directory=/naab1 fstype=ext4 \ op monitor interval=10 timeout=10 primitive MyApp_fs_landing ocf:heartbeat:Filesystem \ params device=/dev/VolGroupB01/AppLV1 directory=/naab2 fstype=ext4 \ op monitor interval=10 timeout=10 primitive MyApp_lvm_graph ocf:heartbeat:LVM \ params volgrpname=VolGroupB00 exclusive=yes \ op monitor interval=10 timeout=10 on-fail=fence depth=0 primitive MyApp_lvm_landing ocf:heartbeat:LVM \ params volgrpname=VolGroupB01 exclusive=yes \ op monitor interval=10 timeout=10 on-fail=fence depth=0 primitive MyApp_scsi_reservation ocf:heartbeat:sg_persist \ params sg_persist_resource=scsi_reservation0 devs=/dev/dm-6 /dev/dm-7 required_devs_nof=2 reservation_type=1 primitive MyApp_init_script lsb:AppInitScript \ op monitor interval=10 timeout=10 primitive fence_host001.domain stonith:fence_ipmilan \ params ipaddr=192.168.16.148 passwd=password login=root pcmk_host_list=host001.domain pcmk_host_check=static-list \ meta target-role=Started primitive fence_host002.domain stonith:fence_ipmilan \ params ipaddr=192.168.16.149 passwd=password login=root pcmk_host_list=host002.domain pcmk_host_check=static-list \ meta target-role=Started group MyApp_group MyApp_lvm_graph MyApp_lvm_landing MyApp_fs_graph MyApp_fs_landing MyApp_IP MyApp_init_script \ meta target-role=Started migration-threshold=2 on-fail=restart failure-timeout=300s ms ms_MyApp_scsi_reservation MyApp_scsi_reservation \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true colocation MyApp_group_on_scsi_reservation inf: MyApp_group ms_MyApp_scsi_reservation:Master order MyApp_group_after_scsi_reservation inf: ms_MyApp_scsi_reservation:promote MyApp_group:start property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=true \ last-lrm-refresh=1305129673 rsc_defaults $id=rsc-options \ resource-stickiness=1 From: Max Williams [mailto:max.willi...@betfair.com] Sent: 11 May 2011 13:55 To: The Pacemaker cluster resource manager (pacemaker@oss.clusterlabs.org) Subject: [Pacemaker] Failover when storage fails Hi, I want to configure pacemaker to failover a group of resources and sg_persist (master/slave) when there is a problem with the storage but when I cause the iSCSI LUN to disappear simulating a failure, the cluster always gets stuck in this state: Last updated: Wed May 11 10:52:43 2011 Stack: openais Current DC: host001.domain - partition with quorum
Re: [Pacemaker] [Openais] Linux HA on debian sparc
On 06/01/2011 11:05 PM, william felipe_welter wrote: I recompile my kernel without hugetlb .. and the result are the same.. My test program still resulting: PATH=/dev/shm/teste123XX page size=2 fd=3 ADDR_ORIG:0xe000a000 ADDR:0x Erro And Pacemaker still resulting because the mmap error: Could not initialize Cluster Configuration Database API instance error 2 Give the patch I posted recently a spin - corosync WFM with this patch on sparc64 with hugetlb set. Please report back results. Regards -steve For make sure that i have disable the hugetlb there is my /proc/meminfo: MemTotal: 33093488 kB MemFree:32855616 kB Buffers:5600 kB Cached:53480 kB SwapCached:0 kB Active:45768 kB Inactive: 28104 kB Active(anon): 18024 kB Inactive(anon): 1560 kB Active(file): 27744 kB Inactive(file):26544 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6104680 kB SwapFree:6104680 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 14936 kB Mapped: 7736 kB Shmem: 4624 kB Slab: 39184 kB SReclaimable: 10088 kB SUnreclaim:29096 kB KernelStack:7088 kB PageTables: 1160 kB Quicklists:17664 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:22651424 kB Committed_AS: 519368 kB VmallocTotal: 1069547520 kB VmallocUsed: 11064 kB VmallocChunk: 1069529616 kB 2011/6/1 Steven Dake sd...@redhat.com: On 06/01/2011 07:42 AM, william felipe_welter wrote: Steven, cat /proc/meminfo ... HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 4096 kB ... It definitely requires a kernel compile and setting the config option to off. I don't know the debian way of doing this. The only reason you may need this option is if you have very large memory sizes, such as 48GB or more. Regards -steve Its 4MB.. How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to kernel ?) 2011/6/1 Steven Dake sd...@redhat.com mailto:sd...@redhat.com On 06/01/2011 01:05 AM, Steven Dake wrote: On 05/31/2011 09:44 PM, Angus Salkeld wrote: On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: Angus, I make some test program (based on the code coreipcc.c) and i now i sure that are problems with the mmap systems call on sparc.. Source code of my test program: #include stdlib.h #include sys/mman.h #include stdio.h #define PATH_MAX 36 int main() { int32_t fd; void *addr_orig; void *addr; char path[PATH_MAX]; const char *file = teste123XX; size_t bytes=10024; snprintf (path, PATH_MAX, /dev/shm/%s, file); printf(PATH=%s\n,path); fd = mkstemp (path); printf(fd=%d \n,fd); addr_orig = mmap (NULL, bytes, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0); printf(ADDR_ORIG:%p ADDR:%p\n,addr_orig,addr); if (addr != addr_orig) { printf(Erro); } } Results on x86: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 Results on sparc: PATH=/dev/shm/teste123XX fd=3 ADDR_ORIG:0xf7f72000 ADDR:0x Note: 0x == MAP_FAILED (from man mmap) RETURN VALUE On success, mmap() returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set appropriately. But im wondering if is really needed to call mmap 2 times ? What are the reason to call the mmap 2 times, on the second time using the address of the first? Well there are 3 calls to mmap() 1) one to allocate 2 * what you need (in pages) 2) maps the first half of the mem to a real file 3) maps the second half of the mem to the same file The point is when you write to an address over the end of the first half of memory it is taken care of the the third mmap which maps the address back to the top of the file for you. This means you don't have to worry about ringbuffer wrapping which can be a headache. -Angus interesting this mmap operation doesn't work on sparc linux. Not sure how I can help here - Next step would be a follow up with the sparc linux
Re: [Pacemaker] [Openais] Linux HA on debian sparc
Well, Now with this patch, the pacemakerd process starts and up his other process ( crmd, lrmd, pengine) but after the process pacemakerd do a fork, the forked process pacemakerd dies due to signal 10, Bus error.. And on the log, the process of pacemark ( crmd, lrmd, pengine) cant connect to open ais plugin (possible because the death of the pacemakerd process). But this time when the forked pacemakerd dies, he generates a coredump. gdb -c /usr/var/lib/heartbeat/cores/root/ pacemakerd 7986 -se /usr/sbin/pacemakerd : GNU gdb (GDB) 7.0.1-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as sparc-linux-gnu. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/sbin/pacemakerd...done. Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libuuid.so.1 Reading symbols from /usr/lib/libcoroipcc.so.4...done. Loaded symbols for /usr/lib/libcoroipcc.so.4 Reading symbols from /usr/lib/libcpg.so.4...done. Loaded symbols for /usr/lib/libcpg.so.4 Reading symbols from /usr/lib/libquorum.so.4...done. Loaded symbols for /usr/lib/libquorum.so.4 Reading symbols from /usr/lib64/libcrmcommon.so.2...done. Loaded symbols for /usr/lib64/libcrmcommon.so.2 Reading symbols from /usr/lib/libcfg.so.4...done. Loaded symbols for /usr/lib/libcfg.so.4 Reading symbols from /usr/lib/libconfdb.so.4...done. Loaded symbols for /usr/lib/libconfdb.so.4 Reading symbols from /usr/lib64/libplumb.so.2...done. Loaded symbols for /usr/lib64/libplumb.so.2 Reading symbols from /usr/lib64/libpils.so.2...done. Loaded symbols for /usr/lib64/libpils.so.2 Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done. Loaded symbols for /lib/libbz2.so.1.0 Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libxslt.so.1 Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libxml2.so.2 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libglib-2.0.so.0 Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libltdl.so.7 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done. Loaded symbols for /lib/libpcre.so.3 Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_compat.so.2 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Core was generated by `pacemakerd'. Program terminated with signal 10, Bus error. #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339 339 switch (dispatch_data-id) { (gdb) bt #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at cpg.c:339 #1 0xf6f100f0 in ?? () #2 0xf6f100f4 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) I take a look at the cpg.c and see that the dispatch_data was aquired by coroipcc_dispatch_get (that was defined on lib/coroipcc.c) function: do { error = coroipcc_dispatch_get ( cpg_inst-handle, (void **)dispatch_data, timeout); Resumed log: ... un 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10 Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10 to pending