Re: [Pacemaker] help deciphering output
I have seen this behavior on several virtualsed environments. when vm backup starts, the VM actually freezes for a (short?) Period of time.I guess it then no more responding to the other cluster nodes thus triggering unexpected fail over and/or fencing.I have this kind of behavior on VMware env using veam backup, as well promox (+ u don't what backup tool) That's actually an interesting topic I never though about rising here. How can we avoid that? Increasing timeout? I am afraid we would have to reach unacceptable high timeout values and am not even sure that would fix the pb. I think not all VM snapshots strategy would trigger that PV, do you guys have any feedback to provide on the backup/snapshot method best suits corosync clusters? Regards Le 9 oct. 2014 01:24, Alex Samad - Yieldbroker alex.sa...@yieldbroker.com a écrit : One of my nodes died in a 2 node cluster I gather something went wrong, and it fenced/killed itself. But I am not sure what happened. I think maybe around that time the VM backups happened and snap of the VM could have happened But there is nothing for me to put my finger on Output from messages around that time This is on devrp1 Oct 8 23:31:38 devrp1 corosync[1670]: [TOTEM ] A processor failed, forming new configuration. Oct 8 23:31:40 devrp1 corosync[1670]: [CMAN ] quorum lost, blocking activity Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] Members[1]: 1 Oct 8 23:31:40 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:40 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:2 left:1) Oct 8 23:31:40 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:41 devrp1 kernel: dlm: closing connection to node 2 Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 424: quorum lost Oct 8 23:31:42 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:42 devrp1 corosync[1670]: [CMAN ] quorum regained, resuming activity Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] This node is within the primary component and will provide service. Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:1 left:0) Oct 8 23:31:42 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now lost (was member) Oct 8 23:31:42 devrp1 crmd[2350]: warning: reap_dead_nodes: Our DC node (devrp2) left the cluster Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 428: quorum acquired Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now member (was lost) Oct 8 23:31:42 devrp1 crmd[2350]: notice: do_state_transition: State transition S_NOT_DC - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ] Oct 8 23:31:42 devrp1 corosync[1670]: cman killed by node 2 because we were killed by cman_tool or other application Oct 8 23:31:42 devrp1 pacemakerd[2339]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 stonith-ng[2346]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]:error: crmd_cs_destroy: connection terminated Oct 8 23:31:43 devrp1 fenced[1726]: cluster is down, exiting Oct 8 23:31:43 devrp1 fenced[1726]: daemon cpg_dispatch error 2 Oct 8 23:31:43 devrp1 attrd[2348]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:43 devrp1 attrd[2348]: crit: attrd_cs_destroy: Lost connection to Corosync service! Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Exiting... Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Disconnecting client 0x18cf240, pid=2350... Oct 8 23:31:43 devrp1 pacemakerd[2339]:error: mcp_cpg_destroy: Connection destroyed Oct 8 23:31:43 devrp1 cib[2345]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:43 devrp1 cib[2345]:error: cib_cs_destroy: Corosync connection lost! Exiting. Oct 8 23:31:43 devrp1 stonith-ng[2346]:error: stonith_peer_cs_destroy: Corosync connection terminated Oct 8 23:31:43 devrp1 dlm_controld[1752]:
[Pacemaker] Raid RA Changes to Enable ms configuration -- need some assistance plz.
Hi all. I was hoping to get some help with my configuration. I'm kinda stuck at the moment. I've made some changes to the Raid1 RA to implement a ms style configuration (you can see a diff here http://pastebin.com/Q2nbF6Rg against the RA in the github repo) I've modeled my ms implementation heavily on the SCST RA from the ESOS project (thanks Marc!). Here is my full pacemaker config: http://pastebin.com/jw6WTpZz I'm running on CentOS 6.5 using Pacemaker, and CMAN available from the distro. I'll explain a little bit about what I'm doing then tell you folks where I think I need a bit of assistance or a push in the right direction The Raid1 RA changes I made are because I want to assemble the same RAID1 members on two different hosts. The slave should assemble it in readonly mode. Using the SCST RA, I can offer two paths to the same LUN from two distinct systems, one standby, one active. In order to keep things good and consistent, the master of the MD resource also needs to be master of the SCST resource (i'm using the example here http://clusterlabs.org/doc/en- US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets- colocation.html). So if you can't already tell, this is YAA to create a highly available open storage system :) My understanding is that in order for N to start, N+1 must already be running. So my configuration (to me) reads that the ms_md0 master resource must be started and running before the ms_scst1 resource will be started (as master) and these services will be force on the same node. Please correct me if my understanding is incorrect. When both nodes are up and running, the master roles are not split so I *think* my configuration is being honored, which leads me to my next issue. In my modified RA, I'm not sure I understand how to promote/demote properly. For example, when I put a node on standby, the remaining node doesn't get promoted. I'm not sure why, so I'm asking the experts. I'd really appreciate any feedback, advice, etc you folks can give. Thanks, Errol Neal ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Time out issue while stopping resource in pacemaker
Hi All, I ran into a time out issue while failing over from master to the peer server and I have a 2 node setup with 2 resources. Though it was working all along, this was the first time this issue is seen for me. It fail with following error 'error: process_lrm_event: LRM operation resourceB_stop_0 (40) Timed Out (timeout=2ms)'. Here is the complete log snippet from pacemaker, appreciate your help on this. Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: Diff: +++ 0.3.1 4e9bfa03cf2fef61843c18e127044d81 Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: -- cib admin_epoch=0 epoch=2 num_updates=8 / Oct 9 14:57:38 server1 crmd[373]: notice: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++ instance_attributes id=nodes-server1 Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++ nvpair id=nodes-server1-standby name=standby value=true / Oct 9 14:57:38 server1 cib[368]: notice: cib:diff: ++ /instance_attributes Oct 9 14:57:38 server1 pengine[372]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 9 14:57:38 server1 pengine[372]: notice: LogActions: Move ClusterIP#011(Started server1 - 172.28.0.64) Oct 9 14:57:38 server1 pengine[372]: notice: LogActions: Move resourceB#011(Started server1 - 172.28.0.64) Oct 9 14:57:38 server1 pengine[372]: notice: process_pe_message: Calculated Transition 11: /var/lib/pacemaker/pengine/pe-input-1710.bz2 Oct 9 14:57:58 server1 lrmd[370]: warning: child_timeout_callback: resourceB_stop_0 process (PID 17327) timed out Oct 9 14:57:58 server1 lrmd[370]: warning: operation_finished: resourceB_stop_0:17327 - timed out after 2ms Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished: resourceB_stop_0:17327 [ % Total% Received % Xferd Average Speed TimeTime Time Current ] Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished: resourceB_stop_0:17327 [ Dload Upload Total SpentLeft Speed ] Oct 9 14:57:58 server1 lrmd[370]: notice: operation_finished: resourceB_stop_0:17327 [ #015 0 00 00 0 0 0 --:--:-- --:--:-- --:--:-- 0#015 0 00 00 0 0 0 --:--:-- 0:00:01 --:--:-- 0#015 0 00 00 0 0 0 --:--:-- 0:00:02 --:--:-- 0#015 0 00 00 0 0 0 --:--:-- 0:00:03 --:--:-- 0#015 0 00 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0#015 0 00 00 0 0 0 --:--:-- 0:00:05 - Oct 9 14:57:58 server1 crmd[373]:error: process_lrm_event: LRM operation resourceB_stop_0 (40) Timed Out (timeout=2ms) Oct 9 14:57:58 server1 crmd[373]: warning: status_from_rc: Action 10 (resourceB_stop_0) on server1 failed (target: 0 vs. rc: 1): Error Oct 9 14:57:58 server1 crmd[373]: warning: update_failcount: Updating failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY, time=1412891878) Oct 9 14:57:58 server1 attrd[371]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-resourceB (INFINITY) Oct 9 14:57:58 server1 crmd[373]: warning: update_failcount: Updating failcount for resourceB on server1 after failed stop: rc=1 (update=INFINITY, time=1412891878) Oct 9 14:57:58 server1 crmd[373]: notice: run_graph: Transition 11 (Complete=2, Pending=0, Fired=0, Skipped=9, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1710.bz2): Stopped Oct 9 14:57:58 server1 attrd[371]: notice: attrd_perform_update: Sent update 11: fail-count-resourceB=INFINITY Thanks Lax ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
Hi Andrew, Okay! I test your patch. And I inform you of a result. Many thanks! Hideo Yamauchi. - Original Message - From: Andrew Beekhof and...@beekhof.net To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Cc: Date: 2014/10/10, Fri 10:47 Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails. Perfect! Can you try this: diff --git a/lib/services/services.c b/lib/services/services.c index 8590b56..cb0f0ae 100644 --- a/lib/services/services.c +++ b/lib/services/services.c @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */ free(id); if (op == NULL) { + op-opaque-repeat_timer = 0; return FALSE; } @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */ } else { if (op-opaque-repeat_timer) { g_source_remove(op-opaque-repeat_timer); + op-opaque-repeat_timer = 0; } recurring_action_timer(op); return TRUE; @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void (*action_callback) (svc_actio if (dup-pid != 0) { if (op-opaque-repeat_timer) { g_source_remove(op-opaque-repeat_timer); + op-opaque-repeat_timer = 0; } recurring_action_timer(dup); } On 10 Oct 2014, at 12:16 pm, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Setting of gdb of the Ubuntu environment does not yet go well and I touch lrmd and cannot acquire trace. Please wait for this a little more. But.. I let lrmd terminate abnormally when g_source_remove() of cancel_recurring_action() returned FALSE. - gboolean cancel_recurring_action(svc_action_t * op) { crm_info(Cancelling operation %s, op-id); if (recurring_actions) { g_hash_table_remove(recurring_actions, op-id); } if (op-opaque-repeat_timer) { if (g_source_remove(op-opaque-repeat_timer) == FALSE) { abort(); } (snip) ---core #0 0x7f30aa60ff79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) where #0 0x7f30aa60ff79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7f30aa613388 in __GI_abort () at abort.c:89 #2 0x7f30aadcde77 in crm_abort (file=file@entry=0x7f30aae0152b logging.c, function=function@entry=0x7f30aae028c0 __FUNCTION__.23262 crm_glib_handler, line=line@entry=73, assert_condition=assert_condition@entry=0x19d2ad0 Source ID 63 was not found when attempting to remove it, do_core=do_core@entry=1, do_fork=optimized out, do_fork@entry=1) at utils.c:1195 #3 0x7f30aadf5ca7 in crm_glib_handler (log_domain=0x7f30aa35eb6e GLib, flags=optimized out, message=0x19d2ad0 Source ID 63 was not found when attempting to remove it, user_data=optimized out) at logging.c:73 #4 0x7f30aa320ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x7f30aa320d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #6 0x7f30aa318c5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #7 0x7f30aabb2b55 in cancel_recurring_action (op=op@entry=0x19caa90) at services.c:363 #8 0x7f30aabb2bee in services_action_cancel (name=name@entry=0x19d0530 dummy3, action=optimized out, interval=interval@entry=1) at services.c:385 #9 0x0040405a in cancel_op (rsc_id=rsc_id@entry=0x19d0530 dummy3, action=action@entry=0x19cec10 monitor, interval=1) at lrmd.c:1404 #10 0x0040614f in process_lrmd_rsc_cancel (client=0x19c8290, id=74, request=0x19ca8a0) at lrmd.c:1468 #11 process_lrmd_message (client=client@entry=0x19c8290, id=74, request=request@entry=0x19ca8a0) at lrmd.c:1507 #12 0x00402bac in lrmd_ipc_dispatch (c=0x19c79c0, data=optimized out, size=361) at main.c:148 #13 0x7f30aa07b4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0 #14 0x7f30aadf209d in gio_read_socket (gio=optimized out, condition=G_IO_IN, data=0x19c68a8) at mainloop.c:437 #15 0x7f30aa319ce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 ---Type return to continue, or q return to quit--- #16 0x7f30aa31a048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x7f30aa31a30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x00402774 in main (argc=optimized out, argv=0x7fffcdd90b88) at main.c:344 - Best Regards, Hideo Yamauchi. - Original Message - From: renayama19661...@ybb.ne.jp
Re: [Pacemaker] help deciphering output
On 9 Oct 2014, at 5:06 pm, Alexandre alxg...@gmail.com wrote: I have seen this behavior on several virtualsed environments. when vm backup starts, the VM actually freezes for a (short?) Period of time.I guess it then no more responding to the other cluster nodes thus triggering unexpected fail over and/or fencing. Alas the dlm is _really_ intolerant of any membership blips. Once a node is marked failed the dlm wants it fenced. Even if is comes back 1ms later. I have this kind of behavior on VMware env using veam backup, as well promox (+ u don't what backup tool) That's actually an interesting topic I never though about rising here. How can we avoid that? Increasing timeout? I am afraid we would have to reach unacceptable high timeout values and am not even sure that would fix the pb. I think not all VM snapshots strategy would trigger that PV, do you guys have any feedback to provide on the backup/snapshot method best suits corosync clusters? Regards Le 9 oct. 2014 01:24, Alex Samad - Yieldbroker alex.sa...@yieldbroker.com a écrit : One of my nodes died in a 2 node cluster I gather something went wrong, and it fenced/killed itself. But I am not sure what happened. I think maybe around that time the VM backups happened and snap of the VM could have happened But there is nothing for me to put my finger on Output from messages around that time This is on devrp1 Oct 8 23:31:38 devrp1 corosync[1670]: [TOTEM ] A processor failed, forming new configuration. Oct 8 23:31:40 devrp1 corosync[1670]: [CMAN ] quorum lost, blocking activity Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Oct 8 23:31:40 devrp1 corosync[1670]: [QUORUM] Members[1]: 1 Oct 8 23:31:40 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:40 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:2 left:1) Oct 8 23:31:40 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:41 devrp1 kernel: dlm: closing connection to node 2 Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 424: quorum lost Oct 8 23:31:42 devrp1 corosync[1670]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Oct 8 23:31:42 devrp1 corosync[1670]: [CMAN ] quorum regained, resuming activity Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] This node is within the primary component and will provide service. Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [QUORUM] Members[2]: 1 2 Oct 8 23:31:42 devrp1 corosync[1670]: [CPG ] chosen downlist: sender r(0) ip(10.172.214.51) ; members(old:1 left:0) Oct 8 23:31:42 devrp1 corosync[1670]: [MAIN ] Completed service synchronization, ready to provide service. Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now lost (was member) Oct 8 23:31:42 devrp1 crmd[2350]: warning: reap_dead_nodes: Our DC node (devrp2) left the cluster Oct 8 23:31:42 devrp1 crmd[2350]: notice: cman_event_callback: Membership 428: quorum acquired Oct 8 23:31:42 devrp1 crmd[2350]: notice: crm_update_peer_state: cman_event_callback: Node devrp2[2] - state is now member (was lost) Oct 8 23:31:42 devrp1 crmd[2350]: notice: do_state_transition: State transition S_NOT_DC - S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ] Oct 8 23:31:42 devrp1 corosync[1670]: cman killed by node 2 because we were killed by cman_tool or other application Oct 8 23:31:42 devrp1 pacemakerd[2339]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 stonith-ng[2346]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:42 devrp1 crmd[2350]:error: crmd_cs_destroy: connection terminated Oct 8 23:31:43 devrp1 fenced[1726]: cluster is down, exiting Oct 8 23:31:43 devrp1 fenced[1726]: daemon cpg_dispatch error 2 Oct 8 23:31:43 devrp1 attrd[2348]:error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Oct 8 23:31:43 devrp1 attrd[2348]: crit: attrd_cs_destroy: Lost connection to Corosync service! Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Exiting... Oct 8 23:31:43 devrp1 attrd[2348]: notice: main: Disconnecting client 0x18cf240, pid=2350... Oct 8 23:31:43 devrp1 pacemakerd[2339]:error: mcp_cpg_destroy: Connection destroyed Oct 8 23:31:43 devrp1 cib[2345]:error: pcmk_cpg_dispatch: Connection to the CPG API
Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
Hi Andrew, I applied three corrections that you made and checked movement. I picked all abort processing with g_source_remove() of services.c just to make sure. * I set following abort in four places that carried out g_source_remove if (g_source_remove(op-opaque-repeat_timer) == FALSE) { abort(); } As a result, abort still occurred. The problem does not seem to be yet settled by your correction. (gdb) where #0 0x7fdd923e1f79 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7fdd923e5388 in __GI_abort () at abort.c:89 #2 0x7fdd92b9fe77 in crm_abort (file=file@entry=0x7fdd92bd352b logging.c, function=function@entry=0x7fdd92bd48c0 __FUNCTION__.23262 crm_glib_handler, line=line@entry=73, assert_condition=assert_condition@entry=0xe20b80 Source ID 40 was not found when attempting to remove it, do_core=do_core@entry=1, do_fork=optimized out, do_fork@entry=1) at utils.c:1195 #3 0x7fdd92bc7ca7 in crm_glib_handler (log_domain=0x7fdd92130b6e GLib, flags=optimized out, message=0xe20b80 Source ID 40 was not found when attempting to remove it, user_data=optimized out) at logging.c:73 #4 0x7fdd920f2ae1 in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #5 0x7fdd920f2d72 in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #6 0x7fdd920eac5c in g_source_remove () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #7 0x7fdd92984b55 in cancel_recurring_action (op=op@entry=0xe19b90) at services.c:365 #8 0x7fdd92984bee in services_action_cancel (name=name@entry=0xe1d2d0 dummy2, action=optimized out, interval=interval@entry=1) at services.c:387 #9 0x0040405a in cancel_op (rsc_id=rsc_id@entry=0xe1d2d0 dummy2, action=action@entry=0xe10d90 monitor, interval=1) at lrmd.c:1404 #10 0x0040614f in process_lrmd_rsc_cancel (client=0xe17290, id=74, request=0xe1be10) at lrmd.c:1468 #11 process_lrmd_message (client=client@entry=0xe17290, id=74, request=request@entry=0xe1be10) at lrmd.c:1507 #12 0x00402bac in lrmd_ipc_dispatch (c=0xe169c0, data=optimized out, size=361) at main.c:148 #13 0x7fdd91e4d4d9 in qb_ipcs_dispatch_connection_request () from /usr/lib/libqb.so.0 #14 0x7fdd92bc409d in gio_read_socket (gio=optimized out, condition=G_IO_IN, data=0xe158a8) at mainloop.c:437 #15 0x7fdd920ebce5 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 ---Type return to continue, or q return to quit--- #16 0x7fdd920ec048 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #17 0x7fdd920ec30a in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #18 0x00402774 in main (argc=optimized out, argv=0x7fff22cac268) at main.c:344 Best Regards, Hideo Yamauchi. - Original Message - From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp To: Andrew Beekhof and...@beekhof.net; The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Cc: Date: 2014/10/10, Fri 10:55 Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails. Hi Andrew, Okay! I test your patch. And I inform you of a result. Many thanks! Hideo Yamauchi. - Original Message - From: Andrew Beekhof and...@beekhof.net To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Cc: Date: 2014/10/10, Fri 10:47 Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails. Perfect! Can you try this: diff --git a/lib/services/services.c b/lib/services/services.c index 8590b56..cb0f0ae 100644 --- a/lib/services/services.c +++ b/lib/services/services.c @@ -417,6 +417,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */ free(id); if (op == NULL) { + op-opaque-repeat_timer = 0; return FALSE; } @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char *action, int interval /* ms */ } else { if (op-opaque-repeat_timer) { g_source_remove(op-opaque-repeat_timer); + op-opaque-repeat_timer = 0; } recurring_action_timer(op); return TRUE; @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void (*action_callback) (svc_actio if (dup-pid != 0) { if (op-opaque-repeat_timer) { g_source_remove(op-opaque-repeat_timer); + op-opaque-repeat_timer = 0; } recurring_action_timer(dup); } On 10 Oct 2014, at 12:16 pm, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Setting of gdb of the Ubuntu environment does not yet go well and I touch lrmd and cannot acquire trace. Please wait for this a little more. But.. I let lrmd