Re: [Pacemaker] Time out issue while stopping resource in pacemaker
Andrew Beekhof writes: > > One does not imply the other. Stonith is arguably even more important for 2-node clusters. Ok, will try it out. > > > > > One more thing, on another setup with same configuration while running > > pacemaker I keep getting 'gfs_controld[10744]: daemon cpg_join error > > retrying'. Even after I force kill the pacemaker processes and reboot the > > server and bring the pacemaker back up, it keeps giving cpg_join error. Is > > there any way to fix this issue? > > That would be something for the gfs and/or corosync guys I'm afraid Thanks for your help Andrew, will follow up with them. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Raid RA Changes to Enable ms configuration -- need some assistance plz.
On 14 Oct 2014, at 12:58 am, Errol Neal wrote: > Andrew Beekhof writes: > >>> >>> Here is my full pacemaker config: >>> >>> http://pastebin.com/jw6WTpZz >>> >>> My understanding is that in order for N to start, N+1 must already > be >>> running. So my configuration (to me) reads that the ms_md0 master >>> resource must be started and running before the ms_scst1 resource > will >>> be started (as master) and these services will be force on the same >>> node. Please correct me if my understanding is incorrect. >> >> I see only one ordering constraint, and thats between dlm_clone and > clvm_clone. >> Colocation != ordering. > > Hi Andrew. I'm still learning, so forgive me. > Are you saying I have an ordering issue? > I'm not following. Yes. If you want the cluster to start things in a particular order, then you need to specify it. > > I also have these two lines: These affect where things go, but not the order in which they are started on the node. > > colocation ms_md0-ms_scst1 inf: ms_scst1:Master ( ms_md0:Master ) > colocation ms_md1-ms_scst2 inf: ms_scst2:Master ( ms_md1:Master ) > > >> >>> When both >>> nodes are up and running, the master roles are not split so I > *think* my >>> configuration is being honored, which leads me to my next issue. >>> >>> In my modified RA, I'm not sure I understand how to promote/demote >>> properly. For example, when I put a node on standby, the remaining > node >>> doesn't get promoted. I'm not sure why, so I'm asking the experts. >>> >>> I'd really appreciate any feedback, advice, etc you folks can give. > > > This is the real issue IMO. The promotion is not occurring when it > should. > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.
Hi Andrew, The problem was settled with your patch. Please merge a patch into master. Please confirm whether there is not a problem in other points either concerning g_timeout_add() and g_source_remove() if possible. Many Thanks! Hideo Yamauchi. - Original Message - > From: "renayama19661...@ybb.ne.jp" > To: The Pacemaker cluster resource manager > Cc: > Date: 2014/10/10, Fri 15:34 > Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, > g_source_remove fails. > > Hi Andrew, > > Thank you for comments. > >> diff --git a/lib/services/services_linux.c b/lib/services/services_linux.c >> index 961ff18..2279e4e 100644 >> --- a/lib/services/services_linux.c >> +++ b/lib/services/services_linux.c >> @@ -227,6 +227,7 @@ recurring_action_timer(gpointer data) >> op->stdout_data = NULL; >> free(op->stderr_data); >> op->stderr_data = NULL; >> + op->opaque->repeat_timer = 0; >> >> services_action_async(op, NULL); >> return FALSE; > > > I confirm a correction again. > > > > Many Thanks! > Hideo Yamauchi. > > > > - Original Message - >> From: Andrew Beekhof >> To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager > >> Cc: >> Date: 2014/10/10, Fri 15:19 >> Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of > glib, g_source_remove fails. >> >> /me slaps forhead >> >> this one should work >> >> diff --git a/lib/services/services.c b/lib/services/services.c >> index 8590b56..753e257 100644 >> --- a/lib/services/services.c >> +++ b/lib/services/services.c >> @@ -313,6 +313,7 @@ services_action_free(svc_action_t * op) >> >> if (op->opaque->repeat_timer) { >> g_source_remove(op->opaque->repeat_timer); >> + op->opaque->repeat_timer = 0; >> } >> if (op->opaque->stderr_gsource) { >> mainloop_del_fd(op->opaque->stderr_gsource); >> @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char > *action, >> int interval /* ms */ >> } else { >> if (op->opaque->repeat_timer) { >> g_source_remove(op->opaque->repeat_timer); >> + op->opaque->repeat_timer = 0; >> } >> recurring_action_timer(op); >> return TRUE; >> @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void >> (*action_callback) (svc_actio >> if (dup->pid != 0) { >> if (op->opaque->repeat_timer) { >> g_source_remove(op->opaque->repeat_timer); >> + op->opaque->repeat_timer = 0; >> } >> recurring_action_timer(dup); >> } >> diff --git a/lib/services/services_linux.c b/lib/services/services_linux.c >> index 961ff18..2279e4e 100644 >> --- a/lib/services/services_linux.c >> +++ b/lib/services/services_linux.c >> @@ -227,6 +227,7 @@ recurring_action_timer(gpointer data) >> op->stdout_data = NULL; >> free(op->stderr_data); >> op->stderr_data = NULL; >> + op->opaque->repeat_timer = 0; >> >> services_action_async(op, NULL); >> return FALSE; >> >> >> On 10 Oct 2014, at 4:45 pm, renayama19661...@ybb.ne.jp wrote: >> >>> Hi Andrew, >>> >>> I applied three corrections that you made and checked movement. >>> I picked all "abort" processing with g_source_remove() of >> services.c just to make sure. >>> * I set following "abort" in four places that carried out >> g_source_remove >>> >> if > (g_source_remove(op->opaque->repeat_timer) == >> FALSE) { >> abort(); >> } >>> >>> >>> As a result, "abort" still occurred. >>> >>> >>> The problem does not seem to be yet settled by your correction. >>> >>> >>> (gdb) where >>> #0 0x7fdd923e1f79 in __GI_raise (sig=sig@entry=6) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:56 >>> #1 0x7fdd923e5388 in __GI_abort () at abort.c:89 >>> #2 0x7fdd92b9fe77 in crm_abort (file=file@entry=0x7fdd92bd352b >> "logging.c", >>> function=function@entry=0x7fdd92bd48c0 <__FUNCTION__.23262> >> "crm_glib_handler", line=line@entry=73, >>> assert_condition=assert_condition@entry=0xe20b80 "Source ID > 40 was >> not found when attempting to remove it", do_core=do_core@entry=1, >>> do_fork=, do_fork@entry=1) at utils.c:1195 >>> #3 0x7fdd92bc7ca7 in crm_glib_handler (log_domain=0x7fdd92130b6e >> "GLib", flags=, >>> message=0xe20b80 "Source ID 40 was not found when attempting > to >> remove it", user_data=) at logging.c:73 >>> #4 0x7fdd920f2ae1 in g_logv () from >> /lib/x86_64-linux-gnu/libglib-2.0.so.0 >>> #5 0x7fdd920f2d72 in g_log () from >> /lib/x86_64-linux-gnu/libglib-2.0.so.0 >>> #6 0x7fdd920eac5c in g_source_remove () from >> /lib/x86_64-linux-gnu/libglib-2.0.so.0 >>> #7 0x7fdd92984b55 in cancel_recurring_action > (op=op@entry=0xe19b90) at >> serv
Re: [Pacemaker] Time out issue while stopping resource in pacemaker
On 14 Oct 2014, at 5:11 am, Lax wrote: > Andrew Beekhof writes: > > >> I'm guessing you don't have stonith? >> >> The underlying philosophy is that the services pacemaker manages need to > exit before pacemaker can. >> If the service can't stop, it would be dishonest of pacemaker to do so. >> >> If you had fencing, it would have been able to clean up after a failed > stop and allow the rest of the cluster to continue. > > Thanks Andrew. I have a 2 node setup so had to turn off stonith. One does not imply the other. Stonith is arguably even more important for 2-node clusters. > > One more thing, on another setup with same configuration while running > pacemaker I keep getting 'gfs_controld[10744]: daemon cpg_join error > retrying'. Even after I force kill the pacemaker processes and reboot the > server and bring the pacemaker back up, it keeps giving cpg_join error. Is > there any way to fix this issue? That would be something for the gfs and/or corosync guys I'm afraid > > Thanks > Lax > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Bandwidth Requirement
On 13 Oct 2014, at 11:49 pm, Sahil Aggarwal wrote: > Hello Andrew , > > Thanx for the response ... > > You are requested to solve one more query ... > > we generally user 2 to 10 nodes in a cluster using multicasting , and also > using Postgres Databse replication under cluster . > > can you tell us the minimum bandwithdth requireed in case without postgres > replication and with postgres replication . Sorry, I can't. There is no magic formula I can plug these values into. You will have to measure the minimum, peak and average values. During a period when there has been no config changes, failures, or node up/down events - the bandwidth caused by pacemaker/corosync should be near zero. > > > > > > On Mon, Oct 13, 2014 at 5:11 AM, Andrew Beekhof wrote: > it depends on how many nodes and resources you have, whether you're using > multicast, and how often things are going to be recovered or moved in the > cluster > > On 11 Oct 2014, at 12:24 am, Sahil Aggarwal > wrote: > > > What is Network Bandwidth requirement in HA using pacemaker and corosync ? > > ? ? > > > > -- > > Regards, > > Sahil > > ApplicationEngineer ( Reseach and Development Department ) | > > Ph+919467607999 > > > > > > DRISHTI-SOFT SOLUTIONS PVT. LTD. > > B2/450, Spaze iTech Park Sohna Road Sector 49, Gurgaon 122008 > > T: +91-124-4771000; Extn.1050 F: +91-124-4039120 > > > > > > IVR l ACD l CTI l Reporting l CRM l Logger l Predictive Dialer > > l Multi-channel interactions l QM l Customization & Integrations > > > > > > > > > -- > Regards, > Sahil > ApplicationEngineer ( Reseach and Development Department ) | Ph+919467607999 > > > DRISHTI-SOFT SOLUTIONS PVT. LTD. > B2/450, Spaze iTech Park Sohna Road Sector 49, Gurgaon 122008 > T: +91-124-4771000; Extn.1050 F: +91-124-4039120 > > > IVR l ACD l CTI l Reporting l CRM l Logger l Predictive Dialer l > Multi-channel interactions l QM l Customization & Integrations > > signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Time out issue while stopping resource in pacemaker
Andrew Beekhof writes: > I'm guessing you don't have stonith? > > The underlying philosophy is that the services pacemaker manages need to exit before pacemaker can. > If the service can't stop, it would be dishonest of pacemaker to do so. > > If you had fencing, it would have been able to clean up after a failed stop and allow the rest of the cluster to continue. Thanks Andrew. I have a 2 node setup so had to turn off stonith. One more thing, on another setup with same configuration while running pacemaker I keep getting 'gfs_controld[10744]: daemon cpg_join error retrying'. Even after I force kill the pacemaker processes and reboot the server and bring the pacemaker back up, it keeps giving cpg_join error. Is there any way to fix this issue? Thanks Lax ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] communications problems in cluster
Hi! Most likely related... I have node vm-vmwww with remote-node vmwww. Both are reported online (vmwww:vm-vmwww) and vm-vmwww is reported as 'started on wings1'. However, when I try to cleanup faulty failed action " vmwww_start_0 on wings1 'unknown error' (1): call=100, status=Timed Out ", here is what I get in the log: Oct 13 18:25:43 wings1 crmd[3844]: warning: qb_ipcs_event_sendv: new_event_notification (3844-18918-16): Broken pipe (32) Oct 13 18:25:43 wings1 crmd[3844]:error: do_lrm_invoke: no lrmd connection for remote node vmwww found on cluster node wings1. Can not process request. Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message. I go to the VM, and try to run 'crm_mon': Oct 13 18:27:06 vmwww pacemaker_remoted[3798]:error: ipc_proxy_accept: No ipc providers available for uid 0 gid 0 Oct 13 18:27:06 vmwww pacemaker_remoted[3798]:error: handle_new_connection: Error in connection setup (3798-3868-13): Remote I/O error (121) ps aux | grep pace root 3798 0.1 0.1 76396 2868 ?S18:16 0:00 pacemaker_remoted netstat -nltp | grep 3121 tcp0 0 0.0.0.0:31210.0.0.0:* LISTEN 3798/pacemaker_remo However I can telnet ok: [root@wings1 ~]# telnet vmwww 3121 Trying 192.168.222.89... Connected to vmwww. Escape character is '^]'. ^] telnet> quit Connection closed. This is pretty weird... Best regards, Alex 2014-10-13 17:47 GMT+04:00 Саша Александров : > Hi! > > I was building a cluster with pacemaker+pacemaker-remote (CentOS 6.5, > everything from the official repo). > While I had several resources, everything was fine. However, when I added > more VMs (2 nodes and 10 VMs currently) I started to run into problems (see > below). > Strange thing is that when I start cman/pacemaker some time later - they > seem to work fine for some time. > > Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child > process crmd terminated with signal 13 (pid=30010, core=0) > Oct 13 17:03:54 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: > new_event_notification (26448-30010-6): Bad file descriptor (9) > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_process_exit: > Respawning failed child process: crmd > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed > > Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child > process crmd terminated with signal 13 (pid=30603, core=0) > Oct 13 17:03:57 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: > new_event_notification (26448-30603-6): Bad file descriptor (9) > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_process_exit: > Respawning failed child process: crmd > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: > Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed > Oct 13 17:03:57 wings1 crmd[31192]: notice: crm_add_logfile: Additional > logging available in /var/log/cluster/corosync.log > Oct 13 17:03:57 wings1 cib
Re: [Pacemaker] Raid RA Changes to Enable ms configuration -- need some assistance plz.
Andrew Beekhof writes: > > > > Here is my full pacemaker config: > > > > http://pastebin.com/jw6WTpZz > > > > My understanding is that in order for N to start, N+1 must already be > > running. So my configuration (to me) reads that the ms_md0 master > > resource must be started and running before the ms_scst1 resource will > > be started (as master) and these services will be force on the same > > node. Please correct me if my understanding is incorrect. > > I see only one ordering constraint, and thats between dlm_clone and clvm_clone. > Colocation != ordering. Hi Andrew. I'm still learning, so forgive me. Are you saying I have an ordering issue? I'm not following. I also have these two lines: colocation ms_md0-ms_scst1 inf: ms_scst1:Master ( ms_md0:Master ) colocation ms_md1-ms_scst2 inf: ms_scst2:Master ( ms_md1:Master ) > > > When both > > nodes are up and running, the master roles are not split so I *think* my > > configuration is being honored, which leads me to my next issue. > > > > In my modified RA, I'm not sure I understand how to promote/demote > > properly. For example, when I put a node on standby, the remaining node > > doesn't get promoted. I'm not sure why, so I'm asking the experts. > > > > I'd really appreciate any feedback, advice, etc you folks can give. This is the real issue IMO. The promotion is not occurring when it should. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] communications problems in cluster
Hi! I was building a cluster with pacemaker+pacemaker-remote (CentOS 6.5, everything from the official repo). While I had several resources, everything was fine. However, when I added more VMs (2 nodes and 10 VMs currently) I started to run into problems (see below). Strange thing is that when I start cman/pacemaker some time later - they seem to work fine for some time. Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child process crmd terminated with signal 13 (pid=30010, core=0) Oct 13 17:03:54 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: new_event_notification (26448-30010-6): Bad file descriptor (9) Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:54 wings1 pacemakerd[26440]: notice: pcmk_process_exit: Respawning failed child process: crmd Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:54 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child process crmd terminated with signal 13 (pid=30603, core=0) Oct 13 17:03:57 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: new_event_notification (26448-30603-6): Bad file descriptor (9) Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 pacemakerd[26440]: notice: pcmk_process_exit: Respawning failed child process: crmd Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed Oct 13 17:03:57 wings1 crmd[31192]: notice: crm_add_logfile: Additional logging available in /var/log/cluster/corosync.log Oct 13 17:03:57 wings1 cib[26446]: warning: qb_ipcs_event_sendv: new_event_notification (26446-30603-11): Broken pipe (32) Oct 13 17:03:57 wings1 cib[26446]: warning: cib_notify_send_one: Notification of client crmd/fe944296-b3a1-4177-a94c-650568e8ff0a failed .. So it keeps restarting, I even had to unmanage resources and stop pacemaker/cman. Oct 13 17:04:13 wings1 lrmd[26448]: warning: qb_ipcs_event_sendv: new_event_notification (26448-32444-6): Bad file descriptor (9) Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 pacemakerd[26440]: notice: pcmk_child_exit: Child process crmd terminated with signal 13 (pid=32444, core=0) Oct 13 17:04:13 wings1 pacemakerd[26440]: notice: pcmk_process_exit: Respawning failed child process: crmd Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 lrmd[26448]: warning: send_client_notify: Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed Oct 13 17:04:13 wings1 cib[26446]: warning: qb_ipcs_event_sendv: new_event_notification (26446-32444-11): Broken pipe (32) Oct 13 17:04:13 wings1 cib[26446]: warning: cib_notify_send_one: Notification of client crmd/ef727424-ce2b-4b3b-8749-82136dc72af8 failed And one more thing (probably not related, but who knows) - I have CentOS 7.0 on one of the VMs, LRMD is unable to establish communications with pacemaker_remote on that VM: (node): Oct 13 17:31:43 wings1 crmd[3844]:error: lrmd_tls_send_recv: Remote lrmd
Re: [Pacemaker] Fencing of movable VirtualDomains
Andrew Beekhof writes: [...] > Is the ipaddr for each device really the same? If so, why not use a > single 'resource'? No, sorry, the IP addr was not the same. > Also, 1.1.7 wasn't as smart as 1.1.12 when it came to deciding which fencing > device to use. > > Likely you'll get the behaviour you want with a version upgrade. I'll do that this week. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org