Re: [Pacemaker] Fencing of movable VirtualDomains

2014-10-13 Thread Daniel Dehennin
Andrew Beekhof and...@beekhof.net writes:


[...]

 Is the ipaddr for each device really the same?  If so, why not use a
 single 'resource'?

No, sorry, the IP addr was not the same.

 Also, 1.1.7 wasn't as smart as 1.1.12 when it came to deciding which fencing 
 device to use.

 Likely you'll get the behaviour you want with a version upgrade.

I'll do that this week.

Regards.
-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF


signature.asc
Description: PGP signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] communications problems in cluster

2014-10-13 Thread Саша Александров
Hi!

I was building a cluster with pacemaker+pacemaker-remote  (CentOS 6.5,
everything from the official repo).
While I had several resources, everything was fine. However, when I added
more VMs (2 nodes and 10 VMs currently) I started to run into problems (see
below).
Strange thing is that when I start cman/pacemaker some time later - they
seem to work fine for some time.

Oct 13 17:03:54 wings1 pacemakerd[26440]:   notice: pcmk_child_exit: Child
process crmd terminated with signal 13 (pid=30010, core=0)
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: qb_ipcs_event_sendv:
new_event_notification (26448-30010-6): Bad file descriptor (9)
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
Oct 13 17:03:54 wings1 pacemakerd[26440]:   notice: pcmk_process_exit:
Respawning failed child process: crmd
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed

Oct 13 17:03:57 wings1 pacemakerd[26440]:   notice: pcmk_child_exit: Child
process crmd terminated with signal 13 (pid=30603, core=0)
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: qb_ipcs_event_sendv:
new_event_notification (26448-30603-6): Bad file descriptor (9)
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 pacemakerd[26440]:   notice: pcmk_process_exit:
Respawning failed child process: crmd
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
Oct 13 17:03:57 wings1 crmd[31192]:   notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 13 17:03:57 wings1 cib[26446]:  warning: qb_ipcs_event_sendv:
new_event_notification (26446-30603-11): Broken pipe (32)
Oct 13 17:03:57 wings1 cib[26446]:  warning: cib_notify_send_one:
Notification of client crmd/fe944296-b3a1-4177-a94c-650568e8ff0a failed

..

So it keeps restarting, I even had to unmanage resources and stop
pacemaker/cman.

Oct 13 17:04:13 wings1 lrmd[26448]:  warning: qb_ipcs_event_sendv:
new_event_notification (26448-32444-6): Bad file descriptor (9)
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 pacemakerd[26440]:   notice: pcmk_child_exit: Child
process crmd terminated with signal 13 (pid=32444, core=0)
Oct 13 17:04:13 wings1 pacemakerd[26440]:   notice: pcmk_process_exit:
Respawning failed child process: crmd
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 lrmd[26448]:  warning: send_client_notify:
Notification of client crmd/ea7ab099-1005-450b-9e46-d9d13ea266e4 failed
Oct 13 17:04:13 wings1 cib[26446]:  warning: qb_ipcs_event_sendv:
new_event_notification (26446-32444-11): Broken pipe (32)
Oct 13 17:04:13 wings1 cib[26446]:  warning: cib_notify_send_one:
Notification of client crmd/ef727424-ce2b-4b3b-8749-82136dc72af8 failed



And one more thing (probably not related, but who knows) - I have CentOS
7.0 on one of the VMs, LRMD is unable to establish communications with
pacemaker_remote on that VM:

(node):
Oct 13 17:31:43 wings1 crmd[3844]:error: lrmd_tls_send_recv: Remote
lrmd 

Re: [Pacemaker] Raid RA Changes to Enable ms configuration -- need some assistance plz.

2014-10-13 Thread Errol Neal
Andrew Beekhof andrew@... writes:

  
  Here is my full pacemaker config:
  
  http://pastebin.com/jw6WTpZz
  
  My understanding is that in order for N to start, N+1 must already 
be 
  running. So my configuration (to me) reads that the ms_md0 master 
  resource must be started and running before the ms_scst1 resource 
will 
  be started (as master) and these services will be force on the same 
  node. Please correct me if my understanding is incorrect.
 
 I see only one ordering constraint, and thats between dlm_clone and 
clvm_clone.
 Colocation != ordering.

Hi Andrew. I'm still learning, so forgive me. 
Are you saying I have an ordering issue? 
I'm not following.

I also have these two lines:

colocation ms_md0-ms_scst1 inf: ms_scst1:Master ( ms_md0:Master )
colocation ms_md1-ms_scst2 inf: ms_scst2:Master ( ms_md1:Master )


 
  When both 
  nodes are up and running, the master roles are not split so I 
*think* my 
  configuration is being honored, which leads me to my next issue. 
  
  In my modified RA, I'm not sure I understand how to promote/demote 
  properly. For example, when I put a node on standby, the remaining 
node 
  doesn't get promoted. I'm not sure why, so I'm asking the experts. 
  
  I'd really appreciate any feedback, advice, etc you folks can give. 


This is the real issue IMO. The promotion is not occurring when it 
should. 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] communications problems in cluster

2014-10-13 Thread Саша Александров
Hi!

Most likely related...
I have node vm-vmwww with remote-node vmwww. Both are reported online
(vmwww:vm-vmwww) and vm-vmwww is reported as 'started on wings1'.
However, when I try to cleanup faulty failed action  vmwww_start_0 on
wings1 'unknown error' (1): call=100, status=Timed Out , here is what I
get in the log:

Oct 13 18:25:43 wings1 crmd[3844]:  warning: qb_ipcs_event_sendv:
new_event_notification (3844-18918-16): Broken pipe (32)
Oct 13 18:25:43 wings1 crmd[3844]:error: do_lrm_invoke: no lrmd
connection for remote node vmwww found on cluster node wings1. Can not
process request.
Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown
Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message.
Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown
Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message.
Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown
Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message.
Oct 13 18:25:43 wings1 crmd[3844]:error: send_msg_via_ipc: Unknown
Sub-system (d483a600-5535-4f0d-8ffd-2af391f5cb21)... discarding message.

I go to the VM, and try to run 'crm_mon':

Oct 13 18:27:06 vmwww pacemaker_remoted[3798]:error: ipc_proxy_accept:
No ipc providers available for uid 0 gid 0
Oct 13 18:27:06 vmwww pacemaker_remoted[3798]:error:
handle_new_connection: Error in connection setup (3798-3868-13): Remote I/O
error (121)

ps aux | grep pace
root  3798  0.1  0.1  76396  2868 ?S18:16   0:00
pacemaker_remoted

netstat -nltp | grep 3121
tcp0  0 0.0.0.0:31210.0.0.0:*
LISTEN  3798/pacemaker_remo

However I can telnet ok:

[root@wings1 ~]# telnet vmwww 3121
Trying 192.168.222.89...
Connected to vmwww.
Escape character is '^]'.
^]
telnet quit
Connection closed.

This is pretty weird...

Best regards,
Alex


2014-10-13 17:47 GMT+04:00 Саша Александров shurr...@gmail.com:

 Hi!

 I was building a cluster with pacemaker+pacemaker-remote  (CentOS 6.5,
 everything from the official repo).
 While I had several resources, everything was fine. However, when I added
 more VMs (2 nodes and 10 VMs currently) I started to run into problems (see
 below).
 Strange thing is that when I start cman/pacemaker some time later - they
 seem to work fine for some time.

 Oct 13 17:03:54 wings1 pacemakerd[26440]:   notice: pcmk_child_exit: Child
 process crmd terminated with signal 13 (pid=30010, core=0)
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: qb_ipcs_event_sendv:
 new_event_notification (26448-30010-6): Bad file descriptor (9)
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
 Oct 13 17:03:54 wings1 pacemakerd[26440]:   notice: pcmk_process_exit:
 Respawning failed child process: crmd
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed
 Oct 13 17:03:54 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/665bd130-2630-454b-9102-3f17d2bd71f3 failed

 Oct 13 17:03:57 wings1 pacemakerd[26440]:   notice: pcmk_child_exit: Child
 process crmd terminated with signal 13 (pid=30603, core=0)
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: qb_ipcs_event_sendv:
 new_event_notification (26448-30603-6): Bad file descriptor (9)
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 pacemakerd[26440]:   notice: pcmk_process_exit:
 Respawning failed child process: crmd
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 lrmd[26448]:  warning: send_client_notify:
 Notification of client crmd/820ac884-24ca-4fff-9dc8-0a09e82e0e0a failed
 Oct 13 17:03:57 wings1 crmd[31192]:   notice: crm_add_logfile: Additional
 logging available in /var/log/cluster/corosync.log
 Oct 13 17:03:57 wings1 cib[26446]:  warning: 

Re: [Pacemaker] Time out issue while stopping resource in pacemaker

2014-10-13 Thread Andrew Beekhof

On 14 Oct 2014, at 5:11 am, Lax lk...@cisco.com wrote:

 Andrew Beekhof andrew@... writes:
 
 
 I'm guessing you don't have stonith?
 
 The underlying philosophy is that the services pacemaker manages need to
 exit before pacemaker can.
 If the service can't stop, it would be dishonest of pacemaker to do so.
 
 If you had fencing, it would have been able to clean up after a failed
 stop and allow the rest of the cluster to continue.
 
 Thanks Andrew. I have a 2 node setup so had to turn off stonith. 

One does not imply the other. Stonith is arguably even more important for 
2-node clusters.

 
 One more thing, on another setup with same configuration while running
 pacemaker I keep getting 'gfs_controld[10744]: daemon cpg_join error
 retrying'. Even after I force kill the pacemaker processes and reboot the
 server and bring the pacemaker back up, it keeps giving cpg_join error. Is
 there any way to fix this issue?  

That would be something for the gfs and/or corosync guys I'm afraid

 
 Thanks
 Lax
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, g_source_remove fails.

2014-10-13 Thread renayama19661014
Hi Andrew,

The problem was settled with your patch.
Please merge a patch into master.

Please confirm whether there is not a problem in other points either concerning 
g_timeout_add() and g_source_remove() if possible.


Many Thanks!
Hideo Yamauchi.



- Original Message -
 From: renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp
 To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Cc: 
 Date: 2014/10/10, Fri 15:34
 Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of glib, 
 g_source_remove fails.
 
 Hi Andrew,
 
 Thank you for comments.
 
  diff --git a/lib/services/services_linux.c b/lib/services/services_linux.c
  index 961ff18..2279e4e 100644
  --- a/lib/services/services_linux.c
  +++ b/lib/services/services_linux.c
  @@ -227,6 +227,7 @@ recurring_action_timer(gpointer data)
      op-stdout_data = NULL;
      free(op-stderr_data);
      op-stderr_data = NULL;
  +    op-opaque-repeat_timer = 0;
  
      services_action_async(op, NULL);
      return FALSE;
 
 
 I confirm a correction again.
 
 
 
 Many Thanks!
 Hideo Yamauchi.
 
 
 
 - Original Message -
  From: Andrew Beekhof and...@beekhof.net
  To: renayama19661...@ybb.ne.jp; The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
  Cc: 
  Date: 2014/10/10, Fri 15:19
  Subject: Re: [Pacemaker] [Problem]When Pacemaker uses a new version of 
 glib, g_source_remove fails.
 
  /me slaps forhead
 
  this one should work
 
  diff --git a/lib/services/services.c b/lib/services/services.c
  index 8590b56..753e257 100644
  --- a/lib/services/services.c
  +++ b/lib/services/services.c
  @@ -313,6 +313,7 @@ services_action_free(svc_action_t * op)
 
       if (op-opaque-repeat_timer) {
           g_source_remove(op-opaque-repeat_timer);
  +        op-opaque-repeat_timer = 0;
       }
       if (op-opaque-stderr_gsource) {
           mainloop_del_fd(op-opaque-stderr_gsource);
  @@ -425,6 +426,7 @@ services_action_kick(const char *name, const char 
 *action, 
  int interval /* ms */
       } else {
           if (op-opaque-repeat_timer) {
               g_source_remove(op-opaque-repeat_timer);
  +            op-opaque-repeat_timer = 0;
           }
           recurring_action_timer(op);
           return TRUE;
  @@ -459,6 +461,7 @@ handle_duplicate_recurring(svc_action_t * op, void 
  (*action_callback) (svc_actio
           if (dup-pid != 0) {
               if (op-opaque-repeat_timer) {
                   g_source_remove(op-opaque-repeat_timer);
  +                op-opaque-repeat_timer = 0;
               }
               recurring_action_timer(dup);
           }
  diff --git a/lib/services/services_linux.c b/lib/services/services_linux.c
  index 961ff18..2279e4e 100644
  --- a/lib/services/services_linux.c
  +++ b/lib/services/services_linux.c
  @@ -227,6 +227,7 @@ recurring_action_timer(gpointer data)
       op-stdout_data = NULL;
       free(op-stderr_data);
       op-stderr_data = NULL;
  +    op-opaque-repeat_timer = 0;
 
       services_action_async(op, NULL);
       return FALSE;
 
 
  On 10 Oct 2014, at 4:45 pm, renayama19661...@ybb.ne.jp wrote:
 
   Hi Andrew,
 
   I applied three corrections that you made and checked movement.
   I picked all abort processing with g_source_remove() of 
  services.c just to make sure.
    * I set following abort in four places that carried out 
  g_source_remove
 
            if 
 (g_source_remove(op-opaque-repeat_timer) == 
  FALSE)  
   {
                    abort();
            }
 
 
   As a result, abort still occurred.
 
 
   The problem does not seem to be yet settled by your correction.
 
 
   (gdb) where
   #0  0x7fdd923e1f79 in __GI_raise (sig=sig@entry=6) at 
  ../nptl/sysdeps/unix/sysv/linux/raise.c:56
   #1  0x7fdd923e5388 in __GI_abort () at abort.c:89
   #2  0x7fdd92b9fe77 in crm_abort (file=file@entry=0x7fdd92bd352b 
  logging.c, 
       function=function@entry=0x7fdd92bd48c0 __FUNCTION__.23262 
  crm_glib_handler, line=line@entry=73, 
       assert_condition=assert_condition@entry=0xe20b80 Source ID 
 40 was 
  not found when attempting to remove it, do_core=do_core@entry=1, 
       do_fork=optimized out, do_fork@entry=1) at utils.c:1195
   #3  0x7fdd92bc7ca7 in crm_glib_handler (log_domain=0x7fdd92130b6e 
  GLib, flags=optimized out, 
       message=0xe20b80 Source ID 40 was not found when attempting 
 to 
  remove it, user_data=optimized out) at logging.c:73
   #4  0x7fdd920f2ae1 in g_logv () from 
  /lib/x86_64-linux-gnu/libglib-2.0.so.0
   #5  0x7fdd920f2d72 in g_log () from 
  /lib/x86_64-linux-gnu/libglib-2.0.so.0
   #6  0x7fdd920eac5c in g_source_remove () from 
  /lib/x86_64-linux-gnu/libglib-2.0.so.0
   #7  0x7fdd92984b55 in cancel_recurring_action 
 (op=op@entry=0xe19b90) at 
  services.c:365
   #8  0x7fdd92984bee in services_action_cancel 
 (name=name@entry=0xe1d2d0 
  dummy2, action=optimized out, 
 interval=interval@entry=1)
       at services.c:387
   #9  0x0040405a 

Re: [Pacemaker] Raid RA Changes to Enable ms configuration -- need some assistance plz.

2014-10-13 Thread Andrew Beekhof

On 14 Oct 2014, at 12:58 am, Errol Neal en...@businessgrade.com wrote:

 Andrew Beekhof andrew@... writes:
 
 
 Here is my full pacemaker config:
 
 http://pastebin.com/jw6WTpZz
 
 My understanding is that in order for N to start, N+1 must already 
 be 
 running. So my configuration (to me) reads that the ms_md0 master 
 resource must be started and running before the ms_scst1 resource 
 will 
 be started (as master) and these services will be force on the same 
 node. Please correct me if my understanding is incorrect.
 
 I see only one ordering constraint, and thats between dlm_clone and 
 clvm_clone.
 Colocation != ordering.
 
 Hi Andrew. I'm still learning, so forgive me. 
 Are you saying I have an ordering issue? 
 I'm not following.

Yes. If you want the cluster to start things in a particular order, then you 
need to specify it.

 
 I also have these two lines:

These affect where things go, but not the order in which they are started on 
the node.

 
 colocation ms_md0-ms_scst1 inf: ms_scst1:Master ( ms_md0:Master )
 colocation ms_md1-ms_scst2 inf: ms_scst2:Master ( ms_md1:Master )
 
 
 
 When both 
 nodes are up and running, the master roles are not split so I 
 *think* my 
 configuration is being honored, which leads me to my next issue. 
 
 In my modified RA, I'm not sure I understand how to promote/demote 
 properly. For example, when I put a node on standby, the remaining 
 node 
 doesn't get promoted. I'm not sure why, so I'm asking the experts. 
 
 I'd really appreciate any feedback, advice, etc you folks can give. 
 
 
 This is the real issue IMO. The promotion is not occurring when it 
 should. 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Time out issue while stopping resource in pacemaker

2014-10-13 Thread Lax
Andrew Beekhof andrew@... writes:

 
 One does not imply the other. Stonith is arguably even more important for
2-node clusters.
 Ok, will try it out.

 
  
  One more thing, on another setup with same configuration while running
  pacemaker I keep getting 'gfs_controld[10744]: daemon cpg_join error
  retrying'. Even after I force kill the pacemaker processes and reboot the
  server and bring the pacemaker back up, it keeps giving cpg_join error. Is
  there any way to fix this issue?  
 
 That would be something for the gfs and/or corosync guys I'm afraid

Thanks for your help Andrew, will follow up with them. 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org