[Pacemaker] Dual-primary DRBD problem: Promoted 0 instances of a possible 2 to master

2011-08-02 Thread Matt Anderson

Hi!

Sorry for the repost, but the links in my previous message expired.
Now these new ones shouldn't do that. I also added the DC's log at the end
of this message.

I've been trying to make a simple HA cluster with 3 servers (the 3rd server
is there only to maintain quorum if one node fails). The idea is to run two
virtual domains over dedicated DRBD devices in dual-primary mode (so that
live migration would be possible).

Things worked well for a while, but somewhere during my tests something
went wrong and now the DRBD devices don't get promoted to primary mode by
pacemaker. Pacemaker just keeps starting and stopping the devices in a loop.
If I start DRBD from the init script, both devices are started and
automaticly synced. At first I had this problem only with one device, but
now it's the same with both devices under pacemaker.

Pacemaker and DRBD write a lot of logs [1] [2] [3] (these are made when I
try to start ms_drbd_www2, but I don't see a reason why pacemaker doesn't
promote any masters.

My guess is that this has something to do with my fencing rules in DRBD [4]
or then just in my pacemaker config [5]. I used to have STONITH enabled, but
since my STONITH devices share the power supply with the server, I've then
removed those settings from my pacemaker config.

I'm running Debian squeeze on amd64 with pacemaker (1.0.11-1~bpo60+1) and
corosync (1.3.0-3~bpo60+1) from backports.

Any ideas what's wrong and how to fix it?


[1] http://paste.debian.net/124836/ (DRBD log from on node)

[2] http://paste.debian.net/124838/ (pacemaker log from the same node as above)

[3] http://paste.debian.net/124839/ (pacemaker log from DC) 

[4] http://paste.debian.net/124845/ (DRBD common config)

[5] http://paste.debian.net/124846/ (pacemaker config)

Pacemaker log from DC [3]:

Jul 28 22:28:01 s3-1 cibadmin: [10292]: info: Invoked: cibadmin -Ql -o 
resources 
Jul 28 22:28:01 s3-1 cibadmin: [10295]: info: Invoked: cibadmin -p -R -o 
resources 
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: -   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: -   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: 
-   
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: -   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: -   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: - 
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: +   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: +   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: 
+   
Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: +   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: +   

Jul 28 22:28:01 s3-1 cib: [21918]: info: log_data_element: cib:diff: + 
Jul 28 22:28:01 s3-1 cib: [21918]: info: cib_process_request: Operation 
complete: op cib_replace for section resources (origin=local/cibadmin/2, 
version=0.440.1): ok (rc=0)
Jul 28 22:28:01 s3-1 crmd: [21922]: info: abort_transition_graph: need_abort:59 
- Triggered transition abort (complete=1) : Non-status change
Jul 28 22:28:01 s3-1 crmd: [21922]: info: need_abort: Aborting on change to 
admin_epoch
Jul 28 22:28:01 s3-1 crmd: [21922]: info: do_state_transition: State transition 
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Jul 28 22:28:01 s3-1 crmd: [21922]: info: do_state_transition: All 3 cluster 
nodes are eligible to run resources.
Jul 28 22:28:01 s3-1 crmd: [21922]: info: do_pe_invoke: Query 1845: Requesting 
the current CIB: S_POLICY_ENGINE
Jul 28 22:28:01 s3-1 crmd: [21922]: info: do_pe_invoke_callback: Invoking the 
PE: query=1845, ref=pe_calc-dc-1311881281-3699, seq=190040, quorate=1
Jul 28 22:28:01 s3-1 pengine: [21921]: info: unpack_config: Node scores: 'red' 
= -INFINITY, 'yellow' = 0, 'green' = 0
Jul 28 22:28:01 s3-1 pengine: [21921]: info: determine_online_status: Node s3 
is online
Jul 28 22:28:01 s3-1 pengine: [21921]: info: determine_online_status: Node s1 
is online
Jul 28 22:28:01 s3-1 pengine: [21921]: info: determine_online_stat

Re: [Pacemaker] "crm resource restart" does not work on the DC node with crmd-transtion-delay="2s"

2011-08-02 Thread NAKAHIRA Kazutomo

Hi, Andrew

(2011/08/01 12:13), Andrew Beekhof wrote:

2011/7/27 NAKAHIRA Kazutomo:

Hi, all

I configured crmd-transition-delay="2s" to address the following problem.

  http://www.gossamer-threads.com/lists/linuxha/pacemaker/68504
  http://developerbugs.linux-foundation.org/show_bug.cgi?id=2528

And then, "crm resource restart" command get become less able to
restart any resources on the DC node.
# "crm resource restart" works fine on the non-DC node.
# Please see attached hb_report generated on the simple environment.

How can I use "crm resource restart" command on the DC node
with crmd-transtion-delay="2s"?


Sounds like the shell isn't waiting long enough.


I understood that it is hard to resolve this problem by the 
configuration and we need to fix crm shell. Would that be about right?


If so, I made a patch for crm shell that wait a crmd-transition-delay 
before checking DC node status.


Please see attached patch.

Best regards,





I confirmed that I can avoid this problem by the following procedure
  1. "crm resource stop rsc-ID"
  2. wait crmd-transtion-delay(2) scond
  3. "crm resource start rsc-ID"
but this behavior(restart does not works on the DC node)
may be confuse users.

Best regards,

--
NAKAHIRA Kazutomo
Infrastructure Software Technology Unit
NTT Open Source Software Center

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





--
NAKAHIRA Kazutomo
Infrastructure Software Technology Unit
NTT Open Source Software Center
# HG changeset patch
# User NAKAHIRA Kazutomo 
# Date 1312274729 -32400
# Branch stable-1.0
# Node ID 2b4a64c1bb737cfce61b1eaef0dca31d903d9b2e
# Parent  db98485d06ed3fe0fe236509f023e1bd4a5566f1
shell: crm_msec is deemed desirable to be located in the utils.py

diff -r db98485d06ed -r 2b4a64c1bb73 shell/modules/ra.py.in
--- a/shell/modules/ra.py.in	Fri May 06 13:47:43 2011 +0200
+++ b/shell/modules/ra.py.in	Tue Aug 02 17:45:29 2011 +0900
@@ -224,37 +224,6 @@
 depth = find_value(pl, "depth") or '0'
 role = find_value(pl, "role")
 return mk_monitor_name(role,depth)
-def crm_msec(t):
-'''
-See lib/common/utils.c:crm_get_msec().
-'''
-convtab = {
-'ms': (1,1),
-'msec': (1,1),
-'us': (1,1000),
-'usec': (1,1000),
-'': (1000,1),
-'s': (1000,1),
-'sec': (1000,1),
-'m': (60*1000,1),
-'min': (60*1000,1),
-'h': (60*60*1000,1),
-'hr': (60*60*1000,1),
-}
-if not t:
-return -1
-r = re.match("\s*(\d+)\s*([a-zA-Z]+)?", t)
-if not r:
-return -1
-if not r.group(2):
-q = ''
-else:
-q = r.group(2).lower()
-try:
-mult,div = convtab[q]
-except:
-return -1
-return (int(r.group(1))*mult)/div
 def crm_time_cmp(a, b):
 return crm_msec(a) - crm_msec(b)
 
diff -r db98485d06ed -r 2b4a64c1bb73 shell/modules/utils.py
--- a/shell/modules/utils.py	Fri May 06 13:47:43 2011 +0200
+++ b/shell/modules/utils.py	Tue Aug 02 17:45:29 2011 +0900
@@ -199,6 +199,38 @@
 s = get_stdout(add_sudo(cmd), stderr_on)
 return s.split('\n')
 
+def crm_msec(t):
+'''
+See lib/common/utils.c:crm_get_msec().
+'''
+convtab = {
+'ms': (1,1),
+'msec': (1,1),
+'us': (1,1000),
+'usec': (1,1000),
+'': (1000,1),
+'s': (1000,1),
+'sec': (1000,1),
+'m': (60*1000,1),
+'min': (60*1000,1),
+'h': (60*60*1000,1),
+'hr': (60*60*1000,1),
+}
+if not t:
+return -1
+r = re.match("\s*(\d+)\s*([a-zA-Z]+)?", t)
+if not r:
+return -1
+if not r.group(2):
+q = ''
+else:
+q = r.group(2).lower()
+try:
+mult,div = convtab[q]
+except:
+return -1
+return (int(r.group(1))*mult)/div
+
 def wait4dc(what = "", show_progress = True):
 '''
 Wait for the DC to get into the S_IDLE state. This should be
# HG changeset patch
# User NAKAHIRA Kazutomo 
# Date 1312275597 -32400
# Branch stable-1.0
# Node ID 422551903526667c380e39f1712b38f3c2b8f0a6
# Parent  2b4a64c1bb737cfce61b1eaef0dca31d903d9b2e
shell: waits crmd-transition-delay before DC status checking

diff -r 2b4a64c1bb73 -r 422551903526 shell/modules/utils.py
--- a/shell/modules/utils.py	Tue Aug 02 17:45:29 2011 +0900
+++ b/shell/modules/utils.py	Tue Aug 02 17:59

[Pacemaker] Backup ring is marked faulty

2011-08-02 Thread Sebastian Kaps

Hi,

we're running a two-node cluster with redundant rings.
Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB 
interfaces that are bonded in
active-backup mode and routed through two independent switches for each 
node. The ring 1 network
is our "normal" 1G LAN and should only be used in case the direct 10G 
connection should fail.
I often (once a day on average, I'd guess) see that ring 1 (an only 
that one) is marked as

FAULTY without any obvious reasons.

Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Retransmit List: c76 
c7a c7c c7e c80 c82 c84

Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Retransmit List: c82
Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Marking seqid 568416 
ringid 1 interface x.y.z.1 FAULTY - administrative intervention 
required.


Whenever I see this, I check if the other node's address can be pinged 
(I never saw any
connectivity problems there), then reenable the ring with 
"corosync-cfgtool -r" and

everything looks ok for a while (i.e. hours or days).

How could I find out why this happens?
What do these "Retransmit List" or seqid (sequence id, I assume?) 
values tell me?
Is it safe to reenable the second ring when the partner node can be 
pinged successfully?


The totem section on our config looks like this:

totem {
   rrp_mode:   passive
   join:   60
   max_messages:   20
   vsftype:none
   consensus:  1
   secauth:on
   token_retransmits_before_loss_const:10
   threads:16
   token:  1
   version:2
   interface {
   bindnetaddr:192.168.1.0
   mcastaddr:  239.250.1.1
   mcastport:  5405
   ringnumber: 0
   }
   interface {
   bindnetaddr:x.y.z.0
   mcastaddr:  239.250.1.2
   mcastport:  5415
   ringnumber: 1
   }
   clear_node_high_bit:yes
}

--
Sebastian

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Backup ring is marked faulty

2011-08-02 Thread Steven Dake
Which version of corosync?

On 08/02/2011 07:35 AM, Sebastian Kaps wrote:
> Hi,
> 
> we're running a two-node cluster with redundant rings.
> Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB
> interfaces that are bonded in
> active-backup mode and routed through two independent switches for each
> node. The ring 1 network
> is our "normal" 1G LAN and should only be used in case the direct 10G
> connection should fail.
> I often (once a day on average, I'd guess) see that ring 1 (an only that
> one) is marked as
> FAULTY without any obvious reasons.
> 
> Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Retransmit List: c76
> c7a c7c c7e c80 c82 c84
> Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Retransmit List: c82
> Aug  2 08:56:15 node02 corosync[5752]:  [TOTEM ] Marking seqid 568416
> ringid 1 interface x.y.z.1 FAULTY - administrative intervention required.
> 
> Whenever I see this, I check if the other node's address can be pinged
> (I never saw any
> connectivity problems there), then reenable the ring with
> "corosync-cfgtool -r" and
> everything looks ok for a while (i.e. hours or days).
> 
> How could I find out why this happens?
> What do these "Retransmit List" or seqid (sequence id, I assume?) values
> tell me?
> Is it safe to reenable the second ring when the partner node can be
> pinged successfully?
> 
> The totem section on our config looks like this:
> 
> totem {
>rrp_mode:   passive
>join:   60
>max_messages:   20
>vsftype:none
>consensus:  1
>secauth:on
>token_retransmits_before_loss_const:10
>threads:16
>token:  1
>version:2
>interface {
>bindnetaddr:192.168.1.0
>mcastaddr:  239.250.1.1
>mcastport:  5405
>ringnumber: 0
>}
>interface {
>bindnetaddr:x.y.z.0
>mcastaddr:  239.250.1.2
>mcastport:  5415
>ringnumber: 1
>}
>clear_node_high_bit:yes
> }
> 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker