[Linux-HA] Antw: Re: TOTEM retransmit list broken?

Ulrich Windl Wed, 15 Jan 2014 23:19:21 -0800

>>> Digimer <[email protected]> schrieb am 15.01.2014 um 19:38 in Nachricht
<[email protected]>:
> On 15/01/14 05:55 AM, Ulrich Windl wrote:
>> Hi!
>>
>> I'm wondering what is going on with SLES11 SP3 and cluster communication: 
> cLVM with mirroring seems to bring cluster communication down. I had this 
> problem in SP2, and support told me in SP3 (which was not available at that 
> time) thing should be better. Now I have SP3, but things aren't better.
>> Maybe it's time to take care of the problems; one by one.
>>
>> The cluster seems to have a longish retransmit list, but the list seems 
> inconsistent by itself: The same items appear over and over (which would 
> indicate no transmission is possible), but then some items in the list 
> change, (which seems to indicating that some transfer must have succeeded). 
> But looking that the list as a total, I cannot make any sense from it.
>>
>> See yourself:
>> [...]
>> Jan 15 08:55:29 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:30 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:30 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:30 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:31 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:31 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:32 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:32 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:32 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:33 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:33 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> Jan 15 08:55:34 o5 corosync[13636]:  [TOTEM ] Retransmit List: d253 d255 
> d257 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d252 d254 d256
>> Jan 15 08:55:34 o5 corosync[13636]:  [TOTEM ] Retransmit List: d252 d254 
> d256 d241 d242 d243 d244 d245 d246 d247 d248 d249 d24a d24b d24c d24d d24e 
> d24f d250 d251 d253 d255 d257
>> [...]
>>
>> In between these messages I see some device-mapper messages, usure whether 
> thats the culprit or the victim:
>> Jan 15 08:55:11 o5 kernel: [  758.400071] device-mapper: dm-log-userspace: 
> [35cRCORE] Request timed out: [15/186129] - retrying
>> Jan 15 08:55:13 o5 kernel: [  760.300067] device-mapper: dm-log-userspace: 
> [35cRCORE] Request timed out: [9/186130] - retrying
>> Jan 15 08:55:28 o5 kernel: [  775.300067] device-mapper: dm-log-userspace: 
> [35cRCORE] Request timed out: [9/186132] - retrying
>>
>> I hope these messages don't try to tell me that there are 186 thousand 
> requests pending ;-)
>> Despite of that the message above could benefit from an actual device name 
> being included in the messages.
>>
>> Regards,
>> Ulrich
> 
> I can't speak to Suse, but I've seen this with RHEL 6.1 when there was a 
> short-lived bug caused by the hardware not being fast enough (sorry, the 
> details are fuzzy), which was fixed. The reason I say this is that, not 
> knowing the SP2/SP3 issue, my first thought is to look at the network stack.
> 
> Can you elaborate on how 'clvmd with mirroring' triggers this? I've used 
> clvmd a lot, but I've never looked at mirroring in LVM (though I know 
> it's possible).
> 
> If possible, can you share your cluster.conf (and crm configure show if 
> using pacemaker, too)?


The cluster configuration is a mess, because I had to disable most things 
because of the OCFS problem describes yesterday; anyway I try to present some 
details:
---
# grep -v '^ *$' /etc/corosync/corosync.conf | grep -v '^[    ]*#'
aisexec {
        group:  root
        user:   root
}
service {
        use_mgmtd:      yes
        use_logd:       yes
        ver:    0
        name:   pacemaker
}
totem {
        rrp_mode:       passive
        window_size:    100
        join:   60
        max_messages:   20
        vsftype:        none
        token:  5000
        consensus:      6000
        secauth:        on
        token_retransmits_before_loss_const:    10
        threads:        4
        transport:      udp
        version:        2
        interface {
                bindnetaddr:    172.20.3.0
                mcastaddr:      239.192.3.9
                mcastport:      5405
                ringnumber:     0
        }
        interface {
                mcastaddr:      239.192.3.109
                mcastport:      5405
                bindnetaddr:    192.168.0.0
                ringnumber:     1
        }
        clear_node_high_bit:    yes
}
logging {
        to_logfile:     no
        to_syslog:      yes
        debug:  off
        timestamp:      off
        to_stderr:      no
        fileline:       off
        syslog_facility:        daemon
}
amf {
        mode:   disabled
}
---
 # gzip -9 </tmp/config | uuencode config.gz
begin 644 config.gz
M'XL(`(:%UU("`^U:6V_;-A1^7GX%H>VA`<98-U_DP2^]8D"##N@0#,,`@9)H
MAP@EJB3EQ/OU.Y1D6Y+E3G4;U`6,IH8MG<MWSD>1ASS*1$*1<-`_5S\5FG'V
M+]%,9*C0/)0D75C.Q+;*7W%>+"P??H`DT5JRJ-!4H2>:A2E;R5(M9#EH3-T;
MU[Z93FX<D%::9$FT65@BLZZRTIG[/,Z<7F?^\SCSFLZ6RZVW\?-X\SNAY9*E
M3+,U1;E,PU=_OGX9JB!`(E[.[RF1.J)$S\UEXR0G`$"A6"=1*&DLUE1N0B[B
MAX4UBI=J9+XJ9Z1(&I&1D;HQ5ZQ*(24965$5/K(L8EFRL#)1(A>Y020U8IFF
M<DWXPH*0-$NI*/3"\B;V7@P^^J4<=R>5BHQI(1N"GMTGFE)-(&@F)-.0C*E?
M7^_FO,RR8S7RW\W:Z_>W9<)R$M.4/%`YCT6FI>#)@/B"KPYOTI2<]`87S,!-
M!_7[N]LPOGO7H1JN-IA>"[Z2>492NK!`UOI6\02#V0KZX['M@W@^N*]>EL'`
M?^7.A1M'YY/^V0'<&#*=E'CY.G7G,7PFC<PGA(*KL!?G<T9SD/TN[*4*[VY5
M9]2\99RJC=(T;89`URR&@3."+Z.$J8=1M,$L&24I-B,*W]WB-ZZ%$@93"8#8
MU),(6+?04NE-#KHEEQ82N7D<U<(B,?^U4%2&3V;"^S[CL2<CU<3W+9+R\9Y(
MFN`W3@@*NB<]E:L?,T/NM\R0^]D,N3]<AE(F)2A]:89@6A[!7.(<Y@$,_E!)
M8'E/Z?'['R1)I-N(OE'0>#?C`*H+EL@PHSHE"@H1=P)7>`$IDV#PGJC[A:5$
M(6/*<EQ]R874%F)+3B(*N,#IH$)D2/S>T(7".P@?K(+&?:BB!-7?Y_0)E#/"
M1^;B/@7P*_RRB<3Y;=CCY!R'[\RJ&EQJG%!.-E6PS[%`'5GS@W+\$+FB&D-Q
M!;%_!!LY3;J9-`4P479W/OZ+9HT</J5+9FR,J(Y'H#%:IR.C9"%3?YO!N"NI
M%U:WI!XV7H8-F/V(*1W04(OALDLIT@'2AV2Z?556?^GK-6K?<1\S,]\8D[1Z
MNK#2+'Y@&54PS3CF#N%</.(:\L+2LJ`'1$(>CQ+IGD*D>R'R[(CT3B'2NQ!Y
M=D3ZIQ#I7X@\.R+'IQ`YOA!Y=D1.3B%R<B'R[(B<GD+D]$+DV1$Y.X7(V87(
MLR,R.(7(X$+DN1'IV"<0:1J/%R*_'Y$Q%QE%,<_VO<E6HW+K?,5%!&8WN,C8
MIP(L+`E78+6,D%.R'NS'=/.V7;VM]4,KW=[:7G_;5VOVV`;9<1RG::?L9^T:
M6P.1S%I0JA[3OMLTS$C'1MWP:39_CMOI/YYK6ZN;)9W>R6";!X0UV@N=;L/7
MX:P/Y#OG\Z?;K,^W&T?=1P=O97!H*B#:ZA&-.<-U=Q^W#D);/\"K+#A%O[!D
M8355S-66GH4PRY9S]'-ACHH1_61>2P!9E,!SB[A&EFL['K8GV+61X\]M;^[-
M_OY_0&[[=&\P(+<'D-\':(J=,7*\N>W,_2&`O/8IU6!`W@D9&D\&`/+;IRV#
M`?D]@)S/`H(_9P"@<?O48#"@\0F4C=T!@&;M6GLPH-DI@((!@()VS3@84-`#
MR.T'Y`;(#N:V/?=M`!2+/23!JR6QG/A0::ZU9K;GWJXJ%$)F7C<2E>X+=&SZ
M<-&QQ\9'QX;+I+V_/49BT*[5KIO+SV<0FXGY'%%7ZT47>+D0E[7%CJ1J;:Y+
MCP/YNW>UP$Y^6U3L=+LZU>I2DK_7JI><YJBX$C*A$L%G=:D6Z1\]U<VN2F>\
M-=;S(XX@Q*J@V>F87&SKG8Y@)_"M9!7T7O3#J[<?PV4#1NG@11^B1G'0+&XZ
M2_YUPWA%&.1;]3%6<]&4KTR:$=K.B_%S%B.T`Q9B/@1;)^+L\.Y3OH78_UCL
MG[]<BIQ*O:DG8A;A2`BMM"0YKM^&*+<P28S75"KX#9N4&_AGX\DXBF936MZN
M7RDP,[4DH%W$NI#FK3A.E&(Q`B<98<#P(]/W*.?%BF77I29]RFD,=1K^5`A9
MI'@M-%7;C5-.P6)>:*RH9%3AE#S5>Z3J)C4Q'+OY2&36?X^;-Q%I!F:UV6*M
MH*YO;.5*&<"M,9<IEG0).[5[LZ^;!5,[&#O3:D]'3/V9D0QV<"GL@'=;*KA7
3OY^`=_O%V<2Z^@_U5XFT`"P`````
`
end
---
 # crm_mon -1Arf
Last updated: Thu Jan 16 08:11:26 2014
Last change: Wed Jan 15 14:34:52 2014 by root via crm_attribute on o2
Stack: classic openais (with plugin)
Current DC: o2 - partition with quorum
Version: 1.1.10-65bb87e
4 Nodes configured, 5 expected votes
51 Resources configured


Node o1: standby
Node o2: standby
Node o5: pending
OFFLINE: [ o4 ]

Full list of resources:

 Clone Set: cln_O2CB [prm_O2CB]
     Stopped: [ o1 o2 o4 o5 ]
 prm_stonith_sbd        (stonith:external/sbd): Stopped
 Clone Set: cln_DLM [prm_DLM]
     Stopped: [ o1 o2 o4 o5 ]
 Clone Set: cln_cLVMd [prm_cLVMd]
     Stopped: [ o1 o2 o4 o5 ]
 Clone Set: cln_LVM_cVG [prm_LVM_cVG]
     Stopped: [ o1 o2 o4 o5 ]
[...]
---
The problem with cLVM mirroring is (besides excessive communication and 
logging) that there is no persistent "mirror write cache" (bitmap of changed 
blocks) for a two-device solution (as opposed to MD-RAID (which is not usable 
for shared devices)). So the few hundred of GB in the volume are resynced all 
the time. As the disks are iSCSI, that takes at least one hour. During that 
time simple commands like "lvs" (only one cluster LV) can take as long as two 
minutes. Without cLVM there is no disk performance issue.

Regards,
Ulrich

> 
> digimer
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/ 
> What if the cure for cancer is trapped in the mind of a person without 
> access to education?
> _______________________________________________
> Linux-HA mailing list
> [email protected] 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Antw: Re: TOTEM retransmit list broken?

Reply via email to