[DRBD-user] Help with ping timeout

Alastair Battrick Fri, 26 Apr 2013 04:50:22 -0700

Hello

We are running DRBD 8.3.12 in a dual primary system. On top of the 3DRBD resources we run CLVM, and KVM virtual machines running from these.Setup of the cluster followed Alteve's tutorial

https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

We have 5 virtual machines, 2 of which are Windows Server 2008 (one isSBS 2011), the others linux. All run fine, as far as I can tell, most ofthe time.

The problem we have is when the SBS2011 guest VM is restarted. This didnot happen when the server was first installed, but the last few rebootshas done.


DRBD/KVM Host 1
Apr 25 21:24:42 oberon kernel: block drbd2: sock was shut down by peer

Apr 25 21:24:42 oberon kernel: block drbd2: peer( Primary -> Unknown )conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )Apr 25 21:24:42 oberon kernel: block drbd2: short read expecting headeron sock: r=0

Apr 25 21:24:42 oberon kernel: block drbd2: asender terminated
Apr 25 21:24:42 oberon kernel: block drbd2: Terminating asender thread
(Host 1 is STONITHed at this point)

DRBD/Host 2
Apr 25 21:24:42 titania kernel: block drbd2: PingAck did not arrive in time.

Apr 25 21:24:42 titania kernel: block drbd2: peer( Primary -> Unknown )conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0-> 1 )

Apr 25 21:24:42 titania kernel: block drbd2: asender terminated
Apr 25 21:24:42 titania kernel: block drbd2: Terminating asender thread
Apr 25 21:24:42 titania kernel: block drbd2: Connection closed

Apr 25 21:24:42 titania kernel: block drbd2: conn( NetworkFailure ->Unconnected )


Host 2 continues, brings up the 2 VMs sucessfully, etc.

I assume the ping not arriving in time to host 2 causes the socket toshut down on host 1?

The ping time out is the default 5/10'th sec. Why is it timing out whenthis guest VM is rebooted?

The 2 host servers are have a dedicated Intel 10 Gigabit AT2 adaptor forDRBD.

I have a feeling this may have started after when the guest Windows VMhad more memory assigned, from about 15Gb to 20Gb, and I wonder ifWindows is writing some large memory dump when rebooting which pushesDRBD's replication too far?

Simply upping the ping timeout seems like the wrong solution, but is theonly thing I can think of. Any suggestions welcome.


Cheers
Alastair Battrick
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] Help with ping timeout

Reply via email to