> > I have about 40 drbd devices per node (primary and secondaries). Our > > provider > > has lot of network issues, which sometimes cause drbd to > > disconnect/reconnect > > very often : about 500 NetworkFailure in 1 hour before the last crash : > > # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00" > > 483 > > So you are using DRBD with ganeti in a cloud? > Which cloud? what do you mean by which cloud ? > The most interessting line is before that. > > > Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2 > > > Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: > > drbd0_worker Tainted: G W 2.6.30-2-amd64 #1 X8STi > > Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: > > 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9 > > Hard out of memory? > did you google for "2.6.30 cache_alloc_refill", > and checked that you are not affected by any of those?
Yep, but there is not lot of things. We may suppose that, because of the lot of NetworkFailure / Reconnection, the system do not flush memory fast enough so that, when the network/drbd driver asks for memory, it fails, and the driver deactivates itself (especially if we are in some special context, like IRQ) ? Maxence -- Maxence DUNNEWIND Contact : [email protected] Site : http://www.dunnewind.net GPG : 18AE 61E4 D0B0 1C7C AAC9 E40D 4D39 68DB 0D2E B533
signature.asc
Description: Digital signature
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
