On 3/4/11 12:38 PM, William Seligman wrote: > I've RTFM'ed and google'd on this problem. Now I ask the experts.
Now that I've joined this list, I looked at the archives directly. I see that Cory Coager reported the same problem: http://lists.linbit.com/pipermail/drbd-user/2011-March/015735.html Lars Ellenberg suggested that the problem was due to a bad NIC. Maybe... but what are the odds that two different systems have a bad NIC? > Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific > Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5. > > Each has two partitions that are used for separate DRBD devices: /dev/md0 > (software RAID1) and /dev/sdd2. On both systems: > > partition /dev/md0 => device drbd1 > partition /dev/sdd2 => device drbd2 > > The DRBD traffic goes over a single Ethernet cable that connects the two > systems. > > For drbd1, the control heirarchy is Corosync->DRBD->LVM->Xen. > For drbd2, the control is Corosync->DRBD->Just mount the thing. > > The complicated one is drbd1, but it seems to work just fine. The problem > appears to be with drbd2, which doesn't do much of anything; it's a > work/backup > directory which I use to take infrequent (~two months) snapshots of the > virtual > machines on drbd1. > > Every ten seconds, the error messages at the end of this post appear in the > log > of the primary system and there are similar lines on the secondary system. It > seems that drbd2 is losing its connection, re-establishing, and doing a > re-sync. > > Everything works, most of the time. But once every few weeks there's enough > of a > delay that Corosync takes notice and STONITHs one of the systems, which is a > big > pain. > > I've tried: > - switching from Protocol C to Protocol A > - setting "net {ping-timeout 100;}" > - throttling the connection by "syncer {rate 10M;}" (used to be 100M) > > Any ideas? > > Mar 4 12:26:25 hypatia kernel: block drbd2: meta connection shut down by > peer. > Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Secondary -> Unknown ) > conn( > Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > Mar 4 12:26:25 hypatia kernel: block drbd2: asender terminated > Mar 4 12:26:25 hypatia kernel: block drbd2: Terminating asender thread > Mar 4 12:26:25 hypatia kernel: block drbd2: sock was shut down by peer > Mar 4 12:26:25 hypatia kernel: block drbd2: short read expecting header on > sock: r=0 > Mar 4 12:26:25 hypatia kernel: block drbd2: Creating new current UUID > Mar 4 12:26:25 hypatia kernel: block drbd2: Connection closed > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( NetworkFailure -> > Unconnected ) > Mar 4 12:26:25 hypatia kernel: block drbd2: receiver terminated > Mar 4 12:26:25 hypatia kernel: block drbd2: Restarting receiver thread > Mar 4 12:26:25 hypatia kernel: block drbd2: receiver (re)started > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( Unconnected -> > WFConnection ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Handshake successful: Agreed > network protocol version 94 > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFConnection -> > WFReportParams ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Starting asender thread (from > drbd2_receiver [7920]) > Mar 4 12:26:25 hypatia kernel: block drbd2: data-integrity-alg: <not-used> > Mar 4 12:26:25 hypatia kernel: block drbd2: drbd_sync_handshake: > Mar 4 12:26:25 hypatia kernel: block drbd2: self > C4884637D2C418DF:922772A0478F5E1F:2DE51139CD7C3DF7:EB27F748FC21DC65 bits:0 > flags:0 > Mar 4 12:26:25 hypatia kernel: block drbd2: peer > 922772A0478F5E1E:0000000000000000:2DE51139CD7C3DF6:EB27F748FC21DC65 bits:0 > flags:0 > Mar 4 12:26:25 hypatia kernel: block drbd2: uuid_compare()=1 by rule 70 > Mar 4 12:26:25 hypatia kernel: block drbd2: peer( Unknown -> Secondary ) > conn( > WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( WFBitMapS -> SyncSource ) > pdsk( UpToDate -> Inconsistent ) > Mar 4 12:26:25 hypatia kernel: block drbd2: Began resync as SyncSource (will > sync 0 KB [0 bits set]). > Mar 4 12:26:25 hypatia kernel: block drbd2: Resync done (total 1 sec; paused > 0 > sec; 0 K/sec) > Mar 4 12:26:25 hypatia kernel: block drbd2: conn( SyncSource -> Connected ) > pdsk( Inconsistent -> UpToDate ) > -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://[email protected] PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
