On Wed, Dec 10, 2014 at 11:16:07AM +0100, Christoph Mitasch wrote: > Hi, > > we recently had an issue with a stacked DRBD device (8.3.16) that > started to block IO after switching from Ahead to SyncSource. > ko-count is set to 6. > > Dec 8 03:35:08 node2 kernel: [668315.119697] block drbd20: helper command: > /sbin/drbdadm before-resync-source minor-20 exit code 0 (0x0) > Dec 8 03:35:08 node2 kernel: [668315.119706] block drbd20: conn( Ahead -> > SyncSource ) pdsk( Consistent -> Inconsistent ) > Dec 8 03:35:08 node2 kernel: [668315.119716] block drbd20: ASSERT( !(remote > && send_oos) ) in /var/lib/dkms/drbd/8.3.16/build/drbd/drbd_req.c:1001 > Dec 8 03:35:08 node2 kernel: [668315.119729] block drbd20: Began resync as > SyncSource (will sync 216 KB [54 bits set]). > Dec 8 03:35:08 node2 kernel: [668315.120419] block drbd20: updated sync UUID > 024B346E4B84E12B:86C8E56E6CD2BBDC:9D97BCB66EBE838D:3E5876F017C7CDBD > Dec 8 03:35:49 node2 kernel: [668356.840611] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.865459] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.903126] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:35:49 node2 kernel: [668356.930498] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:00 node2 kernel: [668367.006241] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:00 node2 kernel: [668367.030987] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:10 node2 kernel: [668377.249395] block drbd20: cs:SyncSource > rs_left=55 > rs_total=54 (rs_failed 0) > Dec 8 03:36:13 node2 kernel: [668380.608957] block drbd20: Remote failed to > finish a request within ko-count * timeout > Dec 8 03:36:13 node2 kernel: [668380.632397] block drbd20: peer( Secondary > -> Unknown ) conn( SyncSource -> Timeout ) > Dec 8 03:36:13 node2 kernel: [668380.632440] block drbd20: error receiving > CsumRSRequest, l: 44! > Dec 8 03:36:13 node2 kernel: [668380.645119] block drbd20: asender terminated > Dec 8 03:36:13 node2 kernel: [668380.645131] block drbd20: Terminating > drbd20_asender > Dec 8 03:37:32 node2 kernel: [668459.482874] INFO: task jbd2/dm-4-8:9503 > blocked for more than 120 seconds. > Dec 8 03:37:32 node2 kernel: [668459.494628] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Dec 8 03:37:32 node2 kernel: [668459.518881] jbd2/dm-4-8 D > ffffffff81806240 0 9503 2 0x00000000 > Dec 8 03:37:32 node2 kernel: [668459.518888] ffff881017c57ac0 > 0000000000000046 ffff881017c57a60 ffffffff8103ec29 > Dec 8 03:37:32 node2 kernel: [668459.542394] ffff881017c57fd8 > ffff881017c57fd8 ffff881017c57fd8 00000000000137c0 > Dec 8 03:37:32 node2 kernel: [668459.565602] ffff8810197b4500 > ffff88100a612e00 ffff881017c57a90 ffff88207fcb4080 > Dec 8 03:37:32 node2 kernel: [668459.588736] Call Trace: > > Is this a known problem and fixed in DRBD 8.4?
Probably? I think I remember something about fixing state handling getting stuck in "Timeout". Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user