On Wed, Dec 10, 2014 at 11:16:07AM +0100, Christoph Mitasch wrote:
> Hi,
> 
> we recently had an issue with a stacked DRBD device (8.3.16) that
> started to block IO after switching from Ahead to SyncSource.
> ko-count is set to 6.
> 
> Dec  8 03:35:08 node2 kernel: [668315.119697] block drbd20: helper command: 
> /sbin/drbdadm before-resync-source minor-20 exit code 0 (0x0)
> Dec  8 03:35:08 node2 kernel: [668315.119706] block drbd20: conn( Ahead -> 
> SyncSource ) pdsk( Consistent -> Inconsistent ) 
> Dec  8 03:35:08 node2 kernel: [668315.119716] block drbd20: ASSERT( !(remote 
> && send_oos) ) in /var/lib/dkms/drbd/8.3.16/build/drbd/drbd_req.c:1001
> Dec  8 03:35:08 node2 kernel: [668315.119729] block drbd20: Began resync as 
> SyncSource (will sync 216 KB [54 bits set]).
> Dec  8 03:35:08 node2 kernel: [668315.120419] block drbd20: updated sync UUID 
> 024B346E4B84E12B:86C8E56E6CD2BBDC:9D97BCB66EBE838D:3E5876F017C7CDBD
> Dec  8 03:35:49 node2 kernel: [668356.840611] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:35:49 node2 kernel: [668356.865459] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:35:49 node2 kernel: [668356.903126] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:35:49 node2 kernel: [668356.930498] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:36:00 node2 kernel: [668367.006241] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:36:00 node2 kernel: [668367.030987] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:36:10 node2 kernel: [668377.249395] block drbd20: cs:SyncSource 
> rs_left=55 > rs_total=54 (rs_failed 0)
> Dec  8 03:36:13 node2 kernel: [668380.608957] block drbd20: Remote failed to 
> finish a request within ko-count * timeout
> Dec  8 03:36:13 node2 kernel: [668380.632397] block drbd20: peer( Secondary 
> -> Unknown ) conn( SyncSource -> Timeout ) 
> Dec  8 03:36:13 node2 kernel: [668380.632440] block drbd20: error receiving 
> CsumRSRequest, l: 44!
> Dec  8 03:36:13 node2 kernel: [668380.645119] block drbd20: asender terminated
> Dec  8 03:36:13 node2 kernel: [668380.645131] block drbd20: Terminating 
> drbd20_asender
> Dec  8 03:37:32 node2 kernel: [668459.482874] INFO: task jbd2/dm-4-8:9503 
> blocked for more than 120 seconds.
> Dec  8 03:37:32 node2 kernel: [668459.494628] "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  8 03:37:32 node2 kernel: [668459.518881] jbd2/dm-4-8     D 
> ffffffff81806240     0  9503      2 0x00000000
> Dec  8 03:37:32 node2 kernel: [668459.518888]  ffff881017c57ac0 
> 0000000000000046 ffff881017c57a60 ffffffff8103ec29
> Dec  8 03:37:32 node2 kernel: [668459.542394]  ffff881017c57fd8 
> ffff881017c57fd8 ffff881017c57fd8 00000000000137c0
> Dec  8 03:37:32 node2 kernel: [668459.565602]  ffff8810197b4500 
> ffff88100a612e00 ffff881017c57a90 ffff88207fcb4080
> Dec  8 03:37:32 node2 kernel: [668459.588736] Call Trace:
> 
> Is this a known problem and fixed in DRBD 8.4?

Probably?

I think I remember something about
fixing state handling getting stuck in "Timeout".

        Lars

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to