Package: linux-image-3.2.0-0.bpo.1-amd64 Version: 3.2.1-2~bpo60+1 Severity: grave
Actually, all kernels from 2.6.32 to (at least) 3.2.1 seem to be affected. I run two identical machines: Debian6 AMD64, raid1 software mirror for data disks, on /dev/mdx is lvm2 configured. The /dev/mapper devices are synced to the other machine using drbd. These drb-devices are used for virtual machines. Sometimes, one machine degrades. Apparently the mdX_resync process and blkback collide. Until now, I only observed this on Windows HVM used disk devices. Details: A DomU running Windows Server 2003 (usually) or 2008R2 Server (rare) (both with GPLPV, but different versions) get stuck accessing their harddisk provided by xen from the drb device. Other linux domUs on the same machine continue to work. The dom0 kern.log shows that the blkback driver is blocked for more than 120 seconds. The windows domU can't be stopped regularly with xm shutdown or xm destroy. Graceful shutdown of the xen host isn't possible either; only "echo b > /proc/sysrq-trigger" is possible. This problem reappears every few weeks, usually in the night Saturday to Sunday. The Windows domUs worked previously on older xen hosts with no problem, then were moved to the current machine pair with Debian 6 with Xen; all versions from debian-stable i.e. kernel 2.6.32 and xen 4.0. After the problem arose, I gradually updated the versions (xen 4.0->4.1, then kernel 2.6.32->3.1->3.2). Currently, both machines run linux-image-3.2.0-0.bpo.1-amd64 and xen-hypervisor-4.1-amd64 4.1.1-1 incl xen-tools-4.1; problem persists. Today I realized that the main trigger for the failure seems the periodic check from /etc/cron.d/mdadm, which checks on the first day of a month at 0:57. Last month, the check didn't trigger the error, but yesterday it hit me again after 58d uptime. For now, I hot-fixed this by setting /etc/default/mdadm to autocheck=false, but other access patterns seem to trigger the failure too. Here's a snippet from the kern.log; this pattern will reapperar every 2 minutes until hard-resetted. May 6 01:06:24 lady kernel: [4979042.044157] INFO: task blkback.12.hdb:16636 blocked for more than 120 seconds. May 6 01:06:24 lady kernel: [4979042.044255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 6 01:06:24 lady kernel: [4979042.044345] blkback.12.hdb D ffff880002c817e0 0 16636 2 0x00000000 May 6 01:06:24 lady kernel: [4979042.044355] ffff880002c817e0 0000000000000246 ffff880000000000 ffffffff8160d020 May 6 01:06:24 lady kernel: [4979042.044366] 0000000000013540 ffff880007dbffd8 ffff880007dbffd8 0000000000013540 May 6 01:06:24 lady kernel: [4979042.044375] ffff880002c817e0 ffff880007dbe010 ffffffff81013949 0000000107cacc78 May 6 01:06:24 lady kernel: [4979042.044385] Call Trace: May 6 01:06:24 lady kernel: [4979042.044398] [<ffffffff81013949>] ? sched_clock+0x5/0x8 May 6 01:06:24 lady kernel: [4979042.044423] [<ffffffffa0103673>] ? wait_barrier+0x94/0xcd [raid1] May 6 01:06:24 lady kernel: [4979042.044432] [<ffffffff81045e84>] ? try_to_wake_up+0x190/0x190 May 6 01:06:24 lady kernel: [4979042.044441] [<ffffffffa0104a94>] ? make_request+0x11d/0x1689 [raid1] May 6 01:06:24 lady kernel: [4979042.044454] [<ffffffffa010cc7b>] ? __split_and_process_bio+0x520/0x532 [dm_mod] May 6 01:06:24 lady kernel: [4979042.044463] [<ffffffff810068e5>] ? xen_force_evtchn_callback+0x9/0xa May 6 01:06:24 lady kernel: [4979042.044469] [<ffffffff81006f92>] ? check_events+0x12/0x20 May 6 01:06:24 lady kernel: [4979042.044481] [<ffffffffa00d7a9b>] ? md_make_request+0xbe/0x1b1 [md_mod] May 6 01:06:24 lady kernel: [4979042.044490] [<ffffffff8135ab25>] ? _raw_spin_unlock_irqrestore+0x10/0x11 May 6 01:06:24 lady kernel: [4979042.044498] [<ffffffff811a807e>] ? generic_make_request+0x8e/0xcd May 6 01:06:24 lady kernel: [4979042.044504] [<ffffffff811a8196>] ? submit_bio+0xd9/0xf7 May 6 01:06:24 lady kernel: [4979042.044512] [<ffffffff8112caf5>] ? bio_alloc_bioset+0x44/0xb3 May 6 01:06:24 lady kernel: [4979042.044520] [<ffffffffa0410a5b>] ? dispatch_rw_block_io+0x49a/0x546 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044527] [<ffffffff810068e5>] ? xen_force_evtchn_callback+0x9/0xa May 6 01:06:24 lady kernel: [4979042.044533] [<ffffffff81006f92>] ? check_events+0x12/0x20 May 6 01:06:24 lady kernel: [4979042.044539] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044546] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044552] [<ffffffff81004299>] ? xen_mc_flush+0x12b/0x158 May 6 01:06:24 lady kernel: [4979042.044558] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044565] [<ffffffffa0410d57>] ? __do_block_io_op+0x250/0x276 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044573] [<ffffffffa041112a>] ? xen_blkif_schedule+0x302/0x3d0 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044580] [<ffffffff810636c5>] ? wake_up_bit+0x20/0x20 May 6 01:06:24 lady kernel: [4979042.044587] [<ffffffffa0410e28>] ? print_stats+0x95/0x95 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044593] [<ffffffff81063289>] ? kthread+0x7a/0x82 May 6 01:06:24 lady kernel: [4979042.044600] [<ffffffff81362274>] ? kernel_thread_helper+0x4/0x10 May 6 01:06:24 lady kernel: [4979042.044607] [<ffffffff81360333>] ? int_ret_from_sys_call+0x7/0x1b May 6 01:06:24 lady kernel: [4979042.044613] [<ffffffff8135aebc>] ? retint_restore_args+0x5/0x6 May 6 01:06:24 lady kernel: [4979042.044618] [<ffffffff81362270>] ? gs_change+0x13/0x13 May 6 01:06:24 lady kernel: [4979042.044623] INFO: task blkback.12.hdd:16637 blocked for more than 120 seconds. May 6 01:06:24 lady kernel: [4979042.044714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 6 01:06:24 lady kernel: [4979042.044804] blkback.12.hdd D ffff880005384300 0 16637 2 0x00000000 May 6 01:06:24 lady kernel: [4979042.044811] ffff880005384300 0000000000000246 ffff880000000000 ffffffff8160d020 May 6 01:06:24 lady kernel: [4979042.044821] 0000000000013540 ffff8800061f7fd8 ffff8800061f7fd8 0000000000013540 May 6 01:06:24 lady kernel: [4979042.044830] ffff880005384300 ffff8800061f6010 ffff8800053846b8 00000001061f6000 May 6 01:06:24 lady kernel: [4979042.044840] Call Trace: May 6 01:06:24 lady kernel: [4979042.044848] [<ffffffffa0103673>] ? wait_barrier+0x94/0xcd [raid1] May 6 01:06:24 lady kernel: [4979042.044854] [<ffffffff81045e84>] ? try_to_wake_up+0x190/0x190 May 6 01:06:24 lady kernel: [4979042.044862] [<ffffffffa0104a94>] ? make_request+0x11d/0x1689 [raid1] May 6 01:06:24 lady kernel: [4979042.044869] [<ffffffff8112caf5>] ? bio_alloc_bioset+0x44/0xb3 May 6 01:06:24 lady kernel: [4979042.044879] [<ffffffffa010cc7b>] ? __split_and_process_bio+0x520/0x532 [dm_mod] May 6 01:06:24 lady kernel: [4979042.044885] [<ffffffff811a7ffe>] ? generic_make_request+0xe/0xcd May 6 01:06:24 lady kernel: [4979042.044895] [<ffffffffa00d7a9b>] ? md_make_request+0xbe/0x1b1 [md_mod] May 6 01:06:24 lady kernel: [4979042.044901] [<ffffffff811a807e>] ? generic_make_request+0x8e/0xcd May 6 01:06:24 lady kernel: [4979042.044907] [<ffffffff811a8196>] ? submit_bio+0xd9/0xf7 May 6 01:06:24 lady kernel: [4979042.044913] [<ffffffff8112caf5>] ? bio_alloc_bioset+0x44/0xb3 May 6 01:06:24 lady kernel: [4979042.044920] [<ffffffffa0410a5b>] ? dispatch_rw_block_io+0x49a/0x546 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044927] [<ffffffff8136239b>] ? xen_hypervisor_callback+0x1b/0x20 May 6 01:06:24 lady kernel: [4979042.044933] [<ffffffff810068e5>] ? xen_force_evtchn_callback+0x9/0xa May 6 01:06:24 lady kernel: [4979042.044939] [<ffffffff81006f92>] ? check_events+0x12/0x20 May 6 01:06:24 lady kernel: [4979042.044945] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044952] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044957] [<ffffffff81004299>] ? xen_mc_flush+0x12b/0x158 May 6 01:06:24 lady kernel: [4979042.044963] [<ffffffff81006f7f>] ? xen_restore_fl_direct_reloc+0x4/0x4 May 6 01:06:24 lady kernel: [4979042.044971] [<ffffffffa0410d57>] ? __do_block_io_op+0x250/0x276 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044978] [<ffffffffa041112a>] ? xen_blkif_schedule+0x302/0x3d0 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044984] [<ffffffff810636c5>] ? wake_up_bit+0x20/0x20 May 6 01:06:24 lady kernel: [4979042.044991] [<ffffffffa0410e28>] ? print_stats+0x95/0x95 [xen_blkback] May 6 01:06:24 lady kernel: [4979042.044997] [<ffffffff81063289>] ? kthread+0x7a/0x82 May 6 01:06:24 lady kernel: [4979042.045002] [<ffffffff81362274>] ? kernel_thread_helper+0x4/0x10 May 6 01:06:24 lady kernel: [4979042.045008] [<ffffffff81360333>] ? int_ret_from_sys_call+0x7/0x1b May 6 01:06:24 lady kernel: [4979042.045014] [<ffffffff8135aebc>] ? retint_restore_args+0x5/0x6 May 6 01:06:24 lady kernel: [4979042.045020] [<ffffffff81362270>] ? gs_change+0x13/0x13 May 6 01:06:24 lady kernel: [4979042.045031] INFO: task md4_resync:8724 blocked for more than 120 seconds. May 6 01:06:24 lady kernel: [4979042.045102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 6 01:06:24 lady kernel: [4979042.045195] md4_resync D ffff8800088f69a0 0 8724 2 0x00000000 May 6 01:06:24 lady kernel: [4979042.045203] ffff8800088f69a0 0000000000000246 ffff880000000000 ffff880014cbf560 May 6 01:06:24 lady kernel: [4979042.045213] 0000000000013540 ffff880001a9dfd8 ffff880001a9dfd8 0000000000013540 May 6 01:06:24 lady kernel: [4979042.045222] ffff8800088f69a0 ffff880001a9c010 dead000000100100 0000000100200200 May 6 01:06:24 lady kernel: [4979042.045231] Call Trace: May 6 01:06:24 lady kernel: [4979042.045239] [<ffffffffa010356e>] ? raise_barrier+0x126/0x15b [raid1] May 6 01:06:24 lady kernel: [4979042.045245] [<ffffffff81045e84>] ? try_to_wake_up+0x190/0x190 May 6 01:06:24 lady kernel: [4979042.045253] [<ffffffffa0103c15>] ? sync_request+0x19b/0x71c [raid1] May 6 01:06:24 lady kernel: [4979042.045265] [<ffffffffa00d8e74>] ? md_do_sync+0x78a/0xb98 [md_mod] May 6 01:06:24 lady kernel: [4979042.045272] [<ffffffff810636c5>] ? wake_up_bit+0x20/0x20 May 6 01:06:24 lady kernel: [4979042.045282] [<ffffffffa00d9507>] ? md_thread+0x105/0x123 [md_mod] May 6 01:06:24 lady kernel: [4979042.045292] [<ffffffffa00d9402>] ? md_rdev_init+0xea/0xea [md_mod] May 6 01:06:24 lady kernel: [4979042.045298] [<ffffffff81063289>] ? kthread+0x7a/0x82 May 6 01:06:24 lady kernel: [4979042.045303] [<ffffffff81362274>] ? kernel_thread_helper+0x4/0x10 May 6 01:06:24 lady kernel: [4979042.045310] [<ffffffff81360333>] ? int_ret_from_sys_call+0x7/0x1b May 6 01:06:24 lady kernel: [4979042.045315] [<ffffffff8135aebc>] ? retint_restore_args+0x5/0x6 May 6 01:06:24 lady kernel: [4979042.045321] [<ffffffff81362270>] ? gs_change+0x13/0x13 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org