We seem to be hitting a similar problem on 2 different machines. When the monthly checkarray script is run, the resync blocks on the raid partition that holds the LVMs.
Both machines have three disks, running raid10 across partitions. As /proc/mdstat shows, it has hit 0K/sec. Smartctl reports no errors for any of the drives. The machines are running squeeze with all packages up to date. /dev/md2 holds an LVM partition that is used for Xen disks. Load average is above 100. The machines run ganeti, with xen-pvm, xen-hvm as the hypervisors and there is some drbd mirroring between these machines, for some of the logical volumes. lvdisplay, vgdisplay, pvdisplay all hang when run. Most of the VMs show very high load as well (through ganglia and snmp reporting) but most are not accessible via ssh or xm console. Is there any other information I can provide to help to debug this? # cat /proc/mdstat Personalities : [raid10] md2 : active raid10 sda3[0] sdc3[2] sdb3[1] 1448908608 blocks super 1.2 64K chunks 2 near-copies [3/3] [UUU] [====>................] check = 23.1% (335167872/1448908608) finish=156694820.0min speed=0K/sec md0 : active raid10 sda1[0] sdc1[2] sdb1[1] 14644736 blocks super 1.2 512K chunks 2 near-copies [3/3] [UUU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> # cat /proc/`pidof mdadm`/status Name: mdadm State: S (sleeping) Tgid: 3298 Pid: 3298 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: VmPeak: 12832 kB VmSize: 12768 kB VmLck: 0 kB VmHWM: 768 kB VmRSS: 604 kB VmData: 364 kB VmStk: 88 kB VmExe: 316 kB VmLib: 1692 kB VmPTE: 48 kB Threads: 1 SigQ: 9/7244 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000002 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff Cpus_allowed: 1 Cpus_allowed_list: 0 Mems_allowed: 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 5739 nonvoluntary_ctxt_switches: 12 [5391047.833632] md: data-check of RAID array md0 [5391047.833636] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [5391047.833639] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [5391047.833644] md: using 128k window, over a total of 14644736 blocks. [5391048.277677] md: delaying data-check of md2 until md0 has finished (they share one or more physical units) [5391235.279026] md: md0: data-check done. [5391235.496633] md: data-check of RAID array md2 [5391235.496638] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [5391235.496641] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [5391235.496647] md: using 128k window, over a total of 1448908608 blocks. [5410976.055527] INFO: task kdmflush:1035 blocked for more than 120 seconds. [5410976.055566] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [5410976.055619] kdmflush D ffff880002818e08 0 1035 2 0x00000000 [5410976.055625] ffff88002e1a5530 0000000000000246 0000000000000002 0000000000000010 [5410976.055633] 0000000000000000 ffff880002d81e80 000000000000f9e0 ffff88003e44dfd8 [5410976.055640] 0000000000015780 0000000000015780 ffff88003ec59530 ffff88003ec59828 [5410976.055647] Call Trace: [5410976.055657] [<ffffffff8100ece2>] ? check_events+0x12/0x20 [5410976.055665] [<ffffffff811804bb>] ? generic_unplug_device+0x0/0x34 [5410976.055680] [<ffffffffa020e6f0>] ? wait_barrier+0x9a/0xd7 [raid10] [5410976.055685] [<ffffffff8104b430>] ? default_wake_function+0x0/0x9 [5410976.055691] [<ffffffff81040e42>] ? check_preempt_wakeup+0x0/0x268 [5410976.055698] [<ffffffffa0210fa2>] ? make_request+0x16f/0x5cd [raid10] [5410976.055703] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 [5410976.055709] [<ffffffff810e81c5>] ? kmem_cache_alloc+0x8c/0xf0 [5410976.055717] [<ffffffffa01f5b9a>] ? md_make_request+0xb6/0xf1 [md_mod] [5410976.055723] [<ffffffff8100eccf>] ? xen_restore_fl_direct_end+0x0/0x1 [5410976.055728] [<ffffffff8117f6b7>] ? generic_make_request+0x299/0x2f9 [5410976.055737] [<ffffffffa021a308>] ? clone_bio+0x44/0xce [dm_mod] [5410976.055745] [<ffffffffa021b5e9>] ? __split_and_process_bio+0x2ac/0x56b [dm_mod] [5410976.055753] [<ffffffffa021ba38>] ? dm_wq_work+0x137/0x167 [dm_mod] [5410976.055760] [<ffffffff810628d3>] ? worker_thread+0x188/0x21d [5410976.055768] [<ffffffffa021b901>] ? dm_wq_work+0x0/0x167 [dm_mod] [5410976.055773] [<ffffffff81065f06>] ? autoremove_wake_function+0x0/0x2e [5410976.055778] [<ffffffff8106274b>] ? worker_thread+0x0/0x21d [5410976.055783] [<ffffffff81065c39>] ? kthread+0x79/0x81 [5410976.055788] [<ffffffff81012baa>] ? child_rip+0xa/0x20 [5410976.055793] [<ffffffff81011d61>] ? int_ret_from_sys_call+0x7/0x1b [5410976.055798] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 [5410976.055803] [<ffffffff81012ba0>] ? child_rip+0x0/0x20 # free total used free shared buffers cached Mem: 1045340 1026512 18828 0 61688 401856 -/+ buffers/cache: 562968 482372 Swap: 3161640 29072 3132568 # mount /dev/md0 on / type ext4 (rw,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) xenfs on /proc/xen type xenfs (rw) fusectl on /sys/fs/fuse/connections type fusectl (rw) # fdisk -l Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x000600d9 Device Boot Start End Blocks Id System /dev/sda1 1 1216 9764864 fd Linux raid autodetect /dev/sda2 1216 1347 1053889+ 82 Linux swap / Solaris /dev/sda3 1348 121601 965940255 fd Linux raid autodetect Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0007482c Device Boot Start End Blocks Id System /dev/sdc1 1 1216 9764864 fd Linux raid autodetect /dev/sdc2 1216 1347 1053889+ 82 Linux swap / Solaris /dev/sdc3 1348 121601 965940255 fd Linux raid autodetect Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0006adc2 Device Boot Start End Blocks Id System /dev/sdb1 1 1216 9764864 fd Linux raid autodetect /dev/sdb2 1216 1347 1053889+ 82 Linux swap / Solaris /dev/sdb3 1348 121601 965940255 fd Linux raid autodetect # uname -a Linux barwon 2.6.32-5-xen-amd64 #1 SMP Thu May 19 01:16:47 UTC 2011 x86_64 GNU/Linux -- Marcus Furlong - VPAC Systems Administrator http://www.vpac.org +61 3 9925 4574 -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org