----- Original Message ----- From: "Brian Kelly" <[EMAIL PROTECTED]> To: <linux-raid@vger.kernel.org> Sent: Thursday, February 23, 2006 1:25 AM Subject: Help Please! mdadm hangs when using nbd or gnbd
> Hail to the Great Linux RAID Gurus! I humbly seek any assistance you > can offer. > > I am building a couple of 20 TB logical volumes from six storage nodes > each offering two 8TB raw storage devices built with Broadcom RAIDCore > BC4852 SATA cards. Each storage node (called leadstor1-6) needs to > publish its two raw devices with iSCSI, nbd or gnbd over a gigabit > network which the head node (leadstor) combines into a RAID 5 volume > using mdadm. > > My problem is that when using nbd or gnbd the original build of the > array on the head node quickly halts, as if a deadlock has occurred. I > have this problem with RAID 1 and RAID 5 configurations regardless of > the size of the storage node published devices. Here's a demonstration > with two 4 TB drives being mirrored using nbd: > > *** Begin Demonstration *** > > [EMAIL PROTECTED] nbd-2.8.3]# uname -a > Linux leadstor.unidata.ucar.edu 2.6.15-1.1831_FC4smp #1 SMP Tue Feb 7 > 13:51:52 EST 2006 x86_64 x86_64 x86_64 GNU/Linux > > >>> I start by preparing the system for nbd and md devices > > [EMAIL PROTECTED] ~]# modprobe nbd > [EMAIL PROTECTED] ~]# cd /dev > [EMAIL PROTECTED] dev]# ./MAKEDEV nb > [EMAIL PROTECTED] dev]# ./MAKEDEV md > > >>> I then mount two 4TB volumes from leadstor5 and leadstor6 > > [EMAIL PROTECTED] dev]# cd /opt/nbd-2.8.3 > [EMAIL PROTECTED] nbd-2.8.3]# ./nbd-client leadstor5 2002 /dev/nb5 > Negotiation: ..size = 3899484160KB > bs=1024, sz=3899484160 > [EMAIL PROTECTED] nbd-2.8.3]# ./nbd-client leadstor6 2002 /dev/nb6 > Negotiation: ..size = 3899484160KB > bs=1024, sz=3899484160 > > >>> I confirm the volumes are mounted properly > > [EMAIL PROTECTED] nbd-2.8.3]# fdisk -l /dev/nb5 > > Disk /dev/nb5: 3993.0 GB, 3993071779840 bytes > 255 heads, 63 sectors/track, 485463 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/nb5 doesn't contain a valid partition table > [EMAIL PROTECTED] nbd-2.8.3]# fdisk -l /dev/nb6 > > Disk /dev/nb6: 3993.0 GB, 3993071779840 bytes > 255 heads, 63 sectors/track, 485463 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > > Disk /dev/nb6 doesn't contain a valid partition table > > >>> I prepare the drives to be used in mdadm > > [EMAIL PROTECTED] nbd-2.8.3]# mdadm -V > mdadm - v1.12.0 - 14 June 2005 > [EMAIL PROTECTED] nbd-2.8.3]# mdadm --zero-superblock /dev/nb5 > [EMAIL PROTECTED] nbd-2.8.3]# mdadm --zero-superblock /dev/nb6 > > >>> I create a device to mirror the two volumes > > [EMAIL PROTECTED] nbd-2.8.3]# mdadm --create /dev/md2 -l 1 -n 2 /dev/nb5 > /dev/nb6 > mdadm: array /dev/md2 started. > > >>> And watch the progress in /proc/mdstat > > [EMAIL PROTECTED] nbd-2.8.3]# date > Wed Feb 22 16:18:55 MST 2006 > [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 nbd6[1] nbd5[0] > 3899484096 blocks [2/2] [UU] > [>....................] resync = 0.0% (1408/3899484096) > finish=389948.2min speed=156K/sec > > md1 : active raid1 sdb3[1] sda3[0] > 78188288 blocks [2/2] [UU] > > md0 : active raid1 sdb1[1] sda1[0] > 128384 blocks [2/2] [UU] > > unused devices: <none> > > >>> But no more has been done a minute later > > [EMAIL PROTECTED] nbd-2.8.3]# date > Wed Feb 22 16:19:49 MST 2006 > [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 nbd6[1] nbd5[0] > 3899484096 blocks [2/2] [UU] > [>....................] resync = 0.0% (1408/3899484096) > finish=2599655.1min speed=23K/sec > > md1 : active raid1 sdb3[1] sda3[0] > 78188288 blocks [2/2] [UU] > > md0 : active raid1 sdb1[1] sda1[0] > 128384 blocks [2/2] [UU] > > unused devices: <none> > > >>> And later still, no more of the resync has been done > > [EMAIL PROTECTED] nbd-2.8.3]# date > Wed Feb 22 16:20:38 MST 2006 > [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 nbd6[1] nbd5[0] > 3899484096 blocks [2/2] [UU] > [>....................] resync = 0.0% (1408/3899484096) > finish=4679379.2min speed=13K/sec > > md1 : active raid1 sdb3[1] sda3[0] > 78188288 blocks [2/2] [UU] > > md0 : active raid1 sdb1[1] sda1[0] > 128384 blocks [2/2] [UU] > > unused devices: <none> > > >>> At this point, the resync is stuck and the system is idle. I have > left it overnight, but it progresses no further. 100% of the time this > test will stop at 1408 on the rebuild. With other configurations, the > number will change (for example, it was 1280 for a 6 column RAID 5), but > always halt at the same spot. > > >>> Nothing is logged in the system files > > [EMAIL PROTECTED] nbd-2.8.3]# tail -15 /var/log/messages > Feb 22 15:48:35 leadstor kernel: parport: PnPBIOS parport detected. > Feb 22 15:48:35 leadstor kernel: parport0: PC-style at 0x378, irq 7 [PCSPP] > Feb 22 15:48:35 leadstor kernel: lp0: using parport0 (interrupt-driven). > Feb 22 15:48:35 leadstor kernel: lp0: console ready > Feb 22 15:48:37 leadstor fstab-sync[2585]: removed all generated mount > points > Feb 22 16:01:00 leadstor sshd(pam_unix)[3000]: session opened for user > root by root(uid=0) > Feb 22 16:06:10 leadstor kernel: nbd: registered device at major 43 > Feb 22 16:07:43 leadstor sshd(pam_unix)[3199]: session opened for user > root by root(uid=0) > Feb 22 16:18:51 leadstor kernel: md: bind<nbd5> > Feb 22 16:18:51 leadstor kernel: md: bind<nbd6> > Feb 22 16:18:51 leadstor kernel: raid1: raid set md2 active with 2 out > of 2 mirrors > Feb 22 16:18:51 leadstor kernel: md: syncing RAID array md2 > Feb 22 16:18:51 leadstor kernel: md: minimum _guaranteed_ reconstruction > speed: 1000 KB/sec/disc. > Feb 22 16:18:51 leadstor kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for reconstruction. > Feb 22 16:18:51 leadstor kernel: md: using 128k window, over a total of > 3899484096 blocks. > > >>> And one last check of the rebuild > > [EMAIL PROTECTED] nbd-2.8.3]# date > Wed Feb 22 16:33:50 MST 2006 > [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 nbd6[1] nbd5[0] > 3899484096 blocks [2/2] [UU] > [>....................] resync = 0.0% (1408/3899484096) > finish=38994826.8min speed=1K/sec > > md1 : active raid1 sdb3[1] sda3[0] > 78188288 blocks [2/2] [UU] > > md0 : active raid1 sdb1[1] sda1[0] > 128384 blocks [2/2] [UU] > > unused devices: <none> > > >>> Now if I try to abort the build, the command also hangs > > [EMAIL PROTECTED] nbd-2.8.3]# mdadm --misc --stop /dev/md2 > * never returns* > > >>> But I can connect with another shell and poke around > > Last login: Wed Feb 22 16:07:43 2006 from robin.unidata.ucar.edu > [EMAIL PROTECTED] ~]# ps -eaf | grep md > root 578 9 0 15:48 ? 00:00:00 [md1_raid1] > root 579 9 0 15:48 ? 00:00:00 [md0_raid1] > root 2181 1 0 15:48 ? 00:00:00 mdadm --monitor --scan > -f --pid-file=/var/run/mdadm/mdadm.pid > root 2258 1 0 15:48 ? 00:00:00 [krfcommd] > root 3298 9 0 16:18 ? 00:00:00 [md2_raid1] > root 3299 9 0 16:18 ? 00:00:00 [md2_resync] > root 3384 3049 0 16:35 pts/1 00:00:00 mdadm --misc --stop /dev/md2 > root 3426 3399 0 16:37 pts/2 00:00:00 grep md > > >>> But all the md2 processes are wedged and can not be killed > > [EMAIL PROTECTED] ~]# kill -9 3298 3299 3384 > [EMAIL PROTECTED] ~]# ps -eaf | grep md > root 578 9 0 15:48 ? 00:00:00 [md1_raid1] > root 579 9 0 15:48 ? 00:00:00 [md0_raid1] > root 2181 1 0 15:48 ? 00:00:00 mdadm --monitor --scan > -f --pid-file=/var/run/mdadm/mdadm.pid > root 2258 1 0 15:48 ? 00:00:00 [krfcommd] > root 3298 9 0 16:18 ? 00:00:00 [md2_raid1] > root 3299 9 0 16:18 ? 00:00:00 [md2_resync] > root 3384 3049 0 16:35 pts/1 00:00:00 mdadm --misc --stop /dev/md2 > root 3431 3399 0 16:38 pts/2 00:00:00 grep md > > >>> So, to get rid of these processes I reboot the system and have to > power down the box since the shutdown process stops when unloading > iptables or md > > >>> The head node is running Fedora Core 4 with the latest 2.6.15smp > kernel since it was mentioned that some deadlock issues were fixed > there. It is running two Opteron CPUs at 1600MHz and has 2GB of RAM. > The storage nodes are FC4 2.6.14 but with a single CPU and 1 GB of RAM. > All systems are using nbd-2.8.3, but the problem systems are the same > when using gnbd in Red Hat's GFS cluster software. The systems > interconnect with a dedicated gigabit copper network. > > *** End Demonstration *** > > This problem seems to exist on both single and multi-threaded kernels. > When I repeat the procedure, but on one of the uni-processor systems, > the resync gets further but still hangs. Here's where it hung on leadstor1: > > [EMAIL PROTECTED] nbd-2.8.3]# uname -a > Linux leadstor1.unidata.ucar.edu 2.6.14-1.1653_FC4 #1 Tue Dec 13 > 21:34:16 EST 2005 x86_64 x86_64 x86_64 GNU/Linux > [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat > Personalities : [raid1] > md2 : active raid1 nbd6[1] nbd5[0] > 3899484096 blocks [2/2] [UU] > [>....................] resync = 0.0% (1409024/3899484096) > finish=1936.4min speed=33548K/sec > > md1 : active raid1 sdb3[1] sda3[0] > 78188288 blocks [2/2] [UU] > > md0 : active raid1 sdb1[1] sda1[0] > 128384 blocks [2/2] [UU] > > unused devices: <none> > > In addition to nbd and gnbd, I have used iSCSI to mount the storage > node's volumes. With iet-0.4.12b and open-iscsi-1.0-485, mdadm worked > well. I'm trying other solutions because the head node would always > crash before getting through a rebuild which I suspect is a problem of > open-iscsi, the hardware or both. I was also hoping mdadm would handle > failures better when using native block devices. > > I've spent the last few days trying different combinations to pinpoint > the problem, but configuration seems to make no difference. Any FC4 > system trying to RAID 1 or RAID 5 any size nbd volumes from any > system(s) will hang. However, and array built without ndb works fine. > > So, I would like to get this nbd/mdadm configuration working, but I am > uncertain where best to look next. I would think it best to determine > where this hang is happening, but my code and kernel debugging skills > are not the best. Would anyone have suggestions on good tests for me to > run or where else I should look? > > My thanks in advance and my apologies if I'm missing something blatantly > obvious. > > Brian Hello, I have use a similar system, and i have some ideas: The general nbd deadlock is fixed on 2.6.16 series! The head node is X86_64 system, or 32 bit? Please try this system with 1.99TB nbd devices, and let me know, it is works? (I use my system like this: nbd-server 1230 /dev/md0 2097000 ) Check this if the sync is stoped: 1. ps fax | grep nbd-client 2. dd if=/dev/nbX of=/dev/null bs=1M count=1 (or more) And dmesg messages after dd! 3. make sure about network package lost. Cheers, Janos > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html