On Thu, Jun 11, 2009 at 07:30:44PM -0400, Bill McGonigle wrote: > I've got a problem I can reproduce easily enough, but really I fail to > understand what's going wrong. > > I've got a 5.3 Dom0, which is running three guests. One is Fedora 10, > that runs with local flat files, and works fine. One is Nexenta 2 > (opensolaris-based), and that runs off of physical partitions, and seems > to work great. The third runs Fedora 11 and and has for its disks, > iSCSI devices that are exported from Nexenta (ZFS-backed). >
So the iSCSI target is running on domU, using the physical partitions as ZFS and exporting them as iSCSI LUNs ? > I have the Dom0 mapping the two iSCSI devices, one for /boot and one for > /. They're showing up initially as /dev/sdc and /dev/sdd. > > If I go after the iSCSI devices in Dom0, with dd, for instance, they > work fine all day. I can read and write the entire devices to and from > local files without error. iSCSI seems to work properly in that regard. > I'm getting about 38MB/s. I've scrubbed the disk pool and no errors > were found and long SMART self-test passed on each of the disks. > > So, I specify those devices in the Xen config for the domU (tried both > real device name and /dev/disk/by-path/ names) and the DomU boots and > operates as I'd expect. Installation worked fine and typical operations > (low volume) work. However, then I try to do something, which I'm > assuming is more disk intensive, like running a yum update, and iSCSI > seems to fall over. > > In the DomU, I'll see a lock-up, and then filesystem errors. e.g.: > > Installing : kernel [############################################ > ] 1/33EXT3-fs error (device xvda1) in ext3_ordered_writepage: IO failure > > In the Dom0, I'll see: > > sd 6:0:0:0: timing out command, waited 360s > sd 6:0:0:0: SCSI error: return code = 0x06050000 > end_request: I/O error, dev sdc, sector 37319 > sd 7:0:0:0: timing out command, waited 360s > sd 7:0:0:0: SCSI error: return code = 0x06000000 > end_request: I/O error, dev sdd, sector 29792137 > sd 7:0:0:0: timing out command, waited 360s > sd 7:0:0:0: SCSI error: return code = 0x06000000 > end_request: I/O error, dev sdd, sector 29792313 > > Both (all) iSCSI devices are failed. Under iostat I see activity to the > iSCSI block devices, and the whole machine acts mostly I/O blocked (even > the Fedora 10 DomU running on flat files will start throwing nagios into > a tizzy). If I do 'service iscsi stop' everything picks right back up > (though the DomU using them as its disks is obviously unhappy). > > When I start iscsi again I can pick right back up (after repairing > filesystems in the DomU), and I can repeat the process at will. > Sometimes the disks will come back as, e.g. sdd and sde, leaving me to > think something still has a handle on sdc. But lsof shows nothing in dom0. > > One thing that stood out were some of the block and sector number errors > being right on power of two boundries: > > scsi 7:0:0:0: SCSI error: return code = 0x00010000 > end_request: I/O error, dev sdd, sector 32768 > Buffer I/O error on device sdd, logical block 4096 > Buffer I/O error on device sdd, logical block 4097 > Buffer I/O error on device sdd, logical block 4098 > Buffer I/O error on device sdd, logical block 4099 > Buffer I/O error on device sdd, logical block 4100 > Buffer I/O error on device sdd, logical block 4101 > Buffer I/O error on device sdd, logical block 4102 > Buffer I/O error on device sdd, logical block 4103 > Buffer I/O error on device sdd, logical block 4104 > Buffer I/O error on device sdd, logical block 4105 > scsi 7:0:0:0: rejecting I/O to dead device > > but as I opened with, I'm sort of at as loss as to what is actually > causing the problem. Any suggestions for further troubleshooting and/or > ideas about what's happening appreciated. > Try testing the iSCSI LUNs from dom0 first. Run some real benchmarks on the LUNs. ltp disktest for example. Or create some filesystem (ext3) on them and run some file-based benchmarks (to generate real load). Do you have errors on the iSCSI target? -- Pasi _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
