On Thu, Jun 11, 2009 at 07:30:44PM -0400, Bill McGonigle wrote:
> I've got a problem I can reproduce easily enough, but really I fail to 
> understand what's going wrong.
> 
> I've got a 5.3 Dom0, which is running three guests.  One is Fedora 10, 
> that runs with local flat files, and works fine.  One is Nexenta 2 
> (opensolaris-based), and that runs off of physical partitions, and seems 
> to work great.  The third runs Fedora 11 and and has for its disks, 
> iSCSI devices that are exported from Nexenta (ZFS-backed).
> 

So the iSCSI target is running on domU, using the physical partitions as ZFS
and exporting them as iSCSI LUNs ?

> I have the Dom0 mapping the two iSCSI devices, one for /boot and one for 
> /.  They're showing up initially as /dev/sdc and /dev/sdd.
> 
> If I go after the iSCSI devices in Dom0, with dd, for instance, they 
> work fine all day. I can read and write the entire devices to and from 
> local files without error.  iSCSI seems to work properly in that regard. 
>  I'm getting about 38MB/s.  I've scrubbed the disk pool and no errors 
> were found and long SMART self-test passed on each of the disks.
> 
> So, I specify those devices in the Xen config for the domU (tried both 
> real device name and /dev/disk/by-path/ names) and the DomU boots and 
> operates as I'd expect.  Installation worked fine and typical operations 
> (low volume) work.  However, then I try to do something, which I'm 
> assuming is more disk intensive, like running a yum update, and iSCSI 
> seems to fall over.
> 
> In the DomU, I'll see a lock-up, and then filesystem errors.  e.g.:
> 
>   Installing     : kernel [############################################ 
> ]  1/33EXT3-fs error (device xvda1) in ext3_ordered_writepage: IO failure
> 
> In the Dom0, I'll see:
> 
>   sd 6:0:0:0: timing out command, waited 360s
>   sd 6:0:0:0: SCSI error: return code = 0x06050000
>   end_request: I/O error, dev sdc, sector 37319
>   sd 7:0:0:0: timing out command, waited 360s
>   sd 7:0:0:0: SCSI error: return code = 0x06000000
>   end_request: I/O error, dev sdd, sector 29792137
>   sd 7:0:0:0: timing out command, waited 360s
>   sd 7:0:0:0: SCSI error: return code = 0x06000000
>   end_request: I/O error, dev sdd, sector 29792313
> 
> Both (all) iSCSI devices are failed.  Under iostat I see activity to the 
> iSCSI block devices, and the whole machine acts mostly I/O blocked (even 
> the Fedora 10 DomU running on flat files will start throwing nagios into 
> a tizzy).  If I do 'service iscsi stop' everything picks right back up 
> (though the DomU using them as its disks is obviously unhappy).
> 
> When I start iscsi again I can pick right back up (after repairing 
> filesystems in the DomU), and I can repeat the process at will. 
> Sometimes the disks will come back as, e.g. sdd and sde, leaving me to 
> think something still has a handle on sdc.  But lsof shows nothing in dom0.
> 
> One thing that stood out were some of the block and sector number errors 
> being right on power of two boundries:
> 
>  scsi 7:0:0:0: SCSI error: return code = 0x00010000
>  end_request: I/O error, dev sdd, sector 32768
>  Buffer I/O error on device sdd, logical block 4096
>  Buffer I/O error on device sdd, logical block 4097
>  Buffer I/O error on device sdd, logical block 4098
>  Buffer I/O error on device sdd, logical block 4099
>  Buffer I/O error on device sdd, logical block 4100
>  Buffer I/O error on device sdd, logical block 4101
>  Buffer I/O error on device sdd, logical block 4102
>  Buffer I/O error on device sdd, logical block 4103
>  Buffer I/O error on device sdd, logical block 4104
>  Buffer I/O error on device sdd, logical block 4105
>  scsi 7:0:0:0: rejecting I/O to dead device
> 
> but as I opened with, I'm sort of at as loss as to what is actually 
> causing the problem.  Any suggestions for further troubleshooting and/or 
> ideas about what's happening appreciated.
> 

Try testing the iSCSI LUNs from dom0 first. Run some real benchmarks on the
LUNs. 

ltp disktest for example. Or create some filesystem (ext3) on them and run
some file-based benchmarks (to generate real load).

Do you have errors on the iSCSI target? 

-- Pasi

_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to