Re: [Qemu-devel] scsi-generic and max request size
On 12/21/2010 11:05 PM, Benjamin Herrenschmidt wrote: So back to square 1 ... my vscsi (and virtio-blk too btw) can technically pass a max size to the guest, but we don't have a way to interrogate scsi-generic (and the underlying block driver) which is the main issue (that plus the fact that the ioctl seems to be broken in compat mode for /dev/sg specifically)... Ah, the warm and fuzzy feeling knowing to be not alone in this ... This is basically the same issue I brought up with the first submission round of my megasas emulation. heh. As we're passing scatter-gather lists directly to the underlying device we might end up sending a request which is improperly formatted. The linux block layer has three limits onto which a request has to be formatted: - Max length of the scatter-gather list (max_sectors) - Max overall request size (max_segments) Didn't you swap the 2 above ? max_sectors is the max overall req. size and max_segments the max number of SG elements afaik :-) Yeah, could be. 'twas only meant for illustration anyway. - Max length of individual sg elements (max_segment_size) newer kernels export these limits; they have been exported with commit c77a5710b7e23847bfdb81fcaa10b585f65c960a. For older kernels, however, we're being left in the dark here. Well, first of all, sg is not there so that doesn't help with the scsi-generic problem much, then parsing sysfs... yuck. Well, sort of. 'sg' doesn't have any block queue limits directly as the block queue is attached to the block device (surprise, surprise :-). But nevertheless any commands send via SG_IO are being placed on the block queue, hence the same limits apply here, too. So on newer kernel we probably could be doing a quick check on the block queue limits and reformat the I/O if required. Maybe but then, sg isn't there. We could I suppose use sr as an indication tho when we know it's a cdrom. If it were me I would be using Instead of reformatting we could be sendiong each element of an eg list individually. Thereby we would be introducing some slowdown as the sg lists have to be reassembled again by the lower layers, but we would be insulated from any sg list mismatch. However, this won't cover requests with too large sg elements. For those we could probably use some simple divide-by-two algorithm on the element to make them fit. How can we ? We need a single request to match a single sg list anyways no ? Yes, true. That's what I was trying to illustrate here. Let's say you get a READ10 from the guest for 200Kb and your underlying max_sectors is 128Kb. How do you want to break that up ? The only way would be to make it two different READ10's and that's a can of worms especially if you start putting tags into the picture... Precisely. Hence I didn't try to implement anything in that area :-) But seeing we have to split the I/O requests anyway we might as well use the divide-by-two algorithm for the sg lists, too. Easiest would be if we could just transfer the available bits and push the request back to the guest as a partial completion. Sadly the I/O stack on the guest will choose to interpret this as an I/O error instead of retrying the remainder :-( So in the long run I fear we have to implement some sort of I/O request splitting in Qemu, using the values from sysfs. So in my case, I'm happy for the time being to continue doing bounce buffering and so my only problem at the moment is the max request size (aka max_sectors). Also I -can- tell the guest what my limitation is, it's part of the vscsi login protocol. I can look into doing DMA directly to the guest SG lists later maybe. However, I can't quite figure out how to reliably obtain that information in my driver since on one hand, the ioctl doesn't seem to work in mixed 32/64-bit environments, and on the other hand, sysfs doesn't seem to have anything for sg in /sys/class/block... Besides, those are both Linux-isms... so we'd have to be extra careful there too. Yes. I've been bashing my head against this, too. IMO the whole problem arises from the fact that we're deliberately destroying information here. Most modern HBAs are using separate codepaths for streaming/block I/O anyway, but when using 'scsi-generic' we are forced to discard this information. We have to fake a SCSI READ/WRITE command, and send it via SG_IO to the underlying device and keep fingers crossed that we're not exceeding any device limitations. The whole problem would just go away if we could use the standard block read()/write() calls here. Then the iovec would be placed _as scatter-gather list_ on the request-queue and the block layer would take care of the whole issue. I've tried to advocate this approach once, but (again) was being told that it's a misuse of scsi-generic and I should be using scsi-disk instead. However, since Alex Graf is facing similar problems with the AHCI HBA of his maybe we could retry again ...
Re: [Qemu-devel] scsi-generic and max request size
On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote: Most modern HBAs are using separate codepaths for streaming/block I/O anyway, That's not true at all. Every normal HBA justs passes normal SCSI commands to the SCSI targets. It's just raid adapters that take special commands, and the megaraid one is extremly special as it actually emulates a few SCSI commands even in RAID mode, which almost no other HBA does. Strictly speaking we should not allow scsi-generic with megaraid_sas, except for the separate passthrough channels that the real hardware has for things like tape drives. However, since Alex Graf is facing similar problems with the AHCI HBA of his maybe we could retry again ... AHCI is a ATA adapter, and should never be used with scsi-generic for disks. Only for the ATAPI-attached cdroms/tapes/etc it could be used, although it's quite pointless.
Re: [Qemu-devel] scsi-generic and max request size
On Wed, 2010-12-22 at 14:54 +0100, Hannes Reinecke wrote: Well, sort of. 'sg' doesn't have any block queue limits directly as the block queue is attached to the block device (surprise, surprise :-). But nevertheless any commands send via SG_IO are being placed on the block queue, hence the same limits apply here, too. Right, tho is there a simple way to map sg to the appropriate block driver to retreive the info via sysfs ? I looks possible from a quick peek there but it also looks like an ungodly mess. If it were me I would be using I think you meant to type more here :-) However, I can't quite figure out how to reliably obtain that information in my driver since on one hand, the ioctl doesn't seem to work in mixed 32/64-bit environments, and on the other hand, sysfs doesn't seem to have anything for sg in /sys/class/block... Besides, those are both Linux-isms... so we'd have to be extra careful there too. Yes. I've been bashing my head against this, too. Christoph, any suggestion there ? IMO the whole problem arises from the fact that we're deliberately destroying information here. Most modern HBAs are using separate codepaths for streaming/block I/O anyway, but when using 'scsi-generic' we are forced to discard this information. We have to fake a SCSI READ/WRITE command, and send it via SG_IO to the underlying device and keep fingers crossed that we're not exceeding any device limitations. I wouldn't say it like that no. It's a transport problem. In my case I'm not faking anything, vscsi is just a transport (a variant of SRP). The problem is that when 'emulating' a HW HBA, you have no way to express the intrinsic limitations of the underlying HBA, but that's not a problem I have with vscsi which is meant to be a transport and as such does have means to convey that sort of information (tho in my case, I have some issues due to assumptions/bugs in the existing ibm vscsi client driver but that's a different topic). So I think there's a significant difference here between emulating a HW HBA and doing something like vscsi. The former has problems that cannot be easily solved I believe. The later problems on the other hands can be solved, the means to do so are there, but we have to deal with interface issues ... plumbing problems. The non working compat ioctl is one, the fact that sg has no /sys/class/block (or /sys/block) entries is another, etc... Ie, we are faced with a problem with Linux not exposing those informations in an easy to retrieve way, and no proper cross-platform way to obtain those informations neither. The whole problem would just go away if we could use the standard block read()/write() calls here. Then the iovec would be placed _as scatter-gather list_ on the request-queue and the block layer would take care of the whole issue. That would be somewhat cheating with the concept of just being a SCSI transport layer :-) You would interpret some requests and turn them into something else. That would be interesting when your user starts using tags and make assumptions about what's in flight and what not etc... I've tried to advocate this approach once, but (again) was being told that it's a misuse of scsi-generic and I should be using scsi-disk instead. However, since Alex Graf is facing similar problems with the AHCI HBA of his maybe we could retry again ... Again, I'd say different problems :-) To some extent scsi-disk will solve the issues with basic read/write operations, but there's some more nasty SCSI commands that you want through for things like DVD burning for example, unless we start building higher level abstractions into the kernel. So you -still- end up acting somewhat as a SCSI transport layer, and potentially hit the problem with limits again. Cheers, Ben. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de+49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg)
Re: [Qemu-devel] scsi-generic and max request size
On Wed, 2010-12-22 at 14:27 +0100, Christoph Hellwig wrote: On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote: Most modern HBAs are using separate codepaths for streaming/block I/O anyway, That's not true at all. Every normal HBA justs passes normal SCSI commands to the SCSI targets. It's just raid adapters that take special commands, and the megaraid one is extremly special as it actually emulates a few SCSI commands even in RAID mode, which almost no other HBA does. Strictly speaking we should not allow scsi-generic with megaraid_sas, except for the separate passthrough channels that the real hardware has for things like tape drives. Actually, I would put it differently here. scsi-generic is -fundamentally- busted for HBA HW emulation since you simply cannot convey the limits of the underlying real HBA. If you are on top of usb-storage with a 120K max_sectors and try to emulate a piece of HBA with no such limitation how in hell do you make you guest know not to give your 120K requests at a time and what do you do if it does ? You're stuffed basically... Hence, the only way scsi-generic can make sense imho, is for something like vscsi which I'm doing now, which is just a transport and does have the ability to convey to the client/guest some of those limitations... provided it can get to them in the first place (see the discussion, it's really non trivial, which makes /dev/sg even less useful even for normal userspace :-) In the Megaraid case, the fact that it has this separate read/write channel on the contrary should make it -easier- to solve that problem typically by allowing the emulation layer to construct sequences of READ/WRITE requests that match the limitations. IE. Ie makes scsi-generic a possibility while it would otherwise (and is) broken in unfixable ways with other HBA emulation. However, since Alex Graf is facing similar problems with the AHCI HBA of his maybe we could retry again ... AHCI is a ATA adapter, and should never be used with scsi-generic for disks. Only for the ATAPI-attached cdroms/tapes/etc it could be used, although it's quite pointless. Right, but in that case (cdroms etc...) it would have the exact same problem. I'm not familiar with AHCI HW, and so I don't know whether there's a way for the HW to convey limits to the driver, but if not, then operating via scsi-generic would be busted the same way anything else is. Basically, scsi-generic cannot work for emulating an HBA. In fact, I would go as far as saying that it's not possible to generically emulate an HBA that just pass-through any SCSI command, simply due to the inability to convey those limits. vscsi is a special case (and other paravirt drivers that may exist) because being explicitely designed for acting as such transports, they -do- convey the necessary limit information. I don't know iscsi but I would be surprised if it didn't provide similar facilities. So what we need here is a way for qemu to retrieve those reliably when using scsi-generic. That's the missing piece of the puzzle on my side. Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
On 22.12.2010, at 14:27, Christoph Hellwig wrote: On Wed, Dec 22, 2010 at 02:54:54PM +0100, Hannes Reinecke wrote: Most modern HBAs are using separate codepaths for streaming/block I/O anyway, That's not true at all. Every normal HBA justs passes normal SCSI commands to the SCSI targets. It's just raid adapters that take special commands, and the megaraid one is extremly special as it actually emulates a few SCSI commands even in RAID mode, which almost no other HBA does. Strictly speaking we should not allow scsi-generic with megaraid_sas, except for the separate passthrough channels that the real hardware has for things like tape drives. However, since Alex Graf is facing similar problems with the AHCI HBA of his maybe we could retry again ... AHCI is a ATA adapter, and should never be used with scsi-generic for disks. Only for the ATAPI-attached cdroms/tapes/etc it could be used, although it's quite pointless. It's not 100% pointless - ATAPI passthrough is a feature requested by users. If we were to model ATAPI properly, it would end up using whatever SCSI layers we have below - which means ATAPI passthrough would be a mere matter of replacing the scsi-cdrom backend with a scsi-passthrough backend. Now for the fun part. ATAPI can also do NCQ. So we actually end up having the exact same thing here as megasas. If we were to do ATAPI passthrough on CD-ROM with NCQ. NCQ goes through the normal read/write path of a block backend. Passthrough would do SG_IO. Alex
Re: [Qemu-devel] scsi-generic and max request size
On 22.12.2010, at 22:59, Benjamin Herrenschmidt wrote: On Wed, 2010-12-22 at 14:54 +0100, Hannes Reinecke wrote: Well, sort of. 'sg' doesn't have any block queue limits directly as the block queue is attached to the block device (surprise, surprise :-). But nevertheless any commands send via SG_IO are being placed on the block queue, hence the same limits apply here, too. Right, tho is there a simple way to map sg to the appropriate block driver to retreive the info via sysfs ? I looks possible from a quick peek there but it also looks like an ungodly mess. If it were me I would be using I think you meant to type more here :-) However, I can't quite figure out how to reliably obtain that information in my driver since on one hand, the ioctl doesn't seem to work in mixed 32/64-bit environments, and on the other hand, sysfs doesn't seem to have anything for sg in /sys/class/block... Besides, those are both Linux-isms... so we'd have to be extra careful there too. Yes. I've been bashing my head against this, too. Christoph, any suggestion there ? IMO the whole problem arises from the fact that we're deliberately destroying information here. Most modern HBAs are using separate codepaths for streaming/block I/O anyway, but when using 'scsi-generic' we are forced to discard this information. We have to fake a SCSI READ/WRITE command, and send it via SG_IO to the underlying device and keep fingers crossed that we're not exceeding any device limitations. I wouldn't say it like that no. It's a transport problem. In my case I'm not faking anything, vscsi is just a transport (a variant of SRP). The problem is that when 'emulating' a HW HBA, you have no way to express the intrinsic limitations of the underlying HBA, but that's not a problem I have with vscsi which is meant to be a transport and as such does have means to convey that sort of information (tho in my case, I have some issues due to assumptions/bugs in the existing ibm vscsi client driver but that's a different topic). So I think there's a significant difference here between emulating a HW HBA and doing something like vscsi. The former has problems that cannot be easily solved I believe. The later problems on the other hands can be solved, the means to do so are there, but we have to deal with interface issues ... plumbing problems. The non working compat ioctl is one, the fact that sg has no /sys/class/block (or /sys/block) entries is another, etc... Ie, we are faced with a problem with Linux not exposing those informations in an easy to retrieve way, and no proper cross-platform way to obtain those informations neither. Why would you care about cross-platform here? Not saying I fully understand what information exactly you're lacking. But it's either SG_IO max request size in which case you don't need any equivalent on other platforms, as it's not available anywhere else. Or it's something else in which case you can just set it to some safe small default value and call it a day :). Alex
Re: [Qemu-devel] scsi-generic and max request size
On Thu, 2010-12-23 at 00:23 +0100, Alexander Graf wrote: The non working compat ioctl is one, the fact that sg has no /sys/class/block (or /sys/block) entries is another, etc... Ie, we are faced with a problem with Linux not exposing those informations in an easy to retrieve way, and no proper cross-platform way to obtain those informations neither. Why would you care about cross-platform here? Not saying I fully understand what information exactly you're lacking. But it's either SG_IO max request size in which case you don't need any equivalent on other platforms, as it's not available anywhere else. Or it's something else in which case you can just set it to some safe small default value and call it a day :). Well, do we support something like scsi-generic on windows or BSD hosts ? dunno.. .just asking :-) They -have- mechanisms (at least windows do) to pass SCSI requests down the stack. In that case, they'll have similar limitations (at the very least the max request size). So we'd want some way to expose that... but if scsi-generic today is linux only, then I can try to add linux-isms in there as a stop-gap to try to at least retreive the max req. size which is the main issue for me right now... at least until I start trying to have the SG_IO read/write directly into guest memory without bouncing :-) At that point, the SG limits might become trouble as well. Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
On 23.12.2010, at 00:35, Benjamin Herrenschmidt wrote: On Thu, 2010-12-23 at 00:23 +0100, Alexander Graf wrote: The non working compat ioctl is one, the fact that sg has no /sys/class/block (or /sys/block) entries is another, etc... Ie, we are faced with a problem with Linux not exposing those informations in an easy to retrieve way, and no proper cross-platform way to obtain those informations neither. Why would you care about cross-platform here? Not saying I fully understand what information exactly you're lacking. But it's either SG_IO max request size in which case you don't need any equivalent on other platforms, as it's not available anywhere else. Or it's something else in which case you can just set it to some safe small default value and call it a day :). Well, do we support something like scsi-generic on windows or BSD hosts ? dunno.. .just asking :-) They -have- mechanisms (at least windows do) to pass SCSI requests down the stack. In that case, they'll have similar limitations (at the very least the max request size). So we'd want some way to expose that... but if scsi-generic today is linux only, then I can try to add linux-isms in there as a stop-gap to try to at least retreive the max req. size which is the main issue for me right now... at least until I start trying to have the SG_IO read/write directly into guest memory without bouncing :-) At that point, the SG limits might become trouble as well. This all belongs in the block layer. If you create a call back function or property in the block struct, windows can implement its own limits when someone sits down to implement SG_IO on Windows. Alex
Re: [Qemu-devel] scsi-generic and max request size
On Thu, 2010-12-23 at 00:39 +0100, Alexander Graf wrote: This all belongs in the block layer. If you create a call back function or property in the block struct, windows can implement its own limits when someone sits down to implement SG_IO on Windows. Right and we do have generic ways it seems to interrogate those limits .. except they seem to be broken for sg :-) Also I've spotted some oddities where the ioctl for the max request size sometimes put_user as a int * and sometimes as a short * ... ooops... Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
On 23.12.2010, at 00:44, Benjamin Herrenschmidt wrote: On Thu, 2010-12-23 at 00:39 +0100, Alexander Graf wrote: This all belongs in the block layer. If you create a call back function or property in the block struct, windows can implement its own limits when someone sits down to implement SG_IO on Windows. Right and we do have generic ways it seems to interrogate those limits .. except they seem to be broken for sg :-) Also I've spotted some oddities where the ioctl for the max request size sometimes put_user as a int * and sometimes as a short * ... ooops... Congratulations for finding lots of Linux bugs :). Look at it from that way: You'll most likely be the very first person actually using sg properly. So after you're done, others won't have to fix it :). Alex
Re: [Qemu-devel] scsi-generic and max request size
On Thu, 2010-12-23 at 00:49 +0100, Alexander Graf wrote: Congratulations for finding lots of Linux bugs :). Look at it from that way: You'll most likely be the very first person actually using sg properly. So after you're done, others won't have to fix it :). Hahah, I doubt it :-) Makes me wonder whether sg can be used properly to be honest... Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
On 12/21/2010 04:52 AM, Benjamin Herrenschmidt wrote: On Tue, 2010-12-21 at 14:38 +1100, ronnie sahlberg wrote: Ben, Since it is a scsi device you can try the Inquiry command with pagecode 0xb0 : Block Limit VPD Page. That pages show optimal and maximum request sizes. This is for SBC, in the Vital Product Data chapter. Unfortunately this page is not mandatory so some devices might not understand it. :-( sg_inq --page=0x00 /dev/sg? will show you what inq pages your device supports. Well, that won't help much figuring what the limit is since in most case the limit seems to come from the host linux HBA (ie, usb-storage for example artificially clamps the max request size to deal with bogus USB-ATA bridges). Indeed. The request size is pretty much limited by the driver/scsi layer, so the above page won't help much here. As for using this to try to inform the guest OS as to what the limit is, this could be done by patching the result of that command on the fly in qemu, but that is nasty, and would only work if the guest OS actually uses the said command in the first place. AFAIK, neither sr.c nor sd.c do in Linux. And you'll be getting yelled at by hch to boot. So back to square 1 ... my vscsi (and virtio-blk too btw) can technically pass a max size to the guest, but we don't have a way to interrogate scsi-generic (and the underlying block driver) which is the main issue (that plus the fact that the ioctl seems to be broken in compat mode for /dev/sg specifically)... Ah, the warm and fuzzy feeling knowing to be not alone in this ... This is basically the same issue I brought up with the first submission round of my megasas emulation. As we're passing scatter-gather lists directly to the underlying device we might end up sending a request which is improperly formatted. The linux block layer has three limits onto which a request has to be formatted: - Max length of the scatter-gather list (max_sectors) - Max overall request size (max_segments) - Max length of individual sg elements (max_segment_size) newer kernels export these limits; they have been exported with commit c77a5710b7e23847bfdb81fcaa10b585f65c960a. For older kernels, however, we're being left in the dark here. So on newer kernel we probably could be doing a quick check on the block queue limits and reformat the I/O if required. Instead of reformatting we could be sendiong each element of an eg list individually. Thereby we would be introducing some slowdown as the sg lists have to be reassembled again by the lower layers, but we would be insulated from any sg list mismatch. However, this won't cover requests with too large sg elements. For those we could probably use some simple divide-by-two algorithm on the element to make them fit. But seeing we have to split the I/O requests anyway we might as well use the divide-by-two algorithm for the sg lists, too. Easiest would be if we could just transfer the available bits and push the request back to the guest as a partial completion. Sadly the I/O stack on the guest will choose to interpret this as an I/O error instead of retrying the remainder :-( So in the long run I fear we have to implement some sort of I/O request splitting in Qemu, using the values from sysfs. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg)
Re: [Qemu-devel] scsi-generic and max request size
So back to square 1 ... my vscsi (and virtio-blk too btw) can technically pass a max size to the guest, but we don't have a way to interrogate scsi-generic (and the underlying block driver) which is the main issue (that plus the fact that the ioctl seems to be broken in compat mode for /dev/sg specifically)... Ah, the warm and fuzzy feeling knowing to be not alone in this ... This is basically the same issue I brought up with the first submission round of my megasas emulation. heh. As we're passing scatter-gather lists directly to the underlying device we might end up sending a request which is improperly formatted. The linux block layer has three limits onto which a request has to be formatted: - Max length of the scatter-gather list (max_sectors) - Max overall request size (max_segments) Didn't you swap the 2 above ? max_sectors is the max overall req. size and max_segments the max number of SG elements afaik :-) - Max length of individual sg elements (max_segment_size) newer kernels export these limits; they have been exported with commit c77a5710b7e23847bfdb81fcaa10b585f65c960a. For older kernels, however, we're being left in the dark here. Well, first of all, sg is not there so that doesn't help with the scsi-generic problem much, then parsing sysfs... yuck. So on newer kernel we probably could be doing a quick check on the block queue limits and reformat the I/O if required. Maybe but then, sg isn't there. We could I suppose use sr as an indication tho when we know it's a cdrom. Instead of reformatting we could be sendiong each element of an eg list individually. Thereby we would be introducing some slowdown as the sg lists have to be reassembled again by the lower layers, but we would be insulated from any sg list mismatch. However, this won't cover requests with too large sg elements. For those we could probably use some simple divide-by-two algorithm on the element to make them fit. How can we ? We need a single request to match a single sg list anyways no ? Let's say you get a READ10 from the guest for 200Kb and your underlying max_sectors is 128Kb. How do you want to break that up ? The only way would be to make it two different READ10's and that's a can of worms especially if you start putting tags into the picture... But seeing we have to split the I/O requests anyway we might as well use the divide-by-two algorithm for the sg lists, too. Easiest would be if we could just transfer the available bits and push the request back to the guest as a partial completion. Sadly the I/O stack on the guest will choose to interpret this as an I/O error instead of retrying the remainder :-( So in the long run I fear we have to implement some sort of I/O request splitting in Qemu, using the values from sysfs. So in my case, I'm happy for the time being to continue doing bounce buffering and so my only problem at the moment is the max request size (aka max_sectors). Also I -can- tell the guest what my limitation is, it's part of the vscsi login protocol. I can look into doing DMA directly to the guest SG lists later maybe. However, I can't quite figure out how to reliably obtain that information in my driver since on one hand, the ioctl doesn't seem to work in mixed 32/64-bit environments, and on the other hand, sysfs doesn't seem to have anything for sg in /sys/class/block... Besides, those are both Linux-isms... so we'd have to be extra careful there too. Cheers, Ben. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de+49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg)
[Qemu-devel] scsi-generic and max request size
Hi folks ! There's an odd problem I've encountered with my scsi host (basically an powerpc vscsi compatible with IBM PAPR). When using /dev/sg (ie, scsi-generic), there seem to be no way I can find to retrieve the underlying driver's max request transfer size. This can normally be obtained with the BLKSECTGET ioctl under Linux (I'm not familiar with other OSes here). However, this is a bit buggy as well, ie, afaik, this doesn't work with 32-bit binaries on 64-bit kernels (the compat ioctl doesn't seem to work on /dev/sg). For now, qemu doesn't pass that from its bdev layer, which means that scsi-generic doesn't pass it to its own upper layer neither. What that means is two fold I suppose: - For real SCSI HBAs, how do you limit the transfer size anyways ? You can't start breaking up user requests without taking risks with tags etc... - For vscsi, I can expose the limit I want via the SRP interface, but scsi-generic doesn't tell me what it is :-) This is a real problem in practice. IE. the USB CD-ROM on this POWER7 blade limits transfers to 0x1e000 bytes for example and the Linux sr driver on the guest is going to try to give me bigger requests than that if I don't start limiting them, which will cause all sort of errors. Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
Ben, Since it is a scsi device you can try the Inquiry command with pagecode 0xb0 : Block Limit VPD Page. That pages show optimal and maximum request sizes. This is for SBC, in the Vital Product Data chapter. Unfortunately this page is not mandatory so some devices might not understand it. :-( sg_inq --page=0x00 /dev/sg? will show you what inq pages your device supports. regards ronnie sahlberg On Tue, Dec 21, 2010 at 2:25 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Hi folks ! There's an odd problem I've encountered with my scsi host (basically an powerpc vscsi compatible with IBM PAPR). When using /dev/sg (ie, scsi-generic), there seem to be no way I can find to retrieve the underlying driver's max request transfer size. This can normally be obtained with the BLKSECTGET ioctl under Linux (I'm not familiar with other OSes here). However, this is a bit buggy as well, ie, afaik, this doesn't work with 32-bit binaries on 64-bit kernels (the compat ioctl doesn't seem to work on /dev/sg). For now, qemu doesn't pass that from its bdev layer, which means that scsi-generic doesn't pass it to its own upper layer neither. What that means is two fold I suppose: - For real SCSI HBAs, how do you limit the transfer size anyways ? You can't start breaking up user requests without taking risks with tags etc... - For vscsi, I can expose the limit I want via the SRP interface, but scsi-generic doesn't tell me what it is :-) This is a real problem in practice. IE. the USB CD-ROM on this POWER7 blade limits transfers to 0x1e000 bytes for example and the Linux sr driver on the guest is going to try to give me bigger requests than that if I don't start limiting them, which will cause all sort of errors. Cheers, Ben.
Re: [Qemu-devel] scsi-generic and max request size
On Tue, 2010-12-21 at 14:38 +1100, ronnie sahlberg wrote: Ben, Since it is a scsi device you can try the Inquiry command with pagecode 0xb0 : Block Limit VPD Page. That pages show optimal and maximum request sizes. This is for SBC, in the Vital Product Data chapter. Unfortunately this page is not mandatory so some devices might not understand it. :-( sg_inq --page=0x00 /dev/sg? will show you what inq pages your device supports. Well, that won't help much figuring what the limit is since in most case the limit seems to come from the host linux HBA (ie, usb-storage for example artificially clamps the max request size to deal with bogus USB-ATA bridges). As for using this to try to inform the guest OS as to what the limit is, this could be done by patching the result of that command on the fly in qemu, but that is nasty, and would only work if the guest OS actually uses the said command in the first place. AFAIK, neither sr.c nor sd.c do in Linux. So back to square 1 ... my vscsi (and virtio-blk too btw) can technically pass a max size to the guest, but we don't have a way to interrogate scsi-generic (and the underlying block driver) which is the main issue (that plus the fact that the ioctl seems to be broken in compat mode for /dev/sg specifically)... Cheers, Ben. regards ronnie sahlberg On Tue, Dec 21, 2010 at 2:25 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: Hi folks ! There's an odd problem I've encountered with my scsi host (basically an powerpc vscsi compatible with IBM PAPR). When using /dev/sg (ie, scsi-generic), there seem to be no way I can find to retrieve the underlying driver's max request transfer size. This can normally be obtained with the BLKSECTGET ioctl under Linux (I'm not familiar with other OSes here). However, this is a bit buggy as well, ie, afaik, this doesn't work with 32-bit binaries on 64-bit kernels (the compat ioctl doesn't seem to work on /dev/sg). For now, qemu doesn't pass that from its bdev layer, which means that scsi-generic doesn't pass it to its own upper layer neither. What that means is two fold I suppose: - For real SCSI HBAs, how do you limit the transfer size anyways ? You can't start breaking up user requests without taking risks with tags etc... - For vscsi, I can expose the limit I want via the SRP interface, but scsi-generic doesn't tell me what it is :-) This is a real problem in practice. IE. the USB CD-ROM on this POWER7 blade limits transfers to 0x1e000 bytes for example and the Linux sr driver on the guest is going to try to give me bigger requests than that if I don't start limiting them, which will cause all sort of errors. Cheers, Ben.