Re: How to resolve an issue in swiotlb environment?
On Wed, Jun 19, 2019 at 05:05:49PM -0400, Alan Stern wrote: > On Wed, 19 Jun 2019, shuah wrote: > > > I missed a lot of the thread info. and went looking for it and found the > > following summary of the problem: > > > > == > > The issue which prompted the commit this thread is about arose in a > > situation where the block layer set up a scatterlist containing buffer > > sizes something like: > > > > 4096 4096 1536 1024 > > > > and the maximum packet size was 1024. The situation was a little > > unusual, because it involved vhci-hcd (a virtual HCD). This doesn't > > matter much in normal practice because: > > > > Block devices normally have a block size of 512 bytes or more. > > Smaller values are very uncommon. So scatterlist element sizes > > are always divisible by 512. > > > > xHCI is the only USB host controller type with a maximum packet > > size larger than 512, and xHCI hardware can do full > > scatter-gather so it doesn't care what the buffer sizes are. > > > > So another approach would be to fix vhci-hcd and then trust that the > > problem won't arise again, for the reasons above. We would be okay so > > long as nobody tried to use a USB-SCSI device with a block size of 256 > > bytes or less. > > === > > > > Out of the summary, the following gives me pause: > > > > "xHCI hardware can do full scatter-gather so it doesn't care what the > > buffer sizes are." > > > > vhci-hcd won't be able to count on hardware being able to do full > > scatter-gather. It has to deal with a variety of hardware with > > varying speeds. > > Sure. But you can test whether the server's HCD is able to handle > scatter-gather transfers, and if it is then you can say that the > client-side vhci-hcd is able to handle them as well. Then all you > would have to do is preserve the scatterlist information describing the > transfer when you go between the client and the server. > > The point is to make sure that the client-side vhci-hcd doesn't claim > to be _less_ capable than the server-side actual HCD. That's what > leads to the problem described above. > > > "We would be okay so long as nobody tried to use a USB-SCSI device with > > a block size of 256 bytes or less." > > > > At least a USB Storage device, I test with says 512 block size. Can we > > count on not seeing a device with block size <= 256 bytes? > > Yes, we can. In fact, the SCSI core doesn't handle devices with block > size < 512. > > > In any case, I am looking into adding SG support vhci-hci at the moment. > > > > Looks like the following is the repo, I should be working with? > > > > git://git.infradead.org/users/hch/misc.git > > It doesn't matter. Your work should end up being independent of > Christoph's, so you can base it on any repo. I implemented SG support of vhci. I will send it as a patch. Please look at it and let me know if you have a feedback. Regards Suwan Kim
Re: How to resolve an issue in swiotlb environment?
On Wed, 19 Jun 2019, shuah wrote: > I missed a lot of the thread info. and went looking for it and found the > following summary of the problem: > > == > The issue which prompted the commit this thread is about arose in a > situation where the block layer set up a scatterlist containing buffer > sizes something like: > > 4096 4096 1536 1024 > > and the maximum packet size was 1024. The situation was a little > unusual, because it involved vhci-hcd (a virtual HCD). This doesn't > matter much in normal practice because: > > Block devices normally have a block size of 512 bytes or more. > Smaller values are very uncommon. So scatterlist element sizes > are always divisible by 512. > > xHCI is the only USB host controller type with a maximum packet > size larger than 512, and xHCI hardware can do full > scatter-gather so it doesn't care what the buffer sizes are. > > So another approach would be to fix vhci-hcd and then trust that the > problem won't arise again, for the reasons above. We would be okay so > long as nobody tried to use a USB-SCSI device with a block size of 256 > bytes or less. > === > > Out of the summary, the following gives me pause: > > "xHCI hardware can do full scatter-gather so it doesn't care what the > buffer sizes are." > > vhci-hcd won't be able to count on hardware being able to do full > scatter-gather. It has to deal with a variety of hardware with > varying speeds. Sure. But you can test whether the server's HCD is able to handle scatter-gather transfers, and if it is then you can say that the client-side vhci-hcd is able to handle them as well. Then all you would have to do is preserve the scatterlist information describing the transfer when you go between the client and the server. The point is to make sure that the client-side vhci-hcd doesn't claim to be _less_ capable than the server-side actual HCD. That's what leads to the problem described above. > "We would be okay so long as nobody tried to use a USB-SCSI device with > a block size of 256 bytes or less." > > At least a USB Storage device, I test with says 512 block size. Can we > count on not seeing a device with block size <= 256 bytes? Yes, we can. In fact, the SCSI core doesn't handle devices with block size < 512. > In any case, I am looking into adding SG support vhci-hci at the moment. > > Looks like the following is the repo, I should be working with? > > git://git.infradead.org/users/hch/misc.git It doesn't matter. Your work should end up being independent of Christoph's, so you can base it on any repo. Alan Stern
Re: How to resolve an issue in swiotlb environment?
Hi Alan, On 6/18/19 9:28 AM, shuah wrote: On 6/14/19 8:44 AM, Alan Stern wrote: On Thu, 13 Jun 2019, shuah wrote: Great! So all we have to do is fix vhci-hcd. Then we can remove all the virt_boundary_mask stuff from usb-storage and uas entirely. (I'm assuming wireless USB isn't a genuine issue. As far as I know, it is pretty much abandoned at this point.) Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too hard. It ought to be possible even without changing the network protocol. I will start taking a look at this. Is there a target release in plan to drop virt_boundary_mask stuff? Not yet. But since it doesn't do what we want anyway, this should be fixed quickly. I missed a lot of the thread info. and went looking for it and found the following summary of the problem: == The issue which prompted the commit this thread is about arose in a situation where the block layer set up a scatterlist containing buffer sizes something like: 4096 4096 1536 1024 and the maximum packet size was 1024. The situation was a little unusual, because it involved vhci-hcd (a virtual HCD). This doesn't matter much in normal practice because: Block devices normally have a block size of 512 bytes or more. Smaller values are very uncommon. So scatterlist element sizes are always divisible by 512. xHCI is the only USB host controller type with a maximum packet size larger than 512, and xHCI hardware can do full scatter-gather so it doesn't care what the buffer sizes are. So another approach would be to fix vhci-hcd and then trust that the problem won't arise again, for the reasons above. We would be okay so long as nobody tried to use a USB-SCSI device with a block size of 256 bytes or less. === Out of the summary, the following gives me pause: "xHCI hardware can do full scatter-gather so it doesn't care what the buffer sizes are." vhci-hcd won't be able to count on hardware being able to do full scatter-gather. It has to deal with a variety of hardware with varying speeds. "We would be okay so long as nobody tried to use a USB-SCSI device with a block size of 256 bytes or less." At least a USB Storage device, I test with says 512 block size. Can we count on not seeing a device with block size <= 256 bytes? In any case, I am looking into adding SG support vhci-hci at the moment. Looks like the following is the repo, I should be working with? git://git.infradead.org/users/hch/misc.git thanks, -- Shuah
Re: How to resolve an issue in swiotlb environment?
On 6/14/19 8:44 AM, Alan Stern wrote: On Thu, 13 Jun 2019, shuah wrote: Great! So all we have to do is fix vhci-hcd. Then we can remove all the virt_boundary_mask stuff from usb-storage and uas entirely. (I'm assuming wireless USB isn't a genuine issue. As far as I know, it is pretty much abandoned at this point.) Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too hard. It ought to be possible even without changing the network protocol. I will start taking a look at this. Is there a target release in plan to drop virt_boundary_mask stuff? Not yet. But since it doesn't do what we want anyway, this should be fixed quickly. Sounds good. I am working on it. thanks, -- Shuah
Re: How to resolve an issue in swiotlb environment?
On Thu, 13 Jun 2019, shuah wrote: > > Great! So all we have to do is fix vhci-hcd. Then we can remove all > > the virt_boundary_mask stuff from usb-storage and uas entirely. > > > > (I'm assuming wireless USB isn't a genuine issue. As far as I know, it > > is pretty much abandoned at this point.) > > > > Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too > > hard. It ought to be possible even without changing the network > > protocol. > > > > I will start taking a look at this. Is there a target release in plan > to drop virt_boundary_mask stuff? Not yet. But since it doesn't do what we want anyway, this should be fixed quickly. Alan Stern
Re: How to resolve an issue in swiotlb environment?
On 6/13/19 11:16 AM, Alan Stern wrote: On Thu, 13 Jun 2019, Christoph Hellwig wrote: On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote: Would it be okay to rely on the assumption that USB block devices never have block size < 512? (We could even add code to the driver to enforce this, although refusing to handle such devices at all might be worse than getting an occasional error.) sd.c only supports a few specific sector size, and none of them is < 512 bytes: if (sector_size != 512 && sector_size != 1024 && sector_size != 2048 && sector_size != 4096) { ... sdkp->capacity = 0; Great! So all we have to do is fix vhci-hcd. Then we can remove all the virt_boundary_mask stuff from usb-storage and uas entirely. (I'm assuming wireless USB isn't a genuine issue. As far as I know, it is pretty much abandoned at this point.) Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too hard. It ought to be possible even without changing the network protocol. I will start taking a look at this. Is there a target release in plan to drop virt_boundary_mask stuff? thanks, -- Shuah
Re: How to resolve an issue in swiotlb environment?
On Thu, Jun 13, 2019 at 01:16:32PM -0400, Alan Stern wrote: > On Thu, 13 Jun 2019, Christoph Hellwig wrote: > > > On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote: > > > Would it be okay to rely on the assumption that USB block devices never > > > have block size < 512? (We could even add code to the driver to > > > enforce this, although refusing to handle such devices at all might be > > > worse than getting an occasional error.) > > > > sd.c only supports a few specific sector size, and none of them is > > < 512 bytes: > > > > if (sector_size != 512 && > > sector_size != 1024 && > > sector_size != 2048 && > > sector_size != 4096) { > > ... > > sdkp->capacity = 0; > > Great! So all we have to do is fix vhci-hcd. Then we can remove all > the virt_boundary_mask stuff from usb-storage and uas entirely. > > (I'm assuming wireless USB isn't a genuine issue. As far as I know, it > is pretty much abandoned at this point.) It is, I need to just move it to staging and delete the thing. I don't know of any hardware anymore. thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to resolve an issue in swiotlb environment?
On Thu, 13 Jun 2019, Christoph Hellwig wrote: > On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote: > > Would it be okay to rely on the assumption that USB block devices never > > have block size < 512? (We could even add code to the driver to > > enforce this, although refusing to handle such devices at all might be > > worse than getting an occasional error.) > > sd.c only supports a few specific sector size, and none of them is > < 512 bytes: > > if (sector_size != 512 && > sector_size != 1024 && > sector_size != 2048 && > sector_size != 4096) { > ... > sdkp->capacity = 0; Great! So all we have to do is fix vhci-hcd. Then we can remove all the virt_boundary_mask stuff from usb-storage and uas entirely. (I'm assuming wireless USB isn't a genuine issue. As far as I know, it is pretty much abandoned at this point.) Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too hard. It ought to be possible even without changing the network protocol. Alan Stern ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to resolve an issue in swiotlb environment?
Christoph, > sd.c only supports a few specific sector size, and none of them is > < 512 bytes: Yep, while sd.c in theory supported 256-byte logical blocks a while back, that code was removed since the block layer always operates on units of 512 bytes. -- Martin K. Petersen Oracle Linux Engineering ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to resolve an issue in swiotlb environment?
On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote: > Would it be okay to rely on the assumption that USB block devices never > have block size < 512? (We could even add code to the driver to > enforce this, although refusing to handle such devices at all might be > worse than getting an occasional error.) sd.c only supports a few specific sector size, and none of them is < 512 bytes: if (sector_size != 512 && sector_size != 1024 && sector_size != 2048 && sector_size != 4096) { ... sdkp->capacity = 0; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: How to resolve an issue in swiotlb environment?
Hi Christoph, > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 8:31 PM > > On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote: > > Hi Christoph, > > > > > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM > > > > > > First things first: > > > > > > Yoshihiro, can you try this git branch? The new bits are just the three > > > patches at the end, but they sit on top of a few patches already sent > > > out to the list, so a branch is probably either: > > > > > >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes > > > > Thank you for the patches! > > Unfortunately, the three patches could not resolve this issue. > > However, it's a hint to me, and then I found the root cause: > > - slave_configure() in drivers/usb/storage/scsiglue.c calls > >blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when > > USB_SPEED_SUPER or more. > > -- So that, even if your patches (also I fixed it a little [1]) could not > > resolve > > the issue because the max_sectors is overwritten by above code. > > > > So, I think we should fix the slave_configure() by using > > dma_max_mapping_size(). > > What do you think? If so, I can make such a patch. > > Yes, please do. Thank you for your comment. I sent a patch to related mailing lists and you. Best regards, Yoshihiro Shimoda ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to resolve an issue in swiotlb environment?
On Wed, 12 Jun 2019, Christoph Hellwig wrote: > On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote: > > > Thay is someething the virt_boundary prevents. But could still give > > > you something like: > > > > > > 1536 4096 4096 1024 > > > > > > or > > > 1536 16384 8192 4096 16384 512 > > > > That would kill the driver, if maxpacket were 1024. > > > > USB has really two kinds of requirements > > > > 1. What comes from the protocol > > 2. What comes from the HCD > > > > The protocol wants just multiples of maxpacket. XHCI can satisfy > > that in arbitrary scatter/gather. Other HCs cannot. > > We have no real way to enforce that for the other HCs unfortunately. > I can't really think of any better way to handle their limitations > except for setting max_segments to 1 or bounce buffering. Would it be okay to rely on the assumption that USB block devices never have block size < 512? (We could even add code to the driver to enforce this, although refusing to handle such devices at all might be worse than getting an occasional error.) As I mentioned before, the only HCD that sometimes ends up with maxpacket = 1024 but is unable to do full SG is vhci-hcd, and that one shouldn't be too hard to fix. Alan Stern
Re: How to resolve an issue in swiotlb environment?
On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote: > > Thay is someething the virt_boundary prevents. But could still give > > you something like: > > > > 1536 4096 4096 1024 > > > > or > > 1536 16384 8192 4096 16384 512 > > That would kill the driver, if maxpacket were 1024. > > USB has really two kinds of requirements > > 1. What comes from the protocol > 2. What comes from the HCD > > The protocol wants just multiples of maxpacket. XHCI can satisfy > that in arbitrary scatter/gather. Other HCs cannot. We have no real way to enforce that for the other HCs unfortunately. I can't really think of any better way to handle their limitations except for setting max_segments to 1 or bounce buffering.
Re: How to resolve an issue in swiotlb environment?
Am Mittwoch, den 12.06.2019, 09:30 +0200 schrieb Christoph Hellwig: > > So based on the above I'm a little confused about the actual requirement > again. Can you still split the SCSI command into multiple URBs? And Yes. The device sees only a number of packets over the wire. They can come from an arbitrary number of URBs with the two restrictions that - we cannot split a packet among URBs - every packet but the last must be a multiple of maxpacket > is the boundary for that split still the scatterlist entry as in the > description above? If so I don't really see how the virt_boundary > helps you at all. as it only guarnatees that in a bio, each subsequent > segment start as the advertised virt_boundary. It says nothing about > the size of each segment. That is problematic. > Thay is someething the virt_boundary prevents. But could still give > you something like: > > 1536 4096 4096 1024 > > or > 1536 16384 8192 4096 16384 512 That would kill the driver, if maxpacket were 1024. USB has really two kinds of requirements 1. What comes from the protocol 2. What comes from the HCD The protocol wants just multiples of maxpacket. XHCI can satisfy that in arbitrary scatter/gather. Other HCs cannot. Regards Oliver
Re: How to resolve an issue in swiotlb environment?
On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote: > Hi Christoph, > > > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM > > > > First things first: > > > > Yoshihiro, can you try this git branch? The new bits are just the three > > patches at the end, but they sit on top of a few patches already sent > > out to the list, so a branch is probably either: > > > >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes > > Thank you for the patches! > Unfortunately, the three patches could not resolve this issue. > However, it's a hint to me, and then I found the root cause: > - slave_configure() in drivers/usb/storage/scsiglue.c calls >blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER > or more. > -- So that, even if your patches (also I fixed it a little [1]) could not > resolve > the issue because the max_sectors is overwritten by above code. > > So, I think we should fix the slave_configure() by using > dma_max_mapping_size(). > What do you think? If so, I can make such a patch. Yes, please do.
RE: How to resolve an issue in swiotlb environment?
Hi Christoph, > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM > > First things first: > > Yoshihiro, can you try this git branch? The new bits are just the three > patches at the end, but they sit on top of a few patches already sent > out to the list, so a branch is probably either: > >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes Thank you for the patches! Unfortunately, the three patches could not resolve this issue. However, it's a hint to me, and then I found the root cause: - slave_configure() in drivers/usb/storage/scsiglue.c calls blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER or more. -- So that, even if your patches (also I fixed it a little [1]) could not resolve the issue because the max_sectors is overwritten by above code. So, I think we should fix the slave_configure() by using dma_max_mapping_size(). What do you think? If so, I can make such a patch. [1] In the "scsi: take the DMA max mapping size into account" patch, + shost->max_sectors = min_t(unsigned int, shost->max_sectors, + dma_max_mapping_size(dev) << SECTOR_SHIFT); it should be: + dma_max_mapping_size(dev) >> SECTOR_SHIFT); But, if we fix the slave_configure(), we don't need this patch, IIUC. Best regards, Yoshihiro Shimoda
Re: How to resolve an issue in swiotlb environment?
First things first: Yoshihiro, can you try this git branch? The new bits are just the three patches at the end, but they sit on top of a few patches already sent out to the list, so a branch is probably either: git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/scsi-virt-boundary-fixes And now on to the rest: > We would like to avoid the extra I/O overhead for host controllers that > can't handle SG. In fact, switching to sg_tablesize = 1 would probably > be considered a regression. Ok, makes sense. > > - set the virt boundary as-is for devices supporting "basic" scatterlist, > >although that still assumes they can rejiggle them because for example > >you could still get a smaller than expected first segment ala (assuming > >a 1024 byte packet size and thus 1023 virt_boundary_mask): > > > > | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 | > > > >as the virt_bondary does not guarantee that the first segment is > >the same size as all the mid segments. > > But that is exactly the problem we need to solve. So based on the above I'm a little confused about the actual requirement again. Can you still split the SCSI command into multiple URBs? And is the boundary for that split still the scatterlist entry as in the description above? If so I don't really see how the virt_boundary helps you at all. as it only guarnatees that in a bio, each subsequent segment start as the advertised virt_boundary. It says nothing about the size of each segment. > The issue which prompted the commit this thread is about arose in a > situation where the block layer set up a scatterlist containing buffer > sizes something like: > > 4096 4096 1536 1024 > > and the maximum packet size was 1024. The situation was a little > unusual, because it involved vhci-hcd (a virtual HCD). This doesn't > matter much in normal practice because: Thay is someething the virt_boundary prevents. But could still give you something like: 1536 4096 4096 1024 or 1536 16384 8192 4096 16384 512 > The ->sysdev field points to the device used for DMA mapping. It is > often the same as ->controller, but sometimes it is > ->controller->parent because of the peculiarities of some platforms. Thanks, taken into account in the above patches!
Re: How to resolve an issue in swiotlb environment?
On Tue, 11 Jun 2019, Christoph Hellwig wrote: > Hi Alan, > > thanks for the explanation. It seems like what usb wants is to: > > - set sg_tablesize to 1 for devices that can't handle scatterlist at all Hmmm. usb-storage (and possible other drivers too) currently handles such controllers by setting up an SG transfer as a series of separate URBs, one for each scatterlist entry. But this is not the same thing, for two reasons: It has less I/O overhead than setting sg_tablesize to 1 because it sets up the whole transfer as a single SCSI command, which requires much less time and traffic on the USB bus than sending multiple commands. It has that requirement about each scatterlist element except the last being a multiple of the maximum packet size in length. (This is because the USB protocol says that a transfer ends whenever a less-than-maximum-size packet is encountered.) We would like to avoid the extra I/O overhead for host controllers that can't handle SG. In fact, switching to sg_tablesize = 1 would probably be considered a regression. > - set the virt boundary as-is for devices supporting "basic" scatterlist, >although that still assumes they can rejiggle them because for example >you could still get a smaller than expected first segment ala (assuming >a 1024 byte packet size and thus 1023 virt_boundary_mask): > > | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 | > >as the virt_bondary does not guarantee that the first segment is >the same size as all the mid segments. But that is exactly the problem we need to solve. The issue which prompted the commit this thread is about arose in a situation where the block layer set up a scatterlist containing buffer sizes something like: 4096 4096 1536 1024 and the maximum packet size was 1024. The situation was a little unusual, because it involved vhci-hcd (a virtual HCD). This doesn't matter much in normal practice because: Block devices normally have a block size of 512 bytes or more. Smaller values are very uncommon. So scatterlist element sizes are always divisible by 512. xHCI is the only USB host controller type with a maximum packet size larger than 512, and xHCI hardware can do full scatter-gather so it doesn't care what the buffer sizes are. So another approach would be to fix vhci-hcd and then trust that the problem won't arise again, for the reasons above. We would be okay so long as nobody tried to use a USB-SCSI device with a block size of 256 bytes or less. > - do not set any limit on xhci > > But that just goes back to the original problem, and that is that with > swiotlb we are limited in the total dma mapping size, and recent block > layer changes in the way we handle the virt_boundary mean we now build > much larger requests by default. For SCSI ULDs to take that into > account I need to call dma_max_mapping_size() and use that as the > upper bound for the request size. My plan is to do that in scsi_lib.c, > but for that we need to expose the actual struct device that the dma > mapping is perfomed on to the scsi layer. If that device is different > from the sysfs hierchary struct device, which it is for usb the ULDD > needs to scsi_add_host_with_dma and pass the dma device as well. How > do I get at the dma device (aka the HCDs pci_dev or similar) from > usb-storage/uas? >From usb_stor_probe2(): us->pusb_dev->bus->sysdev. >From uas_probe(): udev->bus->sysdev. The ->sysdev field points to the device used for DMA mapping. It is often the same as ->controller, but sometimes it is ->controller->parent because of the peculiarities of some platforms. Alan Stern
RE: How to resolve an issue in swiotlb environment?
Hi Christoph, Alan, > From: Alan Stern, Sent: Tuesday, June 11, 2019 3:46 AM > > On Mon, 10 Jun 2019, Christoph Hellwig wrote: > > > Hi Yoshihiro, > > > > sorry for not taking care of this earlier, today is a public holiday > > here and thus I'm not working much over the long weekend. To Christoph: No worries. > > On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote: > > > I have another way to avoid the issue. But it doesn't seem that a good > > > way though... > > > According to the commit that adding blk_queue_virt_boundary() [3], > > > this is needed for vhci_hcd as a workaround so that if we avoid to call it > > > on xhci-hcd driver, the issue disappeared. What do you think? > > > JFYI, I pasted a tentative patch in the end of email [4]. > > > > Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it > > seems like the usage is wrong, as it doesn't follow the same rules as > > all the others. I think your patch goes in the right direction, > > but instead of comparing a hcd name it needs to be keyed of a flag > > set by the driver (I suspect there is one indicating native SG support, > > but I can't quickly find it), and we need an alternative solution > > for drivers that don't see like vhci. I suspect just limiting the > > entire transfer size to something that works for a single packet > > for them would be fine. > > Christoph: > > In most of the different kinds of USB host controllers, the hardware is > not capable of assembling a packet out of multiple buffers at arbitrary > addresses. As a matter of fact, xHCI is the only kind that _can_ do > this. > > In some cases, the hardware can assemble packets provided each buffer > other than the last ends at a page boundary and each buffer other than > the first starts at a page boundary (Intel would say the buffers are > "virtually contiguous"), but this is a rather complex rule and we don't > want to rely on it. Plus, in other cases the hardware _can't_ do this. > > Instead, we want the SG buffers to be set up so that each one (except > the last) is an exact multiple of the maximum packet size. That way, > each packet can be assembled from the contents of a single buffer and > there's no problem. There is out of this topic though, if we prepare such an exact multiple of the maximum packet size (1024, 512 or 64), is it possible to cause trouble on IOMMU environment? IIUC, dma_map_sg() maps SG buffers as a single segment and then the segment buffer is not contiguous. > The maximum packet size depends on the type of USB connection. > Typical values are 1024, 512, or 64. It's always a power of two and > it's smaller than 4096. Therefore we simplify the problem even further > by requiring that each SG buffer in a scatterlist (except the last one) > be a multiple of the page size. (It doesn't need to be aligned on a > page boundary, as far as I remember.) > > That's why the blk_queue_virt_boundary usage was added to the USB code. > Perhaps it's not the right way of doing this; I'm not an expert on the > inner workings of the block layer. If you can suggest a better way to > express our requirement, that would be great. Since I'm also not familiar with the block layer, I could not find a better way... Best regards, Yoshihiro Shimoda > Alan Stern > > PS: There _is_ a flag saying whether an HCD supports SG. But what it > means is that the driver can handle an SG list that meets the > requirement above; it doesn't mean that the driver can reassemble the > data from an SG list into a series of bounce buffers in order to meet > the requirement. We very much want not to do that, especially since > the block layer should already be capable of doing it for us. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: How to resolve an issue in swiotlb environment?
Hi Alan, thanks for the explanation. It seems like what usb wants is to: - set sg_tablesize to 1 for devices that can't handle scatterlist at all - set the virt boundary as-is for devices supporting "basic" scatterlist, although that still assumes they can rejiggle them because for example you could still get a smaller than expected first segment ala (assuming a 1024 byte packet size and thus 1023 virt_boundary_mask): | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 | as the virt_bondary does not guarantee that the first segment is the same size as all the mid segments. - do not set any limit on xhci But that just goes back to the original problem, and that is that with swiotlb we are limited in the total dma mapping size, and recent block layer changes in the way we handle the virt_boundary mean we now build much larger requests by default. For SCSI ULDs to take that into account I need to call dma_max_mapping_size() and use that as the upper bound for the request size. My plan is to do that in scsi_lib.c, but for that we need to expose the actual struct device that the dma mapping is perfomed on to the scsi layer. If that device is different from the sysfs hierchary struct device, which it is for usb the ULDD needs to scsi_add_host_with_dma and pass the dma device as well. How do I get at the dma device (aka the HCDs pci_dev or similar) from usb-storage/uas?
Re: How to resolve an issue in swiotlb environment?
On Mon, 10 Jun 2019, Christoph Hellwig wrote: > Hi Yoshihiro, > > sorry for not taking care of this earlier, today is a public holiday > here and thus I'm not working much over the long weekend. > > On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote: > > I have another way to avoid the issue. But it doesn't seem that a good way > > though... > > According to the commit that adding blk_queue_virt_boundary() [3], > > this is needed for vhci_hcd as a workaround so that if we avoid to call it > > on xhci-hcd driver, the issue disappeared. What do you think? > > JFYI, I pasted a tentative patch in the end of email [4]. > > Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it > seems like the usage is wrong, as it doesn't follow the same rules as > all the others. I think your patch goes in the right direction, > but instead of comparing a hcd name it needs to be keyed of a flag > set by the driver (I suspect there is one indicating native SG support, > but I can't quickly find it), and we need an alternative solution > for drivers that don't see like vhci. I suspect just limiting the > entire transfer size to something that works for a single packet > for them would be fine. Christoph: In most of the different kinds of USB host controllers, the hardware is not capable of assembling a packet out of multiple buffers at arbitrary addresses. As a matter of fact, xHCI is the only kind that _can_ do this. In some cases, the hardware can assemble packets provided each buffer other than the last ends at a page boundary and each buffer other than the first starts at a page boundary (Intel would say the buffers are "virtually contiguous"), but this is a rather complex rule and we don't want to rely on it. Plus, in other cases the hardware _can't_ do this. Instead, we want the SG buffers to be set up so that each one (except the last) is an exact multiple of the maximum packet size. That way, each packet can be assembled from the contents of a single buffer and there's no problem. The maximum packet size depends on the type of USB connection. Typical values are 1024, 512, or 64. It's always a power of two and it's smaller than 4096. Therefore we simplify the problem even further by requiring that each SG buffer in a scatterlist (except the last one) be a multiple of the page size. (It doesn't need to be aligned on a page boundary, as far as I remember.) That's why the blk_queue_virt_boundary usage was added to the USB code. Perhaps it's not the right way of doing this; I'm not an expert on the inner workings of the block layer. If you can suggest a better way to express our requirement, that would be great. Alan Stern PS: There _is_ a flag saying whether an HCD supports SG. But what it means is that the driver can handle an SG list that meets the requirement above; it doesn't mean that the driver can reassemble the data from an SG list into a series of bounce buffers in order to meet the requirement. We very much want not to do that, especially since the block layer should already be capable of doing it for us.
Re: How to resolve an issue in swiotlb environment?
Hi Yoshihiro, sorry for not taking care of this earlier, today is a public holiday here and thus I'm not working much over the long weekend. On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote: > I have another way to avoid the issue. But it doesn't seem that a good way > though... > According to the commit that adding blk_queue_virt_boundary() [3], > this is needed for vhci_hcd as a workaround so that if we avoid to call it > on xhci-hcd driver, the issue disappeared. What do you think? > JFYI, I pasted a tentative patch in the end of email [4]. Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it seems like the usage is wrong, as it doesn't follow the same rules as all the others. I think your patch goes in the right direction, but instead of comparing a hcd name it needs to be keyed of a flag set by the driver (I suspect there is one indicating native SG support, but I can't quickly find it), and we need an alternative solution for drivers that don't see like vhci. I suspect just limiting the entire transfer size to something that works for a single packet for them would be fine.
RE: How to resolve an issue in swiotlb environment?
Hi Christoph, Alan, (add linux-usb ML on CC.) > From: Yoshihiro Shimoda, Sent: Friday, June 7, 2019 9:00 PM > > Hi Christoph, > > I think we should continue to discuss on this email thread instead of the > fixed DMA-API.txt patch [1] > > [1] > https://marc.info/?t=15598941221&r=1&w=2 > > > From: Yoshihiro Shimoda, Sent: Monday, June 3, 2019 3:42 PM > > > > Hi linux-block and iommu mailing lists, > > > > I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is > > full" like below. > > > > [ 36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: > > 524288 bytes), total 32768 (slots), used 1338 > (slots) > > > > I have investigated this issue by using git bisect, and then I found the > > following commit: > > > > --- > > commit 09324d32d2a0843e66652a087da6f77924358e62 > > Author: Christoph Hellwig > > Date: Tue May 21 09:01:41 2019 +0200 > > > > block: force an unlimited segment size on queues with a virt boundary > > --- > > Thank you for your comment on other email thread [2] like below: > --- > Turns out it isn't as simple as I thought, as there doesn't seem to > be an easy way to get to the struct device used for DMA mapping > from USB drivers. I'll need to think a bit more how to handle that > best. > --- > > [2] > https://marc.info/?l=linux-doc&m=155989651620473&w=2 I have another way to avoid the issue. But it doesn't seem that a good way though... According to the commit that adding blk_queue_virt_boundary() [3], this is needed for vhci_hcd as a workaround so that if we avoid to call it on xhci-hcd driver, the issue disappeared. What do you think? JFYI, I pasted a tentative patch in the end of email [4]. --- [3] commit 747668dbc061b3e62bc1982767a3a1f9815fcf0e Author: Alan Stern Date: Mon Apr 15 13:19:25 2019 -0400 usb-storage: Set virt_boundary_mask to avoid SG overflows --- [4] diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index 59190d8..277c6f7e 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -30,6 +30,8 @@ #include #include +#include +#include #include #include @@ -65,6 +67,7 @@ static const char* host_info(struct Scsi_Host *host) static int slave_alloc (struct scsi_device *sdev) { struct us_data *us = host_to_us(sdev->host); + struct usb_hcd *hcd = bus_to_hcd(us->pusb_dev->bus); int maxp; /* @@ -80,8 +83,10 @@ static int slave_alloc (struct scsi_device *sdev) * Bulk maxpacket value. Fortunately this value is always a * power of 2. Inform the block layer about this requirement. */ - maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0); - blk_queue_virt_boundary(sdev->request_queue, maxp - 1); + if (!strcmp(hcd->driver->description, "vhci_hcd")) { + maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0); + blk_queue_virt_boundary(sdev->request_queue, maxp - 1); + } /* * Some host controllers may have alignment requirements. --- Best regards, Yoshihiro Shimoda
RE: How to resolve an issue in swiotlb environment?
slots) [ 223.888940] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.899630] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.910318] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.921005] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.931695] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.942387] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.953077] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 223.963765] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.865664] swiotlb_tbl_map_single: 70409 callbacks suppressed [ 228.865668] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.882188] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.892878] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.903567] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.914256] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.924944] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.935636] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.946326] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.957015] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) [ 228.967705] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots) Regards, Biju > Subject: RE: How to resolve an issue in swiotlb environment? > > Hi Christoph, > > I think we should continue to discuss on this email thread instead of the > fixed > DMA-API.txt patch [1] > > [1] > https://marc.info/?t=15598941221&r=1&w=2 > > > From: Yoshihiro Shimoda, Sent: Monday, June 3, 2019 3:42 PM > > > > Hi linux-block and iommu mailing lists, > > > > I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is > > full" > like below. > > > > [ 36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 524288 > bytes), total 32768 (slots), used 1338 (slots) > > > > I have investigated this issue by using git bisect, and then I found the > following commit: > > > > --- > > commit 09324d32d2a0843e66652a087da6f77924358e62 > > Author: Christoph Hellwig > > Date: Tue May 21 09:01:41 2019 +0200 > > > > block: force an unlimited segment size on queues with a virt > > boundary > > --- > > Thank you for your comment on other email thread [2] like below: > --- > Turns out it isn't as simple as I thought, as there doesn't seem to be an easy > way to get to the struct device used for DMA mapping from USB drivers. I'll > need to think a bit more how to handle that best. > --- > > [2] > https://marc.info/?l=linux-doc&m=155989651620473&w=2 > > I'm not sure this is a correct way, but the issue disappears if I applied a > patch > below to USB storage driver. Especially, WARNING happened on > blk_queue_max_segment_size(). > Maybe we need to expand the argument "struct device *" of > blk_queue_virt_boundary() to call dma_max_mapping_size()? > --- > diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c > index 59190d8..fa37b39 100644 > --- a/drivers/usb/storage/scsiglue.c > +++ b/drivers/usb/storage/scsiglue.c > @@ -28,6 +28,7 @@ > * status of a command. > */ > > +#include > #include > #include > > @@ -83,6 +84,15 @@ static int slave_alloc (struct scsi_device *sdev) > maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0); > blk_queue_virt_boundary(sdev->request_queue, maxp - 1); > > +{ > + struct device *dev = us->pusb_dev->bus->controller; > + > + dev_info(dev, "%s: size = %zu\n", __func__, > dma_max_mapping_size(dev)); > + blk_queue_max_segment_size(sdev->request_queue, > +dma_max_mappi
RE: How to resolve an issue in swiotlb environment?
Hi Christoph, I think we should continue to discuss on this email thread instead of the fixed DMA-API.txt patch [1] [1] https://marc.info/?t=15598941221&r=1&w=2 > From: Yoshihiro Shimoda, Sent: Monday, June 3, 2019 3:42 PM > > Hi linux-block and iommu mailing lists, > > I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is full" > like below. > > [ 36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 524288 > bytes), total 32768 (slots), used 1338 (slots) > > I have investigated this issue by using git bisect, and then I found the > following commit: > > --- > commit 09324d32d2a0843e66652a087da6f77924358e62 > Author: Christoph Hellwig > Date: Tue May 21 09:01:41 2019 +0200 > > block: force an unlimited segment size on queues with a virt boundary > --- Thank you for your comment on other email thread [2] like below: --- Turns out it isn't as simple as I thought, as there doesn't seem to be an easy way to get to the struct device used for DMA mapping from USB drivers. I'll need to think a bit more how to handle that best. --- [2] https://marc.info/?l=linux-doc&m=155989651620473&w=2 I'm not sure this is a correct way, but the issue disappears if I applied a patch below to USB storage driver. Especially, WARNING happened on blk_queue_max_segment_size(). Maybe we need to expand the argument "struct device *" of blk_queue_virt_boundary() to call dma_max_mapping_size()? --- diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c index 59190d8..fa37b39 100644 --- a/drivers/usb/storage/scsiglue.c +++ b/drivers/usb/storage/scsiglue.c @@ -28,6 +28,7 @@ * status of a command. */ +#include #include #include @@ -83,6 +84,15 @@ static int slave_alloc (struct scsi_device *sdev) maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0); blk_queue_virt_boundary(sdev->request_queue, maxp - 1); +{ + struct device *dev = us->pusb_dev->bus->controller; + + dev_info(dev, "%s: size = %zu\n", __func__, dma_max_mapping_size(dev)); + blk_queue_max_segment_size(sdev->request_queue, + dma_max_mapping_size(dev)); +} + + /* * Some host controllers may have alignment requirements. * We'll play it safe by requiring 512-byte alignment always. --- Best regards, Yoshihiro Shimoda