Re: How to resolve an issue in swiotlb environment?

2019-06-21 Thread Suwan Kim
On Wed, Jun 19, 2019 at 05:05:49PM -0400, Alan Stern wrote:
> On Wed, 19 Jun 2019, shuah wrote:
> 
> > I missed a lot of the thread info. and went looking for it and found the
> > following summary of the problem:
> > 
> > ==
> > The issue which prompted the commit this thread is about arose in a
> > situation where the block layer set up a scatterlist containing buffer
> > sizes something like:
> > 
> > 4096 4096 1536 1024
> > 
> > and the maximum packet size was 1024.  The situation was a little
> > unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't
> > matter much in normal practice because:
> > 
> > Block devices normally have a block size of 512 bytes or more.
> > Smaller values are very uncommon.  So scatterlist element sizes
> > are always divisible by 512.
> > 
> > xHCI is the only USB host controller type with a maximum packet
> > size larger than 512, and xHCI hardware can do full
> > scatter-gather so it doesn't care what the buffer sizes are.
> > 
> > So another approach would be to fix vhci-hcd and then trust that the
> > problem won't arise again, for the reasons above.  We would be okay so
> > long as nobody tried to use a USB-SCSI device with a block size of 256
> > bytes or less.
> > ===
> > 
> > Out of the summary, the following gives me pause:
> > 
> > "xHCI hardware can do full scatter-gather so it doesn't care what the
> > buffer sizes are."
> > 
> > vhci-hcd won't be able to count on hardware being able to do full
> > scatter-gather. It has to deal with a variety of hardware with
> > varying speeds.
> 
> Sure.  But you can test whether the server's HCD is able to handle 
> scatter-gather transfers, and if it is then you can say that the 
> client-side vhci-hcd is able to handle them as well.  Then all you 
> would have to do is preserve the scatterlist information describing the 
> transfer when you go between the client and the server.
> 
> The point is to make sure that the client-side vhci-hcd doesn't claim
> to be _less_ capable than the server-side actual HCD.  That's what
> leads to the problem described above.
> 
> > "We would be okay so long as nobody tried to use a USB-SCSI device with
> > a block size of 256 bytes or less."
> > 
> > At least a USB Storage device, I test with says 512 block size. Can we
> > count on not seeing a device with block size <= 256 bytes?
> 
> Yes, we can.  In fact, the SCSI core doesn't handle devices with block 
> size < 512.
> 
> > In any case, I am looking into adding SG support vhci-hci at the moment.
> > 
> > Looks like the following is the repo, I should be working with?
> > 
> > git://git.infradead.org/users/hch/misc.git
> 
> It doesn't matter.  Your work should end up being independent of 
> Christoph's, so you can base it on any repo.

I implemented SG support of vhci. I will send it as a patch.
Please look at it and let me know if you have a feedback.

Regards

Suwan Kim


Re: How to resolve an issue in swiotlb environment?

2019-06-19 Thread Alan Stern
On Wed, 19 Jun 2019, shuah wrote:

> I missed a lot of the thread info. and went looking for it and found the
> following summary of the problem:
> 
> ==
> The issue which prompted the commit this thread is about arose in a
> situation where the block layer set up a scatterlist containing buffer
> sizes something like:
> 
>   4096 4096 1536 1024
> 
> and the maximum packet size was 1024.  The situation was a little
> unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't
> matter much in normal practice because:
> 
>   Block devices normally have a block size of 512 bytes or more.
>   Smaller values are very uncommon.  So scatterlist element sizes
>   are always divisible by 512.
> 
>   xHCI is the only USB host controller type with a maximum packet
>   size larger than 512, and xHCI hardware can do full
>   scatter-gather so it doesn't care what the buffer sizes are.
> 
> So another approach would be to fix vhci-hcd and then trust that the
> problem won't arise again, for the reasons above.  We would be okay so
> long as nobody tried to use a USB-SCSI device with a block size of 256
> bytes or less.
> ===
> 
> Out of the summary, the following gives me pause:
> 
> "xHCI hardware can do full scatter-gather so it doesn't care what the
> buffer sizes are."
> 
> vhci-hcd won't be able to count on hardware being able to do full
> scatter-gather. It has to deal with a variety of hardware with
> varying speeds.

Sure.  But you can test whether the server's HCD is able to handle 
scatter-gather transfers, and if it is then you can say that the 
client-side vhci-hcd is able to handle them as well.  Then all you 
would have to do is preserve the scatterlist information describing the 
transfer when you go between the client and the server.

The point is to make sure that the client-side vhci-hcd doesn't claim
to be _less_ capable than the server-side actual HCD.  That's what
leads to the problem described above.

> "We would be okay so long as nobody tried to use a USB-SCSI device with
> a block size of 256 bytes or less."
> 
> At least a USB Storage device, I test with says 512 block size. Can we
> count on not seeing a device with block size <= 256 bytes?

Yes, we can.  In fact, the SCSI core doesn't handle devices with block 
size < 512.

> In any case, I am looking into adding SG support vhci-hci at the moment.
> 
> Looks like the following is the repo, I should be working with?
> 
> git://git.infradead.org/users/hch/misc.git

It doesn't matter.  Your work should end up being independent of 
Christoph's, so you can base it on any repo.

Alan Stern



Re: How to resolve an issue in swiotlb environment?

2019-06-19 Thread shuah

Hi Alan,

On 6/18/19 9:28 AM, shuah wrote:

On 6/14/19 8:44 AM, Alan Stern wrote:

On Thu, 13 Jun 2019, shuah wrote:


Great!  So all we have to do is fix vhci-hcd.  Then we can remove all
the virt_boundary_mask stuff from usb-storage and uas entirely.

(I'm assuming wireless USB isn't a genuine issue.  As far as I know, it
is pretty much abandoned at this point.)

Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
hard.  It ought to be possible even without changing the network
protocol.



I will start taking a look at this. Is there a target release in plan
to drop virt_boundary_mask stuff?


Not yet.  But since it doesn't do what we want anyway, this should be
fixed quickly.





I missed a lot of the thread info. and went looking for it and found the
following summary of the problem:

==
The issue which prompted the commit this thread is about arose in a
situation where the block layer set up a scatterlist containing buffer
sizes something like:

4096 4096 1536 1024

and the maximum packet size was 1024.  The situation was a little
unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't
matter much in normal practice because:

Block devices normally have a block size of 512 bytes or more.
Smaller values are very uncommon.  So scatterlist element sizes
are always divisible by 512.

xHCI is the only USB host controller type with a maximum packet
size larger than 512, and xHCI hardware can do full
scatter-gather so it doesn't care what the buffer sizes are.

So another approach would be to fix vhci-hcd and then trust that the
problem won't arise again, for the reasons above.  We would be okay so
long as nobody tried to use a USB-SCSI device with a block size of 256
bytes or less.
===

Out of the summary, the following gives me pause:

"xHCI hardware can do full scatter-gather so it doesn't care what the
buffer sizes are."

vhci-hcd won't be able to count on hardware being able to do full
scatter-gather. It has to deal with a variety of hardware with
varying speeds.

"We would be okay so long as nobody tried to use a USB-SCSI device with
a block size of 256 bytes or less."

At least a USB Storage device, I test with says 512 block size. Can we
count on not seeing a device with block size <= 256 bytes?

In any case, I am looking into adding SG support vhci-hci at the moment.

Looks like the following is the repo, I should be working with?

git://git.infradead.org/users/hch/misc.git

thanks,
-- Shuah


Re: How to resolve an issue in swiotlb environment?

2019-06-18 Thread shuah

On 6/14/19 8:44 AM, Alan Stern wrote:

On Thu, 13 Jun 2019, shuah wrote:


Great!  So all we have to do is fix vhci-hcd.  Then we can remove all
the virt_boundary_mask stuff from usb-storage and uas entirely.

(I'm assuming wireless USB isn't a genuine issue.  As far as I know, it
is pretty much abandoned at this point.)

Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
hard.  It ought to be possible even without changing the network
protocol.



I will start taking a look at this. Is there a target release in plan
to drop virt_boundary_mask stuff?


Not yet.  But since it doesn't do what we want anyway, this should be
fixed quickly.



Sounds good. I am working on it.

thanks,
-- Shuah


Re: How to resolve an issue in swiotlb environment?

2019-06-14 Thread Alan Stern
On Thu, 13 Jun 2019, shuah wrote:

> > Great!  So all we have to do is fix vhci-hcd.  Then we can remove all
> > the virt_boundary_mask stuff from usb-storage and uas entirely.
> > 
> > (I'm assuming wireless USB isn't a genuine issue.  As far as I know, it
> > is pretty much abandoned at this point.)
> > 
> > Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
> > hard.  It ought to be possible even without changing the network
> > protocol.
> > 
> 
> I will start taking a look at this. Is there a target release in plan
> to drop virt_boundary_mask stuff?

Not yet.  But since it doesn't do what we want anyway, this should be 
fixed quickly.

Alan Stern



Re: How to resolve an issue in swiotlb environment?

2019-06-13 Thread shuah

On 6/13/19 11:16 AM, Alan Stern wrote:

On Thu, 13 Jun 2019, Christoph Hellwig wrote:


On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote:

Would it be okay to rely on the assumption that USB block devices never
have block size < 512?  (We could even add code to the driver to
enforce this, although refusing to handle such devices at all might be
worse than getting an occasional error.)


sd.c only supports a few specific sector size, and none of them is
< 512 bytes:

if (sector_size != 512 &&
sector_size != 1024 &&
sector_size != 2048 &&
sector_size != 4096) {
...
sdkp->capacity = 0;


Great!  So all we have to do is fix vhci-hcd.  Then we can remove all
the virt_boundary_mask stuff from usb-storage and uas entirely.

(I'm assuming wireless USB isn't a genuine issue.  As far as I know, it
is pretty much abandoned at this point.)

Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
hard.  It ought to be possible even without changing the network
protocol.



I will start taking a look at this. Is there a target release in plan
to drop virt_boundary_mask stuff?

thanks,
-- Shuah



Re: How to resolve an issue in swiotlb environment?

2019-06-13 Thread Greg KH
On Thu, Jun 13, 2019 at 01:16:32PM -0400, Alan Stern wrote:
> On Thu, 13 Jun 2019, Christoph Hellwig wrote:
> 
> > On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote:
> > > Would it be okay to rely on the assumption that USB block devices never 
> > > have block size < 512?  (We could even add code to the driver to 
> > > enforce this, although refusing to handle such devices at all might be 
> > > worse than getting an occasional error.)
> > 
> > sd.c only supports a few specific sector size, and none of them is
> > < 512 bytes:
> > 
> > if (sector_size != 512 &&
> > sector_size != 1024 &&
> > sector_size != 2048 &&
> > sector_size != 4096) {
> > ...
> > sdkp->capacity = 0;
> 
> Great!  So all we have to do is fix vhci-hcd.  Then we can remove all 
> the virt_boundary_mask stuff from usb-storage and uas entirely.
> 
> (I'm assuming wireless USB isn't a genuine issue.  As far as I know, it 
> is pretty much abandoned at this point.)

It is, I need to just move it to staging and delete the thing.  I don't
know of any hardware anymore.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-13 Thread Alan Stern
On Thu, 13 Jun 2019, Christoph Hellwig wrote:

> On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote:
> > Would it be okay to rely on the assumption that USB block devices never 
> > have block size < 512?  (We could even add code to the driver to 
> > enforce this, although refusing to handle such devices at all might be 
> > worse than getting an occasional error.)
> 
> sd.c only supports a few specific sector size, and none of them is
> < 512 bytes:
> 
>   if (sector_size != 512 &&
>   sector_size != 1024 &&
>   sector_size != 2048 &&
>   sector_size != 4096) {
>   ...
>   sdkp->capacity = 0;

Great!  So all we have to do is fix vhci-hcd.  Then we can remove all 
the virt_boundary_mask stuff from usb-storage and uas entirely.

(I'm assuming wireless USB isn't a genuine issue.  As far as I know, it 
is pretty much abandoned at this point.)

Valentina and Shua: Adding SG support to vhci-hcd shouldn't be too
hard.  It ought to be possible even without changing the network
protocol.

Alan Stern

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-13 Thread Martin K. Petersen


Christoph,

> sd.c only supports a few specific sector size, and none of them is
> < 512 bytes:

Yep, while sd.c in theory supported 256-byte logical blocks a while
back, that code was removed since the block layer always operates on
units of 512 bytes.

-- 
Martin K. Petersen  Oracle Linux Engineering
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-13 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 10:43:11AM -0400, Alan Stern wrote:
> Would it be okay to rely on the assumption that USB block devices never 
> have block size < 512?  (We could even add code to the driver to 
> enforce this, although refusing to handle such devices at all might be 
> worse than getting an occasional error.)

sd.c only supports a few specific sector size, and none of them is
< 512 bytes:

if (sector_size != 512 &&
sector_size != 1024 &&
sector_size != 2048 &&
sector_size != 4096) {
...
sdkp->capacity = 0;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Yoshihiro Shimoda
Hi Christoph,

> From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 8:31 PM
> 
> On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote:
> > Hi Christoph,
> >
> > > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> > >
> > > First things first:
> > >
> > > Yoshihiro, can you try this git branch?  The new bits are just the three
> > > patches at the end, but they sit on top of a few patches already sent
> > > out to the list, so a branch is probably either:
> > >
> > >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes
> >
> > Thank you for the patches!
> > Unfortunately, the three patches could not resolve this issue.
> > However, it's a hint to me, and then I found the root cause:
> >  - slave_configure() in drivers/usb/storage/scsiglue.c calls
> >blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when 
> > USB_SPEED_SUPER or more.
> >  -- So that, even if your patches (also I fixed it a little [1]) could not 
> > resolve
> > the issue because the max_sectors is overwritten by above code.
> >
> > So, I think we should fix the slave_configure() by using 
> > dma_max_mapping_size().
> > What do you think? If so, I can make such a patch.
> 
> Yes, please do.

Thank you for your comment. I sent a patch to related mailing lists and you.

Best regards,
Yoshihiro Shimoda

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Alan Stern
On Wed, 12 Jun 2019, Christoph Hellwig wrote:

> On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote:
> > > Thay is someething the virt_boundary prevents.  But could still give
> > > you something like:
> > > 
> > >   1536 4096 4096 1024
> > > 
> > > or
> > >   1536 16384 8192 4096 16384 512
> > 
> > That would kill the driver, if maxpacket were 1024.
> > 
> > USB has really two kinds of requirements
> > 
> > 1. What comes from the protocol
> > 2. What comes from the HCD
> > 
> > The protocol wants just multiples of maxpacket. XHCI can satisfy
> > that in arbitrary scatter/gather. Other HCs cannot.
> 
> We have no real way to enforce that for the other HCs unfortunately.
> I can't really think of any better way to handle their limitations
> except for setting max_segments to 1 or bounce buffering.

Would it be okay to rely on the assumption that USB block devices never 
have block size < 512?  (We could even add code to the driver to 
enforce this, although refusing to handle such devices at all might be 
worse than getting an occasional error.)

As I mentioned before, the only HCD that sometimes ends up with
maxpacket = 1024 but is unable to do full SG is vhci-hcd, and that one
shouldn't be too hard to fix.

Alan Stern



Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote:
> > Thay is someething the virt_boundary prevents.  But could still give
> > you something like:
> > 
> > 1536 4096 4096 1024
> > 
> > or
> > 1536 16384 8192 4096 16384 512
> 
> That would kill the driver, if maxpacket were 1024.
> 
> USB has really two kinds of requirements
> 
> 1. What comes from the protocol
> 2. What comes from the HCD
> 
> The protocol wants just multiples of maxpacket. XHCI can satisfy
> that in arbitrary scatter/gather. Other HCs cannot.

We have no real way to enforce that for the other HCs unfortunately.
I can't really think of any better way to handle their limitations
except for setting max_segments to 1 or bounce buffering.


Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Oliver Neukum
Am Mittwoch, den 12.06.2019, 09:30 +0200 schrieb Christoph Hellwig:
> 
> So based on the above I'm a little confused about the actual requirement
> again.  Can you still split the SCSI command into multiple URBs?  And

Yes. The device sees only a number of packets over the wire. They can
come from an arbitrary number of URBs with the two restrictions that
- we cannot split a packet among URBs
- every packet but the last must be a multiple of maxpacket

> is the boundary for that split still the scatterlist entry as in the
> description above?  If so I don't really see how the virt_boundary
> helps you at all. as it only guarnatees that in a bio, each subsequent
> segment start as the advertised virt_boundary.  It says nothing about
> the size of each segment.

That is problematic.

> Thay is someething the virt_boundary prevents.  But could still give
> you something like:
> 
>   1536 4096 4096 1024
> 
> or
>   1536 16384 8192 4096 16384 512

That would kill the driver, if maxpacket were 1024.

USB has really two kinds of requirements

1. What comes from the protocol
2. What comes from the HCD

The protocol wants just multiples of maxpacket. XHCI can satisfy
that in arbitrary scatter/gather. Other HCs cannot.

Regards
Oliver



Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote:
> Hi Christoph,
> 
> > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> > 
> > First things first:
> > 
> > Yoshihiro, can you try this git branch?  The new bits are just the three
> > patches at the end, but they sit on top of a few patches already sent
> > out to the list, so a branch is probably either:
> > 
> >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes
> 
> Thank you for the patches!
> Unfortunately, the three patches could not resolve this issue.
> However, it's a hint to me, and then I found the root cause:
>  - slave_configure() in drivers/usb/storage/scsiglue.c calls
>blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER 
> or more.
>  -- So that, even if your patches (also I fixed it a little [1]) could not 
> resolve
> the issue because the max_sectors is overwritten by above code.
> 
> So, I think we should fix the slave_configure() by using 
> dma_max_mapping_size().
> What do you think? If so, I can make such a patch.

Yes, please do.


RE: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Yoshihiro Shimoda
Hi Christoph,

> From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> 
> First things first:
> 
> Yoshihiro, can you try this git branch?  The new bits are just the three
> patches at the end, but they sit on top of a few patches already sent
> out to the list, so a branch is probably either:
> 
>git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes

Thank you for the patches!
Unfortunately, the three patches could not resolve this issue.
However, it's a hint to me, and then I found the root cause:
 - slave_configure() in drivers/usb/storage/scsiglue.c calls
   blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER or 
more.
 -- So that, even if your patches (also I fixed it a little [1]) could not 
resolve
the issue because the max_sectors is overwritten by above code.

So, I think we should fix the slave_configure() by using dma_max_mapping_size().
What do you think? If so, I can make such a patch.

[1]
In the "scsi: take the DMA max mapping size into account" patch,
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_max_mapping_size(dev) << SECTOR_SHIFT);

it should be:
+   dma_max_mapping_size(dev) >> SECTOR_SHIFT);

But, if we fix the slave_configure(), we don't need this patch, IIUC.

Best regards,
Yoshihiro Shimoda



Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
First things first:

Yoshihiro, can you try this git branch?  The new bits are just the three
patches at the end, but they sit on top of a few patches already sent
out to the list, so a branch is probably either:

   git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes

Gitweb:

   
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/scsi-virt-boundary-fixes

And now on to the rest:

> We would like to avoid the extra I/O overhead for host controllers that
> can't handle SG.  In fact, switching to sg_tablesize = 1 would probably
> be considered a regression.

Ok, makes sense.

> >  - set the virt boundary as-is for devices supporting "basic" scatterlist,
> >although that still assumes they can rejiggle them because for example
> >you could still get a smaller than expected first segment ala (assuming
> >a 1024 byte packet size and thus 1023 virt_boundary_mask):
> > 
> > | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |
> > 
> >as the virt_bondary does not guarantee that the first segment is
> >the same size as all the mid segments.
> 
> But that is exactly the problem we need to solve.

So based on the above I'm a little confused about the actual requirement
again.  Can you still split the SCSI command into multiple URBs?  And
is the boundary for that split still the scatterlist entry as in the
description above?  If so I don't really see how the virt_boundary
helps you at all. as it only guarnatees that in a bio, each subsequent
segment start as the advertised virt_boundary.  It says nothing about
the size of each segment.

> The issue which prompted the commit this thread is about arose in a
> situation where the block layer set up a scatterlist containing buffer
> sizes something like:
> 
>   4096 4096 1536 1024
> 
> and the maximum packet size was 1024.  The situation was a little 
> unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't 
> matter much in normal practice because:

Thay is someething the virt_boundary prevents.  But could still give
you something like:

1536 4096 4096 1024

or
1536 16384 8192 4096 16384 512

> The ->sysdev field points to the device used for DMA mapping.  It is
> often the same as ->controller, but sometimes it is
> ->controller->parent because of the peculiarities of some platforms.

Thanks, taken into account in the above patches!


Re: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Alan Stern
On Tue, 11 Jun 2019, Christoph Hellwig wrote:

> Hi Alan,
> 
> thanks for the explanation.  It seems like what usb wants is to:
> 
>  - set sg_tablesize to 1 for devices that can't handle scatterlist at all

Hmmm.  usb-storage (and possible other drivers too) currently handles
such controllers by setting up an SG transfer as a series of separate
URBs, one for each scatterlist entry.  But this is not the same thing,
for two reasons:

It has less I/O overhead than setting sg_tablesize to 1 because 
it sets up the whole transfer as a single SCSI command, which 
requires much less time and traffic on the USB bus than sending 
multiple commands.

It has that requirement about each scatterlist element except
the last being a multiple of the maximum packet size in length.
(This is because the USB protocol says that a transfer ends
whenever a less-than-maximum-size packet is encountered.)

We would like to avoid the extra I/O overhead for host controllers that
can't handle SG.  In fact, switching to sg_tablesize = 1 would probably
be considered a regression.

>  - set the virt boundary as-is for devices supporting "basic" scatterlist,
>although that still assumes they can rejiggle them because for example
>you could still get a smaller than expected first segment ala (assuming
>a 1024 byte packet size and thus 1023 virt_boundary_mask):
> 
> | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |
> 
>as the virt_bondary does not guarantee that the first segment is
>the same size as all the mid segments.

But that is exactly the problem we need to solve.

The issue which prompted the commit this thread is about arose in a
situation where the block layer set up a scatterlist containing buffer
sizes something like:

4096 4096 1536 1024

and the maximum packet size was 1024.  The situation was a little 
unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't 
matter much in normal practice because:

Block devices normally have a block size of 512 bytes or more.
Smaller values are very uncommon.  So scatterlist element sizes
are always divisible by 512.

xHCI is the only USB host controller type with a maximum packet 
size larger than 512, and xHCI hardware can do full 
scatter-gather so it doesn't care what the buffer sizes are.

So another approach would be to fix vhci-hcd and then trust that the
problem won't arise again, for the reasons above.  We would be okay so
long as nobody tried to use a USB-SCSI device with a block size of 256
bytes or less.

>  - do not set any limit on xhci
> 
> But that just goes back to the original problem, and that is that with
> swiotlb we are limited in the total dma mapping size, and recent block
> layer changes in the way we handle the virt_boundary mean we now build
> much larger requests by default.  For SCSI ULDs to take that into
> account I need to call dma_max_mapping_size() and use that as the
> upper bound for the request size.  My plan is to do that in scsi_lib.c,
> but for that we need to expose the actual struct device that the dma
> mapping is perfomed on to the scsi layer.  If that device is different
> from the sysfs hierchary struct device, which it is for usb the ULDD
> needs to scsi_add_host_with_dma and pass the dma device as well.  How
> do I get at the dma device (aka the HCDs pci_dev or similar) from
> usb-storage/uas?

>From usb_stor_probe2(): us->pusb_dev->bus->sysdev.
>From uas_probe(): udev->bus->sysdev.

The ->sysdev field points to the device used for DMA mapping.  It is
often the same as ->controller, but sometimes it is
->controller->parent because of the peculiarities of some platforms.

Alan Stern



RE: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Yoshihiro Shimoda
Hi Christoph, Alan,

> From: Alan Stern, Sent: Tuesday, June 11, 2019 3:46 AM
> 
> On Mon, 10 Jun 2019, Christoph Hellwig wrote:
> 
> > Hi Yoshihiro,
> >
> > sorry for not taking care of this earlier, today is a public holiday
> > here and thus I'm not working much over the long weekend.

To Christoph:

No worries.

> > On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote:
> > > I have another way to avoid the issue. But it doesn't seem that a good 
> > > way though...
> > > According to the commit that adding blk_queue_virt_boundary() [3],
> > > this is needed for vhci_hcd as a workaround so that if we avoid to call it
> > > on xhci-hcd driver, the issue disappeared. What do you think?
> > > JFYI, I pasted a tentative patch in the end of email [4].
> >
> > Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it
> > seems like the usage is wrong, as it doesn't follow the same rules as
> > all the others.  I think your patch goes in the right direction,
> > but instead of comparing a hcd name it needs to be keyed of a flag
> > set by the driver (I suspect there is one indicating native SG support,
> > but I can't quickly find it), and we need an alternative solution
> > for drivers that don't see like vhci.  I suspect just limiting the
> > entire transfer size to something that works for a single packet
> > for them would be fine.
> 
> Christoph:
> 
> In most of the different kinds of USB host controllers, the hardware is
> not capable of assembling a packet out of multiple buffers at arbitrary
> addresses.  As a matter of fact, xHCI is the only kind that _can_ do
> this.
> 
> In some cases, the hardware can assemble packets provided each buffer
> other than the last ends at a page boundary and each buffer other than
> the first starts at a page boundary (Intel would say the buffers are
> "virtually contiguous"), but this is a rather complex rule and we don't
> want to rely on it.  Plus, in other cases the hardware _can't_ do this.
> 
> Instead, we want the SG buffers to be set up so that each one (except
> the last) is an exact multiple of the maximum packet size.  That way,
> each packet can be assembled from the contents of a single buffer and
> there's no problem.

There is out of this topic though, if we prepare such an exact multiple
of the maximum packet size (1024, 512 or 64), is it possible to cause
trouble on IOMMU environment? IIUC, dma_map_sg() maps SG buffers as
a single segment and then the segment buffer is not contiguous.

> The maximum packet size depends on the type of USB connection.
> Typical values are 1024, 512, or 64.  It's always a power of two and
> it's smaller than 4096.  Therefore we simplify the problem even further
> by requiring that each SG buffer in a scatterlist (except the last one)
> be a multiple of the page size.  (It doesn't need to be aligned on a
> page boundary, as far as I remember.)
> 
> That's why the blk_queue_virt_boundary usage was added to the USB code.
> Perhaps it's not the right way of doing this; I'm not an expert on the
> inner workings of the block layer.  If you can suggest a better way to
> express our requirement, that would be great.

Since I'm also not familiar with the block layer, I could not find a better
way...

Best regards,
Yoshihiro Shimoda

> Alan Stern
> 
> PS: There _is_ a flag saying whether an HCD supports SG.  But what it
> means is that the driver can handle an SG list that meets the
> requirement above; it doesn't mean that the driver can reassemble the
> data from an SG list into a series of bounce buffers in order to meet
> the requirement.  We very much want not to do that, especially since
> the block layer should already be capable of doing it for us.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Christoph Hellwig
Hi Alan,

thanks for the explanation.  It seems like what usb wants is to:

 - set sg_tablesize to 1 for devices that can't handle scatterlist at all
 - set the virt boundary as-is for devices supporting "basic" scatterlist,
   although that still assumes they can rejiggle them because for example
   you could still get a smaller than expected first segment ala (assuming
   a 1024 byte packet size and thus 1023 virt_boundary_mask):

| 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |

   as the virt_bondary does not guarantee that the first segment is
   the same size as all the mid segments.
 - do not set any limit on xhci

But that just goes back to the original problem, and that is that with
swiotlb we are limited in the total dma mapping size, and recent block
layer changes in the way we handle the virt_boundary mean we now build
much larger requests by default.  For SCSI ULDs to take that into
account I need to call dma_max_mapping_size() and use that as the
upper bound for the request size.  My plan is to do that in scsi_lib.c,
but for that we need to expose the actual struct device that the dma
mapping is perfomed on to the scsi layer.  If that device is different
from the sysfs hierchary struct device, which it is for usb the ULDD
needs to scsi_add_host_with_dma and pass the dma device as well.  How
do I get at the dma device (aka the HCDs pci_dev or similar) from
usb-storage/uas?


Re: How to resolve an issue in swiotlb environment?

2019-06-10 Thread Alan Stern
On Mon, 10 Jun 2019, Christoph Hellwig wrote:

> Hi Yoshihiro,
> 
> sorry for not taking care of this earlier, today is a public holiday
> here and thus I'm not working much over the long weekend.
> 
> On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote:
> > I have another way to avoid the issue. But it doesn't seem that a good way 
> > though...
> > According to the commit that adding blk_queue_virt_boundary() [3],
> > this is needed for vhci_hcd as a workaround so that if we avoid to call it
> > on xhci-hcd driver, the issue disappeared. What do you think?
> > JFYI, I pasted a tentative patch in the end of email [4].
> 
> Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it
> seems like the usage is wrong, as it doesn't follow the same rules as
> all the others.  I think your patch goes in the right direction,
> but instead of comparing a hcd name it needs to be keyed of a flag
> set by the driver (I suspect there is one indicating native SG support,
> but I can't quickly find it), and we need an alternative solution
> for drivers that don't see like vhci.  I suspect just limiting the
> entire transfer size to something that works for a single packet
> for them would be fine.

Christoph:

In most of the different kinds of USB host controllers, the hardware is
not capable of assembling a packet out of multiple buffers at arbitrary
addresses.  As a matter of fact, xHCI is the only kind that _can_ do 
this.

In some cases, the hardware can assemble packets provided each buffer
other than the last ends at a page boundary and each buffer other than
the first starts at a page boundary (Intel would say the buffers are
"virtually contiguous"), but this is a rather complex rule and we don't
want to rely on it.  Plus, in other cases the hardware _can't_ do this.

Instead, we want the SG buffers to be set up so that each one (except 
the last) is an exact multiple of the maximum packet size.  That way, 
each packet can be assembled from the contents of a single buffer and 
there's no problem.

The maximum packet size depends on the type of USB connection.  
Typical values are 1024, 512, or 64.  It's always a power of two and
it's smaller than 4096.  Therefore we simplify the problem even further
by requiring that each SG buffer in a scatterlist (except the last one)
be a multiple of the page size.  (It doesn't need to be aligned on a 
page boundary, as far as I remember.)

That's why the blk_queue_virt_boundary usage was added to the USB code.  
Perhaps it's not the right way of doing this; I'm not an expert on the
inner workings of the block layer.  If you can suggest a better way to
express our requirement, that would be great.

Alan Stern

PS: There _is_ a flag saying whether an HCD supports SG.  But what it
means is that the driver can handle an SG list that meets the
requirement above; it doesn't mean that the driver can reassemble the
data from an SG list into a series of bounce buffers in order to meet
the requirement.  We very much want not to do that, especially since
the block layer should already be capable of doing it for us.



Re: How to resolve an issue in swiotlb environment?

2019-06-10 Thread Christoph Hellwig
Hi Yoshihiro,

sorry for not taking care of this earlier, today is a public holiday
here and thus I'm not working much over the long weekend.

On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote:
> I have another way to avoid the issue. But it doesn't seem that a good way 
> though...
> According to the commit that adding blk_queue_virt_boundary() [3],
> this is needed for vhci_hcd as a workaround so that if we avoid to call it
> on xhci-hcd driver, the issue disappeared. What do you think?
> JFYI, I pasted a tentative patch in the end of email [4].

Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it
seems like the usage is wrong, as it doesn't follow the same rules as
all the others.  I think your patch goes in the right direction,
but instead of comparing a hcd name it needs to be keyed of a flag
set by the driver (I suspect there is one indicating native SG support,
but I can't quickly find it), and we need an alternative solution
for drivers that don't see like vhci.  I suspect just limiting the
entire transfer size to something that works for a single packet
for them would be fine.


RE: How to resolve an issue in swiotlb environment?

2019-06-10 Thread Yoshihiro Shimoda
Hi Christoph, Alan,
(add linux-usb ML on CC.)

> From: Yoshihiro Shimoda, Sent: Friday, June 7, 2019 9:00 PM
> 
> Hi Christoph,
> 
> I think we should continue to discuss on this email thread instead of the 
> fixed DMA-API.txt patch [1]
> 
> [1]
> https://marc.info/?t=15598941221=1=2
> 
> > From: Yoshihiro Shimoda, Sent: Monday, June 3, 2019 3:42 PM
> >
> > Hi linux-block and iommu mailing lists,
> >
> > I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is 
> > full" like below.
> >
> > [   36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 
> > 524288 bytes), total 32768 (slots), used 1338
> (slots)
> >
> > I have investigated this issue by using git bisect, and then I found the 
> > following commit:
> >
> > ---
> > commit 09324d32d2a0843e66652a087da6f77924358e62
> > Author: Christoph Hellwig 
> > Date:   Tue May 21 09:01:41 2019 +0200
> >
> > block: force an unlimited segment size on queues with a virt boundary
> > ---
> 
> Thank you for your comment on other email thread [2] like below:
> ---
> Turns out it isn't as simple as I thought, as there doesn't seem to
> be an easy way to get to the struct device used for DMA mapping
> from USB drivers.  I'll need to think a bit more how to handle that
> best.
> ---
> 
> [2]
> https://marc.info/?l=linux-doc=155989651620473=2

I have another way to avoid the issue. But it doesn't seem that a good way 
though...
According to the commit that adding blk_queue_virt_boundary() [3],
this is needed for vhci_hcd as a workaround so that if we avoid to call it
on xhci-hcd driver, the issue disappeared. What do you think?
JFYI, I pasted a tentative patch in the end of email [4].

---
[3]
commit 747668dbc061b3e62bc1982767a3a1f9815fcf0e
Author: Alan Stern 
Date:   Mon Apr 15 13:19:25 2019 -0400

usb-storage: Set virt_boundary_mask to avoid SG overflows
---
[4]
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 59190d8..277c6f7e 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -30,6 +30,8 @@
 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -65,6 +67,7 @@ static const char* host_info(struct Scsi_Host *host)
 static int slave_alloc (struct scsi_device *sdev)
 {
struct us_data *us = host_to_us(sdev->host);
+   struct usb_hcd *hcd = bus_to_hcd(us->pusb_dev->bus);
int maxp;
 
/*
@@ -80,8 +83,10 @@ static int slave_alloc (struct scsi_device *sdev)
 * Bulk maxpacket value.  Fortunately this value is always a
 * power of 2.  Inform the block layer about this requirement.
 */
-   maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0);
-   blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
+   if (!strcmp(hcd->driver->description, "vhci_hcd")) {
+   maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0);
+   blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
+   }
 
/*
 * Some host controllers may have alignment requirements.
---
Best regards,
Yoshihiro Shimoda



RE: How to resolve an issue in swiotlb environment?

2019-06-10 Thread Biju Das
Hi All,

Any update on the below issue. I am seeing similar issue on RZ/G2M board with 
Linux version 5.2.0-rc3.

root@hihope-rz-g2m:~# [   35.414177] usb 2-1: new SuperSpeed Gen 1 USB device 
number 2 using xhci-hcd
[   35.449402] usb-storage 2-1:1.0: USB Mass Storage device detected
[   35.455915] scsi host0: usb-storage 2-1:1.0
[   36.482585] scsi 0:0:0:0: Direct-Access SanDisk  Extreme  0001 
PQ: 0 ANSI: 6
[   36.491260] sd 0:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 
GB/59.6 GiB)
[   36.499823] sd 0:0:0:0: [sda] Write Protect is off
[   36.505474] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[   36.518074]  sda: sda1
[   36.523163] sd 0:0:0:0: [sda] Attached SCSI disk

root@hihope-rz-g2m:~# mkdir -p /tmp/rmnt/sda1
root@hihope-rz-g2m:~# mount -t auto /dev/sda1 /tmp/rmnt/sda1
root@hihope-rz-g2m:~# dd if=/dev/urandom of=/tmp/sda1-random bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.187696 s, 55.9 MB/s
root@hihope-rz-g2m:~# cp /tmp/sda1-random /tmp/rmnt/sda1/sda1-random
root@hihope-rz-g2m:~# [  218.861212] xhci-hcd ee00.usb: swiotlb buffer is 
full (sz: 1003520 bytes), total 32768 (slots), used 1088 (slots)
[  218.871885] xhci-hcd ee00.usb: overflow 0x00067430b000+1003520 of 
DMA mask  bus mask 0
[  218.881233] WARNING: CPU: 0 PID: 258 at kernel/dma/direct.c:43 
report_addr+0x38/0xa8
[  218.888974] Modules linked in: renesas_usb3 usb_dmac phy_rcar_gen3_usb3
[  218.895594] CPU: 0 PID: 258 Comm: usb-storage Not tainted 
5.2.0-rc3-00017-gc80b083-dirty #5
[  218.903940] Hardware name: HopeRun HiHope RZ/G2M with sub board (DT)
[  218.910291] pstate: 4005 (nZcv daif -PAN -UAO)
[  218.915078] pc : report_addr+0x38/0xa8
[  218.918821] lr : report_addr+0xa0/0xa8
[  218.922564] sp : 125fb970
[  218.925872] x29: 125fb970 x28: 
[  218.931180] x27:  x26: 1f020280
[  218.936487] x25: 8006394a82a8 x24: 
[  218.941794] x23: 0001 x22: 
[  218.947101] x21: 000f5000 x20: 11309000
[  218.952408] x19: 80063a600010 x18: 
[  218.957714] x17:  x16: 
[  218.963023] x15: 113096c8 x14: 4d4420666f203032
[  218.968331] x13: 35333030312b3030 x12: 3062303334373630
[  218.973638] x11: 3030303030307830 x10: 11309f20
[  218.978945] x9 : 112e3018 x8 : 0123
[  218.984252] x7 : 0005 x6 : 80063b578180
[  218.989559] x5 : 80063b578180 x4 : 
[  218.994865] x3 : 80063b57ef10 x2 : eed25f279b69f300
[  219.000172] x1 : eed25f279b69f300 x0 : 
[  219.005481] Call trace:
[  219.007923]  report_addr+0x38/0xa8
[  219.011321]  dma_direct_map_page+0x148/0x158
[  219.015586]  dma_direct_map_sg+0x78/0xe0
[  219.019510]  usb_hcd_map_urb_for_dma+0x2fc/0x468
[  219.024124]  xhci_map_urb_for_dma+0x54/0x68
[  219.028303]  usb_hcd_submit_urb+0x88/0x968
[  219.032394]  usb_submit_urb+0x3b0/0x570
[  219.036226]  usb_sg_wait+0x98/0x158
[  219.039711]  usb_stor_bulk_transfer_sglist.part.3+0x94/0x128
[  219.045366]  usb_stor_bulk_srb+0x48/0x88
[  219.049283]  usb_stor_Bulk_transport+0x10c/0x390
[  219.053896]  usb_stor_invoke_transport+0x3c/0x500
[  219.058595]  usb_stor_transparent_scsi_command+0xc/0x18
[  219.063816]  usb_stor_control_thread+0x1c4/0x260
[  219.068431]  kthread+0x124/0x128
[  219.071660]  ret_from_fork+0x10/0x18
[  219.075229] ---[ end trace dd9ef2a6b7fef860 ]---
[  219.080087] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.090810] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.101510] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.112209] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.122901] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.133591] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.144283] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.154973] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  219.165674] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  223.861717] swiotlb_tbl_map_single: 67451 callbacks suppressed
[  223.861721] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  223.878249] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 1003520 
bytes), total 32768 (slots), used 1088 (slots)
[  

RE: How to resolve an issue in swiotlb environment?

2019-06-07 Thread Yoshihiro Shimoda
Hi Christoph,

I think we should continue to discuss on this email thread instead of the fixed 
DMA-API.txt patch [1]

[1]
https://marc.info/?t=15598941221=1=2

> From: Yoshihiro Shimoda, Sent: Monday, June 3, 2019 3:42 PM
> 
> Hi linux-block and iommu mailing lists,
> 
> I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is full" 
> like below.
> 
> [   36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 524288 
> bytes), total 32768 (slots), used 1338 (slots)
> 
> I have investigated this issue by using git bisect, and then I found the 
> following commit:
> 
> ---
> commit 09324d32d2a0843e66652a087da6f77924358e62
> Author: Christoph Hellwig 
> Date:   Tue May 21 09:01:41 2019 +0200
> 
> block: force an unlimited segment size on queues with a virt boundary
> ---

Thank you for your comment on other email thread [2] like below:
---
Turns out it isn't as simple as I thought, as there doesn't seem to
be an easy way to get to the struct device used for DMA mapping
from USB drivers.  I'll need to think a bit more how to handle that
best.
---

[2]
https://marc.info/?l=linux-doc=155989651620473=2

I'm not sure this is a correct way, but the issue disappears if I applied a 
patch below
to USB storage driver. Especially, WARNING happened on 
blk_queue_max_segment_size().
Maybe we need to expand the argument "struct device *" of 
blk_queue_virt_boundary() to
call dma_max_mapping_size()?
---
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 59190d8..fa37b39 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -28,6 +28,7 @@
  * status of a command.
  */
 
+#include 
 #include 
 #include 
 
@@ -83,6 +84,15 @@ static int slave_alloc (struct scsi_device *sdev)
maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0);
blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
 
+{
+   struct device *dev = us->pusb_dev->bus->controller;
+
+   dev_info(dev, "%s: size = %zu\n", __func__, dma_max_mapping_size(dev));
+   blk_queue_max_segment_size(sdev->request_queue,
+  dma_max_mapping_size(dev));
+}
+
+
/*
 * Some host controllers may have alignment requirements.
 * We'll play it safe by requiring 512-byte alignment always.
---

Best regards,
Yoshihiro Shimoda



How to resolve an issue in swiotlb environment?

2019-06-03 Thread Yoshihiro Shimoda
Hi linux-block and iommu mailing lists,

I have an issue that a USB SSD with xHCI on R-Car H3 causes "swiotlb is full" 
like below.

[   36.745286] xhci-hcd ee00.usb: swiotlb buffer is full (sz: 524288 
bytes), total 32768 (slots), used 1338 (slots)

I have investigated this issue by using git bisect, and then I found the 
following commit:

---
commit 09324d32d2a0843e66652a087da6f77924358e62
Author: Christoph Hellwig 
Date:   Tue May 21 09:01:41 2019 +0200

block: force an unlimited segment size on queues with a virt boundary
---

After the patch above was applied, the max_segment_size of USB SSD is set to 
UINT_MAX so that
it might be possible to use larger memory size than swiotlb's memory limit that 
is up to 256 KiB
as I investigated before [1]. Note that if the patch was not applied, the 
max_segment_size is set
to 65536.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/mmc/host/tmio_mmc_core.c?h=v5.2-rc3=e90e8da72ad694a16a4ffa6e5adae3610208f73b

How to resolve this issue? JFYI, I pasted a git bisect log and a kernel log as 
following:

--- git bisect log ---
git bisect start
# bad: [cd6c84d8f0cdc911df435bb075ba22ce3c605b07] Linux 5.2-rc2
git bisect bad cd6c84d8f0cdc911df435bb075ba22ce3c605b07
# good: [e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd] Linux 5.1
git bisect good e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd
# good: [f4d9a23d3dad0252f375901bf4ff6523a2c97241] sparc64: simplify 
reduce_memory() function
git bisect good f4d9a23d3dad0252f375901bf4ff6523a2c97241
# good: [4dbf09fea60d158e60a30c419e0cfa1ea138dd57] Merge tag 'mtd/for-5.2' of 
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mtd/linux
git bisect good 4dbf09fea60d158e60a30c419e0cfa1ea138dd57
# good: [22c58fd70ca48a29505922b1563826593b08cc00] Merge tag 'armsoc-soc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 22c58fd70ca48a29505922b1563826593b08cc00
# good: [4c7b63a32d54850a31a00f22131db417face70e4] Merge tag 
'linux-kselftest-5.2-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
git bisect good 4c7b63a32d54850a31a00f22131db417face70e4
# good: [d8848eefc1d541dd0e3ae175e09fb5c66f1d4de6] Merge tag 
'5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
git bisect good d8848eefc1d541dd0e3ae175e09fb5c66f1d4de6
# good: [9e567af4f0fc3bfa77462c87246ceb82896cdebc] treewide: Replace GPLv2 
boilerplate/reference with SPDX - rule 90
git bisect good 9e567af4f0fc3bfa77462c87246ceb82896cdebc
# good: [51816e9e113934281b44f1a352852ef7631e75ea] locking/lock_events: Use 
this_cpu_add() when necessary
git bisect good 51816e9e113934281b44f1a352852ef7631e75ea
# bad: [7fbc78e3155a0c464bd832efc07fb3c2355fe9bd] Merge tag 
'for-linus-20190524' of git://git.kernel.dk/linux-block
git bisect bad 7fbc78e3155a0c464bd832efc07fb3c2355fe9bd
# good: [e7bd3e248bc36451fdbf2a2e3a3c5a23cd0b1f6f] Merge tag 
'devicetree-fixes-for-5.2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect good e7bd3e248bc36451fdbf2a2e3a3c5a23cd0b1f6f
# good: [cb9e0e5006064a807b5d722c7e3c42f307193792] nvme-pci: use blk-mq mapping 
for unmanaged irqs
git bisect good cb9e0e5006064a807b5d722c7e3c42f307193792
# bad: [7996a8b5511a72465b0b286763c2d8f412b8874a] blk-mq: fix hang caused by 
freeze/unfreeze sequence
git bisect bad 7996a8b5511a72465b0b286763c2d8f412b8874a
# good: [eded341c085bebdd653f8086c02179098cb81748] block: don't decrement 
nr_phys_segments for physically contigous segments
git bisect good eded341c085bebdd653f8086c02179098cb81748
# bad: [200a9aff7b02feea30b01141b0df9bc19457a232] block: remove the segment 
size check in bio_will_gap
git bisect bad 200a9aff7b02feea30b01141b0df9bc19457a232
# bad: [09324d32d2a0843e66652a087da6f77924358e62] block: force an unlimited 
segment size on queues with a virt boundary
git bisect bad 09324d32d2a0843e66652a087da6f77924358e62
# first bad commit: [09324d32d2a0843e66652a087da6f77924358e62] block: force an 
unlimited segment size on queues with a virt boundary


--- kernel log ---
[0.00] Booting Linux on physical CPU 0x00 [0x411fd073]
[0.00] Linux version 5.1.0-12510-g09324d3 (shimoda@shimoda-RB02198) 
(gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 
56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #17 SMP 
PREEMPT Mon Jun 3 14:56:35 JST 2019
[0.00] Machine model: Renesas Salvator-X board based on r8a7795 ES2.0+
[0.00] printk: debug: ignoring loglevel setting.
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] cma: Reserved 32 MiB at 0x7e00
[0.00] NUMA: No NUMA configuration found
[0.00] NUMA: Faking a node at [mem 
0x4800-0x00073fff]
[0.00] NUMA: NODE_DATA [mem 0x73f7de840-0x73f7d]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x4800-0x]
[0.00]   Normal   [mem 0x0001-0x00073fff]
[