Re: [PATCH 5/6] xhci: Use device_iommu_mapped()

2018-12-17 Thread Mathias Nyman

On 11.12.2018 15:43, Joerg Roedel wrote:

From: Joerg Roedel 

Replace the dev->iommu_group check with a proper function
call that better reprensents its purpose.

Cc: Mathias Nyman 
Acked-by: Robin Murphy 
Signed-off-by: Joerg Roedel 
---
  drivers/usb/host/xhci.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index dae3be1b9c8f..8eacd2ed412b 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -244,7 +244,7 @@ static void xhci_zero_64b_regs(struct xhci_hcd *xhci)
 * an iommu. Doing anything when there is no iommu is definitely
 * unsafe...
 */
-   if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !dev->iommu_group)
+   if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
return;
  
  	xhci_info(xhci, "Zeroing 64bit base registers, expecting fault\n");




Acked-by: Mathias Nyman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-20 Thread Mathias Nyman

On 19.07.2018 17:57, Alan Stern wrote:

On Thu, 19 Jul 2018, Mathias Nyman wrote:


xhci driver will set up all the endpoints for the new altsetting already in
usb_hcd_alloc_bandwidth().

New endpoints will be ready and rings running after this. I don't know the exact
history behind this, but I assume it is because xhci does all of the steps to
drop/add, disable/enable endpoints and check bandwidth in a single configure
endpoint command, that will return errors if there is not enough bandwidth.


That's right; Sarah and I spent some time going over this while she was
working on it.  But it looks like the approach isn't adequate.


This command is issued in hcd->driver->check_bandwidth()
This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
hcd->driver->endpoint_enable

It also means that xhci driver assumes rings are empty when
hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
If there are URBs left on a endpoint ring that was dropped+added
(freed+reallocated) then those URBs will contain pointers to freed ring,
causing issues when usb_hcd_flush_endpoint() cancels those URBs.

usb_set_interface()
usb_hcd_alloc_bandwidth()
  hcd->driver->drop_endpoint()
  hcd->driver->add_endpoint() // allocates new rings
  hcd->driver->check_bandwidth() // issues configure endpoint command, free 
rings.
usb_disable_interface(iface, true)
  usb_disable_endpoint()
usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
usb_hcd_disable_endpoint()
  hcd->driver->endpoint_disable()  // xhci does nothing
usb_enable_interface(iface, true)
  usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci 
side.

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.

But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently 
xhci
has the altssetting up and running when usb core hasn't event started flushing 
endpoints.


Absolutely.  The core tries to be compatible with host controller
drivers that either allocate bandwidth as it is requested or else
allocate bandwidth all at once when an altsetting is installed.

xhci-hcd falls into the second category.  However, this approach
requires the bandwidth verification for the new altsetting to be
performed before the old altsetting has been disabled, and the xHCI
hardware can't do this.

We may need to change the core so that the old endpoints are disabled
before the bandwidth check is done, instead of after.  Of course, this
leads to an awkward situation if the check fails -- we'd probably have
to go back and re-install the old altsetting.


That would help xhci a lot.

If we want to avoid the awkward altsetting re-install after bandwidth failure
then adding a extra endpoint flush before checking the bandwidth would already 
help a lot.

The endpoint disabling can then be remain after bandwidth checking.
Does that work for other host controllers?

-Mathias
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-07-20 Thread Mathias Nyman

On 19.07.2018 20:32, Sudip Mukherjee wrote:

Hi Mathias,

On Thu, Jul 19, 2018 at 06:42:19PM +0300, Mathias Nyman wrote:

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.


Yes, please. I think your first-aid will be a much better option than
the hacky patch I am using atm.



Attached a patch that checks canceled URB td/trb pointers.
I haven't tested it at all (well compiles and boots, but new code never 
exercised)

Does it work for you?


No, not exactly. :(

I can see your message getting printed.
[  249.518394] xhci_hcd :00:14.0: Canceled URB td not found on endpoint ring
[  249.518431] xhci_hcd :00:14.0: Canceled URB td not found on endpoint ring

But I can see the message from slub debug again:

[  348.279986] 
=
[  348.279993] BUG kmalloc-96 (Tainted: G U O   ): Poison overwritten
[  348.279995] 
-

[  348.279997] Disabling lock debugging due to kernel taint
[  348.28] INFO: 0xe5acda60-0xe5acda67. First byte 0x60 instead of 0x6b
[  348.280012] INFO: Allocated in xhci_ring_alloc.constprop.14+0x31/0x125 
[xhci_hcd] age=129264 cpu=0 pid=33

...

[  348.280095] INFO: Freed in xhci_ring_free+0xa7/0xc6 [xhci_hcd] age=98722 
cpu=0 pid=33

...

[  348.280158] INFO: Slab 0xf46e0fe0 objects=29 used=29 fp=0x  (null) 
flags=0x40008100
[  348.280160] INFO: Object 0xe5acda48 @offset=6728 fp=0xe5acd700

[  348.280164] Redzone e5acda40: bb bb bb bb bb bb bb bb
  
[  348.280167] Object e5acda48: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
 
[  348.280169] Object e5acda58: 6b 6b 6b 6b 6b 6b 6b 6b 60 da ac e5 60 da ac e5 
 `...`...


So poison is overwritten at e5acda58 with almost its own address, (reading 
backwards) e5 ac da 60, twice.
looks like something (32bit?)is pointing to itself twice, maybe a linked list 
node next and prev pointer
being set to point to itself as last item was removed from list.

The cancelled_td_list is part of struct xhci_virt_ep, so that should be fine.
But td_list is part of struct xhci_ring, which was freed. and we removed the 
URBs tds from the td_list when
flushing the ring after ring was freed

I changed the patch (attached) to make sure it doesn't touch the td_list when 
canceling a URB after
ring is freed.

How about this one, any improvements?

-Mathias  

 
>From ee48d9f9c2d82058489dcdc38faa34a3cbdb08d1 Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Thu, 19 Jul 2018 18:06:18 +0300
Subject: [PATCH v2] xhci: when dequeing a URB make sure it exists on the
 current endpoint ring.

If the endpoint ring has been reallocated since the URB was enqueued,
then URB may contain TD and TRB pointers to a already freed ring.
If this the case then manuallt return the URB without touching any of the
freed ring structure data.

Don't try to stop the ring. It would be useless.

This can happened if endpoint is not flushed before it is dropped and
re-added, which is the case in usb_set_interface() as xhci does
things in an odd order.

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 711da33..7093341 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -37,6 +37,21 @@ static unsigned int quirks;
 module_param(quirks, uint, S_IRUGO);
 MODULE_PARM_DESC(quirks, "Bit flags for quirks to be enabled as default");
 
+static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+	struct xhci_segment *seg = ring->first_seg;
+
+	if (!td || !td->start_seg)
+		return false;
+	do {
+		if (seg == td->start_seg)
+			return true;
+		seg = seg->next;
+	} while (seg && seg != ring->first_seg);
+
+	return false;
+}
+
 /* TODO: copied from ehci-hcd.c - can this be refactored? */
 /*
  * xhci_handshake - spin reading hc until handshake completes or fails
@@ -1467,6 +1482,21 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 		goto done;
 	}
 
+	/*
+	 * check ring is not re-allocated since URB was enqueued. If it is, then
+	 * make sure none of the ring related pointers in this URB private data
+	 * are touched, such as td_list, otherwise we overwrite freed data
+	 */
+	if (!td_on_ring(_priv->td[0], ep_ring)) {
+		xhci_err(xhci, "Canceled URB td not found on endpoint ring");
+		for (i = urb_priv->num_tds_done; i < urb_priv->num_tds; i++) {
+			td = _priv->td[i];
+			if (!list_empty(>cancelled_td_list))
+list_del_init(>cancelled_td_list);
+		}
+		goto err_giveback;
+	}
+
 	if (xhci->xhc_state & XHCI_STATE_HALTED) {
 		xhci_dbg_t

Re: usb HC busted?

2018-07-19 Thread Mathias Nyman

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.


Yes, please. I think your first-aid will be a much better option than
the hacky patch I am using atm.



Attached a patch that checks canceled URB td/trb pointers.
I haven't tested it at all (well compiles and boots, but new code never 
exercised)

Does it work for you?



But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently 
xhci
has the altssetting up and running when usb core hasn't event started flushing 
endpoints.


I am able to reproduce this on almost all cycles, so I can always test
the fix for you after you are fully back from your holiday.


Nice, thanks

-Mathias
>From a7d4af3129a91811c95ea642f6c916b1c1ca6d46 Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Thu, 19 Jul 2018 18:06:18 +0300
Subject: [PATCH] xhci: when dequeing a URB make sure it exists on the current
 endpoint ring.

If the endpoint ring has been reallocated since the URB was enqueued,
then URB may contain TD and TRB pointers to a already freed ring.
If this the case then manuallt return the URB, and don't try to stop
the ring. It would be useless.

This can happened if endpoint is not flushed before it is dropped and
re-added, which is the case in usb_set_interface() as xhci does
things in an odd order.

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci.c | 43 ---
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 711da33..5bedab7 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -37,6 +37,21 @@ static unsigned int quirks;
 module_param(quirks, uint, S_IRUGO);
 MODULE_PARM_DESC(quirks, "Bit flags for quirks to be enabled as default");
 
+static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
+{
+	struct xhci_segment *seg = ring->first_seg;
+
+	if (!td || !td->start_seg)
+		return false;
+	do {
+		if (seg == td->start_seg)
+			return true;
+		seg = seg->next;
+	} while (seg && seg != ring->first_seg);
+
+	return false;
+}
+
 /* TODO: copied from ehci-hcd.c - can this be refactored? */
 /*
  * xhci_handshake - spin reading hc until handshake completes or fails
@@ -1467,19 +1482,16 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 		goto done;
 	}
 
+	/* check ring is not re-allocated since URB was enqueued */
+	if (!td_on_ring(_priv->td[0], ep_ring)) {
+		xhci_err(xhci, "Canceled URB td not found on endpoint ring");
+		goto err_unlink_giveback;
+	}
+
 	if (xhci->xhc_state & XHCI_STATE_HALTED) {
 		xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-"HC halted, freeing TD manually.");
-		for (i = urb_priv->num_tds_done;
-		 i < urb_priv->num_tds;
-		 i++) {
-			td = _priv->td[i];
-			if (!list_empty(>td_list))
-list_del_init(>td_list);
-			if (!list_empty(>cancelled_td_list))
-list_del_init(>cancelled_td_list);
-		}
-		goto err_giveback;
+			   "HC halted, freeing TD manually.");
+		goto err_unlink_giveback;
 	}
 
 	i = urb_priv->num_tds_done;
@@ -1519,6 +1531,15 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 	spin_unlock_irqrestore(>lock, flags);
 	return ret;
 
+err_unlink_giveback:
+	for (i = urb_priv->num_tds_done; i < urb_priv->num_tds; i++) {
+		td = _priv->td[i];
+		if (!list_empty(>td_list))
+			list_del_init(>td_list);
+		if (!list_empty(>cancelled_td_list))
+			list_del_init(>cancelled_td_list);
+	}
+
 err_giveback:
 	if (urb_priv)
 		xhci_urb_free_priv(urb_priv);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: usb HC busted?

2018-07-19 Thread Mathias Nyman

On 17.07.2018 18:10, Sudip Mukherjee wrote:

Hi Alan, Greg,

On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote:

On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote:

Hi Alan,

On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote:

On Tue, 17 Jul 2018, Sudip Mukherjee wrote:


I did some more debugging. Tested with a KASAN enabled kernel and that
shows the problem. The report is attached.

To my understanding:

btusb_work() is calling usb_set_interface() with alternate = 0. which
again calls usb_hcd_alloc_bandwidth() and that frees the rings by
xhci_free_endpoint_ring().


That doesn't sound like the right thing to do.  The rings shouldn't be
freed until xhci_endpoint_disable() is called.

On the other hand, there doesn't appear to be any
xhci_endpoint_disable() routine, although a comment refers to it.
Maybe this is the real problem?


one of your old mail might help :)

https://www.spinics.net/lists/linux-usb/msg98123.html


Wrote too soon.

Is it the one you are looking for -
usb_disable_endpoint() is in drivers/usb/core/message.c


I think now I understand what the problem is.
usb_set_interface() calls usb_disable_interface() which again calls
usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer
to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint().
After flushing the endpoints usb_disable_endpoint() calls
usb_hcd_disable_endpoint() which tries to do:
if (hcd->driver->endpoint_disable)
hcd->driver->endpoint_disable(hcd, ep);
but there is no endpoint_disable() callback in xhci, so the endpoint is
never marked as disabled. So, next time usb_hcd_flush_endpoint() is
called I get this corruption.
And this is exactly where I used to see the problem happening.

And, my hacky patch worked as I prevented it from calling
usb_disable_interface() in this particular case.



Back for a few days, looking at this

xhci driver will set up all the endpoints for the new altsetting already in
usb_hcd_alloc_bandwidth().

New endpoints will be ready and rings running after this. I don't know the exact
history behind this, but I assume it is because xhci does all of the steps to
drop/add, disable/enable endpoints and check bandwidth in a single configure
endpoint command, that will return errors if there is not enough bandwidth.
This command is issued in hcd->driver->check_bandwidth()
This means that xhci doesn't really do much in hcd->driver->endpoint_disable or
hcd->driver->endpoint_enable

It also means that xhci driver assumes rings are empty when
hcd->driver->check_bandwidth is called. It will bluntly free dropped rings.
If there are URBs left on a endpoint ring that was dropped+added
(freed+reallocated) then those URBs will contain pointers to freed ring,
causing issues when usb_hcd_flush_endpoint() cancels those URBs.

usb_set_interface()
  usb_hcd_alloc_bandwidth()
hcd->driver->drop_endpoint()
hcd->driver->add_endpoint() // allocates new rings
hcd->driver->check_bandwidth() // issues configure endpoint command, free 
rings.
  usb_disable_interface(iface, true)
usb_disable_endpoint()
  usb_hcd_flush_endpoint() // will access freed ring if URBs found!!
  usb_hcd_disable_endpoint()
hcd->driver->endpoint_disable()  // xhci does nothing
  usb_enable_interface(iface, true)
usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side.

As first aid I could try to implement checks that make sure the flushed URBs
trb pointers really are on the current endpoint ring, and also add some warning
if we are we are dropping endpoints with URBs still queued.

But we need to fix this properly as well.
xhci needs to be more in sync with usb core in usb_set_interface(), currently 
xhci
has the altssetting up and running when usb core hasn't event started flushing 
endpoints.

-Mathias
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-06-29 Thread Mathias Nyman

On 27.06.2018 14:59, Sudip Mukherjee wrote:

Can you share a bit more details on the platform you are using, and what types 
of test you are running.


It is a board based on "Intel(R) Atom(TM) CPU  E3840  @ 1.91GHz".
The usb device in question is a bluetooth device:

Bus 001 Device 012: ID 8087:07dc Intel Corp.




And the problem that we are seeing is with phone calls via bluetooth.


Does my test above trigger the case? (show "MATTU dmatest match!")


I have kept it for tonight, will see the results tomorrow morning.
And I am using that same device in the usb script to change "authrized".


No, your test did not trigger the error. :(

But, my last night's test (with an added debug to get some extra trace for 
addresses) showed the same error of -
"Looking for event-dma", but looking at the ftrace, I could not see it getting 
same address from dma_pool_zalloc().

Can you please have a look at the dmesg and ftrace at:
https://drive.google.com/open?id=1nMy_qVxOQzcZNYa9bw7az9WiS2MZzdKo



There is however freeing of the same dma address:

<...>-28448 [003]    492.025808: xhci_ring_free: ISOC f1ffb700: enq 
0x2d31bcc0(0x2d31b000) deq 0x2d31b000(0x2d31b000) 
segs 2 stream 0 free_trbs 305 bounce 17 cycle 0
<...>-28448 [003]    492.025818: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ 0x2d31b000
<...>-28448 [003]    492.025823: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ 0x2d31b000
<...>-28448 [003]    492.025826: xhci_ring_free: ISOC f1f9b380: enq 
0x2d31b140(0x2d31b000) deq 0x2d31b000(0x2d31b000) 
segs 2 stream 0 free_trbs 489 bounce 17 cycle 1
<...>-28448 [003]    492.025828: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ 0x2d31b000
<...>-28448 [003]    492.025830: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ 0x2d31b000

I'd guess it's still the same cause, maybe trace is not complete?

-Mathias
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-06-21 Thread Mathias Nyman

On 21.06.2018 03:53, Sudip Mukherjee wrote:

Hi Mathias, Andy,

On Thu, Jun 07, 2018 at 10:40:03AM +0300, Mathias Nyman wrote:

On 06.06.2018 19:45, Sudip Mukherjee wrote:

Hi Andy,

And we meet again. :)

On Wed, Jun 06, 2018 at 06:36:35PM +0300, Andy Shevchenko wrote:

On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote:

On 04.06.2018 18:28, Sudip Mukherjee wrote:

On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote:





Odd and unlikely, but to me this looks like some issue in allocating
dma memory
from pool using dma_pool_zalloc()


Here's the story:
Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel.
All tracing points to dma_pool_zalloc() returning the same dma address
block on
consecutive calls.

In the failing case dma_pool_zalloc() is called 3 - 6us apart.

<...>-26362 [002]   1186.756739: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756745: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756748: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000

dma_pool_zalloc() is called from xhci_segment_alloc() in
drivers/usb/host/xhci-mem.c
see:
https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci-
mem.c#L52

prints above are custom traces added right after dma_pool_zalloc()


For better understanding it would be good to have dma_pool_free() calls
debugged as well.




Sudip has a full (394M unpacked) trace at:
https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx-






But then it gets stuck, for the whole ring2 dma_pool_zalloc() just returns the 
same dma address as the last segment for
ring1:0x2d92b000. Last part of trace snippet is just another ring being freed.


A gentle ping on this. Any idea on what the problem might be and any
possible fix?



I tried to reproduce it by quickly hacking xhci to allocate and free 50 
segments each time
we normally allocate one segment from dmapool.
I let it run for 3 days on a Atom based platform, but could not reproduce it.

xhci testhack can be found here:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git dmapool-test
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=dmapool-test

Tested by just leaving the following running for a few days:

while true; do echo 0 > authorized; sleep 3; echo 1 > authorized; sleep 3; done;
For some usb device (for example: /sys/bus/usb/devices/1-8)

Then grep logs for "MATTU dmatest match! "

Can you share a bit more details on the platform you are using, and what types 
of test you are running.
Does my test above trigger the case? (show "MATTU dmatest match!")

-Mathias

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: usb HC busted?

2018-06-07 Thread Mathias Nyman

On 06.06.2018 19:45, Sudip Mukherjee wrote:

Hi Andy,

And we meet again. :)

On Wed, Jun 06, 2018 at 06:36:35PM +0300, Andy Shevchenko wrote:

On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote:

On 04.06.2018 18:28, Sudip Mukherjee wrote:

On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote:





Odd and unlikely, but to me this looks like some issue in allocating
dma memory
from pool using dma_pool_zalloc()

Adding people with DMA knowledge to cc, maybe someone knows what is
going on.

Here's the story:
Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel.
All tracing points to dma_pool_zalloc() returning the same dma address
block on
consecutive calls.

In the failing case dma_pool_zalloc() is called 3 - 6us apart.

<...>-26362 [002]   1186.756739: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756745: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756748: xhci_ring_mem_detail: MATTU
xhci_segment_alloc dma @ 0x2d92b000

dma_pool_zalloc() is called from xhci_segment_alloc() in
drivers/usb/host/xhci-mem.c
see:
https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci-
mem.c#L52

prints above are custom traces added right after dma_pool_zalloc()


For better understanding it would be good to have dma_pool_free() calls
debugged as well.


So, I am adding another trace event for dma_pool_free() and continuing
with the test. Is there anything else that I should be adding as debug?



The patch traced both dma_pool_zalloc() and dma_pool_free() calls from xhci,
no need to retry.

Sudip has a full (394M unpacked) trace at:
https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx-

Interesting part is:

<...>-26362 [002]   1186.756728: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d34d000
<...>-26362 [002]   1186.756735: xhci_ring_mem_detail: MATTU xhci segment 
alloc seg->dma @ 0x2d34d000
<...>-26362 [002]   1186.756739: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756740: xhci_ring_mem_detail: MATTU xhci segment 
alloc seg->dma @ 0x2d92b000
<...>-26362 [002]   1186.756743: xhci_ring_alloc: ISOC eefa0580: enq 
0x2d34d000(0x2d34d000) deq 0x2d34d000(0x2d34d000) 
segs 2 stream 0 free_trbs 509 bounce 17 cycle 1
<...>-26362 [002]   1186.756745: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756746: xhci_ring_mem_detail: MATTU xhci segment 
alloc seg->dma @ 0x2d92b000
<...>-26362 [002]   1186.756748: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756751: xhci_ring_mem_detail: MATTU xhci segment 
alloc seg->dma @ 0x2d92b000
<...>-26362 [002]   1186.756752: xhci_ring_alloc: ISOC f19d7c80: enq 
0x2d92b000(0x2d92b000) deq 0x2d92b000(0x2d92b000) 
segs 2 stream 0 free_trbs 509 bounce 17 cycle 1
<...>-26362 [002] d..1  1186.756761: xhci_queue_trb: CMD: Configure Endpoint 
Command: ctx 2ce96000 slot 7 flags d:C
<...>-26362 [002] d..1  1186.756762: xhci_inc_enq: CMD ed930b80: enq 
0x2d93adb0(0x2d93a000) deq 0x2d93ada0(0x2d93a000) 
segs 1 stream 0 free_trbs 253 bounce 0 \
cycle 1
<...>-26362 [002]   1186.757066: xhci_dbg_context_change: Successful 
Endpoint Configure command
<...>-26362 [002]   1186.757072: xhci_ring_free: ISOC eefd9380: enq 
0x2c482000(0x2c482000) deq 0x2c482000(0x2c482000) 
segs 2 stream 0 free_trbs 509 bounce0 cycle 1
<...>-26362 [002]   1186.757075: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ ee2d23c8
<...>-26362 [002]   1186.757078: xhci_ring_mem_detail: MATTU xhci segment free 
seg->dma @ c7a93488
<...>-26362 [002]   1186.757080: xhci_ring_free: ISOC eef0d800: enq 
0x2c50a000(0x2c50a000) deq 0x2c50a000(0x2c50a000) 
segs 2 stream 0 free_trbs 509 bounce0 cycle 1

What is shown is the allocation of two ISOC transfer rings, each ring has 2 
segments (two dma_pool_zalloc() calls per ring)
First ring looks normal, ring1 get dma memory at 0x2d34d000 for first ring 
segment, and dma memory at 0x2d92b000 for second segment.

But then it gets stuck, for the whole ring2 dma_pool_zalloc() just returns the 
same dma address as the last segment for
ring1:0x2d92b000. Last part of trace snippet is just another ring being freed.

Full testpatch looked like this:

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index e5ace89..7d343ad 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -44,10 +44,15 @@ static struct xhci_segme

Re: usb HC busted?

2018-06-06 Thread Mathias Nyman

On 04.06.2018 18:28, Sudip Mukherjee wrote:

On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote:


Log show two rings having the same TRB segment dma address, this will 
completely mess up the transfer:

While allocating rigs the enque pointers for the two rings are the same:

461.859315: xhci_ring_alloc: ISOC efa4e580: enq 
0x33386000(0x33386000) deq 
0x33386000(0x33386000) segs 2 stream 0 ...bs
461.859320: xhci_ring_alloc: ISOC f0ce1f00: enq 
0x33386000(0x33386000) deq 
0x33386000(0x33386000) segs 2 stream 0 ...

So something goes really wrong when allocating or setting up the rings in one 
of these functions:

To verify and rule out dma_pool_zalloc(), could you apply the attached patch 
and reproduce with new logs?


I spoke too soon in my yesterday's mail. We were able to reproduce it
on the automated tests. The log and the trace is at:
https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx-

Will request you to have a look at it.



Odd and unlikely, but to me this looks like some issue in allocating dma memory
from pool using dma_pool_zalloc()

Adding people with DMA knowledge to cc, maybe someone knows what is going on.

Here's the story:
Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel.
All tracing points to dma_pool_zalloc() returning the same dma address block on
consecutive calls.

In the failing case dma_pool_zalloc() is called 3 - 6us apart.

<...>-26362 [002]   1186.756739: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756745: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000
<...>-26362 [002]   1186.756748: xhci_ring_mem_detail: MATTU 
xhci_segment_alloc dma @ 0x2d92b000

dma_pool_zalloc() is called from xhci_segment_alloc() in 
drivers/usb/host/xhci-mem.c
see:
https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci-mem.c#L52

prints above are custom traces added right after dma_pool_zalloc()
@@ -44,10 +44,15 @@ static struct xhci_segment *xhci_segment_alloc(struct 
xhci_hcd *xhci,
return NULL;
}
 
+	xhci_dbg_trace(xhci,  trace_xhci_ring_mem_detail,

+  "MATTU xhci_segment_alloc dma @ %pad", );
+

Any idea what's going on?
dma_pool_alloc() has a comment that it drops >lock if it needs to allocate
a page, can it be related?

Thanks
-Mathias

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu