Re: [PATCH] usb: gadget: u_ether: remove interrupt throttling
On Tue, Nov 01, 2016 at 01:29:59PM +0200, Felipe Balbi wrote: > According to Dave Miller "the networking stack has a > hard requirement that all SKBs which are transmitted > must have their completion signalled in a fininte > amount of time. This is because, until the SKB is > freed by the driver, it holds onto socket, > netfilter, and other subsystem resources." > > In summary, this means that using TX IRQ throttling > for the networking gadgets is, at least, complex and > we should avoid it for the time being. > > Cc: > Reported-by: Ville Syrjälä > Suggested-by: David Miller > Signed-off-by: Felipe Balbi Fixes laggy/hangy g_ether on my BYT FFRD8 tablet, caused by commit 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for LST bit") Tested-by: Ville Syrjälä > --- > drivers/usb/gadget/function/u_ether.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/drivers/usb/gadget/function/u_ether.c > b/drivers/usb/gadget/function/u_ether.c > index f4a640216913..119a2e5848e8 100644 > --- a/drivers/usb/gadget/function/u_ether.c > +++ b/drivers/usb/gadget/function/u_ether.c > @@ -589,14 +589,6 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb, > > req->length = length; > > - /* throttle high/super speed IRQ rate back slightly */ > - if (gadget_is_dualspeed(dev->gadget)) > - req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH || > -dev->gadget->speed == USB_SPEED_SUPER)) > && > - !list_empty(&dev->tx_reqs)) > - ? ((atomic_read(&dev->tx_qlen) % dev->qmult) != 0) > - : 0; > - > retval = usb_ep_queue(in, req, GFP_ATOMIC); > switch (retval) { > default: > -- > 2.10.1 -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"
On Fri, Oct 28, 2016 at 01:16:13PM +0300, Felipe Balbi wrote: > > Hi, > > Ville Syrjälä writes: > > On Thu, Oct 06, 2016 at 12:08:26PM +0300, Ville Syrjälä wrote: > >> On Thu, Oct 06, 2016 at 10:36:09AM +0300, Felipe Balbi wrote: > >> > > >> > Hi, > >> > > >> > Felipe Balbi writes: > >> > >> > Okay, I have found a regression on dwc3 and another patch follows: > >> > > >> > commit 5e1a2af3e46248c55098cdae643c4141851b703e > >> > Author: Felipe Balbi > >> > Date: Wed Oct 5 14:24:37 2016 +0300 > >> > > >> > usb: dwc3: gadget: properly account queued requests > >> > > >> > Some requests could be accounted for multiple > >> > times. Let's fix that so each and every requests is > >> > accounted for only once. > >> > > >> > Cc: # v4.8 > >> > Fixes: 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs > >> > for LST bit") > >> > Signed-off-by: Felipe Balbi > >> > > >> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > >> > index 07cc8929f271..3c3ced128c77 100644 > >> > --- a/drivers/usb/dwc3/gadget.c > >> > +++ b/drivers/usb/dwc3/gadget.c > >> > @@ -783,6 +783,7 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, > >> > req->trb = trb; > >> > req->trb_dma = dwc3_trb_dma_offset(dep, trb); > >> > req->first_trb_index = dep->trb_enqueue; > >> > +dep->queued_requests++; > >> > } > >> > > >> > dwc3_ep_inc_enq(dep); > >> > @@ -833,8 +834,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, > >> > > >> > trb->ctrl |= DWC3_TRB_CTRL_HWO; > >> > > >> > -dep->queued_requests++; > >> > - > >> > trace_dwc3_prepare_trb(dep, trb); > >> > } > >> > > >> > @@ -1861,8 +1860,11 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 > >> > *dwc, struct dwc3_ep *dep, > >> > unsigned ints_pkt = 0; > >> > unsigned inttrb_status; > >> > > >> > -dep->queued_requests--; > >> > dwc3_ep_inc_deq(dep); > >> > + > >> > +if (req->trb == trb) > >> > +dep->queued_requests--; > >> > + > >> > trace_dwc3_complete_trb(dep, trb); > >> > > >> > /* > >> > > >> > I have also built a branch which you can use for testing. Here's a pull > >> > request, once you tell me it works for you, then I can send proper > >> > patches out: > >> > > >> > The following changes since commit > >> > c8d2bc9bc39ebea8437fd974fdbc21847bb897a3: > >> > > >> > Linux 4.8 (2016-10-02 16:24:33 -0700) > >> > > >> > are available in the git repository at: > >> > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git > >> > tmp-test-v4.8 > >> > > >> > for you to fetch changes up to c968b8d1effe64a7980802d1eef29f4e1922faca: > >> > > >> > usb: dwc3: gadget: properly account queued requests (2016-10-06 > >> > 10:16:37 +0300) > >> > > >> > > >> > Felipe Balbi (2): > >> > usb: gadget: function: u_ether: don't starve tx request queue > >> > usb: dwc3: gadget: properly account queued requests > >> > > >> > drivers/usb/dwc3/gadget.c | 7 --- > >> > drivers/usb/gadget/function/u_ether.c | 5 +++-- > >> > 2 files changed, 7 insertions(+), 5 deletions(-) > >> > >> Tried your branch, but unfortunately I'm still seeing the lags. New trace > >> attached. > > > > Just a reminder that the regressions is still there in 4.9-rc2. > > Unfortunateyly with all the stuff already piled on top, getting USB > > working on my device is no longer a simple matter of reverting the > > one commit. I had to revert 10 patches to get even a clean revert, but > > unfortunately that approach just lead to the transfer hanging immediately. &g
Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"
On Thu, Oct 06, 2016 at 12:08:26PM +0300, Ville Syrjälä wrote: > On Thu, Oct 06, 2016 at 10:36:09AM +0300, Felipe Balbi wrote: > > > > Hi, > > > > Felipe Balbi writes: > > > Okay, I have found a regression on dwc3 and another patch follows: > > > > commit 5e1a2af3e46248c55098cdae643c4141851b703e > > Author: Felipe Balbi > > Date: Wed Oct 5 14:24:37 2016 +0300 > > > > usb: dwc3: gadget: properly account queued requests > > > > Some requests could be accounted for multiple > > times. Let's fix that so each and every requests is > > accounted for only once. > > > > Cc: # v4.8 > > Fixes: 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for > > LST bit") > > Signed-off-by: Felipe Balbi > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > > index 07cc8929f271..3c3ced128c77 100644 > > --- a/drivers/usb/dwc3/gadget.c > > +++ b/drivers/usb/dwc3/gadget.c > > @@ -783,6 +783,7 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, > > req->trb = trb; > > req->trb_dma = dwc3_trb_dma_offset(dep, trb); > > req->first_trb_index = dep->trb_enqueue; > > + dep->queued_requests++; > > } > > > > dwc3_ep_inc_enq(dep); > > @@ -833,8 +834,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep, > > > > trb->ctrl |= DWC3_TRB_CTRL_HWO; > > > > - dep->queued_requests++; > > - > > trace_dwc3_prepare_trb(dep, trb); > > } > > > > @@ -1861,8 +1860,11 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 > > *dwc, struct dwc3_ep *dep, > > unsigned ints_pkt = 0; > > unsigned inttrb_status; > > > > - dep->queued_requests--; > > dwc3_ep_inc_deq(dep); > > + > > + if (req->trb == trb) > > + dep->queued_requests--; > > + > > trace_dwc3_complete_trb(dep, trb); > > > > /* > > > > I have also built a branch which you can use for testing. Here's a pull > > request, once you tell me it works for you, then I can send proper > > patches out: > > > > The following changes since commit c8d2bc9bc39ebea8437fd974fdbc21847bb897a3: > > > > Linux 4.8 (2016-10-02 16:24:33 -0700) > > > > are available in the git repository at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git tmp-test-v4.8 > > > > for you to fetch changes up to c968b8d1effe64a7980802d1eef29f4e1922faca: > > > > usb: dwc3: gadget: properly account queued requests (2016-10-06 10:16:37 > > +0300) > > > > > > Felipe Balbi (2): > > usb: gadget: function: u_ether: don't starve tx request queue > > usb: dwc3: gadget: properly account queued requests > > > > drivers/usb/dwc3/gadget.c | 7 --- > > drivers/usb/gadget/function/u_ether.c | 5 +++-- > > 2 files changed, 7 insertions(+), 5 deletions(-) > > Tried your branch, but unfortunately I'm still seeing the lags. New trace > attached. Just a reminder that the regressions is still there in 4.9-rc2. Unfortunateyly with all the stuff already piled on top, getting USB working on my device is no longer a simple matter of reverting the one commit. I had to revert 10 patches to get even a clean revert, but unfortunately that approach just lead to the transfer hanging immediately. So what I ended up doing is reverting all of this: git log --no-merges --format=oneline 55a0237f8f47957163125e20ee9260538c5c341c^..HEAD -- drivers/usb/dwc3/ include/linux/ulpi/interface.h drivers/usb/Kconfig drivers/usb/core/Kconfig which comes out at whopping 70 commits. Having to carry that around is going to be quite a pain especially as more stuff might be piled on top. /me thinks a stable backport of any fix (assuming one is found eventually) is going to be quite the challenge... -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"
On Tue, Oct 04, 2016 at 03:23:12PM +0300, Felipe Balbi wrote: > > Hi, > > Ville Syrjälä writes: > > On Mon, Oct 03, 2016 at 07:08:43PM +0300, Felipe Balbi wrote: > >> > >> Hi, > >> > >> ville.syrj...@linux.intel.com writes: > >> > From: Ville Syrjälä > >> > > >> > This reverts commit 55a0237f8f47957163125e20ee9260538c5c341c. > >> > > >> > commit 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for > >> > LST bit") causes my BYT FFRD8 with g_ether to behave poorly. ssh/scp > >> > is very sluggish and can even stall entirely. Revert cures it. > >> > > >> > Cc: Felipe Balbi > >> > Cc: sta...@vger.kernel.org > >> > Signed-off-by: Ville Syrjälä > >> > >> Reverting that commit is not the best idea. Let's start with the usual: > >> > >> - Kernel version > >> - dmesg on both sides (host and device) > >> - dwc3 tracepoints: > >> > >> # mkdir -p /t > >> # mount -t tracefs none /t > >> # echo 8192 > /t/buffer_size_kb > >> # echo 1 > /t/events/dwc3/enable > >> # echo 0 > /t/events/dwc3/dwc3_readl/enable > >> # echo 0 > /t/events/dwc3/dwc3_writel/enable > >> > >> This should be enough to tell me what's really going on. > > > > I've attached the (compressed due to size) dmesgs/traces from the device. > > For my test I just did 'for i in `seq 1 5` ; dmesg ; done' in an ssh > > session, and I did get the lag in the bad run, so presumably the trace > > should have caught it. Both were on 4.8 (+ the revert for the good run). > > > > Host side dmesg just showes me: > > usb 3-3.4: new high-speed USB device number 25 using xhci_hcd > > cdc_ether 3-3.4:1.0 usb0: register 'cdc_ether' at usb-:00:14.0-3.4, > > CDC Ethernet Device, ... > > when I boot the device. Nothing extra appears when things lag around. > > I'm running 4.5.something on the host. > > Looking at your logs what I can see is that "revert.trace" doesn't > respect gadget driver's IRQ throttling, which means you get an interrupt > for each and every request, while "bad.trace" shows dwc3 respecting > "no_interrupt" flag passed in by g_ether. > > The only explanation I have for this behavior when we start respecting > gadget driver's flags, is that there's a possible race in u_ether.c > where it can go for a long time without an interrupt due to that > throttling. > > This only happens with IN endpoints (the ones which are throttled) and > the behavior can be seen in log snippet below: > > 27782: irq/17-dwc3-2524 [000] d... 666.891969: tx_complete: req > 88016da1a6c0 length 1514/1514 flags ZsI tx_qlen 4 qmult 5 > 27802: ksoftirqd/0-3 [000] .Ns1 666.892047: eth_start_xmit: req > 88016da1a6c0 length 0/1514 flags Zsi tx_qlen 5 qmult 5 > 27808: ksoftirqd/0-3 [000] .Ns2 666.892062: eth_start_xmit: req > 88016da1a300 length 0/1514 flags ZsI tx_qlen 6 qmult 5 > 27814: ksoftirqd/0-3 [000] .Ns2 666.892069: eth_start_xmit: req > 88016da1ab40 length 0/1514 flags Zsi tx_qlen 7 qmult 5 > 27820: ksoftirqd/0-3 [000] .Ns2 666.892075: eth_start_xmit: req > 88016da1acc0 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27826: ksoftirqd/0-3 [000] .Ns2 666.892082: eth_start_xmit: req > 88016da1ae40 length 0/1514 flags Zsi tx_qlen 9 qmult 5 > 27832: ksoftirqd/0-3 [000] .Ns2 666.892088: eth_start_xmit: req > 88016da1a780 length 0/1514 flags Zsi tx_qlen 10 qmult 5 > 27884: irq/17-dwc3-2524 [000] d... 666.892213: tx_complete: req > 88016da1a840 length 1514/1514 flags Zsi tx_qlen 9 qmult 5 > 27889: irq/17-dwc3-2524 [000] d... 666.892215: tx_complete: req > 88016da1a600 length 1514/1514 flags Zsi tx_qlen 8 qmult 5 > 27897:sshd-2684 [002] ..s1 666.892223: eth_start_xmit: req > 88016da1a840 length 0/1514 flags Zsi tx_qlen 9 qmult 5 > 27898: irq/17-dwc3-2524 [000] d... 666.892223: tx_complete: req > 88016da1a180 length 1514/1514 flags Zsi tx_qlen 8 qmult 5 > 27903: irq/17-dwc3-2524 [000] d... 666.892225: tx_complete: req > 88016da1a240 length 1514/1514 flags Zsi tx_qlen 7 qmult 5 > 27910:sshd-2684 [002] ..s1 666.892230: eth_start_xmit: req > 88016da1a180 length 0/1514 flags Zsi tx_qlen 8 qmult 5 > 27913: irq/17-dwc3-2524 [000] d... 666.892231: tx_complete: req > 88016da1a6c0 length 1514/1514 flags Zsi tx_qlen 7 qmult 5 &
Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"
On Mon, Sep 07, 2015 at 09:56:06AM +0300, Heikki Krogerus wrote: > Hi, > > On Tue, Sep 01, 2015 at 06:37:54PM +0300, Ville Syrjälä wrote: > > On Tue, Sep 01, 2015 at 10:17:59AM -0500, Felipe Balbi wrote: > > > Hi, > > > > > > On Tue, Sep 01, 2015 at 05:39:28PM +0300, Ville Syrjälä wrote: > > > > On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote: > > > > > Hi, > > > > > > > > > > On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote: > > > > > > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote: > > > > > > > Hi, > > > > > > > > > > > > > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote: > > > > > > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote: > > > > > > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, > > > > > > > > > ville.syrj...@linux.intel.com wrote: > > > > > > > > > > From: Ville Syrjälä > > > > > > > > > > > > > > > > > > > > This reverts commit > > > > > > > > > > 8f2c9544aba636134303105ecb164190a39dece4. > > > > > > > > > > > > > > > > > > > > As it breaks g_ether on my Baytrail FFRD8 device. > > > > > > > > > > Everything starts out > > > > > > > > > > fine, but after a bit of data has been transferred it just > > > > > > > > > > stops > > > > > > > > > > flowing. > > > > > > > > > > > > > > > > > > > > Note that I do get a bunch of these "NOHZ: > > > > > > > > > > local_softirq_pending 08" > > > > > > > > > > when booting the machine, but I'm not really sure if > > > > > > > > > > they're related > > > > > > > > > > to this problem. > > > > > > > > > > > > > > > > > > I have a feeling your problem is elsewhere. We *are* > > > > > > > > > completing one TRB > > > > > > > > > at a time. By reverting that commit you're just masking the > > > > > > > > > real problem > > > > > > > > > and I'd rather get that one fixed. > > > > > > > > > > > > > > > > > > How do you reproduce your issue ? > > > > > > > > > > > > > > > > Just boot the system, it gets an IP from dnsmasq on my host, > > > > > > > > then I ssh > > > > > > > > into it and do something to produce a bit of console output, > > > > > > > > after which > > > > > > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it. > > > > > > > > > > > > > > which kernel version ? > > > > > > > > > > > > Anything since the patch went in, so 4.1-rc > > > > > > > > > > > > > Running as USB2 or USB3 ? > > > > > > > > > > > > speed:480, so USB2 I presume? > > > > > > > > > > > > > Have you tried > > > > > > > linux-next ? > > > > > > > > > > > > Tried it now (next-20150901). Equally bad as the rest. > > > > > > > > > > > > > I just did 1000 dmesg iterations over ssh with g_ether and > > > > > > > saw no issues. > > > > > > > > > > > > > > Can you enable dwc3 tracepoints and try again ? (use some very > > > > > > > large > > > > > > > trace buffer, something around 2 or 4 MiB should be enough). > > > > > > > > > > > > Attached one trace from linux-next, and another one with the revert > > > > > > on > > > > > > top. > > > > > > > > > > are you sure these come from next ? > > > > > > > > Yep. > > > > > > > > > It makes zero sense :-) Here's an > > > > > odd snippet: > > > > > > > > > > | sshd-1719 [000] d.s342.579785: dwc3_ep_q
Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"
On Tue, Sep 01, 2015 at 10:17:59AM -0500, Felipe Balbi wrote: > Hi, > > On Tue, Sep 01, 2015 at 05:39:28PM +0300, Ville Syrjälä wrote: > > On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote: > > > Hi, > > > > > > On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote: > > > > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote: > > > > > Hi, > > > > > > > > > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote: > > > > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote: > > > > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, > > > > > > > ville.syrj...@linux.intel.com wrote: > > > > > > > > From: Ville Syrjälä > > > > > > > > > > > > > > > > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4. > > > > > > > > > > > > > > > > As it breaks g_ether on my Baytrail FFRD8 device. Everything > > > > > > > > starts out > > > > > > > > fine, but after a bit of data has been transferred it just stops > > > > > > > > flowing. > > > > > > > > > > > > > > > > Note that I do get a bunch of these "NOHZ: > > > > > > > > local_softirq_pending 08" > > > > > > > > when booting the machine, but I'm not really sure if they're > > > > > > > > related > > > > > > > > to this problem. > > > > > > > > > > > > > > I have a feeling your problem is elsewhere. We *are* completing > > > > > > > one TRB > > > > > > > at a time. By reverting that commit you're just masking the real > > > > > > > problem > > > > > > > and I'd rather get that one fixed. > > > > > > > > > > > > > > How do you reproduce your issue ? > > > > > > > > > > > > Just boot the system, it gets an IP from dnsmasq on my host, then I > > > > > > ssh > > > > > > into it and do something to produce a bit of console output, after > > > > > > which > > > > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it. > > > > > > > > > > which kernel version ? > > > > > > > > Anything since the patch went in, so 4.1-rc > > > > > > > > > Running as USB2 or USB3 ? > > > > > > > > speed:480, so USB2 I presume? > > > > > > > > > Have you tried > > > > > linux-next ? > > > > > > > > Tried it now (next-20150901). Equally bad as the rest. > > > > > > > > > I just did 1000 dmesg iterations over ssh with g_ether and > > > > > saw no issues. > > > > > > > > > > Can you enable dwc3 tracepoints and try again ? (use some very large > > > > > trace buffer, something around 2 or 4 MiB should be enough). > > > > > > > > Attached one trace from linux-next, and another one with the revert on > > > > top. > > > > > > are you sure these come from next ? > > > > Yep. > > > > > It makes zero sense :-) Here's an > > > odd snippet: > > > > > > | sshd-1719 [000] d.s342.579785: dwc3_ep_queue: ep1in: > > > req 880077afa540 length 822/1514 ==> 0 > > > | sshd-1719 [000] d.s342.580075: dwc3_ep_queue: ep1in: > > > req 880077afa6c0 length 0/334 ==> -108 > > > | systemd-network-1618 [003] d.s342.754796: dwc3_ep_queue: ep1in: > > > req 880077afa780 length 0/120 ==> -108 > > > > > > your requests are queued with -ESHUTDOWN!! > > > > Looking at the code the tracepoint is before the request is queued, so > > maybe there's just stale junk in req->status before it gets overwritten > > by __dwc3_gadget_ep_queue()? > > right, something touched usb_request.status before and the request has > been recycled. > > > > more requests are queued and that's it. No further traffic. It just > > > stopped working. No further IRQs, nothing. > > > > > > mine looks very much different (see attached). I don't have any > > > -ESHUTDOWNs. How did you load g_ether ? Did you pass any extra options ? > > > > g_ether is builtin, and I just pass g_ether.dev_addr= via kernel > > cmdline. > > > > > Which IP version are you running ? > > > > ipv4 > > I mean the SNPS IP :-) (it's 2.10a, see below) It's all Greek to me :) > > > GSBUSCFG0 = 0x0006 > > GSBUSCFG1 = 0x0f00 > > GTXTHRCFG = 0x230a > > GRXTHRCFG = 0x2280 > > GCTL = 0x45802002 > > GEVTEN = 0x > > GSTS = 0x3e82 > > GSNPSID = 0x5533210a > > this could be a bug with 2.10a where completion IRQs are missed. Any > chance you can look for you Errata document and see if any exist ? I'm > using 2.40a. Ugh. USB isn't my thing, so I'm definitely not going to start hunting down any obscure docs. Cc:ing Mathias and Heikki since it looks like they've touched this beast before. You guys have any docs and/or clue as to what's happening here? -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"
On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote: > Hi, > > On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote: > > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote: > > > Hi, > > > > > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote: > > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote: > > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, > > > > > ville.syrj...@linux.intel.com wrote: > > > > > > From: Ville Syrjälä > > > > > > > > > > > > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4. > > > > > > > > > > > > As it breaks g_ether on my Baytrail FFRD8 device. Everything starts > > > > > > out > > > > > > fine, but after a bit of data has been transferred it just stops > > > > > > flowing. > > > > > > > > > > > > Note that I do get a bunch of these "NOHZ: local_softirq_pending 08" > > > > > > when booting the machine, but I'm not really sure if they're related > > > > > > to this problem. > > > > > > > > > > I have a feeling your problem is elsewhere. We *are* completing one > > > > > TRB > > > > > at a time. By reverting that commit you're just masking the real > > > > > problem > > > > > and I'd rather get that one fixed. > > > > > > > > > > How do you reproduce your issue ? > > > > > > > > Just boot the system, it gets an IP from dnsmasq on my host, then I ssh > > > > into it and do something to produce a bit of console output, after which > > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it. > > > > > > which kernel version ? > > > > Anything since the patch went in, so 4.1-rc > > > > > Running as USB2 or USB3 ? > > > > speed:480, so USB2 I presume? > > > > > Have you tried > > > linux-next ? > > > > Tried it now (next-20150901). Equally bad as the rest. > > > > > I just did 1000 dmesg iterations over ssh with g_ether and > > > saw no issues. > > > > > > Can you enable dwc3 tracepoints and try again ? (use some very large > > > trace buffer, something around 2 or 4 MiB should be enough). > > > > Attached one trace from linux-next, and another one with the revert on > > top. > > are you sure these come from next ? Yep. > It makes zero sense :-) Here's an > odd snippet: > > | sshd-1719 [000] d.s342.579785: dwc3_ep_queue: ep1in: req > 880077afa540 length 822/1514 ==> 0 > | sshd-1719 [000] d.s342.580075: dwc3_ep_queue: ep1in: req > 880077afa6c0 length 0/334 ==> -108 > | systemd-network-1618 [003] d.s342.754796: dwc3_ep_queue: ep1in: req > 880077afa780 length 0/120 ==> -108 > > your requests are queued with -ESHUTDOWN!! Looking at the code the tracepoint is before the request is queued, so maybe there's just stale junk in req->status before it gets overwritten by __dwc3_gadget_ep_queue()? > > | -0 [000] d.h342.877628: dwc3_readl: addr > c940040c value 0004 > | -0 [000] d.h342.877635: dwc3_readl: addr > c9400408 value 0100 > | -0 [000] d.h342.877638: dwc3_writel: addr > c9400408 value 8100 > | irq/22-dwc3-753 [002] d..242.878300: dwc3_event: event 00c4 > > so you had an IRQ, fine. > > | irq/22-dwc3-753 [002] d..242.878312: dwc3_gadget: ep1out: reason > Transfer Not Active > | irq/22-dwc3-753 [002] d..242.878328: dwc3_gadget_ep_cmd: ep1out: > cmd 'Start Transfer' [6] params 77ad9030 > > a transfer is started. > > | irq/22-dwc3-753 [002] d..242.878332: dwc3_writel: addr > c9400828 value > | irq/22-dwc3-753 [002] d..242.878336: dwc3_writel: addr > c9400824 value 77ad9030 > | irq/22-dwc3-753 [002] d..242.878339: dwc3_writel: addr > c9400820 value > | irq/22-dwc3-753 [002] d..242.878342: dwc3_writel: addr > c940082c value 0406 > | irq/22-dwc3-753 [002] d..242.878345: dwc3_readl: addr > c940082c value 00050006 > | irq/22-dwc3-753 [002] d..242.878348: dwc3_gadget: Command > Complete --> 0 > | irq/22-dwc3-753 [002] d..242.
Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"
On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote: > On Mon, Aug 31, 2015 at 07:48:28PM +0300, ville.syrj...@linux.intel.com wrote: > > From: Ville Syrjälä > > > > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4. > > > > As it breaks g_ether on my Baytrail FFRD8 device. Everything starts out > > fine, but after a bit of data has been transferred it just stops > > flowing. > > > > Note that I do get a bunch of these "NOHZ: local_softirq_pending 08" > > when booting the machine, but I'm not really sure if they're related > > to this problem. > > I have a feeling your problem is elsewhere. We *are* completing one TRB > at a time. By reverting that commit you're just masking the real problem > and I'd rather get that one fixed. > > How do you reproduce your issue ? Just boot the system, it gets an IP from dnsmasq on my host, then I ssh into it and do something to produce a bit of console output, after which g_ether is dead. Eg. 'dmesg' a few times is enough to kill it. Here's what I have in my .config: CONFIG_USB_OTG=y CONFIG_USB_OTG_FSM=y CONFIG_USB_DWC3=y CONFIG_USB_DWC3_GADGET=y CONFIG_USB_DWC3_PCI=y CONFIG_USB_PHY=y CONFIG_NOP_USB_XCEIV=y CONFIG_USB_GPIO_VBUS=y CONFIG_USB_GADGET=y CONFIG_USB_GADGET_VBUS_DRAW=2 CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2 CONFIG_USB_LIBCOMPOSITE=y CONFIG_USB_U_ETHER=y CONFIG_USB_F_ECM=y CONFIG_USB_F_SUBSET=y CONFIG_USB_ETH=y -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote: > On 05/06/2014 02:41 PM, Ville Syrjälä wrote: > > On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote: > >> Hmmm... very odd. I unfortunately don't have a machine that can easily > >> do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME > >> in S3 (essentially the same code path), and I didn't run into any > >> problems. > >> > >> How exactly does your machine fail on resume? Is it a kernel crash or > >> just a hang? Can you try getting some debug output (by setting 'echo N > >>> /sys/module/printk/parameters/console_suspend' and trying to catch > >> the crash on the screen or a serial line, or maybe through pstore)? I > >> really don't see much that could go wrong with this patch, so without > >> more info it will be hard to understand your problem. > >> > >> Also, I noticed that you have two HID devices plugged in during > >> suspend. Does it make a difference if you have different devices (e.g. > >> a mass storage stick) or none at all? > > > > Looks like it doesn't like it when there's anything plugged into the > > "SS" ports. I tried with just a HID keyboard or with just a hub. In > > both cases it fails to resume. If I have nothing connected to the "SS" > > ports then it resumes just fine. > > > > I managed to catch something with ramoops. Looks like it's hitting > > POISON_FREE when trying to delete some list entry. > > > > > <4>[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from > > BW list! > > <4>[ 107.047574] general protection fault: [#1] PREEMPT SMP > > I took a look at the xhci_mem_cleanup() function and to me it looks > like it tries to access a list_head that is already freed. > > The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an > endpoint > list in xhci->rh_bw[].bw_table.interval_bw[].endpoints > > xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing > the > bw_endpoint_list) before it starts to loop through, and delete entries from > the > xhci->rh_bw[].bw_table.interval_bw[].endpoints list. > > I can't see how this relates to Julius patch though, and I'm not sure yet why > it > only triggers when devices are connected to SS ports. Maybe just unlucky > timing? I think the non-SS ports are connected to the EHCI controllers rather than the XHCI controllers. So that explains at least one detail. And I guess timing is as good an excuse as any why this gets exposed by the patch in question. > > Does this help?: Indeed it does. The machine just survived a dozen or so suspend+resume cycles without a hitch. The bug was 100% reproducible on this machine, so the fix seems solid. Tested-by: Ville Syrjälä > > diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c > index c089668..b1a8a5f 100644 > --- a/drivers/usb/host/xhci-mem.c > +++ b/drivers/usb/host/xhci-mem.c > @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) > kfree(cur_cd); > } > > + num_ports = HCS_MAX_PORTS(xhci->hcs_params1); > + for (i = 0; i < num_ports; i++) { > + struct xhci_interval_bw_table *bwt = &xhci->rh_bw[i].bw_table; > + for (j = 0; j < XHCI_MAX_INTERVAL; j++) { > + struct list_head *ep = &bwt->interval_bw[j].endpoints; > + while (!list_empty(ep)) > + list_del_init(ep->next); > + } > + } > + > for (i = 1; i < MAX_HC_SLOTS; ++i) > xhci_free_virt_device(xhci, i); > > @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci) > if (!xhci->rh_bw) > goto no_bw; > > - num_ports = HCS_MAX_PORTS(xhci->hcs_params1); > - for (i = 0; i < num_ports; i++) { > - struct xhci_interval_bw_table *bwt = &xhci->rh_bw[i].bw_table; > - for (j = 0; j < XHCI_MAX_INTERVAL; j++) { > - struct list_head *ep = &bwt->interval_bw[j].endpoints; > - while (!list_empty(ep)) > - list_del_init(ep->next); > - } > - } > - > for (i = 0; i < num_ports; i++) { > struct xhci_tt_bw_info *tt, *n; > list_for_each_entry_safe(tt, n, &xhci->rh_bw[i].tts, > tt_list) { -- Ville Syrjälä Intel OTC -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f
106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7 <6>[ 106.923408] kvm: enabling virtualization on CPU7 <6>[ 106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19 <6>[ 106.934883] CPU7 is up <6>[ 106.957959] ACPI: Waking up from system sleep state S4 <6>[ 106.990680] PM: noirq restore of devices complete after 11.474 msecs <6>[ 106.993975] PM: early restore of devices complete after 3.024 msecs <4>[ 107.046519] usb usb3: root hub lost power or was reset <4>[ 107.046549] usb usb1: root hub lost power or was reset <4>[ 107.046694] usb usb4: root hub lost power or was reset <4>[ 107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW list! <4>[ 107.047574] general protection fault: [#1] PREEMPT SMP <4>[ 107.047768] usb usb2: root hub lost power or was reset <4>[ 107.048277] <4>[ 107.049138] Modules linked in: x86_pkg_temp_thermal coretemp kvm_intel kvm aesni_intel aes_x86_64 glue_helper iTCO_wdt lrw i915 snd_hda_codec_hdmi <4>[ 107.050193] usb usb5: root hub lost power or was reset <4>[ 107.050196] usb usb6: root hub lost power or was reset <7>[ 107.050410] snd_hda_intel :00:1b.0: irq 44 for MSI/MSI-X <7>[ 107.050453] ehci-pci :00:1a.0: cache line size of 64 is not supported <7>[ 107.051666] ehci-pci :00:1d.0: cache line size of 64 is not supported <7>[ 107.052951] xhci_hcd :04:00.0: irq 45 for MSI/MSI-X <7>[ 107.053018] xhci_hcd :04:00.0: irq 46 for MSI/MSI-X <7>[ 107.053078] xhci_hcd :04:00.0: irq 47 for MSI/MSI-X <7>[ 107.053137] xhci_hcd :04:00.0: irq 48 for MSI/MSI-X <7>[ 107.053197] xhci_hcd :04:00.0: irq 49 for MSI/MSI-X <7>[ 107.054698] xhci_hcd :04:00.0: irq 50 for MSI/MSI-X <7>[ 107.054763] xhci_hcd :04:00.0: irq 51 for MSI/MSI-X <7>[ 107.054830] xhci_hcd :04:00.0: irq 52 for MSI/MSI-X <4>[ 107.057882] snd_hda_codec_realtek hid_generic snd_hda_codec_generic gf128mul snd_hda_intel ablk_helper usbhid cryptd snd_hda_controller hid cfbfillrect snd_hda_codec cfbimgblt snd_hwdep snd_pcm i2c_algo_bit cfbcopyarea snd_timer drm_kms_helper psmouse snd drm lpc_ich pcspkr i2c_i801 mfd_core soundcore wmi evdev <4>[ 107.062030] CPU: 2 PID: 756 Comm: kworker/u16:2 Tainted: GW 3.15.0-rc4-hang #16 <5>[ 107.062068] sd 0:0:0:0: [sda] Starting disk <4>[ 107.062992] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0100.2013.0517.0942 05/17/2013 <4>[ 107.063454] Workqueue: events_unbound async_run_entry_fn <4>[ 107.063892] task: 88007cb58000 ti: 88007b6fc000 task.ti: 88007b6fc000 <4>[ 107.064311] RIP: 0010:[] [] __list_del_entry+0x29/0xd0 <4>[ 107.064942] RSP: :88007b6fdb48 EFLAGS: 00010a83 <4>[ 107.065446] RAX: 6b6b6b6b6b6b6b6b RBX: 88007b0ba1e8 RCX: dead00200200 <4>[ 107.065962] RDX: 6b6b6b6b6b6b6b6b RSI: 88007cb58750 RDI: 88007ac40318 <4>[ 107.066479] RBP: 88007b6fdb48 R08: R09: 0001 <4>[ 107.067024] R10: R11: R12: 0003 <4>[ 107.067528] R13: 88007b108000 R14: 88007b0ba160 R15: 88007ac40318 <4>[ 107.068051] FS: () GS:88007fc8() knlGS: <4>[ 107.068560] CS: 0010 DS: ES: CR0: 80050033 <4>[ 107.068901] CR2: CR3: 01a0e000 CR4: 001407e0 <4>[ 107.069247] Stack: <4>[ 107.069563] 88007b6fdbb8 81423d08 0001 88007cad2290 <4>[ 107.070156] 88007b6fdb88 812e4815 1580 <4>[ 107.070929] 88007b6fdbb8 88007b108000 88007b0f7978 88007b0f7978 <4>[ 107.071788] Call Trace: <4>[ 107.072260] [] xhci_mem_cleanup+0x428/0x610 <4>[ 107.072756] [] ? pci_disable_msi+0x45/0x50 <4>[ 107.073193] [] xhci_resume+0x210/0x3a0 <4>[ 107.073618] [] ? usb_enable_intel_xhci_ports+0xbd/0xd0 <4>[ 107.074245] [] xhci_pci_resume+0x36/0x50 <4>[ 107.074837] [] resume_common+0xa1/0x150 <4>[ 107.075460] [] ? pci_pm_default_resume+0x50/0x50 <4>[ 107.076103] [] hcd_pci_restore+0x13/0x20 <4>[ 107.076692] [] pci_pm_restore+0x88/0xf0 <4>[ 107.077282] [] ? device_resume+0x6c/0x1a0 <4>[ 107.077902] [] dpm_run_callback+0x3a/0xe0 <4>[ 107.078489] [] ? _raw_spin_unlock_irq+0x30/0x60 <4>[ 107.079119] [] device_resume+0xae/0x1a0 <4>[ 107.079741] [] async_resume+0x21/0x50 <4>[ 107.080171] [] async_run_entry_fn+0x46/0x140 <4>[ 107.080792] [] process_one_work+0x1f4/0x530 <4>[ 107.081387] [] ? process_one_work+0x189/0x530 <4>[ 107.082019] [] worker_thread+0x11c/0x370 <4&g