Re: [PATCH] usb: gadget: u_ether: remove interrupt throttling

2016-11-01 Thread Ville Syrjälä
On Tue, Nov 01, 2016 at 01:29:59PM +0200, Felipe Balbi wrote:
> According to Dave Miller "the networking stack has a
> hard requirement that all SKBs which are transmitted
> must have their completion signalled in a fininte
> amount of time. This is because, until the SKB is
> freed by the driver, it holds onto socket,
> netfilter, and other subsystem resources."
> 
> In summary, this means that using TX IRQ throttling
> for the networking gadgets is, at least, complex and
> we should avoid it for the time being.
> 
> Cc: 
> Reported-by: Ville Syrjälä 
> Suggested-by: David Miller 
> Signed-off-by: Felipe Balbi 

Fixes laggy/hangy g_ether on my BYT FFRD8 tablet, caused by
commit 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for LST bit")

Tested-by: Ville Syrjälä 

> ---
>  drivers/usb/gadget/function/u_ether.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/u_ether.c 
> b/drivers/usb/gadget/function/u_ether.c
> index f4a640216913..119a2e5848e8 100644
> --- a/drivers/usb/gadget/function/u_ether.c
> +++ b/drivers/usb/gadget/function/u_ether.c
> @@ -589,14 +589,6 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb,
>  
>   req->length = length;
>  
> - /* throttle high/super speed IRQ rate back slightly */
> - if (gadget_is_dualspeed(dev->gadget))
> - req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH ||
> -dev->gadget->speed == USB_SPEED_SUPER)) 
> &&
> - !list_empty(&dev->tx_reqs))
> - ? ((atomic_read(&dev->tx_qlen) % dev->qmult) != 0)
> - : 0;
> -
>   retval = usb_ep_queue(in, req, GFP_ATOMIC);
>   switch (retval) {
>   default:
> -- 
> 2.10.1

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"

2016-10-28 Thread Ville Syrjälä
On Fri, Oct 28, 2016 at 01:16:13PM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Ville Syrjälä  writes:
> > On Thu, Oct 06, 2016 at 12:08:26PM +0300, Ville Syrjälä wrote:
> >> On Thu, Oct 06, 2016 at 10:36:09AM +0300, Felipe Balbi wrote:
> >> > 
> >> > Hi,
> >> > 
> >> > Felipe Balbi  writes:
> >> 
> >> > Okay, I have found a regression on dwc3 and another patch follows:
> >> > 
> >> > commit 5e1a2af3e46248c55098cdae643c4141851b703e
> >> > Author: Felipe Balbi 
> >> > Date:   Wed Oct 5 14:24:37 2016 +0300
> >> > 
> >> > usb: dwc3: gadget: properly account queued requests
> >> > 
> >> > Some requests could be accounted for multiple
> >> > times. Let's fix that so each and every requests is
> >> > accounted for only once.
> >> > 
> >> > Cc:  # v4.8
> >> > Fixes: 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs 
> >> > for LST bit")
> >> > Signed-off-by: Felipe Balbi 
> >> > 
> >> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> >> > index 07cc8929f271..3c3ced128c77 100644
> >> > --- a/drivers/usb/dwc3/gadget.c
> >> > +++ b/drivers/usb/dwc3/gadget.c
> >> > @@ -783,6 +783,7 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
> >> >  req->trb = trb;
> >> >  req->trb_dma = dwc3_trb_dma_offset(dep, trb);
> >> >  req->first_trb_index = dep->trb_enqueue;
> >> > +dep->queued_requests++;
> >> >  }
> >> >  
> >> >  dwc3_ep_inc_enq(dep);
> >> > @@ -833,8 +834,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
> >> >  
> >> >  trb->ctrl |= DWC3_TRB_CTRL_HWO;
> >> >  
> >> > -dep->queued_requests++;
> >> > -
> >> >  trace_dwc3_prepare_trb(dep, trb);
> >> >  }
> >> >  
> >> > @@ -1861,8 +1860,11 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 
> >> > *dwc, struct dwc3_ep *dep,
> >> >  unsigned ints_pkt = 0;
> >> >  unsigned inttrb_status;
> >> >  
> >> > -dep->queued_requests--;
> >> >  dwc3_ep_inc_deq(dep);
> >> > +
> >> > +if (req->trb == trb)
> >> > +dep->queued_requests--;
> >> > +
> >> >  trace_dwc3_complete_trb(dep, trb);
> >> >  
> >> >  /*
> >> > 
> >> > I have also built a branch which you can use for testing. Here's a pull
> >> > request, once you tell me it works for you, then I can send proper
> >> > patches out:
> >> > 
> >> > The following changes since commit 
> >> > c8d2bc9bc39ebea8437fd974fdbc21847bb897a3:
> >> > 
> >> >   Linux 4.8 (2016-10-02 16:24:33 -0700)
> >> > 
> >> > are available in the git repository at:
> >> > 
> >> >   git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git 
> >> > tmp-test-v4.8
> >> > 
> >> > for you to fetch changes up to c968b8d1effe64a7980802d1eef29f4e1922faca:
> >> > 
> >> >   usb: dwc3: gadget: properly account queued requests (2016-10-06 
> >> > 10:16:37 +0300)
> >> > 
> >> > 
> >> > Felipe Balbi (2):
> >> >   usb: gadget: function: u_ether: don't starve tx request queue
> >> >   usb: dwc3: gadget: properly account queued requests
> >> > 
> >> >  drivers/usb/dwc3/gadget.c | 7 ---
> >> >  drivers/usb/gadget/function/u_ether.c | 5 +++--
> >> >  2 files changed, 7 insertions(+), 5 deletions(-)
> >> 
> >> Tried your branch, but unfortunately I'm still seeing the lags. New trace
> >> attached.
> >
> > Just a reminder that the regressions is still there in 4.9-rc2.
> > Unfortunateyly with all the stuff already piled on top, getting USB
> > working on my device is no longer a simple matter of reverting the
> > one commit. I had to revert 10 patches to get even a clean revert, but
> > unfortunately that approach just lead to the transfer hanging immediately.
&g

Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"

2016-10-27 Thread Ville Syrjälä
On Thu, Oct 06, 2016 at 12:08:26PM +0300, Ville Syrjälä wrote:
> On Thu, Oct 06, 2016 at 10:36:09AM +0300, Felipe Balbi wrote:
> > 
> > Hi,
> > 
> > Felipe Balbi  writes:
> 
> > Okay, I have found a regression on dwc3 and another patch follows:
> > 
> > commit 5e1a2af3e46248c55098cdae643c4141851b703e
> > Author: Felipe Balbi 
> > Date:   Wed Oct 5 14:24:37 2016 +0300
> > 
> > usb: dwc3: gadget: properly account queued requests
> > 
> > Some requests could be accounted for multiple
> > times. Let's fix that so each and every requests is
> > accounted for only once.
> > 
> > Cc:  # v4.8
> > Fixes: 55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for 
> > LST bit")
> > Signed-off-by: Felipe Balbi 
> > 
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index 07cc8929f271..3c3ced128c77 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -783,6 +783,7 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
> > req->trb = trb;
> > req->trb_dma = dwc3_trb_dma_offset(dep, trb);
> > req->first_trb_index = dep->trb_enqueue;
> > +   dep->queued_requests++;
> > }
> >  
> > dwc3_ep_inc_enq(dep);
> > @@ -833,8 +834,6 @@ static void dwc3_prepare_one_trb(struct dwc3_ep *dep,
> >  
> > trb->ctrl |= DWC3_TRB_CTRL_HWO;
> >  
> > -   dep->queued_requests++;
> > -
> > trace_dwc3_prepare_trb(dep, trb);
> >  }
> >  
> > @@ -1861,8 +1860,11 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 
> > *dwc, struct dwc3_ep *dep,
> > unsigned ints_pkt = 0;
> > unsigned inttrb_status;
> >  
> > -   dep->queued_requests--;
> > dwc3_ep_inc_deq(dep);
> > +
> > +   if (req->trb == trb)
> > +   dep->queued_requests--;
> > +
> > trace_dwc3_complete_trb(dep, trb);
> >  
> > /*
> > 
> > I have also built a branch which you can use for testing. Here's a pull
> > request, once you tell me it works for you, then I can send proper
> > patches out:
> > 
> > The following changes since commit c8d2bc9bc39ebea8437fd974fdbc21847bb897a3:
> > 
> >   Linux 4.8 (2016-10-02 16:24:33 -0700)
> > 
> > are available in the git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git tmp-test-v4.8
> > 
> > for you to fetch changes up to c968b8d1effe64a7980802d1eef29f4e1922faca:
> > 
> >   usb: dwc3: gadget: properly account queued requests (2016-10-06 10:16:37 
> > +0300)
> > 
> > 
> > Felipe Balbi (2):
> >   usb: gadget: function: u_ether: don't starve tx request queue
> >   usb: dwc3: gadget: properly account queued requests
> > 
> >  drivers/usb/dwc3/gadget.c | 7 ---
> >  drivers/usb/gadget/function/u_ether.c | 5 +++--
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> Tried your branch, but unfortunately I'm still seeing the lags. New trace
> attached.

Just a reminder that the regressions is still there in 4.9-rc2.
Unfortunateyly with all the stuff already piled on top, getting USB
working on my device is no longer a simple matter of reverting the
one commit. I had to revert 10 patches to get even a clean revert, but
unfortunately that approach just lead to the transfer hanging immediately.

So what I ended up doing is reverting all of this:
git log --no-merges --format=oneline 
55a0237f8f47957163125e20ee9260538c5c341c^..HEAD -- drivers/usb/dwc3/ 
include/linux/ulpi/interface.h drivers/usb/Kconfig drivers/usb/core/Kconfig

which comes out at whopping 70 commits. Having to carry that around
is going to be quite a pain especially as more stuff might be piled on
top.

/me thinks a stable backport of any fix (assuming one is found
eventually) is going to be quite the challenge...

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "usb: dwc3: gadget: use allocated/queued reqs for LST bit"

2016-10-04 Thread Ville Syrjälä
On Tue, Oct 04, 2016 at 03:23:12PM +0300, Felipe Balbi wrote:
> 
> Hi,
> 
> Ville Syrjälä  writes:
> > On Mon, Oct 03, 2016 at 07:08:43PM +0300, Felipe Balbi wrote:
> >> 
> >> Hi,
> >> 
> >> ville.syrj...@linux.intel.com writes:
> >> > From: Ville Syrjälä 
> >> >
> >> > This reverts commit 55a0237f8f47957163125e20ee9260538c5c341c.
> >> >
> >> > commit  55a0237f8f47 ("usb: dwc3: gadget: use allocated/queued reqs for
> >> > LST bit") causes my BYT FFRD8 with g_ether to behave poorly. ssh/scp
> >> > is very sluggish and can even stall entirely. Revert cures it.
> >> >
> >> > Cc: Felipe Balbi 
> >> > Cc: sta...@vger.kernel.org
> >> > Signed-off-by: Ville Syrjälä 
> >> 
> >> Reverting that commit is not the best idea. Let's start with the usual:
> >> 
> >> - Kernel version
> >> - dmesg on both sides (host and device)
> >> - dwc3 tracepoints:
> >> 
> >> # mkdir -p /t
> >> # mount -t tracefs none /t
> >> # echo 8192 > /t/buffer_size_kb
> >> # echo 1 > /t/events/dwc3/enable
> >> # echo 0 > /t/events/dwc3/dwc3_readl/enable
> >> # echo 0 > /t/events/dwc3/dwc3_writel/enable
> >> 
> >> This should be enough to tell me what's really going on.
> >
> > I've attached the (compressed due to size) dmesgs/traces from the device.
> > For my test I just did 'for i in `seq 1 5` ; dmesg ; done' in an ssh
> > session, and I did get the lag in the bad run, so presumably the trace
> > should have caught it. Both were on 4.8 (+ the revert for the good run).
> >
> > Host side dmesg just showes me:
> >  usb 3-3.4: new high-speed USB device number 25 using xhci_hcd
> >  cdc_ether 3-3.4:1.0 usb0: register 'cdc_ether' at usb-:00:14.0-3.4, 
> > CDC Ethernet Device, ...
> > when I boot the device. Nothing extra appears when things lag around.
> > I'm running 4.5.something on the host.
> 
> Looking at your logs what I can see is that "revert.trace" doesn't
> respect gadget driver's IRQ throttling, which means you get an interrupt
> for each and every request, while "bad.trace" shows dwc3 respecting
> "no_interrupt" flag passed in by g_ether.
> 
> The only explanation I have for this behavior when we start respecting
> gadget driver's flags, is that there's a possible race in u_ether.c
> where it can go for a long time without an interrupt due to that
> throttling.
> 
> This only happens with IN endpoints (the ones which are throttled) and
> the behavior can be seen in log snippet below:
> 
>   27782: irq/17-dwc3-2524  [000] d...   666.891969: tx_complete: req 
> 88016da1a6c0 length 1514/1514 flags ZsI tx_qlen 4 qmult 5
>   27802: ksoftirqd/0-3 [000] .Ns1   666.892047: eth_start_xmit: req 
> 88016da1a6c0 length 0/1514 flags Zsi tx_qlen 5 qmult 5
>   27808: ksoftirqd/0-3 [000] .Ns2   666.892062: eth_start_xmit: req 
> 88016da1a300 length 0/1514 flags ZsI tx_qlen 6 qmult 5
>   27814: ksoftirqd/0-3 [000] .Ns2   666.892069: eth_start_xmit: req 
> 88016da1ab40 length 0/1514 flags Zsi tx_qlen 7 qmult 5
>   27820: ksoftirqd/0-3 [000] .Ns2   666.892075: eth_start_xmit: req 
> 88016da1acc0 length 0/1514 flags Zsi tx_qlen 8 qmult 5
>   27826: ksoftirqd/0-3 [000] .Ns2   666.892082: eth_start_xmit: req 
> 88016da1ae40 length 0/1514 flags Zsi tx_qlen 9 qmult 5
>   27832: ksoftirqd/0-3 [000] .Ns2   666.892088: eth_start_xmit: req 
> 88016da1a780 length 0/1514 flags Zsi tx_qlen 10 qmult 5
>   27884: irq/17-dwc3-2524  [000] d...   666.892213: tx_complete: req 
> 88016da1a840 length 1514/1514 flags Zsi tx_qlen 9 qmult 5
>   27889: irq/17-dwc3-2524  [000] d...   666.892215: tx_complete: req 
> 88016da1a600 length 1514/1514 flags Zsi tx_qlen 8 qmult 5
>   27897:sshd-2684  [002] ..s1   666.892223: eth_start_xmit: req 
> 88016da1a840 length 0/1514 flags Zsi tx_qlen 9 qmult 5
>   27898: irq/17-dwc3-2524  [000] d...   666.892223: tx_complete: req 
> 88016da1a180 length 1514/1514 flags Zsi tx_qlen 8 qmult 5
>   27903: irq/17-dwc3-2524  [000] d...   666.892225: tx_complete: req 
> 88016da1a240 length 1514/1514 flags Zsi tx_qlen 7 qmult 5
>   27910:sshd-2684  [002] ..s1   666.892230: eth_start_xmit: req 
> 88016da1a180 length 0/1514 flags Zsi tx_qlen 8 qmult 5
>   27913: irq/17-dwc3-2524  [000] d...   666.892231: tx_complete: req 
> 88016da1a6c0 length 1514/1514 flags Zsi tx_qlen 7 qmult 5
&

Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"

2015-09-28 Thread Ville Syrjälä
On Mon, Sep 07, 2015 at 09:56:06AM +0300, Heikki Krogerus wrote:
> Hi,
> 
> On Tue, Sep 01, 2015 at 06:37:54PM +0300, Ville Syrjälä wrote:
> > On Tue, Sep 01, 2015 at 10:17:59AM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > On Tue, Sep 01, 2015 at 05:39:28PM +0300, Ville Syrjälä wrote:
> > > > On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote:
> > > > > Hi,
> > > > > 
> > > > > On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote:
> > > > > > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote:
> > > > > > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote:
> > > > > > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, 
> > > > > > > > > ville.syrj...@linux.intel.com wrote:
> > > > > > > > > > From: Ville Syrjälä 
> > > > > > > > > > 
> > > > > > > > > > This reverts commit 
> > > > > > > > > > 8f2c9544aba636134303105ecb164190a39dece4.
> > > > > > > > > > 
> > > > > > > > > > As it breaks g_ether on my Baytrail FFRD8 device. 
> > > > > > > > > > Everything starts out
> > > > > > > > > > fine, but after a bit of data has been transferred it just 
> > > > > > > > > > stops
> > > > > > > > > > flowing.
> > > > > > > > > > 
> > > > > > > > > > Note that I do get a bunch of these "NOHZ: 
> > > > > > > > > > local_softirq_pending 08"
> > > > > > > > > > when booting the machine, but I'm not really sure if 
> > > > > > > > > > they're related
> > > > > > > > > > to this problem.
> > > > > > > > > 
> > > > > > > > > I have a feeling your problem is elsewhere. We *are* 
> > > > > > > > > completing one TRB
> > > > > > > > > at a time. By reverting that commit you're just masking the 
> > > > > > > > > real problem
> > > > > > > > > and I'd rather get that one fixed.
> > > > > > > > > 
> > > > > > > > > How do you reproduce your issue ?
> > > > > > > > 
> > > > > > > > Just boot the system, it gets an IP from dnsmasq on my host, 
> > > > > > > > then I ssh
> > > > > > > > into it and do something to produce a bit of console output, 
> > > > > > > > after which
> > > > > > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it.
> > > > > > > 
> > > > > > > which kernel version ?
> > > > > > 
> > > > > > Anything since the patch went in, so 4.1-rc
> > > > > > 
> > > > > > > Running as USB2 or USB3 ?
> > > > > > 
> > > > > > speed:480, so USB2 I presume?
> > > > > > 
> > > > > > > Have you tried
> > > > > > > linux-next ?
> > > > > > 
> > > > > > Tried it now (next-20150901). Equally bad as the rest.
> > > > > > 
> > > > > > > I just did 1000 dmesg iterations over ssh with g_ether and
> > > > > > > saw no issues.
> > > > > > > 
> > > > > > > Can you enable dwc3 tracepoints and try again ? (use some very 
> > > > > > > large
> > > > > > > trace buffer, something around 2 or 4 MiB should be enough).
> > > > > > 
> > > > > > Attached one trace from linux-next, and another one with the revert 
> > > > > > on
> > > > > > top.
> > > > > 
> > > > > are you sure these come from next ?
> > > > 
> > > > Yep.
> > > > 
> > > > > It makes zero sense :-) Here's an
> > > > > odd snippet:
> > > > > 
> > > > > | sshd-1719  [000] d.s342.579785: dwc3_ep_q

Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"

2015-09-01 Thread Ville Syrjälä
On Tue, Sep 01, 2015 at 10:17:59AM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Tue, Sep 01, 2015 at 05:39:28PM +0300, Ville Syrjälä wrote:
> > On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote:
> > > > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote:
> > > > > Hi,
> > > > > 
> > > > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote:
> > > > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote:
> > > > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, 
> > > > > > > ville.syrj...@linux.intel.com wrote:
> > > > > > > > From: Ville Syrjälä 
> > > > > > > > 
> > > > > > > > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4.
> > > > > > > > 
> > > > > > > > As it breaks g_ether on my Baytrail FFRD8 device. Everything 
> > > > > > > > starts out
> > > > > > > > fine, but after a bit of data has been transferred it just stops
> > > > > > > > flowing.
> > > > > > > > 
> > > > > > > > Note that I do get a bunch of these "NOHZ: 
> > > > > > > > local_softirq_pending 08"
> > > > > > > > when booting the machine, but I'm not really sure if they're 
> > > > > > > > related
> > > > > > > > to this problem.
> > > > > > > 
> > > > > > > I have a feeling your problem is elsewhere. We *are* completing 
> > > > > > > one TRB
> > > > > > > at a time. By reverting that commit you're just masking the real 
> > > > > > > problem
> > > > > > > and I'd rather get that one fixed.
> > > > > > > 
> > > > > > > How do you reproduce your issue ?
> > > > > > 
> > > > > > Just boot the system, it gets an IP from dnsmasq on my host, then I 
> > > > > > ssh
> > > > > > into it and do something to produce a bit of console output, after 
> > > > > > which
> > > > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it.
> > > > > 
> > > > > which kernel version ?
> > > > 
> > > > Anything since the patch went in, so 4.1-rc
> > > > 
> > > > > Running as USB2 or USB3 ?
> > > > 
> > > > speed:480, so USB2 I presume?
> > > > 
> > > > > Have you tried
> > > > > linux-next ?
> > > > 
> > > > Tried it now (next-20150901). Equally bad as the rest.
> > > > 
> > > > > I just did 1000 dmesg iterations over ssh with g_ether and
> > > > > saw no issues.
> > > > > 
> > > > > Can you enable dwc3 tracepoints and try again ? (use some very large
> > > > > trace buffer, something around 2 or 4 MiB should be enough).
> > > > 
> > > > Attached one trace from linux-next, and another one with the revert on
> > > > top.
> > > 
> > > are you sure these come from next ?
> > 
> > Yep.
> > 
> > > It makes zero sense :-) Here's an
> > > odd snippet:
> > > 
> > > | sshd-1719  [000] d.s342.579785: dwc3_ep_queue: ep1in: 
> > > req 880077afa540 length 822/1514 ==> 0
> > > | sshd-1719  [000] d.s342.580075: dwc3_ep_queue: ep1in: 
> > > req 880077afa6c0 length 0/334 ==> -108
> > > |  systemd-network-1618  [003] d.s342.754796: dwc3_ep_queue: ep1in: 
> > > req 880077afa780 length 0/120 ==> -108
> > > 
> > > your requests are queued with -ESHUTDOWN!!
> > 
> > Looking at the code the tracepoint is before the request is queued, so
> > maybe there's just stale junk in req->status before it gets overwritten
> > by __dwc3_gadget_ep_queue()?
> 
> right, something touched usb_request.status before and the request has
> been recycled.
> 
> > > more requests are queued and that's it. No further traffic. It just
> > > stopped working. No further IRQs, nothing.
> > > 
> > > mine looks very much different (see attached). I don't have any
> > > -ESHUTDOWNs. How did you load g_ether ? Did you pass any extra options ?
> > 
> > g_ether is builtin, and I just pass g_ether.dev_addr= via kernel 
> > cmdline.
> > 
> > > Which IP version are you running ?
> > 
> > ipv4
> 
> I mean the SNPS IP :-) (it's 2.10a, see below)

It's all Greek to me :)

> 
> > GSBUSCFG0 = 0x0006
> > GSBUSCFG1 = 0x0f00
> > GTXTHRCFG = 0x230a
> > GRXTHRCFG = 0x2280
> > GCTL = 0x45802002
> > GEVTEN = 0x
> > GSTS = 0x3e82
> > GSNPSID = 0x5533210a
> 
> this could be a bug with 2.10a where completion IRQs are missed. Any
> chance you can look for you Errata document and see if any exist ? I'm
> using 2.40a.

Ugh. USB isn't my thing, so I'm definitely not going to start hunting down
any obscure docs.

Cc:ing Mathias and Heikki since it looks like they've touched this beast
before. You guys have any docs and/or clue as to what's happening here?

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"

2015-09-01 Thread Ville Syrjälä
On Tue, Sep 01, 2015 at 08:59:02AM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Tue, Sep 01, 2015 at 04:17:00PM +0300, Ville Syrjälä wrote:
> > On Mon, Aug 31, 2015 at 01:50:10PM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > On Mon, Aug 31, 2015 at 08:25:10PM +0300, Ville Syrjälä wrote:
> > > > On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote:
> > > > > On Mon, Aug 31, 2015 at 07:48:28PM +0300, 
> > > > > ville.syrj...@linux.intel.com wrote:
> > > > > > From: Ville Syrjälä 
> > > > > > 
> > > > > > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4.
> > > > > > 
> > > > > > As it breaks g_ether on my Baytrail FFRD8 device. Everything starts 
> > > > > > out
> > > > > > fine, but after a bit of data has been transferred it just stops
> > > > > > flowing.
> > > > > > 
> > > > > > Note that I do get a bunch of these "NOHZ: local_softirq_pending 08"
> > > > > > when booting the machine, but I'm not really sure if they're related
> > > > > > to this problem.
> > > > > 
> > > > > I have a feeling your problem is elsewhere. We *are* completing one 
> > > > > TRB
> > > > > at a time. By reverting that commit you're just masking the real 
> > > > > problem
> > > > > and I'd rather get that one fixed.
> > > > > 
> > > > > How do you reproduce your issue ?
> > > > 
> > > > Just boot the system, it gets an IP from dnsmasq on my host, then I ssh
> > > > into it and do something to produce a bit of console output, after which
> > > > g_ether is dead. Eg. 'dmesg' a few times is enough to kill it.
> > > 
> > > which kernel version ?
> > 
> > Anything since the patch went in, so 4.1-rc
> > 
> > > Running as USB2 or USB3 ?
> > 
> > speed:480, so USB2 I presume?
> > 
> > > Have you tried
> > > linux-next ?
> > 
> > Tried it now (next-20150901). Equally bad as the rest.
> > 
> > > I just did 1000 dmesg iterations over ssh with g_ether and
> > > saw no issues.
> > > 
> > > Can you enable dwc3 tracepoints and try again ? (use some very large
> > > trace buffer, something around 2 or 4 MiB should be enough).
> > 
> > Attached one trace from linux-next, and another one with the revert on
> > top.
> 
> are you sure these come from next ?

Yep.

> It makes zero sense :-) Here's an
> odd snippet:
> 
> | sshd-1719  [000] d.s342.579785: dwc3_ep_queue: ep1in: req 
> 880077afa540 length 822/1514 ==> 0
> | sshd-1719  [000] d.s342.580075: dwc3_ep_queue: ep1in: req 
> 880077afa6c0 length 0/334 ==> -108
> |  systemd-network-1618  [003] d.s342.754796: dwc3_ep_queue: ep1in: req 
> 880077afa780 length 0/120 ==> -108
> 
> your requests are queued with -ESHUTDOWN!!

Looking at the code the tracepoint is before the request is queued, so
maybe there's just stale junk in req->status before it gets overwritten
by __dwc3_gadget_ep_queue()?

> 
> |   -0 [000] d.h342.877628: dwc3_readl: addr 
> c940040c value 0004
> |   -0 [000] d.h342.877635: dwc3_readl: addr 
> c9400408 value 0100
> |   -0 [000] d.h342.877638: dwc3_writel: addr 
> c9400408 value 8100
> |  irq/22-dwc3-753   [002] d..242.878300: dwc3_event: event 00c4
> 
> so you had an IRQ, fine.
> 
> |  irq/22-dwc3-753   [002] d..242.878312: dwc3_gadget: ep1out: reason 
> Transfer Not Active
> |  irq/22-dwc3-753   [002] d..242.878328: dwc3_gadget_ep_cmd: ep1out: 
> cmd 'Start Transfer' [6] params  77ad9030 
> 
> a transfer is started.
> 
> |  irq/22-dwc3-753   [002] d..242.878332: dwc3_writel: addr 
> c9400828 value 
> |  irq/22-dwc3-753   [002] d..242.878336: dwc3_writel: addr 
> c9400824 value 77ad9030
> |  irq/22-dwc3-753   [002] d..242.878339: dwc3_writel: addr 
> c9400820 value 
> |  irq/22-dwc3-753   [002] d..242.878342: dwc3_writel: addr 
> c940082c value 0406
> |  irq/22-dwc3-753   [002] d..242.878345: dwc3_readl: addr 
> c940082c value 00050006
> |  irq/22-dwc3-753   [002] d..242.878348: dwc3_gadget: Command 
> Complete --> 0
> |  irq/22-dwc3-753   [002] d..242.

Re: [PATCH] Revert "usb: dwc3: gadget: drop unnecessary loop when cleaning up TRBs"

2015-08-31 Thread Ville Syrjälä
On Mon, Aug 31, 2015 at 11:54:13AM -0500, Felipe Balbi wrote:
> On Mon, Aug 31, 2015 at 07:48:28PM +0300, ville.syrj...@linux.intel.com wrote:
> > From: Ville Syrjälä 
> > 
> > This reverts commit 8f2c9544aba636134303105ecb164190a39dece4.
> > 
> > As it breaks g_ether on my Baytrail FFRD8 device. Everything starts out
> > fine, but after a bit of data has been transferred it just stops
> > flowing.
> > 
> > Note that I do get a bunch of these "NOHZ: local_softirq_pending 08"
> > when booting the machine, but I'm not really sure if they're related
> > to this problem.
> 
> I have a feeling your problem is elsewhere. We *are* completing one TRB
> at a time. By reverting that commit you're just masking the real problem
> and I'd rather get that one fixed.
> 
> How do you reproduce your issue ?

Just boot the system, it gets an IP from dnsmasq on my host, then I ssh
into it and do something to produce a bit of console output, after which
g_ether is dead. Eg. 'dmesg' a few times is enough to kill it.

Here's what I have in my .config:
CONFIG_USB_OTG=y
CONFIG_USB_OTG_FSM=y
CONFIG_USB_DWC3=y
CONFIG_USB_DWC3_GADGET=y
CONFIG_USB_DWC3_PCI=y
CONFIG_USB_PHY=y
CONFIG_NOP_USB_XCEIV=y
CONFIG_USB_GPIO_VBUS=y
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_VBUS_DRAW=2
CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS=2
CONFIG_USB_LIBCOMPOSITE=y
CONFIG_USB_U_ETHER=y
CONFIG_USB_F_ECM=y
CONFIG_USB_F_SUBSET=y
CONFIG_USB_ETH=y

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-07 Thread Ville Syrjälä
On Wed, May 07, 2014 at 04:48:48PM +0300, Mathias Nyman wrote:
> On 05/06/2014 02:41 PM, Ville Syrjälä wrote:
> > On Mon, May 05, 2014 at 12:32:22PM -0700, Julius Werner wrote:
> >> Hmmm... very odd. I unfortunately don't have a machine that can easily
> >> do S4 at hand, but I did test this on an IVB with XHCI_RESET_ON_RESUME
> >> in S3 (essentially the same code path), and I didn't run into any
> >> problems.
> >>
> >> How exactly does your machine fail on resume? Is it a kernel crash or
> >> just a hang? Can you try getting some debug output (by setting 'echo N
> >>> /sys/module/printk/parameters/console_suspend' and trying to catch
> >> the crash on the screen or a serial line, or maybe through pstore)? I
> >> really don't see much that could go wrong with this patch, so without
> >> more info it will be hard to understand your problem.
> >>
> >> Also, I noticed that you have two HID devices plugged in during
> >> suspend. Does it make a difference if you have different devices (e.g.
> >> a mass storage stick) or none at all?
> >
> > Looks like it doesn't like it when there's anything plugged into the
> > "SS" ports. I tried with just a HID keyboard or with just a hub. In
> > both cases it fails to resume. If I have nothing connected to the "SS"
> > ports then it resumes just fine.
> >
> > I managed to catch something with ramoops. Looks like it's hitting
> > POISON_FREE when trying to delete some list entry.
> >
> 
> > <4>[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from 
> > BW list!
> > <4>[  107.047574] general protection fault:  [#1] PREEMPT SMP
> 
> I took a look at the xhci_mem_cleanup() function and to me it looks
> like it tries to access a list_head that is already freed.
> 
> The struct list_head xhci->devs[].eps[].bw_endpoint_list is added to an 
> endpoint 
> list in xhci->rh_bw[].bw_table.interval_bw[].endpoints
> 
> xhci_mem_cleanup() frees all devices (the allocated xhci->devs[], containing 
> the 
> bw_endpoint_list) before it starts to loop through, and delete entries from 
> the 
> xhci->rh_bw[].bw_table.interval_bw[].endpoints list.
> 
> I can't see how this relates to Julius patch though, and I'm not sure yet why 
> it 
> only triggers when devices are connected to SS ports. Maybe just unlucky 
> timing?

I think the non-SS ports are connected to the EHCI controllers rather
than the XHCI controllers. So that explains at least one detail. And I
guess timing is as good an excuse as any why this gets exposed by the
patch in question.

> 
> Does this help?:

Indeed it does. The machine just survived a dozen or so suspend+resume
cycles without a hitch. The bug was 100% reproducible on this machine,
so the fix seems solid.

Tested-by: Ville Syrjälä 

> 
> diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
> index c089668..b1a8a5f 100644
> --- a/drivers/usb/host/xhci-mem.c
> +++ b/drivers/usb/host/xhci-mem.c
> @@ -1822,6 +1822,16 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  kfree(cur_cd);
>  }
> 
> +   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
> +   for (i = 0; i < num_ports; i++) {
> +   struct xhci_interval_bw_table *bwt = &xhci->rh_bw[i].bw_table;
> +   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
> +   struct list_head *ep = &bwt->interval_bw[j].endpoints;
> +   while (!list_empty(ep))
> +   list_del_init(ep->next);
> +   }
> +   }
> +
>  for (i = 1; i < MAX_HC_SLOTS; ++i)
>  xhci_free_virt_device(xhci, i);
> 
> @@ -1857,16 +1867,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
>  if (!xhci->rh_bw)
>  goto no_bw;
> 
> -   num_ports = HCS_MAX_PORTS(xhci->hcs_params1);
> -   for (i = 0; i < num_ports; i++) {
> -   struct xhci_interval_bw_table *bwt = &xhci->rh_bw[i].bw_table;
> -   for (j = 0; j < XHCI_MAX_INTERVAL; j++) {
> -   struct list_head *ep = &bwt->interval_bw[j].endpoints;
> -   while (!list_empty(ep))
> -   list_del_init(ep->next);
> -   }
> -   }
> -
>  for (i = 0; i < num_ports; i++) {
>  struct xhci_tt_bw_info *tt, *n;
>  list_for_each_entry_safe(tt, n, &xhci->rh_bw[i].tts, 
> tt_list) {

-- 
Ville Syrjälä
Intel OTC
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [regression 3.15-rc3] Resume from s4 broken by 1f81b6d22a5980955b01e08cf27fb745dc9b686f

2014-05-06 Thread Ville Syrjälä
 106.911605] smpboot: Booting Node 0 Processor 7 APIC 0x7
<6>[  106.923408] kvm: enabling virtualization on CPU7
<6>[  106.928161] microcode: CPU7 sig=0x306a9, pf=0x2, revision=0x19
<6>[  106.934883] CPU7 is up
<6>[  106.957959] ACPI: Waking up from system sleep state S4
<6>[  106.990680] PM: noirq restore of devices complete after 11.474 msecs
<6>[  106.993975] PM: early restore of devices complete after 3.024 msecs
<4>[  107.046519] usb usb3: root hub lost power or was reset
<4>[  107.046549] usb usb1: root hub lost power or was reset
<4>[  107.046694] usb usb4: root hub lost power or was reset
<4>[  107.047230] xhci_hcd :00:14.0: Slot 1 endpoint 2 not removed from BW 
list!
<4>[  107.047574] general protection fault:  [#1] PREEMPT SMP 
<4>[  107.047768] usb usb2: root hub lost power or was reset
<4>[  107.048277] 
<4>[  107.049138] Modules linked in: x86_pkg_temp_thermal coretemp kvm_intel 
kvm aesni_intel aes_x86_64 glue_helper iTCO_wdt lrw i915 snd_hda_codec_hdmi
<4>[  107.050193] usb usb5: root hub lost power or was reset
<4>[  107.050196] usb usb6: root hub lost power or was reset
<7>[  107.050410] snd_hda_intel :00:1b.0: irq 44 for MSI/MSI-X
<7>[  107.050453] ehci-pci :00:1a.0: cache line size of 64 is not supported
<7>[  107.051666] ehci-pci :00:1d.0: cache line size of 64 is not supported
<7>[  107.052951] xhci_hcd :04:00.0: irq 45 for MSI/MSI-X
<7>[  107.053018] xhci_hcd :04:00.0: irq 46 for MSI/MSI-X
<7>[  107.053078] xhci_hcd :04:00.0: irq 47 for MSI/MSI-X
<7>[  107.053137] xhci_hcd :04:00.0: irq 48 for MSI/MSI-X
<7>[  107.053197] xhci_hcd :04:00.0: irq 49 for MSI/MSI-X
<7>[  107.054698] xhci_hcd :04:00.0: irq 50 for MSI/MSI-X
<7>[  107.054763] xhci_hcd :04:00.0: irq 51 for MSI/MSI-X
<7>[  107.054830] xhci_hcd :04:00.0: irq 52 for MSI/MSI-X
<4>[  107.057882]  snd_hda_codec_realtek hid_generic snd_hda_codec_generic 
gf128mul snd_hda_intel ablk_helper usbhid cryptd snd_hda_controller hid 
cfbfillrect snd_hda_codec cfbimgblt snd_hwdep snd_pcm i2c_algo_bit cfbcopyarea 
snd_timer drm_kms_helper psmouse snd drm lpc_ich pcspkr i2c_i801 mfd_core 
soundcore wmi evdev
<4>[  107.062030] CPU: 2 PID: 756 Comm: kworker/u16:2 Tainted: GW 
3.15.0-rc4-hang #16
<5>[  107.062068] sd 0:0:0:0: [sda] Starting disk
<4>[  107.062992] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0100.2013.0517.0942 05/17/2013
<4>[  107.063454] Workqueue: events_unbound async_run_entry_fn
<4>[  107.063892] task: 88007cb58000 ti: 88007b6fc000 task.ti: 
88007b6fc000
<4>[  107.064311] RIP: 0010:[]  [] 
__list_del_entry+0x29/0xd0
<4>[  107.064942] RSP: :88007b6fdb48  EFLAGS: 00010a83
<4>[  107.065446] RAX: 6b6b6b6b6b6b6b6b RBX: 88007b0ba1e8 RCX: 
dead00200200
<4>[  107.065962] RDX: 6b6b6b6b6b6b6b6b RSI: 88007cb58750 RDI: 
88007ac40318
<4>[  107.066479] RBP: 88007b6fdb48 R08:  R09: 
0001
<4>[  107.067024] R10:  R11:  R12: 
0003
<4>[  107.067528] R13: 88007b108000 R14: 88007b0ba160 R15: 
88007ac40318
<4>[  107.068051] FS:  () GS:88007fc8() 
knlGS:
<4>[  107.068560] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  107.068901] CR2:  CR3: 01a0e000 CR4: 
001407e0
<4>[  107.069247] Stack:
<4>[  107.069563]  88007b6fdbb8 81423d08 0001 
88007cad2290
<4>[  107.070156]  88007b6fdb88 812e4815 1580 

<4>[  107.070929]  88007b6fdbb8 88007b108000 88007b0f7978 
88007b0f7978
<4>[  107.071788] Call Trace:
<4>[  107.072260]  [] xhci_mem_cleanup+0x428/0x610
<4>[  107.072756]  [] ? pci_disable_msi+0x45/0x50
<4>[  107.073193]  [] xhci_resume+0x210/0x3a0
<4>[  107.073618]  [] ? usb_enable_intel_xhci_ports+0xbd/0xd0
<4>[  107.074245]  [] xhci_pci_resume+0x36/0x50
<4>[  107.074837]  [] resume_common+0xa1/0x150
<4>[  107.075460]  [] ? pci_pm_default_resume+0x50/0x50
<4>[  107.076103]  [] hcd_pci_restore+0x13/0x20
<4>[  107.076692]  [] pci_pm_restore+0x88/0xf0
<4>[  107.077282]  [] ? device_resume+0x6c/0x1a0
<4>[  107.077902]  [] dpm_run_callback+0x3a/0xe0
<4>[  107.078489]  [] ? _raw_spin_unlock_irq+0x30/0x60
<4>[  107.079119]  [] device_resume+0xae/0x1a0
<4>[  107.079741]  [] async_resume+0x21/0x50
<4>[  107.080171]  [] async_run_entry_fn+0x46/0x140
<4>[  107.080792]  [] process_one_work+0x1f4/0x530
<4>[  107.081387]  [] ? process_one_work+0x189/0x530
<4>[  107.082019]  [] worker_thread+0x11c/0x370
<4&g