Hello Felipe,

I am not using DWC3.

I have new information about the problem. 
The context was this: 
  1) device boots 
  2) some usb transfers happen, all are OK 
  3) a device app runs to completion (USB quiescent during this time, no USB 
transfers required) 
  4) the controlling PC starts a 4 KByte USB transfer to the device, but this 
transfer does not finish. Only 3 Kbytes are ACK'd by the device.
       (A USB analyzer shows the host trying to send more, but the device 
persistently NAK's)

  If step (3) is omitted, everything works fine.

The new information is that step (3) consumes a lot of memory and the theory is 
the OS is throwing USB user-space code pages out of RAM and using this RAM for 
NFS files (root file system is NFS mounted). 

Continuing on with this theory:
After the USB related code pages are no longer in RAM, a USB transaction 
happens. 
Now there is a race condition:
1)      The USB transaction complete interrupt and the OS call into user-space 
USB related code (functionfs, aio, etc) 
2)      The Linux paging system trying to page the user-space USB related code 
back into RAM

The theory is that (1) can happen before (2). 

As I understand it, the techniques below may solve the problem: 
* USB user space code calls mlockall()
* Change system swappiness


AP 



> -----Original Message-----
> From: Felipe Balbi [mailto:felipe.ba...@linux.intel.com]
> Sent: Tuesday, October 31, 2017 4:19 AM
> To: PURCELL,ANDY (K-Loveland,ex1) <andy_purc...@keysight.com>; linux-
> u...@vger.kernel.org
> Subject: Re: Linux and usb device drivers using functionfs
> 
> 
> Hi,
> 
> andy_purc...@keysight.com writes:
> > I have implemented a USB device function using Linux functionfs and
> > now there is a problem being reported.
> >
> > I need to ask this group for advice.
> >
> > The problem is this:
> > 1) device boots
> >
> > 2) some usb transfers happen, all are OK
> >
> > 3) a device app runs to completion (USB quiescent during this time, no
> > USB transfers required)
> >
> > 4) the controlling PC starts a 4 KByte USB transfer to the device, but
> > this transfer does not finish. Only 3 Kbytes are ACK'd by the device.
> >
> >      (A USB analyzer shows the host trying to send more, but the
> >      device persistently NAK's)
> >
> > If step (3) is omitted, everything works fine. It is reliable - 15/15
> > times it is OK.
> >
> > The USB device function is implemented with functionfs and aio. Most
> > of the implementation is in user space.
> >
> > An off-the-shelf low level Linux driver is being used.
> >
> > Regression tests show no problems with various sized USB transfers for
> > over 24 hours.
> 
> Okay, let's try to figure out what's going on. Are you using dwc3, by any 
> chance?
> If you are, can you capture tracepoints of the failing case?
> 
> While it could be something on the application side, I want to be sure the
> controller is behaving properly.
> 
> For details on how to capture tracepoints, see [1] below.
> 
> > A colleague has investigated and has asserted user space is not the
> > right way to do things.
> >
> > He says:
> >
> > "It appeared that running the <device app> was enough to swap the usb
> > code out that it wasn't able to swap back in quick enough to respond
> > to the USB traffic in a timely fashion" .... "This is the major
> > drawback to user space drivers as opposed to kernel drivers.  Kernel
> > drivers pages are locked into memory while user space can be swapped
> > out.  There were numerous articles about this, but the best one I
> > found was:
> >
> > http://www.makelinux.net/ldd3/chp-2-sect-9 "
> >
> >     Linux Device Drivers, 3rd Edition, By Jonathan Corbet,
> >     Greg Kroah-Hartman, Alessandro Rubini : February 2005
> >
> > "There pertinent part is:
> >
> > o    Response time is slower, because a context switch is required to
> > transfer information or actions between the client and the hardware.
> >
> > o    Worse yet, if the driver has been swapped to disk, response time
> > is unacceptably long. Using the mlock system call might help, but
> > usually you'll need to lock many memory pages, because a user-space
> > program depends on a lot of library code. mlock, too, is limited to
> > privileged users.
> >
> > Some articles I read stated that the swap could take seconds."
> >
> >
> > QUESTIONS:
> >
> > - Did I make a mistake using user space and functionfs?
> >   (I thought state-of-the-art way to do usb function drivers was to
> >   use functionfs...)
> 
> right, unless you can use some of the in-tree functions, it doesn't make sure 
> to
> rely on an ever-changing internal API :-)
> 
> > - Should I add calls to mlock() to try to fix?
> 
> that's an easy enough test, yes :-)
> 
> > Any advice is appreciated.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docume
> ntation/driver-api/usb/dwc3.rst#n113
> 
> --
> balbi

Reply via email to