On Fri, Oct 29, 2021 at 4:27 PM Eugene Bordenkircher
<eugene_bordenkirc...@selinc.com> wrote:
>
> Typing Greg's email correct this time.  My apologies.
>
> Eugene
>
> -----Original Message-----
> From: Eugene Bordenkircher
> Sent: Friday, October 29, 2021 10:14 AM
> To: linux-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Cc: leoyang...@nxp.com; ba...@kernel.org; gre...@linuxfoundataion.org
> Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to 
> unrecoverable loop.
>
> Hello all,
>
> We've discovered a situation where the FSL udc driver 
> (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the 
> request queue, but the queue has been corrupted at some point so it loops 
> infinitely.  I believe we have narrowed into the offending code, but we are 
> in need of assistance trying to find an appropriate fix for the problem.  The 
> identified code appears to be in all versions of the Linux kernel the driver 
> exists in.
>
> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The 
> driver gets this request and then calls the ch9getstatus() function.  In this 
> function, it starts a request by "borrowing" the per device status_req, 
> filling it in, and then queuing it with a call to list_add_tail() to add the 
> request to the endpoint queue.  Right before it exits the function however, 
> it's calling ep0_prime_status(), which is filling out that same status_req 
> structure and then queuing it with another call to list_add_tail() to add the 
> request to the endpoint queue.  This adds two instances of the exact same 
> LIST_HEAD to the endpoint queue, which breaks the list since the prev and 
> next pointers end up pointing to the wrong things.  This ends up causing a 
> hard loop the next time nuke() gets called, which happens on the next setup 
> IRQ.
>

I agree with you that this looks problematic.  This is probably
introduced by f79a60b8785 "usb: fsl_udc_core: prime status stage once
data stage has primed" that it didn't consider that the status_req has
been re-used for the DATA phase.

I think the proper fix should be having a separate request allocated
for the data phase after the above change.

> I'm not sure what the appropriate fix to this problem is, mostly due to my 
> lack of expertise in USB and this driver stack.  The code has been this way 
> in the kernel for a very long time, which suggests that it has been working, 
> unless USB_REQ_GET_STATUS requests are never made.  This further suggests 
> that there is something else going on that I don't understand.  Deleting the 
> call to ep0_prime_status() and the following ep0stall() call appears, on the 
> surface, to get the device working again, but may have side effects that I'm 
> not seeing.
>
> I'm hopeful someone in the community can help provide some information on 
> what I may be missing or help come up with a solution to the problem.  A big 
> thank you to anyone who would like to help out.
>
> Eugene

Reply via email to