Hi, On Sat, Oct 28, 2017 at 8:51 AM, Stefan Wahren <[email protected]> wrote: > Hi Doug, > > [add Felipe since this should go through his tree]
Ah. Sorry Felipe! I know you've landed some dwc2 stuff in the past but for some reason get_maintainer didn't ID you so I thought maybe you weren't doing it anymore. Please let me know if you'd like me to send this to you again with collected Reviewed-by and Tested-by tags. > i'm back from vacation and happy to see this patch. > >> Douglas Anderson <[email protected]> hat am 26. Oktober 2017 um 22:22 >> geschrieben: >> >> >> On rk3288-veyron devices on Chrome OS it was found that plugging in an >> Arduino-based USB device could cause the system to lockup, especially >> if the CPU Frequency was at one of the slower operating points (like >> 100 MHz / 200 MHz). >> >> Upon tracing, I found that the following was happening: >> * The USB device (full speed) was connected to a high speed hub and >> then to the rk3288. Thus, we were dealing with split transactions, >> which is all handled in software on dwc2. >> * Userspace was initiating a BULK IN transfer >> * When we sent the SSPLIT (to start the split transaction), we got an >> ACK. Good. Then we issued the CSPLIT. >> * When we sent the CSPLIT, we got back a NAK. We immediately (from >> the interrupt handler) started to retry and sent another SSPLIT. >> * The device kept NAKing our CSPLIT, so we kept ping-ponging between >> sending a SSPLIT and a CSPLIT, each time sending from the interrupt >> handler. >> * The handling of the interrupts was (because of the low CPU speed and >> the inefficiency of the dwc2 interrupt handler) was actually taking >> _longer_ than it took the other side to send the ACK/NAK. Thus we >> were _always_ in the USB interrupt routine. >> * The fact that USB interrupts were always going off was preventing >> other things from happening in the system. This included preventing >> the system from being able to transition to a higher CPU frequency. >> >> As I understand it, there is no requirement to retry super quickly >> after a NAK, we just have to retry sometime in the future. Thus one >> solution to the above is to just add a delay between getting a NAK and >> retrying the transmission. If this delay is sufficiently long to get >> out of the interrupt routine then the rest of the system will be able >> to make forward progress. Even a 25 us delay would probably be >> enough, but we'll be extra conservative and try to delay 1 ms (the >> exact amount depends on HZ and the accuracy of the jiffy and how close >> the current jiffy is to ticking, but could be as much as 20 ms or as >> little as 1 ms). >> >> Presumably adding a delay like this could impact the USB throughput, >> so we only add the delay with repeated NAKs. >> >> NOTE: Upon further testing of a pl2303 serial adapter, I found that >> this fix may help with problems there. Specifically I found that the >> pl2303 serial adapters tend to respond with a NAK when they have >> nothing to say and thus we end with this same sequence. > > This patch fixes "dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for > 146s" on bcm2835. After applying your patch on linux-next plus "mm: drop > migrate type checks from has_unmovable_pages" i'm able to open 3 PL2303 > serial adapters parallel on a Raspberry Pi 1 B without any issues. Woohoo! It only took a year from your report of this issue to getting a fix. ;) It sounds as if Julius found something the other day too (dealing better with stalls), so any day now we'll have a mainline kernel that can keep up with "blazingly fast" UART connections. :-P > Tested-by: Stefan Wahren <[email protected]> Thank you so much for the testing! > Please see my comment below > >> >> Signed-off-by: Douglas Anderson <[email protected]> >> Reviewed-by: Julius Werner <[email protected]> >> --- >> >> Changes in v2: >> - Address http://crosreview.com/737520 feedback >> >> drivers/usb/dwc2/core.h | 1 + >> drivers/usb/dwc2/hcd.c | 7 ++++ >> drivers/usb/dwc2/hcd.h | 9 +++++ >> drivers/usb/dwc2/hcd_intr.c | 20 +++++++++++ >> drivers/usb/dwc2/hcd_queue.c | 81 >> +++++++++++++++++++++++++++++++++++++++++--- >> 5 files changed, 114 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h >> index 8367d4f985c1..c0807e558726 100644 >> --- a/drivers/usb/dwc2/core.h >> +++ b/drivers/usb/dwc2/core.h >> @@ -962,6 +962,7 @@ struct dwc2_hsotg { >> } flags; >> >> struct list_head non_periodic_sched_inactive; >> + struct list_head non_periodic_sched_waiting; >> struct list_head non_periodic_sched_active; >> struct list_head *non_periodic_qh_ptr; >> struct list_head periodic_sched_inactive; >> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c >> index c2631145f404..ed92ae78dcf9 100644 >> --- a/drivers/usb/dwc2/hcd.c >> +++ b/drivers/usb/dwc2/hcd.c >> @@ -653,6 +653,10 @@ static void dwc2_dump_channel_info(struct dwc2_hsotg >> *hsotg, >> list_for_each_entry(qh, &hsotg->non_periodic_sched_inactive, >> qh_list_entry) >> dev_dbg(hsotg->dev, " %p\n", qh); >> + dev_dbg(hsotg->dev, " NP waiting sched:\n"); >> + list_for_each_entry(qh, &hsotg->non_periodic_sched_waiting, >> + qh_list_entry) >> + dev_dbg(hsotg->dev, " %p\n", qh); >> dev_dbg(hsotg->dev, " NP active sched:\n"); >> list_for_each_entry(qh, &hsotg->non_periodic_sched_active, >> qh_list_entry) >> @@ -1812,6 +1816,7 @@ static void dwc2_qh_list_free(struct dwc2_hsotg *hsotg, >> static void dwc2_kill_all_urbs(struct dwc2_hsotg *hsotg) >> { >> dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_inactive); >> + dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_waiting); >> dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_active); >> dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->periodic_sched_inactive); >> dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->periodic_sched_ready); >> @@ -4989,6 +4994,7 @@ static void dwc2_hcd_free(struct dwc2_hsotg *hsotg) >> >> /* Free memory for QH/QTD lists */ >> dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_inactive); >> + dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_waiting); >> dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_active); >> dwc2_qh_list_free(hsotg, &hsotg->periodic_sched_inactive); >> dwc2_qh_list_free(hsotg, &hsotg->periodic_sched_ready); >> @@ -5151,6 +5157,7 @@ int dwc2_hcd_init(struct dwc2_hsotg *hsotg) >> >> /* Initialize the non-periodic schedule */ >> INIT_LIST_HEAD(&hsotg->non_periodic_sched_inactive); >> + INIT_LIST_HEAD(&hsotg->non_periodic_sched_waiting); >> INIT_LIST_HEAD(&hsotg->non_periodic_sched_active); >> >> /* Initialize the periodic schedule */ >> diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h >> index 11c3c145b793..af4f4360e8b7 100644 >> --- a/drivers/usb/dwc2/hcd.h >> +++ b/drivers/usb/dwc2/hcd.h >> @@ -313,12 +313,16 @@ struct dwc2_hs_transfer_time { >> * descriptor and indicates original XferSize value >> for the >> * descriptor >> * @unreserve_timer: Timer for releasing periodic reservation. >> + * @wait_timer: Timer used to wait before re-queuing. >> * @dwc2_tt: Pointer to our tt info (or NULL if no tt). >> * @ttport: Port number within our tt. >> * @tt_buffer_dirty True if clear_tt_buffer_complete is pending >> * @unreserve_pending: True if we planned to unreserve but haven't yet. >> * @schedule_low_speed: True if we have a low/full speed component (either >> the >> * host is in low/full speed mode or do_split). >> + * @want_wait: We should wait before re-queuing; only matters for >> non- >> + * periodic transfers and is ignored for periodic ones. >> + * @wait_timer_cancel: Set to true to cancel the wait_timer. >> * >> * A Queue Head (QH) holds the static characteristics of an endpoint and >> * maintains a list of transfers (QTDs) for that endpoint. A QH structure >> may >> @@ -353,11 +357,14 @@ struct dwc2_qh { >> u32 desc_list_sz; >> u32 *n_bytes; >> struct timer_list unreserve_timer; >> + struct timer_list wait_timer; >> struct dwc2_tt *dwc_tt; >> int ttport; >> unsigned tt_buffer_dirty:1; >> unsigned unreserve_pending:1; >> unsigned schedule_low_speed:1; >> + unsigned want_wait:1; >> + unsigned wait_timer_cancel:1; >> }; >> >> /** >> @@ -388,6 +395,7 @@ struct dwc2_qh { >> * @n_desc: Number of DMA descriptors for this QTD >> * @isoc_frame_index_last: Last activated frame (packet) index, used in >> * descriptor DMA mode only >> + * @num_naks: Number of NAKs received on this QTD. >> * @urb: URB for this transfer >> * @qh: Queue head for this QTD >> * @qtd_list_entry: For linking to the QH's list of QTDs >> @@ -418,6 +426,7 @@ struct dwc2_qtd { >> u8 error_count; >> u8 n_desc; >> u16 isoc_frame_index_last; >> + u16 num_naks; >> struct dwc2_hcd_urb *urb; >> struct dwc2_qh *qh; >> struct list_head qtd_list_entry; >> diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c >> index 28a8210710b1..aad3abd2ddf2 100644 >> --- a/drivers/usb/dwc2/hcd_intr.c >> +++ b/drivers/usb/dwc2/hcd_intr.c >> @@ -52,6 +52,12 @@ >> #include "core.h" >> #include "hcd.h" >> >> +/* >> + * If we get this many NAKs on a split transaction we'll slow down >> + * retransmission. A 1 here means delay after the first NAK. >> + */ >> +#define DWC2_NAKS_BEFORE_DELAY 3 >> + >> /* This function is for debug only */ >> static void dwc2_track_missed_sofs(struct dwc2_hsotg *hsotg) >> { >> @@ -1200,11 +1206,25 @@ static void dwc2_hc_nak_intr(struct dwc2_hsotg >> *hsotg, >> /* >> * Handle NAK for IN/OUT SSPLIT/CSPLIT transfers, bulk, control, and >> * interrupt. Re-start the SSPLIT transfer. >> + * >> + * Normally for non-periodic transfers we'll retry right away, but to >> + * avoid interrupt storms we'll wait before retrying if we've got >> + * several NAKs. If we didn't do this we'd retry directly from the >> + * interrupt handler and could end up quickly getting another >> + * interrupt (another NAK), which we'd retry. >> + * >> + * Note that in DMA mode software only gets involved to re-send NAKed >> + * transfers for split transactions, so we only need to apply this >> + * delaying logic when handling splits. In non-DMA mode presumably we >> + * might want a similar delay if someone can demonstrate this problem >> + * affects that code path too. >> */ >> if (chan->do_split) { >> if (chan->complete_split) >> qtd->error_count = 0; >> qtd->complete_split = 0; >> + qtd->num_naks++; >> + qtd->qh->want_wait = qtd->num_naks >= DWC2_NAKS_BEFORE_DELAY; >> dwc2_halt_channel(hsotg, chan, qtd, DWC2_HC_XFER_NAK); >> goto handle_nak_done; >> } >> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c >> index 3ae8b1bbaa55..bea0aadd756e 100644 >> --- a/drivers/usb/dwc2/hcd_queue.c >> +++ b/drivers/usb/dwc2/hcd_queue.c >> @@ -57,6 +57,9 @@ >> /* Wait this long before releasing periodic reservation */ >> #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5)) >> >> +/* If we get a NAK, wait this long before retrying */ >> +#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1)) >> + >> /** >> * dwc2_periodic_channel_available() - Checks that a channel is available >> for a >> * periodic transfer >> @@ -1439,6 +1442,55 @@ static void dwc2_deschedule_periodic(struct >> dwc2_hsotg *hsotg, >> list_del_init(&qh->qh_list_entry); >> } >> >> +/** >> + * dwc2_wait_timer_fn() - Timer function to re-queue after waiting >> + * >> + * As per the spec, a NAK indicates that "a function is temporarily unable >> to >> + * transmit or receive data, but will eventually be able to do so without >> need >> + * of host intervention". >> + * >> + * That means that when we encounter a NAK we're supposed to retry. >> + * >> + * ...but if we retry right away (from the interrupt handler that saw the >> NAK) >> + * then we can end up with an interrupt storm (if the other side keeps >> NAKing >> + * us) because on slow enough CPUs it could take us longer to get out of the >> + * interrupt routine than it takes for the device to send another NAK. That >> + * leads to a constant stream of NAK interrupts and the CPU locks. >> + * >> + * ...so instead of retrying right away in the case of a NAK we'll set a >> timer >> + * to retry some time later. This function handles that timer and moves the >> + * qh back to the "inactive" list, then queues transactions. >> + * >> + * @data: Pointer to a qh to re-schedule. >> + */ >> +static void dwc2_wait_timer_fn(unsigned long data) >> +{ >> + struct dwc2_qh *qh = (struct dwc2_qh *)data; >> + struct dwc2_hsotg *hsotg = qh->hsotg; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&hsotg->lock, flags); >> + >> + /* >> + * We'll set wait_timer_cancel to true if we want to cancel this >> + * operation in dwc2_hcd_qh_unlink(). >> + */ >> + if (!qh->wait_timer_cancel) { >> + enum dwc2_transaction_type tr_type; >> + >> + qh->want_wait = false; >> + >> + list_move(&qh->qh_list_entry, >> + &hsotg->non_periodic_sched_inactive); >> + >> + tr_type = dwc2_hcd_select_transactions(hsotg); >> + if (tr_type != DWC2_TRANSACTION_NONE) >> + dwc2_hcd_queue_transactions(hsotg, tr_type); >> + } >> + >> + spin_unlock_irqrestore(&hsotg->lock, flags); >> +} >> + >> /** >> * dwc2_qh_init() - Initializes a QH structure >> * >> @@ -1468,6 +1520,7 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, >> struct dwc2_qh *qh, >> qh->hsotg = hsotg; >> setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn, >> (unsigned long)qh); >> + setup_timer(&qh->wait_timer, dwc2_wait_timer_fn, (unsigned long)qh); > > Kees Cook started to replace setup_timer() with the new timer_setup(). I'm > not sure if it's okay to use the deprecated function and replace all of them > later. I'm happy to do it either way. Personally I'd rather see my patch land as-is and then have a followup patch that converts to timer_setup(). That makes it easier for people like me who are always backporting to older kernels. I'm also happy to either post that patch myself or have someone else do it. -Doug

