Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
On 05/16/2018, 03:36 PM, Jiri Slaby wrote: > So I assume it must be a problem of making usb->disconnect without prior > ndo->close (or alike). So according to my debug messages, I think this should workaround the problem: --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3962,7 +3962,8 @@ static int rtl8152_close(struct net_device *netdev) #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule); napi is deleted in usb->disconnect, then unregister_netdev is called which invokes netdev->ndo_stop, i.e. rtl8152_close above. And rtl8152_close tries to napi_disable, but that is already deleted. The patch does not solve the race between disconnect and ndo_stop AFAICS. It needs locking, IMO. I am not familiar enough with the code, but it looks like ->disconnect can happen any time while ->stop is in progress. thanks, -- js suse labs
Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
On 05/16/2018, 03:29 PM, Jiri Slaby wrote: > On 05/16/2018, 02:07 PM, Hayes Wang wrote: >> Oliver Neukum [mailto:oneu...@suse.com] >>> Sent: Wednesday, May 16, 2018 6:10 PM >> [...] Besides, I find a similar issue as following. https://www.spinics.net/lists/netdev/msg493512.html >>> >>> Well, if we have an imbalance in NAPI it should strike whereever >>> it is used, not just in suspend(). Is there debugging for NAPI >>> we could activate? >> >> No. The driver doesn't embed such debugging about it. > > Despite of that, Oliver, I have a kernel with a debug patch (attached) > at (suse-only link): > https://build.suse.de/project/show/home:jirislaby:stable-drm > >> And I don't find an imbalance between napi_disable() and napi_enable(). > > There is none, apparently (the warns never triggered). BUt still the > driver sucks wrt both power mgmt and dock plug/unplug. Since I am using > the patch (it upper-bounds the napi_disable loop count) and the udev > rule below, I can really use the nic. BTW the added warning to napi_disable indeed triggers: > xzgrep -a -B 2 kernel:.*WARNING.*napi messages-20180* > messages-20180503.xz:2018-04-27T09:57:00.048922+02:00 anemoi2 kernel: > [158616.363052] [ cut here ] > messages-20180503.xz:2018-04-27T09:57:00.048979+02:00 anemoi2 kernel: > [158616.363070] NAPI_STATE_SCHED never cleared > messages-20180503.xz:2018-04-27T09:57:00.048988+02:00 anemoi2 kernel: > [158616.363120] WARNING: CPU: 1 PID: 14365 at ../net/core/dev.c:5665 > napi_disable+0x3d/0x80 And since I do 'ip l set dev ethX down' before unplugging the dock with the NIC, I have not seen a single occurrence. So I assume it must be a problem of making usb->disconnect without prior ndo->close (or alike). thanks, -- js suse labs
Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
On 05/16/2018, 02:07 PM, Hayes Wang wrote: > Oliver Neukum [mailto:oneu...@suse.com] >> Sent: Wednesday, May 16, 2018 6:10 PM > [...] >>> Besides, I find a similar issue as following. >>> https://www.spinics.net/lists/netdev/msg493512.html >> >> Well, if we have an imbalance in NAPI it should strike whereever >> it is used, not just in suspend(). Is there debugging for NAPI >> we could activate? > > No. The driver doesn't embed such debugging about it. Despite of that, Oliver, I have a kernel with a debug patch (attached) at (suse-only link): https://build.suse.de/project/show/home:jirislaby:stable-drm > And I don't find an imbalance between napi_disable() and napi_enable(). There is none, apparently (the warns never triggered). BUt still the driver sucks wrt both power mgmt and dock plug/unplug. Since I am using the patch (it upper-bounds the napi_disable loop count) and the udev rule below, I can really use the nic. $ cat /etc/udev/rules.d/10-disable-r8152-pm.rules ACTION=="add", SUBSYSTEM=="usb", ATTR{idProduct}=="8153", ATTR{idVendor}=="0bda", TEST=="power/control", ATTR{power/control}="on" thanks, -- js suse labs --- drivers/net/usb/r8152.c | 62 +++- net/core/dev.c | 14 +- 2 files changed, 53 insertions(+), 23 deletions(-) --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -704,6 +704,8 @@ struct r8152 { unsigned long flags; struct usb_device *udev; struct napi_struct napi; + int napi_stat; + void *napi_last_en, *napi_last_dis; struct usb_interface *intf; struct net_device *netdev; struct urb *intr_urb; @@ -775,6 +777,31 @@ static unsigned int agg_buf_sz = 16384; #define RTL_LIMITED_TSO_SIZE (agg_buf_sz - sizeof(struct tx_desc) - \ VLAN_ETH_HLEN - ETH_FCS_LEN) +static void my_napi_enable(struct r8152 *tp) +{ + if (tp->napi_stat == 0) { + napi_enable(&tp->napi); + tp->napi_stat++; + tp->napi_last_en = (void *)_RET_IP_; + return; + } + + WARN(1, "napi_stat=%d\n", tp->napi_stat); +} + +static void my_napi_disable(struct r8152 *tp) +{ + if (tp->napi_stat == 1) { + napi_disable(&tp->napi); + tp->napi_stat--; + tp->napi_last_dis = (void *)_RET_IP_; + return; + } + + WARN(1, "napi_stat=%d last_dis=%pF last_en=%pF\n", + tp->napi_stat, tp->napi_last_en, tp->napi_last_dis); +} + static int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data) { @@ -3787,7 +3814,6 @@ static bool rtl8153_in_nway(struct r8152 static void set_carrier(struct r8152 *tp) { struct net_device *netdev = tp->netdev; - struct napi_struct *napi = &tp->napi; u8 speed; speed = rtl8152_get_speed(tp); @@ -3796,12 +3822,12 @@ static void set_carrier(struct r8152 *tp if (!netif_carrier_ok(netdev)) { tp->rtl_ops.enable(tp); netif_stop_queue(netdev); - napi_disable(napi); + my_napi_disable(tp); netif_carrier_on(netdev); rtl_start_rx(tp); clear_bit(RTL8152_SET_RX_MODE, &tp->flags); _rtl8152_set_rx_mode(netdev); - napi_enable(&tp->napi); + my_napi_enable(tp); netif_wake_queue(netdev); netif_info(tp, link, netdev, "carrier on\n"); } else if (netif_queue_stopped(netdev) && @@ -3811,9 +3837,9 @@ static void set_carrier(struct r8152 *tp } else { if (netif_carrier_ok(netdev)) { netif_carrier_off(netdev); - napi_disable(napi); + my_napi_disable(tp); tp->rtl_ops.disable(tp); - napi_enable(napi); + my_napi_enable(tp); netif_info(tp, link, netdev, "carrier off\n"); } } @@ -3934,7 +3960,7 @@ static int rtl8152_open(struct net_devic res); goto out_unlock; } - napi_enable(&tp->napi); + my_napi_enable(tp); mutex_unlock(&tp->control); @@ -3962,7 +3988,7 @@ static int rtl8152_close(struct net_devi #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + my_napi_disable(tp); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule); @@ -4230,7 +4256,7 @@ static int rtl8152_pre_reset(struct usb_ return 0; netif_stop_queue(netdev); - napi_disable(&tp->napi); + my_napi_disable(tp); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule); @@ -4264,7 +4290,7 @@ static int rtl8152_post_reset(struct usb mutex_unlock(&tp->control); } - napi_enable(&tp->napi); + my_napi_enable(tp); netif_wake_queue(netdev); usb_submit_urb(tp->intr_urb, GFP_KERNEL); @@ -4302,10 +4328,8 @@ static int rtl8152_runtime_resume(struct struct net_device *netdev = tp->netdev; if (netif_running(netdev) && netdev->flags & IFF_UP) { - struct napi_struct *napi = &tp->napi; - tp->rtl_ops.autosuspend_en(tp, false); - napi_disable(napi); + my_napi_disable(tp); set_bit(WORK_ENABLE, &tp->flags); if (netif_carrier_ok(netdev)) { @@ -4318,7 +4342,7 @@ static int rtl8152_runtime_resume(struct } } - napi_enable(napi); + my_napi_enable(tp); clear_bit(SELECTIVE_SUSPEND
RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Oliver Neukum [mailto:oneu...@suse.com] > Sent: Wednesday, May 16, 2018 6:10 PM [...] > > Besides, I find a similar issue as following. > > https://www.spinics.net/lists/netdev/msg493512.html > > Well, if we have an imbalance in NAPI it should strike whereever > it is used, not just in suspend(). Is there debugging for NAPI > we could activate? No. The driver doesn't embed such debugging about it. And I don't find an imbalance between napi_disable() and napi_enable(). Best Regards, Hayes
Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Am Mittwoch, den 16.05.2018, 10:00 + schrieb Hayes Wang: > Oliver Neukum [mailto:oneu...@suse.com] > > Sent: Wednesday, May 16, 2018 4:27 PM > > [...] > > > > > > Would usb_autopm_get_interface() take a long time? > > > The driver would wake the device if it has suspended. > > > I have no idea about how usb_autopm_get_interface() works, so I don't know > > > > how to help. > > > > Hi, > > > > it basically calls r8152_resume() and makes a control request to the > > hub. I think we are spinning in rtl8152_runtime_resume(), but where? > > It has a lot of NAPI stuff. Any suggestions on how to instrument or > > trace this? > > Is rtl8152_runtime_resume() called? I don't see the name in the trace. Good question. I see nothing else that could produce a live lock. > > I guess the relative API in rtl8152_runtime_resume() are > ops->disable= rtl8153_disable; > ops->autosuspend_en = rtl8153_runtime_enable; > > And I don't find any possible dead lock in rtl8152_runtime_resume(). > > Besides, I find a similar issue as following. > https://www.spinics.net/lists/netdev/msg493512.html Well, if we have an imbalance in NAPI it should strike whereever it is used, not just in suspend(). Is there debugging for NAPI we could activate? Regards Oliver
RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Oliver Neukum [mailto:oneu...@suse.com] > Sent: Wednesday, May 16, 2018 4:27 PM [...] > > > > Would usb_autopm_get_interface() take a long time? > > The driver would wake the device if it has suspended. > > I have no idea about how usb_autopm_get_interface() works, so I don't know > how to help. > > Hi, > > it basically calls r8152_resume() and makes a control request to the > hub. I think we are spinning in rtl8152_runtime_resume(), but where? > It has a lot of NAPI stuff. Any suggestions on how to instrument or > trace this? Is rtl8152_runtime_resume() called? I don't see the name in the trace. I guess the relative API in rtl8152_runtime_resume() are ops->disable= rtl8153_disable; ops->autosuspend_en = rtl8153_runtime_enable; And I don't find any possible dead lock in rtl8152_runtime_resume(). Besides, I find a similar issue as following. https://www.spinics.net/lists/netdev/msg493512.html Best Regards, Hayes
Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Am Mittwoch, den 16.05.2018, 03:37 + schrieb Hayes Wang: > Oliver Neukum [mailto:oneu...@suse.com] > > > > Hi, > > > > I got reports about hangs with this trace: > > > > May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more > > than 60 seconds. > > May 13 01:36:55 neroon kernel: Tainted: G U > > 4.17.0-rc4-1.g8257a00-vanilla #1 > > May 13 01:36:55 neroon kernel: "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > May 13 01:36:55 neroon kernel: kworker/0:0 D0 4 2 > > 0x8000 > > May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152] > > May 13 01:36:55 neroon kernel: Call Trace: > > May 13 01:36:55 neroon kernel: ? __schedule+0x289/0x880 > > May 13 01:36:55 neroon kernel: schedule+0x2f/0x90 > > May 13 01:36:55 neroon kernel: rpm_resume+0xf9/0x7a0 > > May 13 01:36:55 neroon kernel: ? wait_woken+0x80/0x80 > > May 13 01:36:55 neroon kernel: rpm_resume+0x547/0x7a0 > > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 > > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 > > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > > May 13 01:36:55 neroon kernel: __pm_runtime_resume+0x3a/0x50 > > May 13 01:36:55 neroon kernel: usb_autopm_get_interface+0x1d/0x50 [usbcore] > > Would usb_autopm_get_interface() take a long time? > The driver would wake the device if it has suspended. > I have no idea about how usb_autopm_get_interface() works, so I don't know > how to help. Hi, it basically calls r8152_resume() and makes a control request to the hub. I think we are spinning in rtl8152_runtime_resume(), but where? It has a lot of NAPI stuff. Any suggestions on how to instrument or trace this? Regards Oliver
RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Oliver Neukum [mailto:oneu...@suse.com] > > Hi, > > I got reports about hangs with this trace: > > May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more > than 60 seconds. > May 13 01:36:55 neroon kernel: Tainted: G U > 4.17.0-rc4-1.g8257a00-vanilla #1 > May 13 01:36:55 neroon kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > May 13 01:36:55 neroon kernel: kworker/0:0 D0 4 2 > 0x8000 > May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152] > May 13 01:36:55 neroon kernel: Call Trace: > May 13 01:36:55 neroon kernel: ? __schedule+0x289/0x880 > May 13 01:36:55 neroon kernel: schedule+0x2f/0x90 > May 13 01:36:55 neroon kernel: rpm_resume+0xf9/0x7a0 > May 13 01:36:55 neroon kernel: ? wait_woken+0x80/0x80 > May 13 01:36:55 neroon kernel: rpm_resume+0x547/0x7a0 > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 > May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 > May 13 01:36:55 neroon kernel: __pm_runtime_resume+0x3a/0x50 > May 13 01:36:55 neroon kernel: usb_autopm_get_interface+0x1d/0x50 [usbcore] Would usb_autopm_get_interface() take a long time? The driver would wake the device if it has suspended. I have no idea about how usb_autopm_get_interface() works, so I don't know how to help. > May 13 01:36:55 neroon kernel: rtl_work_func_t+0x3c/0x27c [r8152] > May 13 01:36:55 neroon kernel: process_one_work+0x1d4/0x3f0 > May 13 01:36:55 neroon kernel: worker_thread+0x2b/0x3d0 > May 13 01:36:55 neroon kernel: ? process_one_work+0x3f0/0x3f0 > May 13 01:36:55 neroon kernel: kthread+0x113/0x130 > May 13 01:36:55 neroon kernel: ? > kthread_create_worker_on_cpu+0x50/0x50 > May 13 01:36:55 neroon kernel: ret_from_fork+0x3a/0x50 Best Regards, Hayes
Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
Hi, I got reports about hangs with this trace: May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more than 60 seconds. May 13 01:36:55 neroon kernel: Tainted: G U 4.17.0-rc4-1.g8257a00-vanilla #1 May 13 01:36:55 neroon kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 13 01:36:55 neroon kernel: kworker/0:0 D0 4 2 0x8000 May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152] May 13 01:36:55 neroon kernel: Call Trace: May 13 01:36:55 neroon kernel: ? __schedule+0x289/0x880 May 13 01:36:55 neroon kernel: schedule+0x2f/0x90 May 13 01:36:55 neroon kernel: rpm_resume+0xf9/0x7a0 May 13 01:36:55 neroon kernel: ? wait_woken+0x80/0x80 May 13 01:36:55 neroon kernel: rpm_resume+0x547/0x7a0 May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x34/0x70 May 13 01:36:55 neroon kernel: ? __switch_to_asm+0x40/0x70 May 13 01:36:55 neroon kernel: __pm_runtime_resume+0x3a/0x50 May 13 01:36:55 neroon kernel: usb_autopm_get_interface+0x1d/0x50 [usbcore] May 13 01:36:55 neroon kernel: rtl_work_func_t+0x3c/0x27c [r8152] May 13 01:36:55 neroon kernel: process_one_work+0x1d4/0x3f0 May 13 01:36:55 neroon kernel: worker_thread+0x2b/0x3d0 May 13 01:36:55 neroon kernel: ? process_one_work+0x3f0/0x3f0 May 13 01:36:55 neroon kernel: kthread+0x113/0x130 May 13 01:36:55 neroon kernel: ? kthread_create_worker_on_cpu+0x50/0x50 May 13 01:36:55 neroon kernel: ret_from_fork+0x3a/0x50 Any idea? Regards Oliver