Re: [Bug 215129] New: Linux kernel hangs during power down
On 25.11.2021 01:46, Jakub Kicinski wrote: > Adding Kalle and Hainer. > > On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote: >> Begin forwarded message: >> >> Date: Wed, 24 Nov 2021 21:14:53 + >> From: bugzilla-dae...@bugzilla.kernel.org >> To: step...@networkplumber.org >> Subject: [Bug 215129] New: Linux kernel hangs during power down >> >> >> https://bugzilla.kernel.org/show_bug.cgi?id=215129 >> >> Bug ID: 215129 >>Summary: Linux kernel hangs during power down >>Product: Networking >>Version: 2.5 >> Kernel Version: 5.15 >> Hardware: All >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> Assignee: step...@networkplumber.org >> Reporter: martin.sto...@gmail.com >> Regression: No >> >> Created attachment 299703 >> --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit >> Kernel log after timeout occured >> >> On my system the kernel is waiting for a task during shutdown which doesn't >> complete. >> >> The commit which causes this behavior is: >> [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev >> parent before ethtool ioctl ops >> >> This bug causes also that the system gets unresponsive after starting Steam: >> https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/ >> > I think the reference to ath10k_pci is misleading, Kalle isn't needed here. The actual issue is a RTNL deadlock in igb_resume(). See log snippet: Nov 24 18:56:19 MartinsPc kernel: igb_resume+0xff/0x1e0 [igb 21bf6a00cb1f20e9b0e8434f7f8748a0504e93f8] Nov 24 18:56:19 MartinsPc kernel: pci_pm_runtime_resume+0xa7/0xd0 Nov 24 18:56:19 MartinsPc kernel: ? pci_pm_freeze_noirq+0x110/0x110 Nov 24 18:56:19 MartinsPc kernel: __rpm_callback+0x41/0x120 Nov 24 18:56:19 MartinsPc kernel: ? pci_pm_freeze_noirq+0x110/0x110 Nov 24 18:56:19 MartinsPc kernel: rpm_callback+0x35/0x70 Nov 24 18:56:19 MartinsPc kernel: rpm_resume+0x567/0x810 Nov 24 18:56:19 MartinsPc kernel: __pm_runtime_resume+0x4a/0x80 Nov 24 18:56:19 MartinsPc kernel: dev_ethtool+0xd4/0x2d80 We have at least two places in net core where runtime_resume() is called under RTNL. This conflicts with the current structure in few Intel drivers that have something like the following in their resume path. rtnl_lock(); if (!err && netif_running(netdev)) err = __igb_open(netdev, true); if (!err) netif_device_attach(netdev); rtnl_unlock(); Other drivers don't do this, so it's the question whether it's actually needed here to take RTNL. Some discussion was started [0], but it ended w/o tangible result and since then it has been surprisingly quiet. [0] https://www.spinics.net/lists/netdev/msg736880.html ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [Bug 215129] New: Linux kernel hangs during power down
Adding Kalle and Hainer. On Wed, 24 Nov 2021 14:45:05 -0800 Stephen Hemminger wrote: > Begin forwarded message: > > Date: Wed, 24 Nov 2021 21:14:53 + > From: bugzilla-dae...@bugzilla.kernel.org > To: step...@networkplumber.org > Subject: [Bug 215129] New: Linux kernel hangs during power down > > > https://bugzilla.kernel.org/show_bug.cgi?id=215129 > > Bug ID: 215129 >Summary: Linux kernel hangs during power down >Product: Networking >Version: 2.5 > Kernel Version: 5.15 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: step...@networkplumber.org > Reporter: martin.sto...@gmail.com > Regression: No > > Created attachment 299703 > --> https://bugzilla.kernel.org/attachment.cgi?id=299703&action=edit > Kernel log after timeout occured > > On my system the kernel is waiting for a task during shutdown which doesn't > complete. > > The commit which causes this behavior is: > [f32a213765739f2a1db319346799f130a3d08820] ethtool: runtime-resume netdev > parent before ethtool ioctl ops > > This bug causes also that the system gets unresponsive after starting Steam: > https://steamcommunity.com/app/221410/discussions/2/3194736442566303600/ > ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
[PATCH v3] ath10k: Fix the MTU size on QCA9377 SDIO
On an imx6dl-pico-pi board with a QCA9377 SDIO chip, simply trying to connect via ssh to another machine causes: [ 55.824159] ath10k_sdio mmc1:0001:1: failed to transmit packet, dropping: -12 [ 55.832169] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 [ 55.838529] ath10k_sdio mmc1:0001:1: failed to push frame: -12 [ 55.905863] ath10k_sdio mmc1:0001:1: failed to transmit packet, dropping: -12 [ 55.913650] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 [ 55.919887] ath10k_sdio mmc1:0001:1: failed to push frame: -12 , leading to an ssh connection failure. One user inspected the size of frames on Wireshark and reported the followig: "I was able to narrow the issue down to the mtu. If I set the mtu for the wlan0 device to 1486 instead of 1500, the issue does not happen. The size of frames that I see on Wireshark is exactly 1500 after setting it to 1486." Clearing the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE avoids the problem and the ssh command works successfully after that. Introduce a 'credit_size_workaround' field to ath10k_hw_params for the QCA9377 SDIO, so that the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE is not set in this case. Tested with QCA9377 SDIO with firmware WLAN.TF.1.1.1-00061-QCATFSWPZ-1. Fixes: 2f918ea98606 ("ath10k: enable alt data of TX path for sdio") Signed-off-by: Fabio Estevam --- Changes since v2: - Set the credit_size_workaround field as true for QCA9377 SDIO. drivers/net/wireless/ath/ath10k/core.c | 4 +++- drivers/net/wireless/ath/ath10k/hw.h | 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 72a366aa9f60..8a325ae97b0e 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -571,6 +571,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = { .ast_skid_limit = 0x10, .num_wds_entries = 0x20, .uart_pin_workaround = true, + .credit_size_workaround = true, .dynamic_sar_support = false, }, { @@ -715,6 +716,7 @@ static void ath10k_send_suspend_complete(struct ath10k *ar) static int ath10k_init_sdio(struct ath10k *ar, enum ath10k_firmware_mode mode) { + bool mtu_workaround = ar->hw_params.credit_size_workaround; int ret; u32 param = 0; @@ -732,7 +734,7 @@ static int ath10k_init_sdio(struct ath10k *ar, enum ath10k_firmware_mode mode) param |= HI_ACS_FLAGS_SDIO_REDUCE_TX_COMPL_SET; - if (mode == ATH10K_FIRMWARE_MODE_NORMAL) + if (mode == ATH10K_FIRMWARE_MODE_NORMAL && !mtu_workaround) param |= HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE; else param &= ~HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE; diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index 6b03c7787e36..591ef7416b61 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -618,6 +618,9 @@ struct ath10k_hw_params { */ bool uart_pin_workaround; + /* Workaround for the credit size calculation */ + bool credit_size_workaround; + /* tx stats support over pktlog */ bool tx_stats_over_pktlog; -- 2.25.1 ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Clean the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE flag
Hi Kalle and Wen, On 24/11/2021 05:05, Kalle Valo wrote: Thanks, I was worried it's something like this. One way to solve this would be to add a new field to ath10k_hw_params so that the workaround is done only on QCA9377 SDIO. Thanks for the feedback, appreciate it. I have done as suggested in v2. Thanks a lot, Fabio Estevam -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-60 Fax: (+49)-8142-66989-80 Email: feste...@denx.de ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
[PATCH v2] ath10k: Fix the MTU size on QCA9377 SDIO
On an imx6dl-pico-pi board with a QCA9377 SDIO chip, simply trying to connect via ssh to another machine causes: [ 55.824159] ath10k_sdio mmc1:0001:1: failed to transmit packet, dropping: -12 [ 55.832169] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 [ 55.838529] ath10k_sdio mmc1:0001:1: failed to push frame: -12 [ 55.905863] ath10k_sdio mmc1:0001:1: failed to transmit packet, dropping: -12 [ 55.913650] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 [ 55.919887] ath10k_sdio mmc1:0001:1: failed to push frame: -12 , leading to an ssh connection failure. One user inspected the size of frames on Wireshark and reported the followig: "I was able to narrow the issue down to the mtu. If I set the mtu for the wlan0 device to 1486 instead of 1500, the issue does not happen. The size of frames that I see on Wireshark is exactly 1500 after setting it to 1486." Clearing the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE avoids the problem and the ssh command works successfully after that. Introduce a 'credit_size_workaround' field to ath10k_hw_params for the QCA9377 SDIO, so that the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE is not set in this case. Tested with QCA9377 SDIO with firmware WLAN.TF.1.1.1-00061-QCATFSWPZ-1. Fixes: 2f918ea98606 ("ath10k: enable alt data of TX path for sdio") Signed-off-by: Fabio Estevam --- Changes since v1: - Restrict the workaround only for QCA9377 SDIO drivers/net/wireless/ath/ath10k/core.c | 3 ++- drivers/net/wireless/ath/ath10k/hw.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index 72a366aa9f60..5a936e643d7a 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -715,6 +715,7 @@ static void ath10k_send_suspend_complete(struct ath10k *ar) static int ath10k_init_sdio(struct ath10k *ar, enum ath10k_firmware_mode mode) { + bool mtu_workaround = ar->hw_params.credit_size_workaround; int ret; u32 param = 0; @@ -732,7 +733,7 @@ static int ath10k_init_sdio(struct ath10k *ar, enum ath10k_firmware_mode mode) param |= HI_ACS_FLAGS_SDIO_REDUCE_TX_COMPL_SET; - if (mode == ATH10K_FIRMWARE_MODE_NORMAL) + if (mode == ATH10K_FIRMWARE_MODE_NORMAL && !mtu_workaround) param |= HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE; else param &= ~HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE; diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index 6b03c7787e36..591ef7416b61 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -618,6 +618,9 @@ struct ath10k_hw_params { */ bool uart_pin_workaround; + /* Workaround for the credit size calculation */ + bool credit_size_workaround; + /* tx stats support over pktlog */ bool tx_stats_over_pktlog; -- 2.25.1 ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Clean the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE flag
Wen Gong writes: > On 11/24/2021 3:46 PM, Kalle Valo wrote: >> Fabio Estevam writes: >> >>> Hi Kalle, >>> >>> On Mon, Nov 15, 2021 at 3:06 PM Fabio Estevam wrote: Hi Kalle, On Wed, Sep 15, 2021 at 1:05 PM Fabio Estevam wrote: > On an imx6dl-pico-pi board with a QCA9377 SDIO chip, the following > errors are observed when the board works in STA mode: > > Simply running "ssh user@192.168.0.1" causes: > > [ 55.824159] ath10k_sdio mmc1:0001:1: failed to transmit packet, > dropping: -12 > [ 55.832169] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 > [ 55.838529] ath10k_sdio mmc1:0001:1: failed to push frame: -12 > [ 55.905863] ath10k_sdio mmc1:0001:1: failed to transmit packet, > dropping: -12 > [ 55.913650] ath10k_sdio mmc1:0001:1: failed to submit frame: -12 > [ 55.919887] ath10k_sdio mmc1:0001:1: failed to push frame: -12 > > and it is not possible to connect via ssh to the other machine. > > One user inspected the size of frames on Wireshark and reported > the followig: > > "I was able to narrow the issue down to the mtu. If I set the mtu for > the wlan0 device to 1486 instead of 1500, the issue does not happen. > > The size of frames that I see on Wireshark is exactly 1500 after > setting it to 1486." > > Clearing the HI_ACS_FLAGS_ALT_DATA_CREDIT_SIZE avoids the problem and > the ssh command works successfully after that. > > Tested with QCA9377 SDIO with firmware WLAN.TF.1.1.1-00061-QCATFSWPZ-1. > > Fixes: 2f918ea98606 ("ath10k: enable alt data of TX path for sdio") > Signed-off-by: Fabio Estevam A gentle ping on this one. >>> Any comments, please? Without this fix, we can not log via ssh to other >>> machine. >> I don't have much time for ath10k nowadays, so expect long delays in >> reviews. >> >> I'm worried that this breaks QCA6174 SDIO support. Wen, what do you >> think of this? Is this because of differences between firmware versions >> or what? > > it is added by below commit, if disable it, will significant effect > performance. Thanks, I was worried it's something like this. One way to solve this would be to add a new field to ath10k_hw_params so that the workaround is done only on QCA9377 SDIO. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k