Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
On 3/22/21 6:20 PM, Brian Norris wrote: On Mon, Mar 22, 2021 at 4:58 PM Ben Greear wrote: On 7/22/20 6:00 AM, Felix Fietkau wrote: On 2020-07-22 14:55, Johannes Berg wrote: On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote: I'm considering testing a different approach (with mt76 initially): - Add a mac80211 rx function that puts processed skbs into a list instead of handing them to the network stack directly. Would this be *after* all the mac80211 processing, i.e. in place of the rx-up-to-stack? Yes, it would run all the rx handlers normally and then put the resulting skbs into a list instead of calling netif_receive_skb or napi_gro_frags. Whatever came of this? I realized I'm running Felix's patch since his mt76 driver needs it. Any chance it will go upstream? If you're asking about $subject (moving NAPI/RX to a thread), this landed upstream recently: http://git.kernel.org/linus/adbb4fb028452b1b0488a1a7b66ab856cdf20715 It needs a bit of coaxing to work on a WiFi driver (including: WiFi drivers tend to have a different netdev for NAPI than they expose to /sys/class/net/), but it's there. I'm not sure if people had something else in mind in the stuff you're quoting though. No, I got it confused with something Felix did: https://github.com/greearb/mt76/blob/master/patches/0001-net-add-support-for-threaded-NAPI-polling.patch Maybe the NAPI/RX to a thread thing superceded Felix's patch? Thanks, Ben Brian -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
On 7/22/20 6:00 AM, Felix Fietkau wrote: On 2020-07-22 14:55, Johannes Berg wrote: On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote: I'm considering testing a different approach (with mt76 initially): - Add a mac80211 rx function that puts processed skbs into a list instead of handing them to the network stack directly. Would this be *after* all the mac80211 processing, i.e. in place of the rx-up-to-stack? Yes, it would run all the rx handlers normally and then put the resulting skbs into a list instead of calling netif_receive_skb or napi_gro_frags. Whatever came of this? I realized I'm running Felix's patch since his mt76 driver needs it. Any chance it will go upstream? Thanks, Ben - Felix -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: EAP AP/VLAN: multicast not send to client
On 2/8/21 12:32 PM, Sven Eckelmann wrote: On Sunday, 7 February 2021 18:42:42 CET Ben Greear wrote: Somewhere along the way I fixed up raw transmit in my firmware, so possibly only then will vlans really have a chance of working. The first step was to disable the check which enables AP_VLAN conditional and just enable it all the time. I appreciate the work you put into this. Looks like it is at least not a regression in code that I added, but I guess I'll need to fix whatever bug/feature upstream added to get it working. I think I'll have a way to set up a testbed for this sometime soon, as part of work on another project, so I'll try to debug it then. Thanks, Ben I've started testing with firmware-5-full-community-commit-0317-cf4991294.bin but it doesn't provide the raw support + per packet swcrypto. So I've tried to switch to firmware-5-full-community-commit-1187-774502ee5.bin but it has exactly the same with the raw mode - but at least advertises WMI_SERVICE_PER_PACKET_SW_ENCRYPT. So my first target was to figure out what was the first firmware with WMI_SERVICE_PER_PACKET_SW_ENCRYPT. So you would guess that bisect would be suitable for this - but no, the first step directly found a crashing version. I should not complain so much -- just have to skip more and have no extra test results regarding the mcast support for them. Here is the log until I found the first one which is supposed to support WMI_SERVICE_PER_PACKET_SW_ENCRYPT: # has_sw_encrypt: firmware-5-full-community-commit-1187-774502ee5.bin # no_sw_encrypt: firmware-5-full-community-commit-0317-cf4991294.bin # skip: firmware-5-full-community-commit-0775-bb7462f22.bin # skip: firmware-5-full-community-commit-0782-c66b3495b.bin # no_sw_encrypt: firmware-5-full-community-commit-0533-4597878a6.bin # no_sw_encrypt: firmware-5-full-community-commit-0885-2d9cfe00b.bin # no_sw_encrypt: firmware-5-full-community-commit-1045-817be3ee8.bin # has_sw_encrypt: firmware-5-full-community-commit-1112-68b46f73e.bin # no_sw_encrypt: firmware-5-full-community-commit-1077-44c74a25a.bin # has_sw_encrypt: firmware-5-full-community-commit-1093-3c7065550.bin # no_sw_encrypt: firmware-5-full-community-commit-1085-c1d37213a.bin # no_sw_encrypt: firmware-5-full-community-commit-1089-1fbfebf26.bin # has_sw_encrypt: firmware-5-full-community-commit-1091-3aa26dbdd.bin # no_sw_encrypt: firmware-5-full-community-commit-1090-7cfbf3e6a.bin # first has_sw_encrypt commit: firmware-5-full-community-commit-1091-3aa26dbdd.bin None of the firmware version seem to have working multicast tx. And here are some (not so random) picked ones (just so nobody can say that I didn't check in the other direction): # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0425-a422b044f.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0371-157623ac0.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0344-8b9e4442a.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0331-5259fada9.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0324-e6723f0f6.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0321-814d9dc06.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0319-ef95e743e.bin # no_per_patcket_sw_encrypt_no_mcast: firmware-5-full-community-commit-0318-51cd44bdd.bin I didn't do a complete sweep of the builds but at the moment it looks a little bit like there might not be a single one which supports multicast over this setup. If you think there is a specific firmware version which I should test then just say which version. So I've decided to try the ath10k firmware blobs from Kalle's repository to provide at least something useful for someone who also has this problem and searches for a compatible version: firmware blob | works | PER_PACKET_SW_ENCRYPT --+---+-- 3.2/firmware-5.bin_10.4-3.2-00080 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-4 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-5 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-7 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00015 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00018 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00023 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00024 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00026 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00028 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00029 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00031 | N | N 3.2.1/firmware-5.bin_10.4-3.2.1-00033 | N
Re: EAP AP/VLAN: multicast not send to client
On 2/7/21 9:13 AM, Sven Eckelmann wrote: On Sunday, 7 February 2021 17:50:11 CET Ben Greear wrote: Here are the images: http://www.candelatech.com/downloads/ath10k-4019-10-4b/bisect/ Thanks, will try to have look at them tomorrow evening. Can you confirm which QCA ath10k version was used as the base for this one? I've read somewhere on your page 10.4.3.3-25 - which doesn't seem to be in Kalles' repository. And my original plan was to test the relevant QCA firmware first and check if the problem might already be in the base version which you've used for your builds. But maybe I will just start with the oldest one in you tree and check if the problem is also there and based on the result decide how to continue. I don't know exactly how qca versioning works, but my notes are that the initial upstream wave-2 code was 3.5.3-00050, from back in 2018. The first commit in the series should be very similar to stock FW, though perhaps missing some feature flags. Maybe try forcing the driver to try to allow vlans...if it is missing feature flag only, that might work around it. Somewhere along the way I fixed up raw transmit in my firmware, so possibly only then will vlans really have a chance of working. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: EAP AP/VLAN: multicast not send to client
On 2/2/21 5:57 AM, Sven Eckelmann wrote: On Tuesday, 2 February 2021 14:27:01 CET Ben Greear wrote: Sven, I can build you a series of firmware if you have interest in bisecting to see if this is a regression? If it is ok for you then I can go through various firmware builds. But it could be that I can only start with the bisect at the end of the week. At least today, I will have no time after work. Kind regards, Sven Here are the images: http://www.candelatech.com/downloads/ath10k-4019-10-4b/bisect/ Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: bss channel survey timed out and then hardware became unavailable on IPQ40xx.
kernel: [ 245.289214] [] (cfg80211_netdev_notifier_call [cfg80211]) from [] (notifier_call_chain+0x2c/0x6c) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.300349] [] (notifier_call_chain) from [] (raw_notifier_call_chain+0x18/0x20) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.310938] [] (raw_notifier_call_chain) from [] (__dev_close_many+0x44/0xc8) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.320140] [] (__dev_close_many) from [] (dev_close_many+0x60/0xdc) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.328902] [] (dev_close_many) from [] (dev_close+0x34/0x48) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.337136] [] (dev_close) from [] (cfg80211_shutdown_all_interfaces+0x58/0xa8 [cfg80211]) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.344631] [] (cfg80211_shutdown_all_interfaces [cfg80211]) from [] (ieee80211_reconfig+0x790/0xb64 [mac80211]) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.354494] [] (ieee80211_reconfig [mac80211]) from [] (ieee80211_restart_work+0xa8/0xb4 [mac80211]) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.366483] [] (ieee80211_restart_work [mac80211]) from [] (process_one_work+0x280/0x410) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.377258] [] (process_one_work) from [] (worker_thread+0x330/0x560) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.387062] [] (worker_thread) from [] (kthread+0x134/0x13c) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.395221] [] (kthread) from [] (ret_from_fork+0x14/0x2c) Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.402846] ---[ end trace fb0fc35d89e4ff3e ]--- Tue Jan 26 18:09:34 2021 kern.info kernel: [ 245.409750] group-ap2: HW problem - can not stop rx aggregation for 06:ef:c0:01:33:6b tid 0 Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.414504] [ cut here ] ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: EAP AP/VLAN: multicast not send to client
On 2/2/21 2:12 AM, Sven Eckelmann wrote: On Tuesday, 2 February 2021 10:12:45 CET Sebastian Gottschall wrote: mmh. l have a idea try the following (this a patch in my tree) and check also the wmi services for this service flag which might be a difference between these firmwares --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -9003,10 +9003,10 @@ int ath10k_mac_register(struct ath10k *ar) [...] Thanks, for the idea. But this has no effect on the problem. I have also attached the services and feature information (from ath10k-ct's perspective to have hopefully a more complete look at the differences). And it seems both have WMI_SERVICE_PER_PACKET_SW_ENCRYPT and Ben's firmware also ATH10K_FW_FEATURE_CONSUME_BLOCK_ACK_CT (which would also have "enabled" this code section). The biggest difference (which would affect also the non-ct ath10k) would be in wmi_services. Ben Greears firmware doesnt support: * WMI_SERVICE_PEER_CACHING * WMI_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS * WMI_SERVICE_HOST_DFS_CHECK_SUPPORT * WMI_SERVICE_TPC_STATS_FINAL Sven, I can build you a series of firmware if you have interest in bisecting to see if this is a regression? I'll get started on the builds, looks like last time I did a full build of 4019 commits was a while back... Thanks, Ben Kind regards, Sven -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: RX rate is wrong in 5.10? (bisected to: mac80211: receive and process S1G beacons)
On 1/4/21 4:25 PM, Thomas Pedersen wrote: Hi Ben, On 2021-01-04 16:18, Ben Greear wrote: On 1/4/21 8:18 AM, Ben Greear wrote: Hello, I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in 5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at all in 'iw dev wlan1 station dump'. I'm using 9984 NIC... Anyone else see this? After a bisect, the first bad commit shows this: commit 09a740ce352e1a1d16b9984115514ba9a4f4704b (refs/bisect/bad) Author: Thomas Pedersen Date: Mon Sep 21 19:28:14 2020 -0700 mac80211: receive and process S1G beacons S1G beacons are 802.11 Extension Frames, so the fixed header part differs from regular beacons. Add a handler to process S1G beacons and abstract out the fetching of BSSID and element start locations in the beacon body handler. Signed-off-by: Thomas Pedersen Link: https://lore.kernel.org/r/20200922022818.15855-14-tho...@adapt-ip.com [don't rename, small coding style cleanups] Signed-off-by: Johannes Berg From a glance through the diff, I'm at a loss as to why it causes the symptom. I manually double-checked the bisect, an it appears correct. What I see is that in the commit before this, I see a useful rx rate (1.3Gbps for instance) in 'iw dev wlan1 station dump', but in this bad commit, both show 6Mbps rate. (Tx rate in ath10k is probably broken for other reasons, so I only bisected the rx side issue.) I'm using ath10k 9984 radio with firmware 10.4-3.9.0.2-00070 in station mode. AP is an ath11k Hawkeye... I'm using a 1Mbps UDP packet 'download' stream to make sure I'm seeing rates for data frames and not just management frames. Sorry about that. Any idea what might be the issue? It may be fixed by https://patchwork.kernel.org/project/linux-wireless/patch/1607483189-3891-1-git-send-email-wg...@codeaurora.org/ Yes, that fixes it. Looks like it is already in 5.10.4 stable, so I'll upgrade to that. Thanks for the quick hint. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: RX rate is wrong in 5.10? (bisected to: mac80211: receive and process S1G beacons)
On 1/4/21 8:18 AM, Ben Greear wrote: Hello, I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in 5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at all in 'iw dev wlan1 station dump'. I'm using 9984 NIC... Anyone else see this? After a bisect, the first bad commit shows this: commit 09a740ce352e1a1d16b9984115514ba9a4f4704b (refs/bisect/bad) Author: Thomas Pedersen Date: Mon Sep 21 19:28:14 2020 -0700 mac80211: receive and process S1G beacons S1G beacons are 802.11 Extension Frames, so the fixed header part differs from regular beacons. Add a handler to process S1G beacons and abstract out the fetching of BSSID and element start locations in the beacon body handler. Signed-off-by: Thomas Pedersen Link: https://lore.kernel.org/r/20200922022818.15855-14-tho...@adapt-ip.com [don't rename, small coding style cleanups] Signed-off-by: Johannes Berg From a glance through the diff, I'm at a loss as to why it causes the symptom. I manually double-checked the bisect, an it appears correct. What I see is that in the commit before this, I see a useful rx rate (1.3Gbps for instance) in 'iw dev wlan1 station dump', but in this bad commit, both show 6Mbps rate. (Tx rate in ath10k is probably broken for other reasons, so I only bisected the rx side issue.) I'm using ath10k 9984 radio with firmware 10.4-3.9.0.2-00070 in station mode. AP is an ath11k Hawkeye... I'm using a 1Mbps UDP packet 'download' stream to make sure I'm seeing rates for data frames and not just management frames. Any idea what might be the issue? Thanks, Ben ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
RX rate is wrong in 5.10?
Hello, I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in 5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at all in 'iw dev wlan1 station dump'. I'm using 9984 NIC... Anyone else see this? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: skb_cb corruption in ath10k
On 12/21/20 3:55 PM, Ben Greear wrote: Hello, I'm trying to figure out what changed in the last few kernels that is making: struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); if (info->control.flags & IEEE80211_TX_CTRL_RATE_INJECT) /* why is code here all of a sudden */ in data frames in ath10k, when, to the best of my knowledge, nothing should be setting that up in the stack. My guess is that something is stepping on the cb field somewhere in ath10k, but I am not sure where that might be at this point. And it also appears mac80211 or maybe supplicant is setting the rate-inject flag on some mgt frames, but I think that is a separate concern at this point. If anyone has any ideas of likely points, please let me know. This issue was me being confused about how the ath10k skb_cb sits in the same memory as the iee skb_cb. I just needed to reorder the ath10k-skb-cb struct a bit to not clobber the control.flags area. I also see no reason not to natually pack that stuct so that the pointers are 8-byte aligned. Any idea why it is force-packed currently instead of using proper padding? Thanks, Ben ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
skb_cb corruption in ath10k
Hello, I'm trying to figure out what changed in the last few kernels that is making: struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); if (info->control.flags & IEEE80211_TX_CTRL_RATE_INJECT) /* why is code here all of a sudden */ in data frames in ath10k, when, to the best of my knowledge, nothing should be setting that up in the stack. My guess is that something is stepping on the cb field somewhere in ath10k, but I am not sure where that might be at this point. And it also appears mac80211 or maybe supplicant is setting the rate-inject flag on some mgt frames, but I think that is a separate concern at this point. If anyone has any ideas of likely points, please let me know. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2] ath10k: Per-chain rssi should sum the secondary channels
On 12/21/20 10:30 AM, Kalle Valo wrote: gree...@candelatech.com wrote: From: Ben Greear This makes per-chain RSSI be more consistent between HT20, HT40, HT80. Instead of doing precise log math for adding dbm, I did a rough estimate, it seems to work good enough. Tested on ath10k-ct 9984 firmware. Signed-off-by: Ben Greear Commented out code etc so I assume this is an RFC. Has anyone tested this with upstream firmware? I probably tweaked this patch since sending. my wave-1 didn't work with this approach, and in the end, to get a valid RSSI, I ended up reading the per-chain noise-floor periodically and storing that so I could use proper noise-floor instead of just -95. I am not sure upstream firmware can support that, so probably not worth adding just the sum logic unless someone can figure out how to get the noise floor out of the firmware... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery
On 12/17/20 2:24 PM, Brian Norris wrote: On Tue, Dec 15, 2020 at 10:23:33AM -0800, Ben Greear wrote: On 12/15/20 9:21 AM, Youghandhar Chintala wrote: From: Rakesh Pillai Currently in case of target hardware restart ,we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Are there any known mac80211 radios/drivers that *can* support seamless restarts? If not, then just could always enable this feature in mac80211? I'm quite sure that iwlwifi intentionally supports a seamless restart. From my experience with dealing with user reports, I don't recall any issues where restart didn't function as expected, unless there was some deeper underlying failure (e.g., hardware/power failure; driver bugs / lockups). I don't have very good stats for ath10k/QCA6174, but it survives our testing OK and I again don't recall any user-reported complaints in this area. I'd say this is a weaker example though, as I don't have as clear of data. (By contrast, ath10k/WCN399x, which Rakesh, et al, are patching here, does not pass our tests at all, and clearly fails to recover from "seamless" restarts, as noted in patch 3.) I'd also note that we don't operate in AP mode -- only STA -- and IIRC Ben, you've complained about AP mode in the past. I complain about all sorts of things, but I'm usually running station mode :) Do you actually see iwlwifi stations stay associated through firmware crashes? Anyway, happy to hear some have seamless recovery, and in that case, I have no objections to the patch. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery
On 12/15/20 9:21 AM, Youghandhar Chintala wrote: From: Rakesh Pillai Currently in case of target hardware restart ,we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Are there any known mac80211 radios/drivers that *can* support seamless restarts? If not, then just could always enable this feature in mac80211? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v3] ath10k: add flag to protect napi operation to avoid dead loop hang
On 12/9/20 1:24 AM, Kalle Valo wrote: Wen Gong writes: On 2020-09-08 00:22, Kalle Valo wrote: Just like with the recent firmware restart patch, isn't ar->napi_enabled racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer? Or are we holding a lock? But then that should be documented with lockdep_assert_held(). yes, ath10k_hif_start is only called from ath10k_core_start, it has "lockdep_assert_held(>conf_mutex)", and ath10k_hif_stop is only called from ath10k_core_stop, it also has "lockdep_assert_held(>conf_mutex)". then it will not 2 thread both enter ath10k_hif_start/ath10k_hif_stop meanwhile. Ok, but every function depending on a lock being held should still call lockdep_assert_held(), that way we can catch the bug if locking changes later. So it's not enough that ath10k_core_stop() has lockdep_assert_held(), also these napi functions should have it. I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with set_bit() & co, simpler locking that way and no lockdep_assert_held() needed anymore. Please check my changes in the pending branch, I have only compile tested them: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending=e0a466d296bd862080f7796b41349f9f586272c9 Why do you not need locking? You can't just check a bit is set and then do work and set it later without locking, two concurrent CPU threads can pass the first check and both get into the logic below it? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"
Hello Zhi, Do you know of any ways to detect in the driver what platforms need your patch and what ones break with it? Otherwise, we're stuck with external config (which is what I added so far as work-around). Thanks, Ben On 9/8/20 9:02 PM, Ben Greear wrote: Please see this bug report, and feel free to ask the reporter for more details if you don't find everything you need there. Seems a basic ping test reproduces packet loss in their case... https://github.com/greearb/ath10k-ct/issues/153 I don't actually have the platform in question. Thanks, Ben On 9/8/20 7:04 PM, Zhi Chen wrote: Hi Ben, Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced? Thanks, Zhi On 2020-09-09 01:48, Ben Greear wrote: Hello, Just FYI: I added this patch to my ath10k-ct driver, and a user reported it causes regressions on his particular 9888 system when using ath10k-ct wave-2 firmware: [ 21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target 0x0100 chip_id 0x sub : [ 21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [ 21.233298] ath10k_pci :00:00.0: firmware ver 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 a00b5f36 [ 21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20 crc32 5bb32c02[ 23.546156] ath10k_pci :00:00.0: unsupported HTC service id: 1536 I'll revert this for the 9888 chipset (at least) in my driver, possibly you need to do similar. https://github.com/greearb/ath10k-ct/issues/153 Thanks, Ben On 1/13/20 8:35 PM, Zhi Chen wrote: This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e. PCIe hung issue was observed on multiple platforms. The issue was reproduced when DUT was configured as AP and associated with 50+ STAs. For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size of the RD/WR access to the HOST MEM. 0 - No split , RAW read/write transfer size from MAC is put out on bus as burst length 1 - Split at 256 byte boundary 2,3 - Reserved With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when issue happened. It broke PCIe spec and caused PCIe stuck. So revert the default value from 0 to 1. Tested: IPQ8064 + QCA9984 with firmware 10.4-3.10-00047 QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044 Synaptics AS370 + QCA9888 with firmware 10.4-3.9.0.2--00040 Signed-off-by: Zhi Chen --- v2: restored 10.2 register configuration v3: modified commit message v4: resolved conflicts --- drivers/net/wireless/ath/ath10k/hw.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index 21b7a2a..775fd62 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw, #define TARGET_10_4_TX_DBG_LOG_SIZE 1024 #define TARGET_10_4_NUM_WDS_ENTRIES 32 -#define TARGET_10_4_DMA_BURST_SIZE 0 +#define TARGET_10_4_DMA_BURST_SIZE 1 #define TARGET_10_4_MAC_AGGR_DELIM 0 #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1 #define TARGET_10_4_VOW_CONFIG 0 ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"
Please see this bug report, and feel free to ask the reporter for more details if you don't find everything you need there. Seems a basic ping test reproduces packet loss in their case... https://github.com/greearb/ath10k-ct/issues/153 I don't actually have the platform in question. Thanks, Ben On 9/8/20 7:04 PM, Zhi Chen wrote: Hi Ben, Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced? Thanks, Zhi On 2020-09-09 01:48, Ben Greear wrote: Hello, Just FYI: I added this patch to my ath10k-ct driver, and a user reported it causes regressions on his particular 9888 system when using ath10k-ct wave-2 firmware: [ 21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target 0x0100 chip_id 0x sub : [ 21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [ 21.233298] ath10k_pci :00:00.0: firmware ver 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 a00b5f36 [ 21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20 crc32 5bb32c02[ 23.546156] ath10k_pci :00:00.0: unsupported HTC service id: 1536 I'll revert this for the 9888 chipset (at least) in my driver, possibly you need to do similar. https://github.com/greearb/ath10k-ct/issues/153 Thanks, Ben On 1/13/20 8:35 PM, Zhi Chen wrote: This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e. PCIe hung issue was observed on multiple platforms. The issue was reproduced when DUT was configured as AP and associated with 50+ STAs. For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size of the RD/WR access to the HOST MEM. 0 - No split , RAW read/write transfer size from MAC is put out on bus as burst length 1 - Split at 256 byte boundary 2,3 - Reserved With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when issue happened. It broke PCIe spec and caused PCIe stuck. So revert the default value from 0 to 1. Tested: IPQ8064 + QCA9984 with firmware 10.4-3.10-00047 QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044 Synaptics AS370 + QCA9888 with firmware 10.4-3.9.0.2--00040 Signed-off-by: Zhi Chen --- v2: restored 10.2 register configuration v3: modified commit message v4: resolved conflicts --- drivers/net/wireless/ath/ath10k/hw.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index 21b7a2a..775fd62 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw, #define TARGET_10_4_TX_DBG_LOG_SIZE 1024 #define TARGET_10_4_NUM_WDS_ENTRIES 32 -#define TARGET_10_4_DMA_BURST_SIZE 0 +#define TARGET_10_4_DMA_BURST_SIZE 1 #define TARGET_10_4_MAC_AGGR_DELIM 0 #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1 #define TARGET_10_4_VOW_CONFIG 0 -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"
Hello, Just FYI: I added this patch to my ath10k-ct driver, and a user reported it causes regressions on his particular 9888 system when using ath10k-ct wave-2 firmware: [ 21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target 0x0100 chip_id 0x sub : [ 21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0 [ 21.233298] ath10k_pci :00:00.0: firmware ver 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 a00b5f36 [ 21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20 crc32 5bb32c02[ 23.546156] ath10k_pci :00:00.0: unsupported HTC service id: 1536 I'll revert this for the 9888 chipset (at least) in my driver, possibly you need to do similar. https://github.com/greearb/ath10k-ct/issues/153 Thanks, Ben On 1/13/20 8:35 PM, Zhi Chen wrote: This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e. PCIe hung issue was observed on multiple platforms. The issue was reproduced when DUT was configured as AP and associated with 50+ STAs. For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size of the RD/WR access to the HOST MEM. 0 - No split , RAW read/write transfer size from MAC is put out on bus as burst length 1 - Split at 256 byte boundary 2,3 - Reserved With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when issue happened. It broke PCIe spec and caused PCIe stuck. So revert the default value from 0 to 1. Tested: IPQ8064 + QCA9984 with firmware 10.4-3.10-00047 QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044 Synaptics AS370 + QCA9888 with firmware 10.4-3.9.0.2--00040 Signed-off-by: Zhi Chen --- v2: restored 10.2 register configuration v3: modified commit message v4: resolved conflicts --- drivers/net/wireless/ath/ath10k/hw.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h index 21b7a2a..775fd62 100644 --- a/drivers/net/wireless/ath/ath10k/hw.h +++ b/drivers/net/wireless/ath/ath10k/hw.h @@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw, #define TARGET_10_4_TX_DBG_LOG_SIZE 1024 #define TARGET_10_4_NUM_WDS_ENTRIES 32 -#define TARGET_10_4_DMA_BURST_SIZE 0 +#define TARGET_10_4_DMA_BURST_SIZE 1 #define TARGET_10_4_MAC_AGGR_DELIM0 #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1 #define TARGET_10_4_VOW_CONFIG0 -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
On 9/7/20 9:07 AM, Kalle Valo wrote: Ben Greear writes: Here is my original patch to fix this, it is not complex. https://patchwork.kernel.org/patch/10249363/ Sure, I have shared your patch above :). Sent a bit early, any idea why this wasn't upstreamed earlier? No, one comment from Michal indicated maybe there were more problems lurking in this area, but he seemed to be OK with the patch over all. After that, it was just ignored. Now might be a good time to push for it :) It is generally a waste of time in my experience. Kalle is the maintainer and should be seeing any of this he cares to see. If he likes the patch, he can apply it or something similar. If you have a reproducible test case, see if the patch fixes things, that might help it be accepted. The problem with yours (Ben's) patches is that you have your own set of patches for ath10k and your own firmware. So I cannot know at all if your patches work with upstream ath10k and upstream firmware, and would need to test the patches myself. But nowadays I just can't find the time for testing. So if someone else can do the testing and provide a Tested-on tag it would it increase my confidence level for the patches. Surely codeaura could get a few entry level engineers to run basic testing against your target platforms on a regular basis? The several years of time this bug was known (to me at least, and to whoever saw my original patch) and the time wasted by codeaura to rediscover and re-fix the bug would have much better been spent just testing and review my patch to begin with. And not just my patches either, this pattern is far and wide in ath10k. Also, my driver is often tested against various upstream QCA firmware and chipsets in openwrt, so while bugs are always possible, there is some test coverage. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
On 8/20/20 1:15 PM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 11:23 PM Ben Greear wrote: On 8/20/20 10:42 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 10:38 PM Ben Greear wrote: On 8/20/20 10:00 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 10:02 PM Ben Greear wrote: On 8/20/20 9:08 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 8:07 PM Wen Gong wrote: On 2020-08-20 18:52, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 3:45 PM Wen Gong wrote: On 2020-08-20 17:19, Krishna Chaitanya wrote: ... I'm not really convinced that this is the right fix, but I'm no NAPI expert. Can anyone else help? Calling napi_disable() twice can lead to hangs, but moving NAPI from start/stop to the probe isn't the right approach as the datapath is tied to start/stop. Maybe check the state of NAPI before disable? if (test_bit(NAPI_STATE_SCHED, >napi.napi.state)) napi_disable(>napi) or maintain napi_state like this https://patchwork.kernel.org/patch/10249365/ it is better to use above link's patch. napi.state is controlled by napi API, it is better ath10k not know it. Sure, but IMHO just canceling the async rx work should solve the issue. Oh no, canceling the async rx work will not solve this issue, rx worker ath10k_rx_indication_async_work call napi_schedule, after napi_complete, the NAPI_STATE_SCHED will clear. The issue of this patch is because 2 thread called to hif_stop and NAPI_STATE_SCHED not clear. That fix is still valid and good to have. ndev_stop being called twice is typical scenarios (stop vs rmmod), so just checking the netdev_flags for IFF_UP and returning from hif_Stop should suffice, no? My approach to fix this problem was to add a boolean in ath10k as to whether it had napi enabled or not, and then check that before trying to enable/disable it again. Seems to work fine, and cleaner in my mind than checking internal napi flags. A much simpler approach is just to check for IFF_UP and skip NAPI (and others) in the hif_stop no? (provided proper RTNL locking is done if hif_stop is being called internally as well). I'm not sure, but I think the driver should be internally consistent and not spend a lot of time trying to guess about interactions with objects higher in the stack. Fair enough, the network interface state is a basic thing controlled by the driver, so, should be okay to use. Anyways, the in-driver approach has more control. Here is my original patch to fix this, it is not complex. https://patchwork.kernel.org/patch/10249363/ Sure, I have shared your patch above :). Sent a bit early, any idea why this wasn't upstreamed earlier? No, one comment from Michal indicated maybe there were more problems lurking in this area, but he seemed to be OK with the patch over all. After that, it was just ignored. Now might be a good time to push for it :) It is generally a waste of time in my experience. Kalle is the maintainer and should be seeing any of this he cares to see. If he likes the patch, he can apply it or something similar. If you have a reproducible test case, see if the patch fixes things, that might help it be accepted. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
On 8/20/20 10:42 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 10:38 PM Ben Greear wrote: On 8/20/20 10:00 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 10:02 PM Ben Greear wrote: On 8/20/20 9:08 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 8:07 PM Wen Gong wrote: On 2020-08-20 18:52, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 3:45 PM Wen Gong wrote: On 2020-08-20 17:19, Krishna Chaitanya wrote: ... I'm not really convinced that this is the right fix, but I'm no NAPI expert. Can anyone else help? Calling napi_disable() twice can lead to hangs, but moving NAPI from start/stop to the probe isn't the right approach as the datapath is tied to start/stop. Maybe check the state of NAPI before disable? if (test_bit(NAPI_STATE_SCHED, >napi.napi.state)) napi_disable(>napi) or maintain napi_state like this https://patchwork.kernel.org/patch/10249365/ it is better to use above link's patch. napi.state is controlled by napi API, it is better ath10k not know it. Sure, but IMHO just canceling the async rx work should solve the issue. Oh no, canceling the async rx work will not solve this issue, rx worker ath10k_rx_indication_async_work call napi_schedule, after napi_complete, the NAPI_STATE_SCHED will clear. The issue of this patch is because 2 thread called to hif_stop and NAPI_STATE_SCHED not clear. That fix is still valid and good to have. ndev_stop being called twice is typical scenarios (stop vs rmmod), so just checking the netdev_flags for IFF_UP and returning from hif_Stop should suffice, no? My approach to fix this problem was to add a boolean in ath10k as to whether it had napi enabled or not, and then check that before trying to enable/disable it again. Seems to work fine, and cleaner in my mind than checking internal napi flags. A much simpler approach is just to check for IFF_UP and skip NAPI (and others) in the hif_stop no? (provided proper RTNL locking is done if hif_stop is being called internally as well). I'm not sure, but I think the driver should be internally consistent and not spend a lot of time trying to guess about interactions with objects higher in the stack. Fair enough, the network interface state is a basic thing controlled by the driver, so, should be okay to use. Anyways, the in-driver approach has more control. Here is my original patch to fix this, it is not complex. https://patchwork.kernel.org/patch/10249363/ Sure, I have shared your patch above :). Sent a bit early, any idea why this wasn't upstreamed earlier? No, one comment from Michal indicated maybe there were more problems lurking in this area, but he seemed to be OK with the patch over all. After that, it was just ignored. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
On 8/20/20 10:00 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 10:02 PM Ben Greear wrote: On 8/20/20 9:08 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 8:07 PM Wen Gong wrote: On 2020-08-20 18:52, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 3:45 PM Wen Gong wrote: On 2020-08-20 17:19, Krishna Chaitanya wrote: ... I'm not really convinced that this is the right fix, but I'm no NAPI expert. Can anyone else help? Calling napi_disable() twice can lead to hangs, but moving NAPI from start/stop to the probe isn't the right approach as the datapath is tied to start/stop. Maybe check the state of NAPI before disable? if (test_bit(NAPI_STATE_SCHED, >napi.napi.state)) napi_disable(>napi) or maintain napi_state like this https://patchwork.kernel.org/patch/10249365/ it is better to use above link's patch. napi.state is controlled by napi API, it is better ath10k not know it. Sure, but IMHO just canceling the async rx work should solve the issue. Oh no, canceling the async rx work will not solve this issue, rx worker ath10k_rx_indication_async_work call napi_schedule, after napi_complete, the NAPI_STATE_SCHED will clear. The issue of this patch is because 2 thread called to hif_stop and NAPI_STATE_SCHED not clear. That fix is still valid and good to have. ndev_stop being called twice is typical scenarios (stop vs rmmod), so just checking the netdev_flags for IFF_UP and returning from hif_Stop should suffice, no? My approach to fix this problem was to add a boolean in ath10k as to whether it had napi enabled or not, and then check that before trying to enable/disable it again. Seems to work fine, and cleaner in my mind than checking internal napi flags. A much simpler approach is just to check for IFF_UP and skip NAPI (and others) in the hif_stop no? (provided proper RTNL locking is done if hif_stop is being called internally as well). I'm not sure, but I think the driver should be internally consistent and not spend a lot of time trying to guess about interactions with objects higher in the stack. Here is my original patch to fix this, it is not complex. https://patchwork.kernel.org/patch/10249363/ Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
On 8/20/20 9:08 AM, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 8:07 PM Wen Gong wrote: On 2020-08-20 18:52, Krishna Chaitanya wrote: On Thu, Aug 20, 2020 at 3:45 PM Wen Gong wrote: On 2020-08-20 17:19, Krishna Chaitanya wrote: ... I'm not really convinced that this is the right fix, but I'm no NAPI expert. Can anyone else help? Calling napi_disable() twice can lead to hangs, but moving NAPI from start/stop to the probe isn't the right approach as the datapath is tied to start/stop. Maybe check the state of NAPI before disable? if (test_bit(NAPI_STATE_SCHED, >napi.napi.state)) napi_disable(>napi) or maintain napi_state like this https://patchwork.kernel.org/patch/10249365/ it is better to use above link's patch. napi.state is controlled by napi API, it is better ath10k not know it. Sure, but IMHO just canceling the async rx work should solve the issue. Oh no, canceling the async rx work will not solve this issue, rx worker ath10k_rx_indication_async_work call napi_schedule, after napi_complete, the NAPI_STATE_SCHED will clear. The issue of this patch is because 2 thread called to hif_stop and NAPI_STATE_SCHED not clear. That fix is still valid and good to have. ndev_stop being called twice is typical scenarios (stop vs rmmod), so just checking the netdev_flags for IFF_UP and returning from hif_Stop should suffice, no? My approach to fix this problem was to add a boolean in ath10k as to whether it had napi enabled or not, and then check that before trying to enable/disable it again. Seems to work fine, and cleaner in my mind than checking internal napi flags. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: WMM not working for mcast packets on ath10k
On 8/11/20 7:36 AM, Ahmed Zaki wrote: On Tue, 11 Aug 2020 at 16:01, Ben Greear wrote: On 8/11/20 5:22 AM, Ahmed Zaki wrote: On Tue, 11 Aug 2020 at 01:05, Ben Greear wrote: On 8/10/20 3:08 PM, Ahmed Zaki wrote: Hello, I have 2 ath10k devices set in Mesh Point mode. When I use iptables' DSCP target to set the WMM AC for some traffic ports to VO, the rules work fine but only for unicast packets. All broadcast on the target UDP ports goes out as BE, and not VO. With some ath10k debugging enabled, it seems that the htt is indeed sending the tx descriptor with the correct tid (6) set. Is this the intended behavior for some reason? If not, is there any more debugging on the TX path that I can do? I don't think that WMM makes sense for bcast frames since they go out on a special TID that does not do aggregation or per peer QoS settings. I get that, but we want to give some WMM priority to bcast as they tend to have higher probability of loss under high network loads. That special TID that you mentioned, is it set in the ath10k fw? Because I traced the htt and it is sending the intended tid in the txbuf flags, but bcast still come 0 (BE) on air. Or am I missing something in the driver? Thanks again for help. Maybe you can convert bcast to unicast in the driver/mac80211? There are no retries or proper rate-ctrl for bcast either, so I don't think you can get great performance for bcast. The bcast that we want to send with higher WMM are kind of KEEPALIVES, small and infrequent but important for network stability. So, we do not care about aggregation, rate-ctrl or higher MCS, just lower probability of being missed. They are, by design, sent to all neighbors and cannot be sent as unicast. Sorry to ask again, but without access to the fw I cannot know the answer to this: are you aware of anything in the fw that can override the tid sent from the driver if the destination is bcast? There is no way to put bcast frames on the high priority queues. If you run a sniffer near the transmitter, do you actually see that your bcast frames are dropped before being put on air? The ath10k driver probably has some tx status that indicates whether it was dropped before being put on air as well...have you checked that? You can send your frames more often if you want better chance of some of them getting through. Thanks, Ben Thanks, -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: WMM not working for mcast packets on ath10k
On 8/11/20 5:22 AM, Ahmed Zaki wrote: On Tue, 11 Aug 2020 at 01:05, Ben Greear wrote: On 8/10/20 3:08 PM, Ahmed Zaki wrote: Hello, I have 2 ath10k devices set in Mesh Point mode. When I use iptables' DSCP target to set the WMM AC for some traffic ports to VO, the rules work fine but only for unicast packets. All broadcast on the target UDP ports goes out as BE, and not VO. With some ath10k debugging enabled, it seems that the htt is indeed sending the tx descriptor with the correct tid (6) set. Is this the intended behavior for some reason? If not, is there any more debugging on the TX path that I can do? I don't think that WMM makes sense for bcast frames since they go out on a special TID that does not do aggregation or per peer QoS settings. I get that, but we want to give some WMM priority to bcast as they tend to have higher probability of loss under high network loads. That special TID that you mentioned, is it set in the ath10k fw? Because I traced the htt and it is sending the intended tid in the txbuf flags, but bcast still come 0 (BE) on air. Or am I missing something in the driver? Thanks again for help. Maybe you can convert bcast to unicast in the driver/mac80211? There are no retries or proper rate-ctrl for bcast either, so I don't think you can get great performance for bcast. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: WMM not working for mcast packets on ath10k
On 8/10/20 3:08 PM, Ahmed Zaki wrote: Hello, I have 2 ath10k devices set in Mesh Point mode. When I use iptables' DSCP target to set the WMM AC for some traffic ports to VO, the rules work fine but only for unicast packets. All broadcast on the target UDP ports goes out as BE, and not VO. With some ath10k debugging enabled, it seems that the htt is indeed sending the tx descriptor with the correct tid (6) set. Is this the intended behavior for some reason? If not, is there any more debugging on the TX path that I can do? I don't think that WMM makes sense for bcast frames since they go out on a special TID that does not do aggregation or per peer QoS settings. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 1/3] ath10k: Add history for tracking certain events
On 7/31/20 11:27 AM, Rakesh Pillai wrote: Add history for tracking the below events - register read - register write - IRQ trigger - NAPI poll - CE service - WMI cmd - WMI event - WMI tx completion This will help in debugging any crash or any improper behaviour. Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1 Signed-off-by: Rakesh Pillai --- drivers/net/wireless/ath/ath10k/ce.c | 1 + drivers/net/wireless/ath/ath10k/core.h| 74 + drivers/net/wireless/ath/ath10k/debug.c | 133 ++ drivers/net/wireless/ath/ath10k/debug.h | 74 + drivers/net/wireless/ath/ath10k/snoc.c| 15 +++- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 1 + drivers/net/wireless/ath/ath10k/wmi.c | 10 +++ 7 files changed, 307 insertions(+), 1 deletion(-) +void ath10k_record_wmi_event(struct ath10k *ar, enum ath10k_wmi_type type, +u32 id, unsigned char *data) +{ + struct ath10k_wmi_event_entry *entry; + u32 idx; + + if (type == ATH10K_WMI_EVENT) { + if (!ar->wmi_event_history.record) + return; This check above is duplicated below, add it once at top of the method instead. + + spin_lock_bh(>wmi_event_history.hist_lock); + idx = ath10k_core_get_next_idx(>reg_access_history.index, + ar->wmi_event_history.max_entries); + spin_unlock_bh(>wmi_event_history.hist_lock); + entry = >wmi_event_history.record[idx]; + } else { + if (!ar->wmi_cmd_history.record) + return; + + spin_lock_bh(>wmi_cmd_history.hist_lock); + idx = ath10k_core_get_next_idx(>reg_access_history.index, + ar->wmi_cmd_history.max_entries); + spin_unlock_bh(>wmi_cmd_history.hist_lock); + entry = >wmi_cmd_history.record[idx]; + } + + entry->timestamp = ath10k_core_get_timestamp(); + entry->cpu_id = smp_processor_id(); + entry->type = type; + entry->id = id; + memcpy(>data, data + 4, ATH10K_WMI_DATA_LEN); +} +EXPORT_SYMBOL(ath10k_record_wmi_event); @@ -1660,6 +1668,11 @@ static int ath10k_snoc_probe(struct platform_device *pdev) ar->ce_priv = _snoc->ce; msa_size = drv_data->msa_size; + ath10k_core_reg_access_history_init(ar, ATH10K_REG_ACCESS_HISTORY_MAX); + ath10k_core_wmi_event_history_init(ar, ATH10K_WMI_EVENT_HISTORY_MAX); + ath10k_core_wmi_cmd_history_init(ar, ATH10K_WMI_CMD_HISTORY_MAX); + ath10k_core_ce_event_history_init(ar, ATH10K_CE_EVENT_HISTORY_MAX); Maybe only enable this once user turns it on? It sucks up a bit of memory? + ath10k_snoc_quirks_init(ar); ret = ath10k_snoc_resource_init(ar); diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c index 932266d..9df5748 100644 --- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c +++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c @@ -627,6 +627,7 @@ static void ath10k_wmi_tlv_op_rx(struct ath10k *ar, struct sk_buff *skb) if (skb_pull(skb, sizeof(struct wmi_cmd_hdr)) == NULL) goto out; + ath10k_record_wmi_event(ar, ATH10K_WMI_EVENT, id, skb->data); trace_ath10k_wmi_event(ar, id, skb->data, skb->len); consumed = ath10k_tm_event_wmi(ar, id, skb); diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c index a81a1ab..8ebd05c 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -1802,6 +1802,15 @@ struct sk_buff *ath10k_wmi_alloc_skb(struct ath10k *ar, u32 len) static void ath10k_wmi_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb) { + struct wmi_cmd_hdr *cmd_hdr; + enum wmi_tlv_event_id id; + + cmd_hdr = (struct wmi_cmd_hdr *)skb->data; + id = MS(__le32_to_cpu(cmd_hdr->cmd_id), WMI_CMD_HDR_CMD_ID); + + ath10k_record_wmi_event(ar, ATH10K_WMI_TX_COMPL, id, + skb->data + sizeof(struct wmi_cmd_hdr)); + dev_kfree_skb(skb); } I think guard the above new code with if (unlikely(ar->ce_event_history.record)) { ... } All in all, I think I'd want to compile this out (while leaving other debug compiled in) since it seems this stuff would be rarely used and it adds method calls to hot paths. That is a decision for Kalle though, so see what he says... Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: duplicate authentications / excessive missing ACKs / deauth due to inactivity timer
On 7/22/20 7:18 AM, Leon M. George wrote: Hi :-) I've encountered connection issues that appear to be strongly related to the problem in this thread [1] (found via the arch linux bug tracker [2]). If this is ath10k-ct only problem, plz don't bother this list and send to just me and/or add to ath10k-ct bugtracker on github. If it does affect upstream ath10k too, then that would be valid for this mailing list. Thanks, Ben Symptoms: ap: "disconnected due to excessive missing ACKs" station: "No beacon heard and the time event is over already..." Using a monitoring interface on the AP i was able to confirm that the beacon is indeed not being sent at any time. I've found a configuration that reliably produces this state ("error state" from here on): If any number of mesh (or adhoc with ct) points are configured alongside any number of ordinary APs, this issue starts appearing. The mesh connections appear to be working correctly in the error state. I tested various combinations of openwrt-19.xx and openwrt-trunk with ath10k/ath10k-ct - all were affected. Aligning with my vague memory, a user on the openwrt forum reports the issue isn't present in openwrt-18.xx [3]. About the ruling out client issues: My employer is operating installations with multiple hundreds of ath10k access points. We couldn't identify the source of the issue at first when we encountered it in our live setup and received unsolicited reports from basically every installation. As far as we can tell, no client is able to connect in the error state. We've had our users confirm the bug for - Apple phones/tablets/macbooks - Samsung phones, laptops - computers with Intel/Realtek/AzureWave-hardware. I hope this info is helpful. kind regards, Leon George [1] https://www.mail-archive.com/ath10k@lists.infradead.org/msg11599.html [2] https://bugs.archlinux.org/task/58457 [3] https://forum.openwrt.org/t/wifi-connectivity-issues-with-ath10k/67779 ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Add history for tracking certain events
On 06/27/2020 10:12 PM, Rakesh Pillai wrote: -Original Message- From: Ben Greear Sent: Saturday, June 27, 2020 8:58 PM To: Rakesh Pillai ; ath10k@lists.infradead.org Cc: linux-wirel...@vger.kernel.org; linux-ker...@vger.kernel.org Subject: Re: [PATCH] ath10k: Add history for tracking certain events On 06/26/2020 11:22 PM, Rakesh Pillai wrote: For debugging many issues, a history of the below mentioned events can help get an idea of what exactly was going on just before any issue occurred in the system. These event history will be collected only when the host driver is run in debug mode (i.e. with the config ATH10K_DEBUG enabled). This should be disabled by default unless user specifically pokes some debugfs value to turn it on so that it does not impact performance. Hi Ben, This history is enabled only if the user compiles the kernel with ATH10K_DEBUG. Making it runtime, adds a lot of "if" conditions for this history record. Do you suggest to add support to enable/disable it runtime even in ATH10K_DEBUG ? Yes, because you are adding lots of locks/unlocks. That is way more expensive than an if statement. You can add an 'unlikely' to the if check as well, so compiler will optimize for this feature not being enabled. Thanks, Ben Thanks, Ben Add history for tracking the below events - register read - register write - IRQ trigger - IRQ Enable - IRQ Disable - NAPI poll - CE service - WMI cmd - WMI event - WMI tx completion This will help in debugging any crash or any improper behaviour. -- Ben Greear Candela Technologies Inc http://www.candelatech.com -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Add history for tracking certain events
On 06/26/2020 11:22 PM, Rakesh Pillai wrote: For debugging many issues, a history of the below mentioned events can help get an idea of what exactly was going on just before any issue occurred in the system. These event history will be collected only when the host driver is run in debug mode (i.e. with the config ATH10K_DEBUG enabled). This should be disabled by default unless user specifically pokes some debugfs value to turn it on so that it does not impact performance. Thanks, Ben Add history for tracking the below events - register read - register write - IRQ trigger - IRQ Enable - IRQ Disable - NAPI poll - CE service - WMI cmd - WMI event - WMI tx completion This will help in debugging any crash or any improper behaviour. -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 2/2] ath10k: Skip wait for delete response if firmware is down
On 06/26/2020 11:11 AM, Rakesh Pillai wrote: Currently the driver waits for response from the firmware for all the delete cmds, eg: vdev_delete, peer delete. If the firmware is down, these wait will always timeout and return an error. Also during subsytems recovery, any attempt to send a WMI cmd to the FW will return the -ESHUTDOWN status, which when returned to mac80211, can cause unnecessary warnings to be printed on to the console, as shown below [ 2559.529565] Call trace: [ 2559.532214] __sta_info_destroy_part2+0x160/0x168 [mac80211] [ 2559.538157] __sta_info_flush+0x124/0x180 [mac80211] [ 2559.543402] ieee80211_set_disassoc+0x130/0x2c0 [mac80211] [ 2559.549172] ieee80211_mgd_deauth+0x238/0x25c [mac80211] [ 2559.554764] ieee80211_deauth+0x24/0x30 [mac80211] [ 2559.559860] cfg80211_mlme_deauth+0x258/0x2b0 [cfg80211] [ 2559.565446] nl80211_deauthenticate+0xe4/0x110 [cfg80211] [ 2559.571064] genl_rcv_msg+0x3a0/0x440 [ 2559.574888] netlink_rcv_skb+0xb4/0x11c [ 2559.578877] genl_rcv+0x34/0x48 [ 2559.582162] netlink_unicast+0x14c/0x1e4 [ 2559.586235] netlink_sendmsg+0x2f0/0x360 [ 2559.590317] sock_sendmsg+0x44/0x5c [ 2559.593951] sys_sendmsg+0x1c8/0x290 [ 2559.598029] ___sys_sendmsg+0xa8/0xfc [ 2559.601840] __sys_sendmsg+0x8c/0xd0 [ 2559.605572] __arm64_compat_sys_sendmsg+0x2c/0x38 [ 2559.610468] el0_svc_common+0xa8/0x160 [ 2559.614372] el0_svc_compat_handler+0x2c/0x38 [ 2559.618905] el0_svc_compat+0x8/0x10 Skip the wait for delete response from the firmware if the firmware is down. Also return success to the mac80211 calls when the peer delete cmd fails with return status -ESHUTDOWN. Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1 Signed-off-by: Rakesh Pillai --- drivers/net/wireless/ath/ath10k/mac.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index dc7befc..7ac6549 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -701,7 +701,8 @@ static void ath10k_wait_for_peer_delete_done(struct ath10k *ar, u32 vdev_id, unsigned long time_left; int ret; - if (test_bit(WMI_SERVICE_SYNC_DELETE_CMDS, ar->wmi.svc_map)) { + if (test_bit(WMI_SERVICE_SYNC_DELETE_CMDS, ar->wmi.svc_map) && + test_bit(ATH10K_FLAG_CRASH_FLUSH, >dev_flags)) { Don't you mean !test_bit(ATH10K_FLAG_CRASH_FLUSH, >dev_flags)) ??? Or maybe I'm just mis-reading your patch? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Potential issue with htt flags
While porting forward my patches to 5.7 , I noticed this: #define HTT_TX_CMPL_FLAG_DATA_RSSI BIT(0) #define HTT_TX_CMPL_FLAG_PPID_PRESENT BIT(1) #define HTT_TX_CMPL_FLAG_PA_PRESENT BIT(2) #define HTT_TX_CMPL_FLAG_PPDU_DURATION_PRESENT BIT(3) #define HTT_TX_DATA_RSSI_ENABLE_WCN3990 BIT(3) #define HTT_TX_DATA_APPEND_RETRIES BIT(0) #define HTT_TX_DATA_APPEND_TIMESTAMP BIT(1) Both of these are used against 'flags2', but as you see, some bits are defined to different things. In particular usage in,: static int ath10k_get_htt_tx_data_rssi_pad(struct htt_resp *resp) looks suspicious to me. Maybe that ath10k_get_htt_tx_data_rssi_pad should be labeled specific for one particular chipset? I didn't look further, but maybe whoever added this could take a look? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Un-recoverable ath10k 4019 NIC lockup.
: [ 2545.370652] ath10k_ahb a80.wifi: failed to halt axi bus: 0 [ 2548.661207] ath10k_ahb a80.wifi: failed to receive initialized event from target: 8000 [ 2548.671340] ath10k_ahb a80.wifi: failed to halt axi bus: 0 Thu May 14 19:29:17 2020 kern.err kernel: [ 2548.661207] ath10k_ahb a80.wifi: failed to receive initialized event from target: 8000 Thu May 14 19:29:17 2020 kern.err kernel: [ 2548.671340] ath10k_ahb a80.wifi: failed to halt axi bus: 0 [ 2548.840677] ath10k_ahb a80.wifi: failed to reset chip: -110 [ 2548.840716] ath10k_ahb a80.wifi: Could not init hif: -110 [ 2548.845695] [ cut here ] [ 2548.851832] WARNING: CPU: 3 PID: 98 at backports-4.19.98-1/net/mac80211/util.c:2040 ieee80211_reconfig+0x98/0xb64 [mac80211] [ 2548.856020] Hardware became unavailable during restart. And endless -108 errors and other funk after this. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC 1/2] devlink: add simple fw crash helpers
On 05/25/2020 02:07 AM, Andy Shevchenko wrote: On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote: On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain wrote: I had to go RTFM re: kernel taints because it has been a very long time since I looked at them. It had always seemed to me that most were caused by "kernel-unfriendly" user actions. The most famous of course is loading proprietary modules, out-of-tree modules, forced module loads, etc... Honestly, I had forgotten the large variety of uses of the taint flags. For anyone who hasn't looked at taints recently, I recommend: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html In light of this I don't object to setting a taint on this anymore. I'm a little uneasy, but I've softened on it now, and now I feel it depends on implementation. Specifically, I don't think we should set a taint flag when a driver easily handles a routine firmware crash and is confident that things have come up just fine again. In other words, triggering the taint in every driver module where it spits out a log comment that it had a firmware crash and had to recover seems too much. Sure, firmware shouldn't crash, sure it should be open source so we can fix it, whatever... While it may sound idealistic the firmware for the end-user, and even for mere kernel developer like me, is a complete blackbox which has more access than root user in the kernel. We have tons of firmwares and each of them potentially dangerous beast. As a user I really care about my data and privacy (hacker can oops a firmware in order to set a specific vector attack). So, tainting kernel is _a least_ we can do there, the strict rules would be to reboot immediately. those sort of wishful comments simply ignore reality and our ability to affect effective change. We can encourage users not to buy cheap crap for the starter. There is no stable wifi firmware for any price. There is also no obvious feedback from even name-brand NICs like ath10k or AX200 when you report a crash. That said, at least in my experience with ath10k-ct, the OS normally recovers fine from firmware crashes. ath10k already reports full crash reports on udev, so easy for user-space to notice and report bug reports upstream if it cares to. Probably other NICs do the same, and if not, they certainly could. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()
On 05/18/2020 10:09 AM, Luis Chamberlain wrote: On Mon, May 18, 2020 at 09:58:53AM -0700, Ben Greear wrote: On 05/18/2020 09:51 AM, Luis Chamberlain wrote: On Sat, May 16, 2020 at 03:24:01PM +0200, Johannes Berg wrote: On Fri, 2020-05-15 at 21:28 +, Luis Chamberlain wrote:> module_firmware_crashed You didn't CC me or the wireless list on the rest of the patches, so I'm replying to a random one, but ... What is the point here? This should in no way affect the integrity of the system/kernel, for most devices anyway. Keyword you used here is "most device". And in the worst case, *who* knows what other odd things may happen afterwards. So what if ath10k's firmware crashes? If there's a driver bug it will not handle it right (and probably crash, WARN_ON, or something else), but if the driver is working right then that will not affect the kernel at all. Sometimes the device can go into a state which requires driver removal and addition to get things back up. It would be lovely to be able to detect this case in the driver/system somehow! I haven't seen any such cases recently, I assure you that I have run into it. Once it does again I'll report the crash, but the problem with some of this is that unless you scrape the log you won't know. Eventually, a uevent would indeed tell inform me. but in case there is some common case you see, maybe we can think of a way to detect it? ath10k is just one case, this patch series addresses a simple way to annotate this tree-wide. So maybe I can understand that maybe you want an easy way to discover - per device - that the firmware crashed, but that still doesn't warrant a complete kernel taint. That is one reason, another is that a taint helps support cases *fast* easily detect if the issue was a firmware crash, instead of scraping logs for driver specific ways to say the firmware has crashed. You can listen for udev events (I think that is the right term), and find crashes that way. You get the actual crash info as well. My follow up to this was to add uevent to add_taint() as well, this way these could generically be processed by userspace. I'm not opposed to the taint, though I have not thought much on it. But, if you can already get the crash info from uevent, and it automatically comes without polling or scraping logs, then what benefit beyond that does the taint give you? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()
On 05/18/2020 09:51 AM, Luis Chamberlain wrote: On Sat, May 16, 2020 at 03:24:01PM +0200, Johannes Berg wrote: On Fri, 2020-05-15 at 21:28 +, Luis Chamberlain wrote:> module_firmware_crashed You didn't CC me or the wireless list on the rest of the patches, so I'm replying to a random one, but ... What is the point here? This should in no way affect the integrity of the system/kernel, for most devices anyway. Keyword you used here is "most device". And in the worst case, *who* knows what other odd things may happen afterwards. So what if ath10k's firmware crashes? If there's a driver bug it will not handle it right (and probably crash, WARN_ON, or something else), but if the driver is working right then that will not affect the kernel at all. Sometimes the device can go into a state which requires driver removal and addition to get things back up. It would be lovely to be able to detect this case in the driver/system somehow! I haven't seen any such cases recently, but in case there is some common case you see, maybe we can think of a way to detect it? So maybe I can understand that maybe you want an easy way to discover - per device - that the firmware crashed, but that still doesn't warrant a complete kernel taint. That is one reason, another is that a taint helps support cases *fast* easily detect if the issue was a firmware crash, instead of scraping logs for driver specific ways to say the firmware has crashed. You can listen for udev events (I think that is the right term), and find crashes that way. You get the actual crash info as well. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 1/2] ath10k: use cumulative survey statistics
On 05/04/2020 04:52 PM, Rajkumar Manoharan wrote: On 2020-05-04 16:49, Ben Greear wrote: On 05/04/2020 04:46 PM, Rajkumar Manoharan wrote: On 2020-05-04 08:41, Markus Theil wrote: ath10k currently reports survey results for the last interval between each invocation of NL80211_CMD_GET_SURVEY. For concurrent invocations, this can lead to unexpectedly small results, e.g. when hostapd uses survey data and iw survey dump is invoked in parallel. Fix this by returning cumulative results, that don't depend on the last invocation. Other drivers, e.g. ath9k or mt76 also use this behavior. Signed-off-by: Markus Theil IIRC this was fixed a while ago by below patch. Somehow it never landed in ath.git. Simple one line change is enough. https://patchwork.kernel.org/patch/10550707/ -Rajkumar Have you tested this with wave-1? Lots of older, at least, firmware has brokenness in this area. Yes. It was tested in wave-1 as well. Venkat replied to your comment on original change. Ahh, sorry I missed that. Hopefully no one is using the broken firmware anymore then! --Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 1/2] ath10k: use cumulative survey statistics
On 05/04/2020 04:46 PM, Rajkumar Manoharan wrote: On 2020-05-04 08:41, Markus Theil wrote: ath10k currently reports survey results for the last interval between each invocation of NL80211_CMD_GET_SURVEY. For concurrent invocations, this can lead to unexpectedly small results, e.g. when hostapd uses survey data and iw survey dump is invoked in parallel. Fix this by returning cumulative results, that don't depend on the last invocation. Other drivers, e.g. ath9k or mt76 also use this behavior. Signed-off-by: Markus Theil IIRC this was fixed a while ago by below patch. Somehow it never landed in ath.git. Simple one line change is enough. https://patchwork.kernel.org/patch/10550707/ -Rajkumar Have you tested this with wave-1? Lots of older, at least, firmware has brokenness in this area. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: increase rx buffer size to 2048
On 04/28/2020 05:01 AM, Kalle Valo wrote: Sven Eckelmann writes: On Wednesday, 1 April 2020 09:00:49 CEST Sven Eckelmann wrote: On Wednesday, 5 February 2020 20:10:43 CEST Linus Lüssing wrote: From: Linus Lüssing Before, only frames with a maximum size of 1528 bytes could be transmitted between two 802.11s nodes. For batman-adv for instance, which adds its own header to each frame, we typically need an MTU of at least 1532 bytes to be able to transmit without fragmentation. This patch now increases the maxmimum frame size from 1528 to 1656 bytes. [...] @Kalle, I saw that this patch was marked as deferred [1] but I couldn't find any mail why it was done so. It seems like this currently creates real world problems - so would be nice if you could explain shortly what is currently blocking its acceptance. Ping? Sorry for the delay, my plan was to first write some documentation about different hardware families but haven't managed to do that yet. My problem with this patch is that I don't know what hardware and firmware versions were tested, so it needs analysis before I feel safe to apply it. The ath10k hardware families are very different that even if a patch works perfectly on one ath10k hardware it could still break badly on another one. What makes me faster to apply ath10k patches is to have comprehensive analysis in the commit log. This shows me the patch author has considered about all hardware families, not just the one he is testing on, and that I don't need to do the analysis myself. It has been in ath10k-ct for a while, and that has some fairly wide coverage in OpenWrt, so likely if there were problems we would have seen it already. I did not make any specific changes to firmware to support this, so upstream firmware should behave similarly. Seems like upstream ath10k could really benefit from having some test beds so you can actually test code on different chips and have confidence in your changes! Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Management rate-control on IPQ4019
On 02/18/2020 03:06 PM, David Bauer wrote: Hello Ben, On 2/18/20 8:58 PM, Ben Greear wrote: On 02/18/2020 11:12 AM, David Bauer wrote: Hello, while playing around with the 2.4GHz WiFi part of the IPQ4019, i was expecting being able to set the rate at which IPQ4019 transmits it's beacon frames. Using OpenWrt, setting "legacy_rates=0" on the radio leads to only advertising 802.11g speeds, however the beacons are still sent out at 1 Mbit/s. Using a QCA9984, the beacons are correctly sent out at the lowest 802.11g rate (6 Mbit/s). So i assume this is either a bug in the ath10k firmware or a hardware-shortcoming. Has anyone else experienced this bug and is it likely we'll see it fixed in a later firmware release? Hardware: IPQ4029 (Aruba AP-303) Firmware Version: 10.4-3.6-00140 / 10.4-3.5.3-00078 There are separate API for setting management frame rates. I forget exactly how upstream supports this, but maybe check debugfs? I'm using the mac80211 interface here [0], which works well for the QCA9984, but not for the IPQ4019. I'm not aware of a debugfs interface with ath10k for setting the management rate. I can try the one ath10k-ct implements, but the fact it works on the QCA9984 makes me believe the culprit is the firmware. The patch adding support for mgmt-rate setting does not list the IPQ4019 as a tested platform also. [0] https://patchwork.kernel.org/patch/10593573/ Ok, maybe so. I compile all of the wave-2 targets from the same firmware source, but maybe upstream 4019 firmware lags others for one reason or another. If you want to try -ct firmware/driver, please search for "Set multicast, broadcast, beacon tx rates." in the link below: http://www.candelatech.com/ath10k-ug.php Possibly these driver changes will work with upstream firmware, I have not tried it. # cat /debug/ieee80211/wiphy1/ath10k/set_rates This is to set fixed bcast, mcast, and beacon rates. Normal rate-ctrl is handled through normal API using 'iw', etc. To set a value, you specify the dev-name, type, band and rate-code: types: bcast, mcast, beacon bands: 2, 5, 60 rate-codes: 0x43 1M, 0x42 2M, 0x41 5.5M, 0x40 11M, 0x3 6M, 0x7 9M, 0x2 12M, 0x6 18M, 0x1 24M, 0x5 36M, 0x0 48M, 0x4 54M, 0xFF default For example, to set beacon to 18Mbps on wlan0: echo "wlan0 beacon 2 0x6" > /debug//set_rates I'm not sure if 'beacon' also controls other mgt frames or not w/out reviewing the code. Thanks, Ben Best wishes David Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Management rate-control on IPQ4019
On 02/18/2020 11:12 AM, David Bauer wrote: Hello, while playing around with the 2.4GHz WiFi part of the IPQ4019, i was expecting being able to set the rate at which IPQ4019 transmits it's beacon frames. Using OpenWrt, setting "legacy_rates=0" on the radio leads to only advertising 802.11g speeds, however the beacons are still sent out at 1 Mbit/s. Using a QCA9984, the beacons are correctly sent out at the lowest 802.11g rate (6 Mbit/s). So i assume this is either a bug in the ath10k firmware or a hardware-shortcoming. Has anyone else experienced this bug and is it likely we'll see it fixed in a later firmware release? Hardware: IPQ4029 (Aruba AP-303) Firmware Version: 10.4-3.6-00140 / 10.4-3.5.3-00078 There are separate API for setting management frame rates. I forget exactly how upstream supports this, but maybe check debugfs? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/18/2019 12:05 AM, Justin Capella wrote: Don't mean to steal your thread here, but since it's being discussed-- is there something that can be done to provide more accurate/precise data? Use of the default is widespread so not a reason to hold back the patch imo, but with a proposed pcap-ng capture information block they would become more accessible and maybe there will be increased interest in real values. It would take some large effort up and down the stack, but we could potentially report the raw data for the secondary frequencies. Probably that is of so little use for the general user that it is not worth the effort. You could just uncomment the printk in my patch if you are curious, or perhaps add some debugfs API if you wanted to get at lots of data with run-time config change. Anyway to fill out IEEE80211_RADIOTAP_DBM_ANT{SIGNAL,NOISE}? Per-antenna rssi is already in wireshark capture for ath10k-ct. I'm pretty sure it is working in upstream ath10k too. I recall from another thread that there isn't currently periodic calibration but the floor could change with environment too. I don't think it is correct to say periodic calibration does not happen with ath10k. Maybe very old wave-1 firmware has some issues, but recent stuff appears to work. I do see reported noise floor changing on 9984. Thanks, Ben On Tue, Dec 17, 2019 at 8:05 PM Sebastian Gottschall wrote: Am 18.12.2019 um 03:37 schrieb Ben Greear: On 12/17/2019 06:12 PM, Sebastian Gottschall wrote: i dont know what you want to compare here. 1. you compare 2 different wifi chipsets. both have different sensititivy and overall output power spec 2. both have different amount of antenna chains. which does make a difference in input sensitivity 3. the patch ben made has no effect on qca9880 chipsets. it only takes effect on 10.4 based chipsets like 9984 The part of my patch that sums secondary frequencies should apply to wave-1 as well, but I have not verified that yet. yeah. right. sorry i was just looking at total signal sum which uses rssi_comb_ht about noise floors in general. noise floors of -108 are bogus. there is a physical limit a noise level can be. since drivers like ath9k are doing a cyclic calibration, the noise value might indeed change. but this calibration is not running in realtime. its cyclic. i'm not aware if chipsets like qca988x are going the same way, but since qca988x has sime similaries with ath9k chipsets unlike the newer 9984 variants, it could be. the 30 seconds mentioned in the bug report fits to my expectations of the early noisefloor calibration which has a short delay and after success turning to use a long delay. anyway. in this early calibration phase signals might change and will stabilize after. this isnt a issue since your connection will work anyway even if it might take a little bit longer if you have poor signal levels @ben. am i wrong or what do think? I don't know enough about how the noise floor calculations are done or how the apply to settings to know the answer. I will be happy in general if ath10k wave-1, wave-2, and ath9k report similar RSSI for similar setups. that will not work. you compare different chipsets and depending on the implementation by the card vendor rf sensitivity can be very diffent. the same goes for output power. some vendors are using additional rf amps for enhancing output power (ubiquiti is best example here). this these amps also may have influence to sensitivity. on these cards you set 10 db output power, but in fact it outputs 18 db. so there is a bias offset on these cards or devices. (the offset is depending on the device model) what you measure is what the chip receives, but not what was lost on the pcb layout. (or was even generated in case of noise) and when it comes to calibration data. correct would be if each individual card is calibrated before shipment. in reality manufactures are doing calibration on a single reference card and clone it on all following cards to save time. the result depends on day or week of production and current position of the moon and sun. errors of +- 2 db are common here. (this is not a fact for all card or device vendors) If you look at the tx-rate-power table in ath10k, for instance, you can see different MCS are transmitted at different signal levels. So, some change from initial conditions might be because higher MCS is being transmitted after rate-ctrl scales up? yes. this is modulation related. as higher the rate goes as lower the power will be. thats princible of QAM. and the rate control itself isnt signal but error rate based. so high packet loss triggers the rate control to lower the rate which results in increased output power and vice versa. but as mentioned. at card startup a noise floor calibration starts which may succeed or fail. if it succeeds it will turn into a long delay phase. so cyclic calibration. the calibration time is exactly 30 seconds (minimum
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/17/2019 06:12 PM, Sebastian Gottschall wrote: i dont know what you want to compare here. 1. you compare 2 different wifi chipsets. both have different sensititivy and overall output power spec 2. both have different amount of antenna chains. which does make a difference in input sensitivity 3. the patch ben made has no effect on qca9880 chipsets. it only takes effect on 10.4 based chipsets like 9984 The part of my patch that sums secondary frequencies should apply to wave-1 as well, but I have not verified that yet. about noise floors in general. noise floors of -108 are bogus. there is a physical limit a noise level can be. since drivers like ath9k are doing a cyclic calibration, the noise value might indeed change. but this calibration is not running in realtime. its cyclic. i'm not aware if chipsets like qca988x are going the same way, but since qca988x has sime similaries with ath9k chipsets unlike the newer 9984 variants, it could be. the 30 seconds mentioned in the bug report fits to my expectations of the early noisefloor calibration which has a short delay and after success turning to use a long delay. anyway. in this early calibration phase signals might change and will stabilize after. this isnt a issue since your connection will work anyway even if it might take a little bit longer if you have poor signal levels @ben. am i wrong or what do think? I don't know enough about how the noise floor calculations are done or how the apply to settings to know the answer. I will be happy in general if ath10k wave-1, wave-2, and ath9k report similar RSSI for similar setups. If you look at the tx-rate-power table in ath10k, for instance, you can see different MCS are transmitted at different signal levels. So, some change from initial conditions might be because higher MCS is being transmitted after rate-ctrl scales up? Lots of moving parts... Thanks, Ben Sebastian Am 18.12.2019 um 00:37 schrieb Tom Psyborg: also noticed now that the noise floor changes with signal strength as described in this bug report: https://www.mail-archive.com/ath10k@lists.infradead.org/msg11553.html after wifi restart iwinfo: signal: -59dBm noise: -108dBm then goes to signal: -52dBm noise: -103dBm and finally drops to signal: -59dBm noise: -103dBm -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/17/19 3:37 PM, Tom Psyborg wrote: also noticed now that the noise floor changes with signal strength as described in this bug report: https://www.mail-archive.com/ath10k@lists.infradead.org/msg11553.html after wifi restart iwinfo: signal: -59dBm noise: -108dBm then goes to signal: -52dBm noise: -103dBm and finally drops to signal: -59dBm noise: -103dBm The problem with debugging this sort of stuff is that you need an RF scope to determine whether signal power of transmitter is changing or receiver is reporting stuff weirdly. If you are comparing against ath9k, probably you need to force your ath10k station to do /n only (or change your AP to do /n only) so that you can be comparing similar MCS rates. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/17/19 10:29 AM, Tom Psyborg wrote: On 17/12/2019, Ben Greear wrote: On 12/17/19 8:23 AM, Justin Capella wrote: I believe someone recently submitted a patch that defined noise floors per band (2/5). I looked at using the real noise floor. Our radio was reporting a noise floor of around -102, where the hard-coded default is -95. This of course would make the reported RSSI lower by 7db in that case. I am not sure that is correct. Hi I am getting similar NF values with all my ath10k devices, I thought default was changed since ath9k from -95 to -115 just like in the vendor driver? There were some discussions about it on mailing list. On some channels (5Ghz) the value goes down to about -107, even saw -110 once. If you use ath9k and ath10k on same channel/environment, do you see similar RSSI reported (especially with the ath10k patch I just posted)? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/17/19 8:23 AM, Justin Capella wrote: I believe someone recently submitted a patch that defined noise floors per band (2/5). I looked at using the real noise floor. Our radio was reporting a noise floor of around -102, where the hard-coded default is -95. This of course would make the reported RSSI lower by 7db in that case. I am not sure that is correct. If this were to be implemented that way, then the firmware would have to be queried for the noise floor in a better way than it is currently done. So, I am not planning to work on that soon. Someone could post-process RSSI based on the reported noise floor if they want to adjust the values in user-space, for isntance. Can't say I'm a fan of the hacky code, in particular the if/else for min/max // maybe abs(a-b)? I like open coded stuff. I'm more concerned that maybe the math could be improved, but it seems to work pretty well in our testing. Either way, please comment inline so that it is more obvious exactly what code you are talking about. if (e40 != 0x80) { // whats this case about? 0x80 means 'value is not valid'. I can add a comment about that. Are there reasons to not use log? I don't want to use log in the rx path, it would very likely decrease rx performance, especially on lower powered systems. Thanks, Ben On Tue, Dec 17, 2019 at 7:59 AM Sebastian Gottschall wrote: currently debugging in your code, but i already have seen that the values are wrong now for this chipset Thanks for testing. I'll add a check for 0 and ignore that value too. That seem OK? i tested already the 0 check and it works Were the per-chain values OK? on 9984 i see no disadvantage so far. seem to work and the values look sane. i will do a side by side comparisation later this day on 9984 Thanks, Ben Am 16.12.2019 um 23:07 schrieb gree...@candelatech.com: From: Ben Greear This makes per-chain RSSI be more consistent between HT20, HT40, HT80. Instead of doing precise log math for adding dbm, I did a rough estimate, it seems to work good enough. Tested on ath10k-ct 9984 firmware. Signed-off-by: Ben Greear --- drivers/net/wireless/ath/ath10k/htt_rx.c | 64 --- drivers/net/wireless/ath/ath10k/rx_desc.h | 3 +- 2 files changed, 60 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index 13f652b622df..034d4ace228d 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -1167,6 +1167,44 @@ static bool ath10k_htt_rx_h_channel(struct ath10k *ar, return true; } +static int ath10k_sum_sigs_2(int a, int b) { +int diff; + +if (b == 0x80) +return a; + +if (a >= b) { +diff = a - b; +if (diff == 0) +return a + 3; +else if (diff == 1) +return a + 2; +else if (diff == 2) +return a + 1; +return a; +} +else { +diff = b - a; +if (diff == 0) +return b + 3; +else if (diff == 1) +return b + 2; +else if (diff == 2) +return b + 1; +return b; +} +} + +static int ath10k_sum_sigs(int p20, int e20, int e40, int e80) { +/* Hacky attempt at summing dbm without resorting to log(10) business */ +if (e40 != 0x80) { +return ath10k_sum_sigs_2(ath10k_sum_sigs_2(p20, e20), ath10k_sum_sigs_2(e40, e80)); +} +else { +return ath10k_sum_sigs_2(p20, e20); +} +} + static void ath10k_htt_rx_h_signal(struct ath10k *ar, struct ieee80211_rx_status *status, struct htt_rx_desc *rxd) @@ -1177,18 +1215,32 @@ static void ath10k_htt_rx_h_signal(struct ath10k *ar, status->chains &= ~BIT(i); if (rxd->ppdu_start.rssi_chains[i].pri20_mhz != 0x80) { -status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR + - rxd->ppdu_start.rssi_chains[i].pri20_mhz; +status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR ++ ath10k_sum_sigs(rxd->ppdu_start.rssi_chains[i].pri20_mhz, + rxd->ppdu_start.rssi_chains[i].ext20_mhz, + rxd->ppdu_start.rssi_chains[i].ext40_mhz, + rxd->ppdu_start.rssi_chains[i].ext80_mhz); +//ath10k_warn(ar, "rx-h-sig, chain[%i] pri20: %d ext20: %d ext40: %d ext80: %d\n", +//i, rxd->ppdu_start.rssi_chains[i].pri20_mhz, rxd->ppdu_start.rssi_chains[i].ext20_mhz, +// rxd->ppdu_start.rssi_chains[i].ext40_mhz, rxd->ppdu_start.rssi_chains[i].ext80_mhz); status->chains |= BIT(i); } } /* FIXME: Get real NF */ -status->signal = ATH10K_DEFAULT_NOISE_FLOOR + - rxd->ppdu_start.rssi_comb; -/* ath10k_warn(ar, "rx-h-sig, signal: %d chains: 0x%x chain[0]: %d chain[1]: %d chan[2]: %d\n", -
Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels
On 12/17/2019 04:32 AM, Sebastian Gottschall wrote: result of my tests on qca988x rxd->ppdu_start.rssi_comb_ht is always zero. so you need to add a additional check Am 17.12.2019 um 13:02 schrieb Sebastian Gottschall: i see a issue in your patch for qca988x chipsets +if (rxd->ppdu_start.rssi_comb_ht != 0x80) { +status->signal = ATH10K_DEFAULT_NOISE_FLOOR + +rxd->ppdu_start.rssi_comb_ht; +} this is always true for qca988x, but the field is not provided on these older chipsets. so signal reporting will be broken i'm currently debugging in your code, but i already have seen that the values are wrong now for this chipset Thanks for testing. I'll add a check for 0 and ignore that value too. That seem OK? Were the per-chain values OK? Thanks, Ben Am 16.12.2019 um 23:07 schrieb gree...@candelatech.com: From: Ben Greear This makes per-chain RSSI be more consistent between HT20, HT40, HT80. Instead of doing precise log math for adding dbm, I did a rough estimate, it seems to work good enough. Tested on ath10k-ct 9984 firmware. Signed-off-by: Ben Greear --- drivers/net/wireless/ath/ath10k/htt_rx.c | 64 --- drivers/net/wireless/ath/ath10k/rx_desc.h | 3 +- 2 files changed, 60 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index 13f652b622df..034d4ace228d 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -1167,6 +1167,44 @@ static bool ath10k_htt_rx_h_channel(struct ath10k *ar, return true; } +static int ath10k_sum_sigs_2(int a, int b) { +int diff; + +if (b == 0x80) +return a; + +if (a >= b) { +diff = a - b; +if (diff == 0) +return a + 3; +else if (diff == 1) +return a + 2; +else if (diff == 2) +return a + 1; +return a; +} +else { +diff = b - a; +if (diff == 0) +return b + 3; +else if (diff == 1) +return b + 2; +else if (diff == 2) +return b + 1; +return b; +} +} + +static int ath10k_sum_sigs(int p20, int e20, int e40, int e80) { +/* Hacky attempt at summing dbm without resorting to log(10) business */ +if (e40 != 0x80) { +return ath10k_sum_sigs_2(ath10k_sum_sigs_2(p20, e20), ath10k_sum_sigs_2(e40, e80)); +} +else { +return ath10k_sum_sigs_2(p20, e20); +} +} + static void ath10k_htt_rx_h_signal(struct ath10k *ar, struct ieee80211_rx_status *status, struct htt_rx_desc *rxd) @@ -1177,18 +1215,32 @@ static void ath10k_htt_rx_h_signal(struct ath10k *ar, status->chains &= ~BIT(i); if (rxd->ppdu_start.rssi_chains[i].pri20_mhz != 0x80) { -status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR + -rxd->ppdu_start.rssi_chains[i].pri20_mhz; +status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR ++ ath10k_sum_sigs(rxd->ppdu_start.rssi_chains[i].pri20_mhz, + rxd->ppdu_start.rssi_chains[i].ext20_mhz, + rxd->ppdu_start.rssi_chains[i].ext40_mhz, + rxd->ppdu_start.rssi_chains[i].ext80_mhz); +//ath10k_warn(ar, "rx-h-sig, chain[%i] pri20: %d ext20: %d ext40: %d ext80: %d\n", +//i, rxd->ppdu_start.rssi_chains[i].pri20_mhz, rxd->ppdu_start.rssi_chains[i].ext20_mhz, +// rxd->ppdu_start.rssi_chains[i].ext40_mhz, rxd->ppdu_start.rssi_chains[i].ext80_mhz); status->chains |= BIT(i); } } /* FIXME: Get real NF */ -status->signal = ATH10K_DEFAULT_NOISE_FLOOR + - rxd->ppdu_start.rssi_comb; -/* ath10k_warn(ar, "rx-h-sig, signal: %d chains: 0x%x chain[0]: %d chain[1]: %d chan[2]: %d\n", - status->signal, status->chains, status->chain_signal[0], status->chain_signal[1], status->chain_signal[2]); */ +if (rxd->ppdu_start.rssi_comb_ht != 0x80) { +status->signal = ATH10K_DEFAULT_NOISE_FLOOR + +rxd->ppdu_start.rssi_comb_ht; +} +else { +status->signal = ATH10K_DEFAULT_NOISE_FLOOR + +rxd->ppdu_start.rssi_comb; +} + +//ath10k_warn(ar, "rx-h-sig, signal: %d chains: 0x%x chain[0]: %d chain[1]: %d chain[2]: %d chain[3]: %d\n", +//status->signal, status->chains, status->chain_signal[0], +//status->chain_signal[1], status->chain_signal[2], status->chain_signal[3]); status->flag &= ~RX_FLAG_NO_SIGNAL_VAL; } diff --git a/drivers/net/wireless/ath/ath10k/rx_desc.h b/drivers/net/wireless/ath/ath10k/rx_desc.h index dec1582005b9..6b44677474dd 100644 --- a/drivers/net/wireless/ath/ath10k/rx_desc.h +++ b/drivers/net/wirele
Re: [PATCH] ath10k: Fix setting txpower to zero.
On 12/12/19 9:14 AM, gree...@candelatech.com wrote: From: Ben Greear Do not ignore 0 txpower setting unless the vif is of type p2p. My patch has problems I think: secondary stations also have un-init txpower when they are first built and start scanning. So, I'm going to try setting txpower to -1 in mac80211 and use that to mean 'unset'. Thanks, Ben This should fix regression in: commit 88407beb1b1462f706a1950a355fd086e1c450b6 Author: Ryan Hsu Date: Tue Dec 13 14:55:19 2016 -0800 ath10k: fix incorrect txpower set by P2P_DEVICE interface Tested (without p2p in use) on 9984 with ath10k-ct firmware, but I don't think this is firmware specific. Signed-off-by: Ben Greear --- drivers/net/wireless/ath/ath10k/mac.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 289d03da14b2..1c5e1b5570f8 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -5902,11 +5902,18 @@ static int ath10k_mac_txpower_recalc(struct ath10k *ar) { struct ath10k_vif *arvif; int ret, txpower = -1; + int p2p_st; + + p2p_st = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_P2P_DEVICE); lockdep_assert_held(>conf_mutex); list_for_each_entry(arvif, >arvifs, list) { - if (arvif->txpower <= 0) + /* p2p may not initialize txpower, and we should ignore it +* in that case. +*/ + if ((arvif->txpower < 0) || + ((arvif->txpower == 0) && (arvif->vdev_subtype == p2p_st))) continue; if (txpower == -1) -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: set WMI_PEER_AUTHORIZE after a firmware crash
On 12/1/19 8:45 PM, Justin Capella wrote: Are there security concerns here? Was the peer known to be authorized beforehand? Would it be better to just trash the peer in the event of a fw crash? I think you should completely re-associate the peer(s) when firmware crashes. The driver does not cache all possible changes, so it cannot exactly rebuild the config to the previous state. Thanks, Ben On Thu, Nov 28, 2019 at 11:46 PM Kalle Valo wrote: Wen Gong wrote: After the firmware crashes ath10k recovers via ieee80211_reconfig(), which eventually leads to firmware configuration and including the encryption keys. However, because there is no new auth/assoc and 4-way-handshake, and firmware set the authorize flag after 4-way-handshake, so the authorize flag in firmware is not set in firmware without 4-way-handshake. This will lead to a failure of data transmission after recovery done when using encrypted connections like WPA-PSK. Set authorize flag after installing keys to firmware will fix the issue. This was noticed by testing firmware crashing using simulate_fw_crash debugfs file. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-7-QCARMSWP-1. Signed-off-by: Wen Gong Signed-off-by: Kalle Valo Patch applied to ath-next branch of ath.git, thanks. 382e51c139ef ath10k: set WMI_PEER_AUTHORIZE after a firmware crash -- https://patchwork.kernel.org/patch/11263357/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH net-next] ath10k: fix RX of frames with broken FCS in monitor mode
On 11/07/2019 06:03 AM, Linus Lüssing wrote: On Tue, Nov 05, 2019 at 09:19:20AM -0800, Ben Greear wrote: Thanks for adding the counter. Since it us u32, I doubt you need the spin lock below? Ok, I can remove the spin-lock. Just for clarification though, if I recall correctly then an increment operator is not guaranteed to work atomically. But you think it's unlikely to race with a concurrent ++ and therefore it's fine for just a debug counter? (and if it were racing, it'd just be a missed +1) I think it is fine to be off-by-one, and u32 is atomic so you would never read a really weird number, like you can if u64 is non-atomically being incremented. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH net-next] ath10k: fix RX of frames with broken FCS in monitor mode
On 11/5/19 8:49 AM, Linus Lüssing wrote: From: Linus Lüssing So far, frames were forwarded regardless of the FCS correctness leading to userspace applications listening on the monitor mode interface to receive potentially broken frames, even with the "fcsfail" flag unset. By default, with the "fcsfail" flag of a monitor mode interface unset, frames with FCS errors should be dropped. With this patch, the fcsfail flag is taken into account correctly. Cc: Simon Wunderlich Signed-off-by: Linus Lüssing --- This was tested on an Open Mesh A41 device, featuring a QCA4019. And with this firmware: https://www.candelatech.com/downloads/ath10k-4019-10-4b/firmware-5-ct-full-community-12.bin-lede.011 But from looking at the code it seems that the vanilla ath10k has the same issue, therefore submitting it here. Changelog RFC->v1: * removed "ar->monitor" check * added a debug counter Thanks for adding the counter. Since it us u32, I doubt you need the spin lock below? --Ben + if (!(ar->filter_flags & FIF_FCSFAIL) && + status->flag & RX_FLAG_FAILED_FCS_CRC) { + spin_lock_bh(>data_lock); + ar->stats.rx_crc_err_drop++; + spin_unlock_bh(>data_lock); + + dev_kfree_skb_any(skb); + return; + } + ath10k_dbg(ar, ATH10K_DBG_DATA, "rx skb %pK len %u peer %pM %s %s sn %u %s%s%s%s%s%s %srate_idx %u vht_nss %u freq %u band %u flag 0x%x fcs-err %i mic-err %i amsdu-more %i\n", skb, -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC] ath10k: interface combination with monitor
On 11/1/19 10:03 AM, Tom Psyborg wrote: Hi Is there a way to run monitor mode interface independent on STA/AP interface presence or their state? I am using airodump-ng/airmon-ng and I've noticed that while mon interface is brought up airodump-ng is unable to find any beacons unless sta interface is brought down. That is with QCA9880 devices, while with QCA9377 airodump-ng only finds beacons if the sta interface is associated to an AP. Does this need firmware change to work or driver changes are sufficient? I would expect it to work. Have you tried -ct firmware on 9880 in this manner? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC PATCH] ath10k: fix RX of frames with broken FCS in monitor mode
On 11/1/19 4:11 AM, Linus Lüssing wrote: From: Linus Lüssing So far, frames were forwarded regardless of the FCS correctness leading to userspace applications listening on the monitor mode interface to receive potentially broken frames, even with the "fcsfail" flag unset. By default, with the "fcsfail" flag of a monitor mode interface unset, frames with FCS errors should be dropped. With this patch, the fcsfail flag is taken into account correctly. Signed-off-by: Linus Lüssing --- This was tested on an Open Mesh A41 device, featuring a QCA4019. And with this firmware: https://www.candelatech.com/downloads/ath10k-4019-10-4b/firmware-5-ct-full-community-12.bin-lede.011 But from looking at the code it seems that the vanilla ath10k has the same issue, therefore submitting it here. I'm also not that familiar with the ath10k code yet. So not 100% sure if it's the right place for this check. Therefore sending as RFC. --- drivers/net/wireless/ath/ath10k/htt_rx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index 53f1095de8ff..ce0a16ebb8bb 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -1285,6 +1285,12 @@ static void ath10k_process_rx(struct ath10k *ar, struct sk_buff *skb) status = IEEE80211_SKB_RXCB(skb); + if (ar->monitor && !(ar->filter_flags & FIF_FCSFAIL) && + status->flag & RX_FLAG_FAILED_FCS_CRC) { + dev_kfree_skb_any(skb); + return; + } Maybe worth adding a counter like 'rx_drop_crc' to the ath10k_debug struct and increment it here and also show in debugfs and/or ethtool stats? And, maybe no check for ar->monitor, in case somehow the frame is still received with bad CRC even without monitor mode? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2())
On 10/20/2019 08:12 AM, Tomislav Požega wrote: -11 is -EAGAIN which would mean that the HTC credits have run out some reason for the WMI command: if (ep->tx_credits < credits) { ath10k_dbg(ar, ATH10K_DBG_HTC, "htc insufficient credits ep %d required %d available %d\n", eid, credits, ep->tx_credits); spin_unlock_bh(>tx_lock); ret = -EAGAIN; goto err_pull; } Credits can run out, for example, if there's a lot of WMI command/event activity and are not returned during the 3s wait, firmware crashed or problems with the PCI bus. Hi Can this occur if the target memory is not properly allocated? I have only seen this on wave-1 cards, and it is usually paired with situations where the wave-1 stops doing WMI related interrupts properly as best as I can understand. If I force the firmware to poll instead of waiting for irqs, then WMI communication will work for a while...I have not implemented that on the driver side though, so I still see these WMI timeout issues. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est
On 10/18/2019 05:35 AM, Johannes Berg wrote: On Fri, 2019-10-18 at 12:15 +0200, Toke Høiland-Jørgensen wrote: Kan Yan writes: The "tx_time_est" field, shared by control and status, is not able to survive until the skb returns to the mac80211 layer in some architectures. The same space is defined as driver_data and some wireless drivers use it for other purposes, as the cb in the sk_buff is free to be used by any layer. In the case of ath10k, the tx_time_est get clobbered by struct ath10k_skb_cb { dma_addr_t paddr; u8 flags; u8 eid; u16 msdu_id; u16 airtime_est; struct ieee80211_vif *vif; struct ieee80211_txq *txq; } __packed; Ah, bugger, of course the driver that actually needs this is using the full driver_data space :P Looks like you could shrink *this* fairly easily though. E.g. most likely vif == txq->vif unless txq==NULL, so it's down to 22 bytes plus a bit/flag for knowing whether the pointer is a vif directly (if no TXQ) or a TXQ. And of course you get two bits in every pointer (0x3) and likely the dma addr too. Plenty of space! Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On 09/18/2019 01:46 AM, Sven Eckelmann wrote: On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote: [...] So whatever the firmware does when it gets a WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR - it is not a CLEAR after read. And they also don't simply wrap around but there all values have to get some kind of "fix" like the active time one shown in ath10k_hw_fill_survey_time. Just that the actual "fixes" for them are unknown. To me it looks like firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with the overflow of total. But the tx and rx_bss are actually cleared. Other than that, the counters are wrapping every ~14-30 seconds. So we also need also some worker for ath10k which every couple of seconds requests new values for all the channel from the firmware. Which already sounds problematic because I get "ath10k_pci :00:00.0: bss channelsurvey timed out" all the time when requesting surveys manually. I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as expected on them. So the change I posted earlier (with a minor fix for ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can be seen in https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706 (after the reboot at 10:15 UTC+2) So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for the wave-1 cards) is still broken and we need a QCA firmware engineer to fix it. Or to work around it by polling every couple of seconds and manually do the cleanup of the values from the firmware. Have you tried probing very fast, like every 100ms, to see if returned values look sane? I seem to recall that there was some firmware issue with this, like it only updates internal counters every second or so. Polling slow would have the same off-by-a-second's-worth-of-data, but you would not easily notice it at slower polling intervals. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] cfg80211: Add cumulative channel survey dump support.
On 9/17/19 10:27 AM, Sven Eckelmann wrote: On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote: I will sent next version of patch with updated commit log. Can you please point me to the second version? Btw. I've just checked the minimal changes in ath10k to get this working. It seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset of ar->survey[idx]. But right now the total time looks (especially) wrong to me. At least it is rather unlikely that I can have around 30 second active time delta in roughly 1 real world second. Maybe a bug with the READ_CLEAR handling in firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about that at the end. @Ben: Was this also what you've experience in the past with the 10.2 firmware bss_chan_info counter bugs or am I just misusing the functionality of the firmware? Last I recall, the upstream code had several bugs. Maybe some QCA firmware person can let you know if they fixed the upstream firmware. If you want to test against ath10k-ct driver/firmware, and if you still see bogus values, then I can debug and fix it. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] ath10k: restore QCA9880-AR1A (v1) detection
On 09/10/2019 06:51 AM, Sebastian Gottschall wrote: the tplink archer c7 v1 indeed has this hw 1 version. but thats the only device i know comming with this chipset version but the v1 has also a minipcie slot and is not soldered like all other revisions. so i just replaced the card on my test device. in addition we may ask ben grear if he is able to provide a v1 firmware from his ct tree since the qca sourcecodes do contain support for the v1 revision. but dont expect too much. there was a reason why v1 was never really on the market Hello, I don't think I can even build a v1 firmware if I wanted to, and I'd much rather work on newer chips. That v1 was an unstable wreck from the beginning, at least with open-source driver. Thanks, Ben Am 10.09.2019 um 14:59 schrieb Tom Psyborg: On 10/09/2019, Kalle Valo wrote: (dropping stable list) Tom Psyborg writes: According to this very old post http://lists.infradead.org/pipermail/ath10k/2013-July/21.html seems like you've been misinformed on amount of these cards that were put out in the market. At least digipart only have >4 units in stocks https://www.digipart.com/part/QCA9880-AR1A and other retailers probably few thousands more. With that large amount of cards I think it is justified to request firmware support for the chip. And probably a lot easier to make few firmware modifications than go hacking a bunch of API calls so it works with v2 firmware. I'm very surprised that QCA9880 hw1.0 boards are still available, after six years. Did you confirm that it really is hw1.0 and not just some mixup with hardware ids or something like that? Print on the chip clearly says QCA9880-AR1A. ID same as for v2 - 003C. old hw1.0 firmware to see if it works. I don't know which fw blob version that is. I could not find it online. All files are v2 related. But if it's really is hw1.0 I doubt there will be any support for that. I recommend to avoid hw1.0 altogether. -- Kalle Valo That would be too bad, even worse when you find out that qca-wifi-10.2.4.58.1 driver fails to load firmware too. The only one that works is qca-wifi that comes with tp-link firmware, some very early version 10.0.108 or somtehing like that that has no available sources.. ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"
Looks like it should work. Why is this rotting in patchwork? Thanks Ben On 08/21/2019 02:12 PM, Rakesh Pillai wrote: Hi Ben, Can you please check https://patchwork.kernel.org/patch/10844513/ ? This change fixes the below mentioned regression. A different structure is made for tlv specific event handling. Thanks, Rakesh Pillai. On 2019-08-21 14:06, Ben Greear wrote: On 08/21/2019 01:56 PM, Ben Greear wrote: Hello, I just noticed in 5.2.7+ kernel than this commit below appears to break WMI message for my 10.1 firmware, and based on code inspection, 10.2 will be broken as well. 10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number after that, but which is not chan_tx_pwr_range. I guess you need to create your own wmi msg for the WCN3990. The change to 10.4 chan_info event is also wrong for my relatively new version of 10.4 code, so likely breaks firmware in use. last member in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;' Sorry, I mis-read this 10.4 part of the patch, it was not changing the wmi event itself, so probably that part is fine. Thanks, Ben commit 13104929d2ec32aec0552007d55b9e15bc07176b Author: Rakesh Pillai Date: Wed Oct 17 16:50:03 2018 +0530 ath10k: fill the channel survey results for WCN3990 correctly diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h index 4971d61..58e33ab 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.h +++ b/drivers/net/wireless/ath/ath10k/wmi.h @@ -6442,6 +6442,14 @@ struct wmi_chan_info_event { __le32 noise_floor; __le32 rx_clear_count; __le32 cycle_count; + __le32 chan_tx_pwr_range; + __le32 chan_tx_pwr_tp; + __le32 rx_frame_count; + __le32 my_bss_rx_cycle_count; + __le32 rx_11b_mode_data_duration; + __le32 tx_frame_cnt; + __le32 mac_clk_mhz; + } __packed; Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"
On 08/21/2019 01:56 PM, Ben Greear wrote: Hello, I just noticed in 5.2.7+ kernel than this commit below appears to break WMI message for my 10.1 firmware, and based on code inspection, 10.2 will be broken as well. 10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number after that, but which is not chan_tx_pwr_range. I guess you need to create your own wmi msg for the WCN3990. The change to 10.4 chan_info event is also wrong for my relatively new version of 10.4 code, so likely breaks firmware in use. last member in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;' Sorry, I mis-read this 10.4 part of the patch, it was not changing the wmi event itself, so probably that part is fine. Thanks, Ben commit 13104929d2ec32aec0552007d55b9e15bc07176b Author: Rakesh Pillai Date: Wed Oct 17 16:50:03 2018 +0530 ath10k: fill the channel survey results for WCN3990 correctly diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h index 4971d61..58e33ab 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.h +++ b/drivers/net/wireless/ath/ath10k/wmi.h @@ -6442,6 +6442,14 @@ struct wmi_chan_info_event { __le32 noise_floor; __le32 rx_clear_count; __le32 cycle_count; + __le32 chan_tx_pwr_range; + __le32 chan_tx_pwr_tp; + __le32 rx_frame_count; + __le32 my_bss_rx_cycle_count; + __le32 rx_11b_mode_data_duration; + __le32 tx_frame_cnt; + __le32 mac_clk_mhz; + } __packed; Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"
Hello, I just noticed in 5.2.7+ kernel than this commit below appears to break WMI message for my 10.1 firmware, and based on code inspection, 10.2 will be broken as well. 10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number after that, but which is not chan_tx_pwr_range. I guess you need to create your own wmi msg for the WCN3990. The change to 10.4 chan_info event is also wrong for my relatively new version of 10.4 code, so likely breaks firmware in use. last member in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;' commit 13104929d2ec32aec0552007d55b9e15bc07176b Author: Rakesh Pillai Date: Wed Oct 17 16:50:03 2018 +0530 ath10k: fill the channel survey results for WCN3990 correctly diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h index 4971d61..58e33ab 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.h +++ b/drivers/net/wireless/ath/ath10k/wmi.h @@ -6442,6 +6442,14 @@ struct wmi_chan_info_event { __le32 noise_floor; __le32 rx_clear_count; __le32 cycle_count; + __le32 chan_tx_pwr_range; + __le32 chan_tx_pwr_tp; + __le32 rx_frame_count; + __le32 my_bss_rx_cycle_count; + __le32 rx_11b_mode_data_duration; + __le32 tx_frame_cnt; + __le32 mac_clk_mhz; + } __packed; Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] ath10k: add mic bytes for pmf management packet
On 6/17/19 12:37 AM, Wen Gong wrote: For PMF case, the action,deauth,disassoc management need to encrypt by hardware, it need to reserve 8 bytes for encryption, otherwise the packet will be sent out with error format, then PMF case will fail. After add the 8 bytes, it will pass the PMF case. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-5-QCARMSWP-1. Signed-off-by: Wen Gong --- drivers/net/wireless/ath/ath10k/htt_tx.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c index d8e9cc0..7bef9d9 100644 --- a/drivers/net/wireless/ath/ath10k/htt_tx.c +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c @@ -1236,6 +1236,7 @@ static int ath10k_htt_tx_hl(struct ath10k_htt *htt, enum ath10k_hw_txrx_mode txm struct ath10k *ar = htt->ar; int res, data_len; struct htt_cmd_hdr *cmd_hdr; + struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)msdu->data; struct htt_data_tx_desc *tx_desc; struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(msdu); struct sk_buff *tmp_skb; @@ -1245,6 +1246,13 @@ static int ath10k_htt_tx_hl(struct ath10k_htt *htt, enum ath10k_hw_txrx_mode txm u8 flags0 = 0; u16 flags1 = 0; + if ((ieee80211_is_action(hdr->frame_control) || +ieee80211_is_deauth(hdr->frame_control) || +ieee80211_is_disassoc(hdr->frame_control)) && +ieee80211_has_protected(hdr->frame_control)) { + skb_put(msdu, IEEE80211_CCMP_MIC_LEN); + } I was looking at mac80211 code recently, and it seems some action frames are NOT supposed to be protected. I added my own helper method to my local ath10k. Maybe you want to use this? /* Copied from ieee80211_is_robust_mgmt_frame, but disable the check for has_protected * since we do tx hw crypt, and it won't actually be encrypted even when this flag is * set. */ bool ieee80211_is_robust_mgmt_frame_tx(struct ieee80211_hdr *hdr) { if (ieee80211_is_disassoc(hdr->frame_control) || ieee80211_is_deauth(hdr->frame_control)) return true; if (ieee80211_is_action(hdr->frame_control)) { u8 *category; /* * Action frames, excluding Public Action frames, are Robust * Management Frames. However, if we are looking at a Protected * frame, skip the check since the data may be encrypted and * the frame has already been found to be a Robust Management * Frame (by the other end). */ /* if (ieee80211_has_protected(hdr->frame_control)) return true; */ category = ((u8 *) hdr) + 24; return *category != WLAN_CATEGORY_PUBLIC && *category != WLAN_CATEGORY_HT && *category != WLAN_CATEGORY_WNM_UNPROTECTED && *category != WLAN_CATEGORY_SELF_PROTECTED && *category != WLAN_CATEGORY_UNPROT_DMG && *category != WLAN_CATEGORY_VHT && *category != WLAN_CATEGORY_VENDOR_SPECIFIC; } return false; } Thanks, Ben + data_len = msdu->len; switch (txmode) { -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] ath10k: fix max antenna gain unit
On 6/11/19 5:19 AM, Sven Eckelmann wrote: From: Sven Eckelmann Most of the txpower for the ath10k firmware is stored as twicepower (0.5 dB steps). This isn't the case for max_antenna_gain - which is still expected by the firmware as dB. The firmware is converting it from dB to the internal (twicepower) representation when it calculates the limits of a channel. This can be seen in tpc_stats when configuring "12" as max_antenna_gain. Instead of the expected 12 (6 dB), the tpc_stats shows 24 (12 dB). Tested on QCA9888 and IPQ4019 with firmware 10.4-3.5.3-00057. I did a visual inspection of wave-1 firmware source and it appears this change would be correct for it as well. I would also suggest updating the comments in the wmi.h structure to document the units. Thanks, Ben Fixes: 02256930d9b8 ("ath10k: use proper tx power unit") Signed-off-by: Sven Eckelmann --- Cc: Michal Kazior drivers/net/wireless/ath/ath10k/mac.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 9c703d287333..35d026a2772a 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -1008,7 +1008,7 @@ static int ath10k_monitor_vdev_start(struct ath10k *ar, int vdev_id) arg.channel.min_power = 0; arg.channel.max_power = channel->max_power * 2; arg.channel.max_reg_power = channel->max_reg_power * 2; - arg.channel.max_antenna_gain = channel->max_antenna_gain * 2; + arg.channel.max_antenna_gain = channel->max_antenna_gain; reinit_completion(>vdev_setup_done); @@ -1450,7 +1450,7 @@ static int ath10k_vdev_start_restart(struct ath10k_vif *arvif, arg.channel.min_power = 0; arg.channel.max_power = chandef->chan->max_power * 2; arg.channel.max_reg_power = chandef->chan->max_reg_power * 2; - arg.channel.max_antenna_gain = chandef->chan->max_antenna_gain * 2; + arg.channel.max_antenna_gain = chandef->chan->max_antenna_gain; if (arvif->vdev_type == WMI_VDEV_TYPE_AP) { arg.ssid = arvif->u.ap.ssid; @@ -3105,7 +3105,7 @@ static int ath10k_update_channel_list(struct ath10k *ar) ch->min_power = 0; ch->max_power = channel->max_power * 2; ch->max_reg_power = channel->max_reg_power * 2; - ch->max_antenna_gain = channel->max_antenna_gain * 2; + ch->max_antenna_gain = channel->max_antenna_gain; ch->reg_class_id = 0; /* FIXME */ /* FIXME: why use only legacy modes, why not any -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: Problem with 9984 in routed mode with 512b frames.
On 5/20/19 12:25 PM, Adrian Chadd wrote: On Mon, 20 May 2019 at 09:59, Sebastian Gottschall wrote: the curious thing is still that the fallback code applies only for 2.4 ghz so it would never have affected 802.11ac Hm, does RC fall back to 11na or 11a rates when doing 11ac? (in 5G mode.) It's good to know fixing that would fix it in 2.4GHz operation but yeah, I wonder about RC in 5G. It appears the rate-ctrl tries to fall to CCK 2Mbps or 1Mbps and skips a/g rates. /n rates are a subset of VHT, so those are used as part of normal VHT rate-ctrl. I have no explanation for why I saw the tx-hang in stock-ish firmware, which indeed should not have tried to use any a/g rates in 5Ghz. The high-level failure looked exactly like what I eventually debugged as falling back to /a rates in my firmware, for what that is worth. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 5/17/19 8:47 AM, Adrian Chadd wrote: On Fri, 17 May 2019 at 08:06, Sebastian Gottschall wrote: personally i think going back to basic rates like 2 mbit makes no sense anyway. its that dead slow that a connection must break and has to be broken if this doesnt work. still a shame that beacons are still transmitted in this way and multicast/broadcast packets as well which causes a hell of problems. but thats for backward compatibility of cause It depends on what kind of channel you are. Not everyone is deploying super dense enterprise APs. :-) The 11ac and 11ax chips that do constant frequency readjusting work better in things like moving drones, where you have constant doppler shift. I know some people doing drone work that just don't bother with MCS and aggregation because they need a super reliable channel and the conditions constantly shift. That said, they're very sad that they can't hack on the 11ac/11ax firmware to fix some err, less optimal decisions in their use case space like they can with ath5k/ath9k. ISTR back at QCA days there were some people on the systems team that could demonstrate CCK was more stable in some use cases and so didn't like the Linux rate control behaviour of not falling back to CCK in 2G 11n mode. There was .. pushback against the linux upstream rate control in this respect right until the linux folk totally deprecated the QCA rate control in ath9k. :) (And then bugs like what ben is seeing :) Ben - did disabling CCK/OFDM fallback rates help? Did you fix the bit that tries to send AMPDU frames in non-11n rates? Yes, disabling the fallback appears to have fixed my issue. I did not try to fix the fallback code because I think it will be quite complicated to do it properly (I suspect a different tid must be used for this to work). I'm not even entirely sure of exactly why the transmit logic fails in this case, and by the time rate-ctrl logic is queried, I think it is too late to easily change tids. And FYI, in my firmware/driver, you can now specify the exact preamble-type, mcs, bandwidth, txpower, retransmit count, etc on a per packet basis. I'm not sure of all the bugs and limitations in this code, but it at least mostly works as hoped for the ways we are using it (rx sensitivity test rigs, etc). Might be of interest if someone wants to do a somewhat limited user-space rate-ctrl for ath10k wave-2. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 05/16/2019 09:21 PM, Sebastian Gottschall wrote: Am 16.05.2019 um 21:40 schrieb Ben Greear: On 5/15/19 6:00 AM, Ben Greear wrote: On 5/15/19 5:26 AM, Sebastian Gottschall wrote: Am 15.05.2019 um 14:20 schrieb Ben Greear: On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? thats the plan. can you do a test with iperf3 to see if its reproduceable. i mean i will test it on ipq based boards and x64. but to make sure that the scenario is identical which raised up your issue, it would be find if we have identical software for testing including the same options I think I found the issue. The rate-ctrl logic in the firmware allows a transition from HT/VHT 20 MCS0 down to OFDM rates. It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz and a VHT rate for 80Mhz (or maybe just the single OFDM rate is the fault). If you can edit firmware, then setting this to 0 probably fixes the issue. g_rc_cck_rate_allowed according to the code this variable has only effect on 2.4 ghz. the fallback to cck rates will only be done if phymode is 2.4 ghz Ok, maybe the symptom I saw with stock-ish firmware was due to some other cause. In my firmware, I had "fixed" that cck-fallback to use OFDM rates in case CCK was not available, so mine was definitely trying to use an OFDM rate. That said, very likely the same bug exists in upstream QCA firmware for 2.4Ghz radios where CCK is available, so still might be worth fixing or at least adding API to let the user disable the fallback in case strange problems are seen. I am guessing that if it really wants to send OFDM/CCK rates, then it will have to use a different TID that is not set up for AMPDUs, and the current code does not deal with that as far as I can tell. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 5/16/19 1:16 PM, Adrian Chadd wrote: You can't do AMPDU with OFDM/CCK. If they're setting the AMPDU bit then that's wrong. it needs to be individual MPDU/PPDUs. There's a benefit for CCK. OFDM 6M is I think roughly the same as OFDM MCS0. But CCK is a lot more reliable. 5Ghz can (should) not do CCK anyway. Do you have any reference for why you think CCK will be better? The one I found shows otherwise: https://d2cpnw0u24fjm4.cloudfront.net/wp-content/uploads/LaminatedCard_RevolutionWiFiMCStoSNRSinglePage.png Thanks, Ben -adrian On Thu, 16 May 2019 at 13:10, Ben Greear wrote: On 5/16/19 12:55 PM, Adrian Chadd wrote: You can totally go down to OFDM yeah but you then need to send it at 20MHz and non-AMPDU. Is it maybe the retry code + rate control code is retagging an AMPDU at a lower rate and it's transitioning down to CCK/OFDM without breaking the AMPDU apart? It was sending a one-frame AMPDU, and one frame AMSDU for that matter. Maybe there is some bit in the tx descriptor that needed to be twiddled as well to make OFDM able to work, but I don't know what that would be. Is there any advantage of (any) OFDM over MCS0 HT 20Mhz as far as range or SNR goes? The chart I found made it look like there was not, and if not, then why bother at all with OFDM if peer advertises HT/VHT rates? Thanks, Ben -a On Thu, 16 May 2019 at 12:40, Ben Greear wrote: On 5/15/19 6:00 AM, Ben Greear wrote: On 5/15/19 5:26 AM, Sebastian Gottschall wrote: Am 15.05.2019 um 14:20 schrieb Ben Greear: On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? thats the plan. can you do a test with iperf3 to see if its reproduceable. i mean i will test it on ipq based boards and x64. but to make sure that the scenario is identical which raised up your issue, it would be find if we have identical software for testing including the same options I think I found the issue. The rate-ctrl logic in the firmware allows a transition from HT/VHT 20 MCS0 down to OFDM rates. It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz and a VHT rate for 80Mhz (or maybe just the single OFDM rate is the fault). If you can edit firmware, then setting this to 0 probably fixes the issue. g_rc_cck_rate_allowed I think to reproduce you'd need to send high speed traffic in a situation where the RF environment is going to make rate-ctrl fail quite a bit. (Slow speed should work too, but it would likely take a lot longer). And, it is always possible that whatever I saw when testing mostly-stock FW is different from what I eventually debugged to in my firmware. Still, from looking at MCS vs SNR charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 5/16/19 12:55 PM, Adrian Chadd wrote: You can totally go down to OFDM yeah but you then need to send it at 20MHz and non-AMPDU. Is it maybe the retry code + rate control code is retagging an AMPDU at a lower rate and it's transitioning down to CCK/OFDM without breaking the AMPDU apart? It was sending a one-frame AMPDU, and one frame AMSDU for that matter. Maybe there is some bit in the tx descriptor that needed to be twiddled as well to make OFDM able to work, but I don't know what that would be. Is there any advantage of (any) OFDM over MCS0 HT 20Mhz as far as range or SNR goes? The chart I found made it look like there was not, and if not, then why bother at all with OFDM if peer advertises HT/VHT rates? Thanks, Ben -a On Thu, 16 May 2019 at 12:40, Ben Greear wrote: On 5/15/19 6:00 AM, Ben Greear wrote: On 5/15/19 5:26 AM, Sebastian Gottschall wrote: Am 15.05.2019 um 14:20 schrieb Ben Greear: On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? thats the plan. can you do a test with iperf3 to see if its reproduceable. i mean i will test it on ipq based boards and x64. but to make sure that the scenario is identical which raised up your issue, it would be find if we have identical software for testing including the same options I think I found the issue. The rate-ctrl logic in the firmware allows a transition from HT/VHT 20 MCS0 down to OFDM rates. It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz and a VHT rate for 80Mhz (or maybe just the single OFDM rate is the fault). If you can edit firmware, then setting this to 0 probably fixes the issue. g_rc_cck_rate_allowed I think to reproduce you'd need to send high speed traffic in a situation where the RF environment is going to make rate-ctrl fail quite a bit. (Slow speed should work too, but it would likely take a lot longer). And, it is always possible that whatever I saw when testing mostly-stock FW is different from what I eventually debugged to in my firmware. Still, from looking at MCS vs SNR charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 5/15/19 6:00 AM, Ben Greear wrote: On 5/15/19 5:26 AM, Sebastian Gottschall wrote: Am 15.05.2019 um 14:20 schrieb Ben Greear: On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? thats the plan. can you do a test with iperf3 to see if its reproduceable. i mean i will test it on ipq based boards and x64. but to make sure that the scenario is identical which raised up your issue, it would be find if we have identical software for testing including the same options I think I found the issue. The rate-ctrl logic in the firmware allows a transition from HT/VHT 20 MCS0 down to OFDM rates. It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz and a VHT rate for 80Mhz (or maybe just the single OFDM rate is the fault). If you can edit firmware, then setting this to 0 probably fixes the issue. g_rc_cck_rate_allowed I think to reproduce you'd need to send high speed traffic in a situation where the RF environment is going to make rate-ctrl fail quite a bit. (Slow speed should work too, but it would likely take a lot longer). And, it is always possible that whatever I saw when testing mostly-stock FW is different from what I eventually debugged to in my firmware. Still, from looking at MCS vs SNR charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 5/15/19 5:26 AM, Sebastian Gottschall wrote: Am 15.05.2019 um 14:20 schrieb Ben Greear: On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? thats the plan. can you do a test with iperf3 to see if its reproduceable. i mean i will test it on ipq based boards and x64. but to make sure that the scenario is identical which raised up your issue, it would be find if we have identical software for testing including the same options One of our other engineers will try to reproduce it on a different system today. And in case you are sniffing during your own testing, I'd be curious if you see any AMSDU frames? I only see AMPDUS full of single-packet frames. I would expect several of the 512b frames to be put into AMSDU sub-frames. I plan to look into that after I figure out the tx stall issue. Thanks, Ben From debugging yesterday, I see a lot of tx-hang notifications in the firmware, and I also believe I saw it trying to transmit with a 0 rate-indx, which is 11Mbps CCK, which is invalid for 5Ghz. I'll be debugging that more today. I don't know if stock firmware fails for the same reasons, but the symptom looked the same. could be a buffer overflow/locking due a udp flooding. so a missing check to drop packets which are out of limit or a too restrictive buffer management. like static frame buffers of max mtu size, but its just used partially by frame due the small size of the udp packets. you may know it better since you have much better knowledge about the firmware internals. Thanks, Ben Sebastian Am 15.05.2019 um 03:52 schrieb Ben Greear: Hello, I found a strange issue and curious if someone has seen similar. I made an AP where the AP interface acts as a routed interface. I generate traffic through another interface in the router. When sending 300Mbps of 512 byte UDP payloads, in the downstream direction, and with the station being a 1x1 /AC device, then the AP NIC appears to mostly lock up within about 1 minute. I still see beacons, but no data frames. In some cases, I reproduced with very slow speed traffic as well. I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as well as my hacked upon firmware, and all act similarly. I'm using the 4.20 kernel, but at least for now, it does not appear to be a kernel issue. If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it is much harder to reproduce (and maybe cannot be reproduced). Also, when generating traffic directly on the AP device instead of using the routed interface as a traffic source, it is harder to reproduce. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Problem with 9984 in routed mode with 512b frames.
On 05/14/2019 09:26 PM, Sebastian Gottschall wrote: can you send me a detailed instruction for testing this on my devices? so which commands have been used for generating the traffic etc. (iperf3?) I am using our own traffic generator, but I imagine iperf3 should work fine too. I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet load on your platform in routed mode (ie, external iperf generator through your AP) and see if you see issues? From debugging yesterday, I see a lot of tx-hang notifications in the firmware, and I also believe I saw it trying to transmit with a 0 rate-indx, which is 11Mbps CCK, which is invalid for 5Ghz. I'll be debugging that more today. I don't know if stock firmware fails for the same reasons, but the symptom looked the same. Thanks, Ben Sebastian Am 15.05.2019 um 03:52 schrieb Ben Greear: Hello, I found a strange issue and curious if someone has seen similar. I made an AP where the AP interface acts as a routed interface. I generate traffic through another interface in the router. When sending 300Mbps of 512 byte UDP payloads, in the downstream direction, and with the station being a 1x1 /AC device, then the AP NIC appears to mostly lock up within about 1 minute. I still see beacons, but no data frames. In some cases, I reproduced with very slow speed traffic as well. I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as well as my hacked upon firmware, and all act similarly. I'm using the 4.20 kernel, but at least for now, it does not appear to be a kernel issue. If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it is much harder to reproduce (and maybe cannot be reproduced). Also, when generating traffic directly on the AP device instead of using the routed interface as a traffic source, it is harder to reproduce. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Problem with 9984 in routed mode with 512b frames.
Hello, I found a strange issue and curious if someone has seen similar. I made an AP where the AP interface acts as a routed interface. I generate traffic through another interface in the router. When sending 300Mbps of 512 byte UDP payloads, in the downstream direction, and with the station being a 1x1 /AC device, then the AP NIC appears to mostly lock up within about 1 minute. I still see beacons, but no data frames. In some cases, I reproduced with very slow speed traffic as well. I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as well as my hacked upon firmware, and all act similarly. I'm using the 4.20 kernel, but at least for now, it does not appear to be a kernel issue. If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it is much harder to reproduce (and maybe cannot be reproduced). Also, when generating traffic directly on the AP device instead of using the routed interface as a traffic source, it is harder to reproduce. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2] mac80211: remove warning message
On 5/14/19 11:40 AM, Johannes Berg wrote: We know the WARN hits, we have the backtrace, and it is easy enough (in my setup at least) to reproduce this. So, the WARN logic has done its job. Having more of these spam the kernel doesn't add much benefit I think. Well, this doesn't necessarily just catch a *single* issue, so it should remain for the future, I'd think. Anyone have any suggestions on how to fix the underlying issue? I don't even have the backtrace and scenario that causes it. johannes Here is the info I have in my commit that changed this to WARN_ON_ONCE. I never posted it because I had to hack ath10k to get to this state, so maybe this is not a valid case to debug. Maybe Yibo Zhao has a better example. mac80211: don't spam kernel logs when chantx is null. I set up ath10k to be chandef based again in order to test WDS. My WDS stations are not very functional yet, and when ethtool stats are queried, there is a WARN_ON splat generated. Change this to WARN_ON_ONCE so that there is less kernel spam. [ 2401.445631] WARNING: CPU: 1 PID: 14070 at /home/greearb/git/linux-4.13.dev.y/net/mac80211/ieee80211_i.h:1452 sta_set_rate_info_tx+0x18c/0x1a0 [mac80211] [ 2401.445727] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink wanlink(O) ath10k_pci ath10k_core mac80211_hwsim ath5k ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nf_defrag_ipv4 libcrc32c 8021q garp mrp stp llc fuse macvlan pktgen nfsv3 nfs fscache amd64_edac_mod edac_mce_amd kvm_amd kvm irqbypass sp5100_tco fam15h_power k10temp i2c_piix4 ccp shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc sdhci_pci sdhci mmc_core igb hwmon ptp pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt [last unloaded: nfnetlink] [ 2401.445911] CPU: 1 PID: 14070 Comm: btserver Tainted: GW O 4.13.11+ #18 [ 2401.445914] Hardware name: PC Engines apu2/apu2, BIOS 88a4f96 03/07/2016 [ 2401.445918] task: 880118c73b80 task.stack: c9000140 [ 2401.445973] RIP: 0010:sta_set_rate_info_tx+0x18c/0x1a0 [mac80211] [ 2401.446003] RSP: 0018:c90001403820 EFLAGS: 00010246 [ 2401.446007] RAX: RBX: 8800ca38e4a0 RCX: 000c [ 2401.446010] RDX: RSI: 8800ca38e4a0 RDI: 8800ca38e000 [ 2401.446013] RBP: c90001403850 R08: a04437a0 R09: 2000 [ 2401.446016] R10: 1183be82 R11: R12: c90001403970 [ 2401.446018] R13: R14: 8800c01d8900 R15: 8800ca180780 [ 2401.446023] FS: 7f8123ed7740() GS:88011ec8() knlGS: [ 2401.446026] CS: 0010 DS: ES: CR0: 80050033 [ 2401.446029] CR2: 036e8018 CR3: c0b29000 CR4: 000406e0 [ 2401.446034] Call Trace: [ 2401.446140] sta_set_sinfo+0x629/0x8b0 [mac80211] [ 2401.446192] ieee80211_get_stats+0x3f2/0x8c0 [mac80211] [ 2401.446207] ? __nla_put+0x20/0x30 [ 2401.446221] ? __kmalloc_reserve.isra.35+0x2c/0x80 [ 2401.446229] ? netlink_deliver_tap+0x2d/0x1e0 [ 2401.446235] ? sock_def_readable+0x6d/0x70 [ 2401.446239] ? __netlink_sendskb+0x36/0x40 [ 2401.446245] ? netlink_unicast+0x1b0/0x1f0 [ 2401.446252] ? rtnl_getlink+0x135/0x1c0 [ 2401.446261] ? get_page_from_freelist+0x913/0xac0 [ 2401.446270] ? vmap_page_range_noflush+0x27d/0x370 [ 2401.446277] ? map_vm_area+0x31/0x40 [ 2401.446284] ? __vmalloc_node_range+0x21f/0x270 [ 2401.446319] dev_ethtool+0x11d1/0x1ce0 [ 2401.446325] ? __rtnl_unlock+0x25/0x50 [ 2401.446330] ? netdev_run_todo+0x4d/0x2e0 [ 2401.446338] ? dev_get_by_name_rcu+0x6f/0xa0 [ 2401.446344] dev_ioctl+0x330/0x550 [ 2401.446349] ? reuse_swap_page+0x30/0x100 [ 2401.446355] sock_do_ioctl+0x3d/0x50 [ 2401.446359] ? sock_do_ioctl+0x3d/0x50 [ 2401.446363] sock_ioctl+0x1e5/0x2a0 [ 2401.446370] do_vfs_ioctl+0x8b/0x5b0 [ 2401.446376] ? getnstimeofday64+0x9/0x20 [ 2401.446383] ? __audit_syscall_entry+0xba/0x110 [ 2401.446391] ? syscall_trace_enter+0x1b0/0x2b0 [ 2401.446395] SyS_ioctl+0x74/0x80 [ 2401.446400] ? __audit_syscall_exit+0x215/0x2b0 [ 2401.446405] do_syscall_64+0x5c/0x190 [ 2401.446412] entry_SYSCALL64_slow_path+0x25/0x25 Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH v2] mac80211: remove warning message
On 5/14/19 8:44 AM, Joe Perches wrote: On Tue, 2019-05-14 at 11:12 +0200, Johannes Berg wrote: On Tue, 2019-05-14 at 17:10 +0800, Yibo Zhao wrote: On 2019-05-14 17:05, Johannes Berg wrote: On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote: In multiple SSID cases, it takes time to prepare every AP interface to be ready in initializing phase. If a sta already knows everything it needs to join one of the APs and sends authentication to the AP which is not fully prepared at this point of time, AP's channel context could be NULL. As a result, warning message occurs. [] I was planning to use WARN_ON_ONCE() in the first place to replace WARN_ON() then after some discussion, we think removing it could be better. So the first patch was based on my first version which is sent incorrectly. Please check again. [] I guess changing it to WARN_ON_ONCE() makes sense, WARN_ON_RATELIMIT exists. We know the WARN hits, we have the backtrace, and it is easy enough (in my setup at least) to reproduce this. So, the WARN logic has done its job. Having more of these spam the kernel doesn't add much benefit I think. Anyone have any suggestions on how to fix the underlying issue? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH] mac80211: remove warning message
On 05/10/2019 12:01 AM, Yibo Zhao wrote: In multiple SSID cases, it takes time to prepare every AP interface to be ready in initializing phase. If a sta already knows everything it needs to join one of the APs and sends authentication to the AP which is not fully prepared at this point of time, AP's channel context could be NULL. As a result, warning message occurs. Even worse, if the AP is under attack via tools such as MDK3 and massive authentication requests are received in a very short time, console will be hung due to kernel warning messages. Since it is a WARN_ON_ONCE, how it the console hang due to warnings? You should get no more than once per boot? I have no problem with removing it though. Seems a harmless splat and I removed it from my tree some time back as well. Thanks, Ben If this case can be hit during normal functionality, there should be no WARN_ON(). Those should be reserved to cases that are not supposed to be hit at all or some other more specific cases like indicating obsolete interface. Signed-off-by: Zhi Chen Signed-off-by: Yibo Zhao --- net/mac80211/ieee80211_i.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 2ae0364..f39c289 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -1435,7 +1435,7 @@ struct ieee80211_local { rcu_read_lock(); chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf); - if (WARN_ON_ONCE(!chanctx_conf)) { + if (!chanctx_conf) { rcu_read_unlock(); return NULL; } -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: ath10k: wmi service ready event not received
On 05/10/2019 05:28 AM, Linus Torvalds wrote: Hmm. I have a nice new laptop, and it works fine. Except today it lost wireless, and I have no idea why. It's not happened before (but it's fairly new and I'm actually on my first trip with it), so I don't know how common this is, but the kernel messages seem to say that the cause of it was ath10k_pci :02:00.0: wmi service ready event not received ath10k_pci :02:00.0: could not init core (-110) ath10k_pci :02:00.0: could not probe fw (-110) and then nothing works. -110 is ETIMEDOUT, fwiw. Rebooting got wireless back. It's possible I could have done something less drastic, but I was thinking that it would be the new kernel and rebooted into an older version. But then rebooting into the new one afterwards (double-checking before starting a bisect) and it all worked. Is there anything I can do to debug this if it happens again? Please provide 'lspci' or other info on the NIC chipset, for reference. Sometimes a work-around is: rmmod ath10k_pci ath10k_core; modprobe ath10k_pci Sometimes you will get a firmware register dump in this crash case, and then someone from QCA might can get a backtrace if you post that with the chipset info and such (or if it is one of the NICs my ath10k-ct firmware supports and you can reproduce an issue with that firmware, then I can debug it). Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug
On 4/26/19 6:38 AM, Venkateswara Naralasetty wrote: #ifdef CONFIG_ATH10K_DEBUG -void ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask, - const char *fmt, ...) +void __ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask, + const char *fmt, ...) { struct va_format vaf; va_list args; Do you still need the check later in this method: if (ath10k_debug_mask & mask) since you already checked in the ath10k_dbg() macro? Yes, we need this check. Otherwise all debug messages will be printed even without any debug mask set in case of tracing enabled. Ahh, I see. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug
On 4/26/19 6:44 AM, Michał Kazior wrote: On Fri, 26 Apr 2019 at 14:58, Venkateswara Naralasetty wrote: ath10k_dbg() is called in ath10k_process_rx() with huge set of arguments which is causing CPU overhead even when debug_mask is not set. Good improvement was observed in the receive side performance when call to ath10k_dbg() is avoided in the RX path. [...] +/* Avoid calling __ath10k_dbg() if debug_mask is not set and tracing + * disabled. + */ +#define ath10k_dbg(ar, dbg_mask, fmt, ...) \ +do { \ + if ((ath10k_debug_mask & dbg_mask) || \ + trace_ath10k_log_dbg_enabled()) \ + __ath10k_dbg(ar, dbg_mask, fmt, ##__VA_ARGS__); \ +} while (0) Did you consider using jump labels (see include/linux/jump_label.h)? It's what tracing uses under the hood. I wonder if you could squeeze out a bit more performance with that? I guess you'd need to add `struct static_key ath10k_dbg_mask_keys[ATH10K_DBG_MAX]` and re-do ath10k_debug_mask enum a bit. Maybe first test with debugging just compiled out to see if there is still any significant overhead with this new patch applied? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug
On 4/26/19 5:58 AM, Venkateswara Naralasetty wrote: ath10k_dbg() is called in ath10k_process_rx() with huge set of arguments which is causing CPU overhead even when debug_mask is not set. Good improvement was observed in the receive side performance when call to ath10k_dbg() is avoided in the RX path. Since currently all debug messages are sent via tracing infrastructure, we cannot entirely avoid calling ath10k_dbg. Therefore, call to ath10k_dbg() is made conditional based on tracing config in the driver. Trasmit performance remains unchanged with this patch; below are some experimental results with this patch and tracing disabled. mesh mode: w/o this patch with this patch Traffic TP CPU Usage TP CPU usage TCP 840Mbps76.53% 960Mbps78.14% UDP 1030Mbps 74.58% 1132Mbps 74.31% Infra mode: w/o this patch with this patch TrafficTP CPU Usage TP CPU usage TCP Rx 1241Mbps 80.89% 1270Mbps 73.50% UDP Rx 1433Mbps 81.77% 1472Mbps 72.80% Tested platform : IPQ8064 hardware used : QCA9984 firmware ver: ver 10.4-3.5.3-00057 Signed-off-by: Kan Yan Signed-off-by: Venkateswara Naralasetty --- v2: * changed trace enabled check from IS_ENABLED(CONFIG_ATH10K_TRACING) * to trace_ath10k_log_dbg_enabled(). drivers/net/wireless/ath/ath10k/core.c | 2 ++ drivers/net/wireless/ath/ath10k/debug.c | 8 drivers/net/wireless/ath/ath10k/debug.h | 22 -- drivers/net/wireless/ath/ath10k/trace.c | 1 + drivers/net/wireless/ath/ath10k/trace.h | 6 +- 5 files changed, 28 insertions(+), 11 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index cfd7bb2..ab709bf 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -26,6 +26,8 @@ #include "coredump.h" unsigned int ath10k_debug_mask; +EXPORT_SYMBOL(ath10k_debug_mask); + static unsigned int ath10k_cryptmode_param; static bool uart_print; static bool skip_otp; diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c index 32d967a..1b63929 100644 --- a/drivers/net/wireless/ath/ath10k/debug.c +++ b/drivers/net/wireless/ath/ath10k/debug.c @@ -2620,8 +2620,8 @@ void ath10k_debug_unregister(struct ath10k *ar) #endif /* CONFIG_ATH10K_DEBUGFS */ #ifdef CONFIG_ATH10K_DEBUG -void ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask, - const char *fmt, ...) +void __ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask, + const char *fmt, ...) { struct va_format vaf; va_list args; Do you still need the check later in this method: if (ath10k_debug_mask & mask) since you already checked in the ath10k_dbg() macro? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [BUG] Can't change default country code from US to DE with Compex WLE900VX (Debian 10 Buster amd64)
On 04/06/2019 11:19 PM, Rene 'Renne' Bartsch, B.Sc. Informatics wrote: Hi, I posted this to the linux-wireless mailing-list 4 days ago and did not receive an answer. The ath10k module sets the country code to "US" at initialization. After that the country code can't be changed anymore (e.g. to "DE"). You might have to patch the driver and maybe the ath/ logic too, but the country code is handled in driver/kernel, so at least you can fix it. Thanks, Ben Compex support suggests setting "reg->country_code = CTRY_UNITED_STATES;" in "/drivers/net/wireless/ath/regd.c" to the local country. Patching and re-compiling every kernel-update isn't an option on an UEFI-only production system. Kernel version: 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux on Debian 10 Buster The WLE900VX is based on the QCA XB140 reference design. Until Thursday we have the option to return the cards to the dealer. Thanx for any hint, Renne renne@cloud:/lib/firmware/ath10k/QCA988X/hw2.0$ uname -a Linux cloud 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux renne@cloud:~$ ls /lib/firmware/ath10k/QCA988X/hw2.0/ board.bin firmware-4.bin firmware-5.bin What doesn't work: /etc/modprobe.d/cfg80211.conf: options cfg80211 ieee80211_regdom=DE /etc/hostapd/hostapd.conf: ... ieee80211d=1 country_code=DE ... root@cloud:/# export COUNTRY=DE; /sbin/crda Failed to set regulatory domain: -7 root@cloud:/# iw reg set DE && iw reg get global country 98: DFS-UNSET (2402 - 2472 @ 40), (N/A, 20), (N/A) (5170 - 5250 @ 80), (N/A, 20), (N/A), NO-OUTDOOR, AUTO-BW (5250 - 5330 @ 80), (N/A, 20), (0 ms), NO-OUTDOOR, DFS, AUTO-BW (5490 - 5725 @ 160), (N/A, 23), (0 ms), DFS (5725 - 5730 @ 5), (N/A, 13), (0 ms), DFS (5735 - 5835 @ 80), (N/A, 13), (N/A) (57240 - 63720 @ 2160), (N/A, 40), (N/A) phy#1 country US: DFS-FCC (2402 - 2472 @ 40), (N/A, 30), (N/A) (5170 - 5250 @ 80), (N/A, 23), (N/A), AUTO-BW (5250 - 5330 @ 80), (N/A, 23), (0 ms), DFS, AUTO-BW (5490 - 5730 @ 160), (N/A, 23), (0 ms), DFS (5735 - 5835 @ 80), (N/A, 30), (N/A) (57240 - 63720 @ 2160), (N/A, 40), (N/A) phy#0 country US: DFS-FCC (2402 - 2472 @ 40), (N/A, 30), (N/A) (5170 - 5250 @ 80), (N/A, 23), (N/A), AUTO-BW (5250 - 5330 @ 80), (N/A, 23), (0 ms), DFS, AUTO-BW (5490 - 5730 @ 160), (N/A, 23), (0 ms), DFS (5735 - 5835 @ 80), (N/A, 30), (N/A) (57240 - 63720 @ 2160), (N/A, 40), (N/A) LOGs: renne@cloud:~$ sudo dmesg | grep ath [4.630113] ath10k_pci :04:00.0: enabling device ( -> 0002) [4.630548] ath10k_pci :04:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0 [4.803687] ath10k_pci :04:00.0: firmware: failed to load ath10k/pre-cal-pci-:04:00.0.bin (-2) [4.803700] ath10k_pci :04:00.0: firmware: failed to load ath10k/cal-pci-:04:00.0.bin (-2) [4.803872] ath10k_pci :04:00.0: firmware: failed to load ath10k/QCA988X/hw2.0/firmware-6.bin (-2) [4.804994] ath10k_pci :04:00.0: firmware: direct-loading firmware ath10k/QCA988X/hw2.0/firmware-5.bin [4.804999] ath10k_pci :04:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub : [4.805000] ath10k_pci :04:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 0 testmode 0 [4.805147] ath10k_pci :04:00.0: firmware ver 10.2.4-1.0-00041 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 f43fa422 [4.840158] ath10k_pci :04:00.0: firmware: failed to load ath10k/QCA988X/hw2.0/board-2.bin (-2) [4.840338] ath10k_pci :04:00.0: firmware: direct-loading firmware ath10k/QCA988X/hw2.0/board.bin [4.840343] ath10k_pci :04:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08 [5.988575] ath10k_pci :04:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1 [6.067999] ath: EEPROM regdomain: 0x0 [6.068000] ath: EEPROM indicates default country code should be used [6.068000] ath: doing EEPROM country->regdmn map search [6.068001] ath: country maps to regdmn code: 0x3a [6.068002] ath: Country alpha2 being used: US [6.068002] ath: Regpair used: 0x3a [6.078078] ath10k_pci :04:00.0 wlp4s0: renamed from wlan0 [ 5099.420780] ath10k_pci :04:00.0: pdev param 0 not supported by firmware ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Don't log a traceback on invalid event IDs.
ff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c index 582fb11f648..ca990c8d306 100644 --- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c +++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c @@ -614,7 +614,7 @@ static void ath10k_wmi_tlv_op_rx(struct ath10k *ar, struct sk_buff *skb) ath10k_wmi_event_mgmt_tx_bundle_compl(ar, skb); break; default: - ath10k_dbg(ar, ATH10K_DBG_WMI, "Unknown eventid: %d\n", id); + ath10k_info(ar, ATH10K_DBG_WMI, "Unknown eventid: %d\n", id); break; } diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c index 98a90e49d66..f4fa406d9fe 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.c +++ b/drivers/net/wireless/ath/ath10k/wmi.c @@ -5850,7 +5850,7 @@ static void ath10k_wmi_op_rx(struct ath10k *ar, struct sk_buff *skb) ath10k_wmi_event_service_available(ar, skb); break; default: - ath10k_warn(ar, "Unknown eventid: %d\n", id); + ath10k_info(ar, "Unknown eventid: %d\n", id); break; } @@ -5981,7 +5981,7 @@ static void ath10k_wmi_10_1_op_rx(struct ath10k *ar, struct sk_buff *skb) /* ignore utf events */ break; default: - ath10k_warn(ar, "Unknown eventid: %d\n", id); + ath10k_info(ar, "Unknown eventid: %d\n", id); break; } @@ -6130,7 +6130,7 @@ static void ath10k_wmi_10_2_op_rx(struct ath10k *ar, struct sk_buff *skb) ath10k_wmi_event_peer_sta_ps_state_chg(ar, skb); break; default: - ath10k_warn(ar, "Unknown eventid: %d\n", id); + ath10k_info(ar, "Unknown eventid: %d\n", id); break; } @@ -6250,7 +6250,7 @@ static void ath10k_wmi_10_4_op_rx(struct ath10k *ar, struct sk_buff *skb) ath10k_wmi_event_peer_sta_ps_state_chg(ar, skb); break; default: - ath10k_warn(ar, "Unknown eventid: %d\n", id); + ath10k_info(ar, "Unknown eventid: %d\n", id); break; } -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Kernel crash in skb_put caused by ath10k_htt_t2h_msg_handler
On 04/05/2019 02:24 AM, Petr Štetiar wrote: Hi, I've just hit following crash on my TP Link Archer C7 v5 with QCA9880, running latest OpenWrt with 4.14.109 (4.19.23-1 backports) and qca988x-ct firmware: Hello, Can you use gdb to print out the lines of code around that crash site in t2h_msg_handler? If I can figure out which message caused it I can add debugging and/or protective code. Thanks, Ben skbuff: skb_over_panic: text:87622780 len:360 put:360 head: (null) data: (null) tail:0x168 end:0x0 dev: Kernel bug detected[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.109 #0 task: 804e5490 task.stack: 804e $ 0 : 0001 006f $ 4 : 804ea7d8 804ea7d8 804f1a50 85c8 $ 8 : 0007 $12 : 01c1 efec8fad 01c0 $16 : 87551bf8 87263054 0416 $20 : 875514e0 87c07dc0 87551bf8 $24 : 0002 80279994 $28 : 804e 87c07d40 8765d994 802f4bd0 Hi: Lo: ec4e4000 epc : 802f4bd0 skb_panic+0x58/0x5c ra: 802f4bd0 skb_panic+0x58/0x5c Status: 1100dc03 KERNEL EXL IE Cause : 00800024 (ExcCode 09) PrId : 00019750 (MIPS 74Kc) Modules linked in: ath9k ath9k_common ath9k_hw ath10k_pci ath10k_core ath mac80211 iptable_nat iptable_mangle iptable_filter ipt_REJECT ipt_MASQUERADE ip_tables cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT x_tables nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack compat ledtrig_usbport tun ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common Process swapper (pid: 0, threadinfo=804e, task=804e5490, tls=) Stack : 80501cc0 8041f040 87622780 0168 0168 0168 804b209c 1770 802f4c60 8749 87649118 804fcbf4 0900 01080020 87622780 0900 875514e0 875578c4 804e fffb 87c07dc4 87551cdc 804e 875514e0 804e 0004 87263054 87263054 0001 0034 87465180 8755777c 87465b40 8766 ... Call Trace: [<802f4bd0>] skb_panic+0x58/0x5c [<802f4c60>] skb_put+0x48/0x54 [<87622780>] ath10k_htt_t2h_msg_handler+0x27e0/0x31dc [ath10k_core] [<8764901c>] ath10k_ce_rx_update_write_idx+0x9c/0xc4 [ath10k_core] Code: 00602825 0c02ce18 248433d4 <000c000d> 8c8200ac 8c88005c 8c8700a8 00451023 01054021 ---[ end trace 6b934e1b587e6bcd ]--- Kernel panic - not syncing: Fatal exception in interrupt Rebooting in 3 seconds.. I've looked at the git log till 5.1-rc3 for htt_rx.c and ce.c, but couldn't find anything possibly related to this issue, so I'm wondering if this is known and already fixed bug. Thanks for any pointers! -- ynezz ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: QCA9984 all devices being de-authenticated and unable to re-connect
Try down this page a bit, you just need to copy the firmware-5.bin and reboot usually. https://www.candelatech.com/ath10k-bugs.php Thanks, Ben On 04/01/2019 04:04 PM, Carlito Nueno wrote: Hi Ben, Can you tell me how I can compile ath10k-ct firmware version present in 18.06.2 into the latest snapshot? I want to try and see if it's firmware related. Awesome to know you had 64+ stations connected to your firmware. Thanks for the help. On Sun, Mar 31, 2019 at 8:06 PM Ben Greear wrote: Hello, You could try using the ath10k-ct firmware that does NOT work for you (18.06.2) in the latest snapshot. If problem still happens, then it is firmware related, and I can then build you a series of images so you can do a bisect to find what commit fixes the issue if you really want to know what is the fix. Otherwise, maybe something else fixed the problem. For what it is worth, we have regularly done 64+ stations connected to our ath10k firmware/driver for years. Thanks, Ben On 03/29/2019 07:02 PM, Carlito Nueno wrote: Hi all, I am using: ath10k-firmware-qca9984 - 2018-12-16-211de167-1 kmod-ath10k - 4.14.109+4.19.23-1-5 ## Problem I am able to associate and authenticate many clients. Max I tested was 15 clients. But when more than 4 clients start to play video stream (youtube, twitch, netflix): 1. all the clients loose internet connectivity 2. all of them are _de-authenticated_ 3. when trying to reconnect, they connect but are _disassociated_ immediately. c2:44:2f:f3:3c:22 -64 dBm / -109 dBm (SNR 45) 40 ms ago RX: 200.0 MBit/s, VHT-MCS 9, 40MHz, VHT-NSS 148 Pkts. TX: 12.0 MBit/s9 Pkts. expected throughput: unknown c2:44:2f:f3:3c:22 -57 dBm / -109 dBm (SNR 52) 10 ms ago RX: 12.0 MBit/s 17 Pkts. TX: 12.0 MBit/s5 Pkts. expected throughput: unknown 4. internet on the AP works. (I am able to ping google.com) ## Firmware and OS this problem occurs - ath10k + 18.06.2 = yes, there is this problem - ath10k + snapshot = yes, there is this problem - ath10k-ct + 18.06.2 = yes, similar problem occurs (https://github.com/greearb/ath10k-ct/issues/82) - ath10k-ct + snapshot = no, works fine ## OpenWRT info ### Release DISTRIB_ID='OpenWrt' DISTRIB_RELEASE='SNAPSHOT' DISTRIB_REVISION='r9753-6df5ab89cf' DISTRIB_TARGET='ar71xx/generic' DISTRIB_ARCH='mips_24kc' DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r9753-6df5ab89cf' DISTRIB_TAINTS='no-all' ### Logs Include: - Full list of OPKGs installed: 2_opkg_installed-txt - Logs up to the point of crash: 3_ath10k_crash-txt - Logs after the crash and trying to reconnect: 4_after_crash_reconnect-txt https://gist.github.com/ironpillow/96ce9173721163a8c8c93113b2a677d7 ### More logs I noticed that the some devices stay connected :man_shrugging: and when these connected devices make a dns request, the request is reaching the DNS server but the AP is not receiving response. I ran ping on *one device* and captured packets on AP (two interfaces): - tcpdump -i wlan0-ap - tcpdump -i br-lan ping google.com: https://gist.github.com/ironpillow/50cb0e2010ac5bc9acc7abc7e20ab910 ping 8.8.8.8: https://gist.github.com/ironpillow/97cb3dd6eb8e9d028a8231f142fae01f Packets are not reaching wifi wlan0-ap interface. I am happy to run more tests. Any advice? ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: QCA9984 all devices being de-authenticated and unable to re-connect
Hello, You could try using the ath10k-ct firmware that does NOT work for you (18.06.2) in the latest snapshot. If problem still happens, then it is firmware related, and I can then build you a series of images so you can do a bisect to find what commit fixes the issue if you really want to know what is the fix. Otherwise, maybe something else fixed the problem. For what it is worth, we have regularly done 64+ stations connected to our ath10k firmware/driver for years. Thanks, Ben On 03/29/2019 07:02 PM, Carlito Nueno wrote: Hi all, I am using: ath10k-firmware-qca9984 - 2018-12-16-211de167-1 kmod-ath10k - 4.14.109+4.19.23-1-5 ## Problem I am able to associate and authenticate many clients. Max I tested was 15 clients. But when more than 4 clients start to play video stream (youtube, twitch, netflix): 1. all the clients loose internet connectivity 2. all of them are _de-authenticated_ 3. when trying to reconnect, they connect but are _disassociated_ immediately. c2:44:2f:f3:3c:22 -64 dBm / -109 dBm (SNR 45) 40 ms ago RX: 200.0 MBit/s, VHT-MCS 9, 40MHz, VHT-NSS 148 Pkts. TX: 12.0 MBit/s9 Pkts. expected throughput: unknown c2:44:2f:f3:3c:22 -57 dBm / -109 dBm (SNR 52) 10 ms ago RX: 12.0 MBit/s 17 Pkts. TX: 12.0 MBit/s5 Pkts. expected throughput: unknown 4. internet on the AP works. (I am able to ping google.com) ## Firmware and OS this problem occurs - ath10k + 18.06.2 = yes, there is this problem - ath10k + snapshot = yes, there is this problem - ath10k-ct + 18.06.2 = yes, similar problem occurs (https://github.com/greearb/ath10k-ct/issues/82) - ath10k-ct + snapshot = no, works fine ## OpenWRT info ### Release DISTRIB_ID='OpenWrt' DISTRIB_RELEASE='SNAPSHOT' DISTRIB_REVISION='r9753-6df5ab89cf' DISTRIB_TARGET='ar71xx/generic' DISTRIB_ARCH='mips_24kc' DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r9753-6df5ab89cf' DISTRIB_TAINTS='no-all' ### Logs Include: - Full list of OPKGs installed: 2_opkg_installed-txt - Logs up to the point of crash: 3_ath10k_crash-txt - Logs after the crash and trying to reconnect: 4_after_crash_reconnect-txt https://gist.github.com/ironpillow/96ce9173721163a8c8c93113b2a677d7 ### More logs I noticed that the some devices stay connected :man_shrugging: and when these connected devices make a dns request, the request is reaching the DNS server but the AP is not receiving response. I ran ping on *one device* and captured packets on AP (two interfaces): - tcpdump -i wlan0-ap - tcpdump -i br-lan ping google.com: https://gist.github.com/ironpillow/50cb0e2010ac5bc9acc7abc7e20ab910 ping 8.8.8.8: https://gist.github.com/ironpillow/97cb3dd6eb8e9d028a8231f142fae01f Packets are not reaching wifi wlan0-ap interface. I am happy to run more tests. Any advice? ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips
On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote: Ben Greear writes: On 2/21/19 8:10 AM, Kalle Valo wrote: Toke Høiland-Jørgensen writes: Grant Grundler writes: On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen wrote: Grant Grundler writes: And, well, Grant's data is from a single test in a noisy environment where the time series graph shows that throughput is all over the place for the duration of the test; so it's hard to draw solid conclusions from (for instance, for the 5-stream test, the average throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7 it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware used in this test, so I can't go verify it myself; so the only thing I can do is grumble about it here... :) It's a fair complaint and I agree with it. My counter argument is the opposite is true too: most ideal benchmarks don't measure what most users see. While the data wgong provided are way more noisy than I like, my overall "confidence" in the "conclusion" I offered is still positive. Right. I guess I would just prefer a slightly more comprehensive evaluation to base a 4x increase in buffer size on... Kalle, is this why you didn't accept this patch? Other reasons? Toke, what else would you like to see evaluated? I generally want to see three things measured when "benchmarking" technologies: throughput, latency, cpu utilization We've covered those three I think "reasonably". Hmm, going back and looking at this (I'd completely forgotten about this patch), I think I had two main concerns: 1. What happens in a degraded signal situation, where the throughput is limited by the signal conditions, or by contention with other devices. Both of these happen regularly, and I worry that latency will be badly affected under those conditions. 2. What happens with old hardware that has worse buffer management in the driver->firmware path (especially drivers without push/pull mode support)? For these, the lower-level queueing structure is less effective at controlling queueing latency. Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377 PCI devices, which IIRC do not even support push/pull mode. All the rest, including QCA988X and QCA9984 are unaffected. Just as a note, at least kernels such as 4.14.whatever perform poorly when running ath10k on 9984 when acting as TCP endpoints. This makes them not really usable for stuff like serving video to lots of clients. Tweaking TCP (I do it a bit differently, but either way) can significantly improve performance. Differently how? Did you have to do more than fiddle with the pacing_shift? This one, or a slightly tweaked version that applies to different kernels: https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5 Recently I helped a user that could get barely 70 stations streaming at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz), and we got 110 working with a tweaked TCP stack. These were /n stations too. I think it is lame that it _still_ requires out of tree patches to make TCP work well on ath10k...even if you want to default to current behaviour, you should allow users to tweak it to work with their use case. Well if TCP is broken to the point of being unusable I do think we should fix it; but I think "just provide a configuration knob" should be the last resort... So, it has been broken for years, and waiting for a perfect solution has not gotten the problem fixed. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips
On 2/21/19 8:10 AM, Kalle Valo wrote: Toke Høiland-Jørgensen writes: Grant Grundler writes: On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen wrote: Grant Grundler writes: And, well, Grant's data is from a single test in a noisy environment where the time series graph shows that throughput is all over the place for the duration of the test; so it's hard to draw solid conclusions from (for instance, for the 5-stream test, the average throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7 it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware used in this test, so I can't go verify it myself; so the only thing I can do is grumble about it here... :) It's a fair complaint and I agree with it. My counter argument is the opposite is true too: most ideal benchmarks don't measure what most users see. While the data wgong provided are way more noisy than I like, my overall "confidence" in the "conclusion" I offered is still positive. Right. I guess I would just prefer a slightly more comprehensive evaluation to base a 4x increase in buffer size on... Kalle, is this why you didn't accept this patch? Other reasons? Toke, what else would you like to see evaluated? I generally want to see three things measured when "benchmarking" technologies: throughput, latency, cpu utilization We've covered those three I think "reasonably". Hmm, going back and looking at this (I'd completely forgotten about this patch), I think I had two main concerns: 1. What happens in a degraded signal situation, where the throughput is limited by the signal conditions, or by contention with other devices. Both of these happen regularly, and I worry that latency will be badly affected under those conditions. 2. What happens with old hardware that has worse buffer management in the driver->firmware path (especially drivers without push/pull mode support)? For these, the lower-level queueing structure is less effective at controlling queueing latency. Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377 PCI devices, which IIRC do not even support push/pull mode. All the rest, including QCA988X and QCA9984 are unaffected. Just as a note, at least kernels such as 4.14.whatever perform poorly when running ath10k on 9984 when acting as TCP endpoints. This makes them not really usable for stuff like serving video to lots of clients. Tweaking TCP (I do it a bit differently, but either way) can significantly improve performance. Recently I helped a user that could get barely 70 stations streaming at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz), and we got 110 working with a tweaked TCP stack. These were /n stations too. I think it is lame that it _still_ requires out of tree patches to make TCP work well on ath10k...even if you want to default to current behaviour, you should allow users to tweak it to work with their use case. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: QCA9888: Driver/Firmware Crash After Initialization
e2-CT crc32 4a66be6f OpenWrt package identifiers ath10k-firmware-qca4019-ct - 2018-10-10-d366b80d-1 ath10k-firmware-qca9888-ct - 2018-10-10-d366b80d-1 kmod-ath10k-ct - 4.14.99+2018-12-20-118e16da-2 _______ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: implement set_base_macaddr to fix rx-bssid mask in multiple APs conf
On 02/07/2019 06:19 AM, Kalle Valo wrote: Christian Lamparter writes: On Monday, February 4, 2019 4:45:12 PM CET Kalle Valo wrote: Christian Lamparter writes: @@ -8885,6 +8904,7 @@ static const struct wmi_ops wmi_ops = { .gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend, .gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume, + .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr, .gen_pdev_set_rd = ath10k_wmi_op_gen_pdev_set_rd, .gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param, .gen_init = ath10k_wmi_op_gen_init, @@ -8960,6 +8980,7 @@ static const struct wmi_ops wmi_10_1_ops = { .gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend, .gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume, + .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr, .gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param, .gen_stop_scan = ath10k_wmi_op_gen_stop_scan, .gen_vdev_create = ath10k_wmi_op_gen_vdev_create, @@ -9032,6 +9053,7 @@ static const struct wmi_ops wmi_10_2_ops = { .gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend, .gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume, + .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr, .gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param, .gen_stop_scan = ath10k_wmi_op_gen_stop_scan, .gen_vdev_create = ath10k_wmi_op_gen_vdev_create, These are in practise obsolete WMI interfaces so not sure if it makes it worth to support this parameter in them. But on the other hand it won't hurt either, so dunno. Ok. I looked what firmware interfaces (wmi_cmd_map) supported the pdev_set_base_macaddr_cmdid and all did (including the old and tlv) so I added the line everywhere I could. As far as the support for the old firmwares goes: I don't think anybody with a current ath10k is willingly still stuck on the 10.1, 10.2 firmware. So, I might as well just remove those for 10_2, 10_1 and MAIN. Yeah, that's the best. BTW I'm planning (or better hoping) to remove 10.1, 10.2 and main WMI interfaces altogether. They are just making these unnecessary complex. My wave-1 firmware uses the 10.1 interface and it is used by a fair number of people, so please leave that one in place. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [Ath10k] QCA9377-based USB dongle for aeronautical research project
On 12/13/2018 01:01 AM, Vincent Guenat wrote: Dear Mr. Greear, I come to you to ask for directions. TRIAGNOSYS Gmbh, a division of Zodiac Inflight Innovations part of the SAFRAN GROUP, participate in a research project for the ARINC CSS committee aiming at investigating the potential use of 5 GHz DFS channels in aircrafts. In order to do that, the partners in this project aim to gather information about where radar interference occurs and on which channels by monitoring DFS channels during commercial flights. Partners including AIRBUS and DELTA AIRLINES have already provided their support. In order to monitor the DFS channels, a device based on WiFi dongles to be installed on planes is currently explored. The Linksys WUSB6100M device based on QCA9377 seems a promising choice, especially given the features proposed by the ath10k driver. The idea is to put this device in monitor mode and use the DFS pattern detector to report DFS events to userspace through cfg80211_radar_event/nl80211_radar_notify. This would require some modification in kernelspace so that the device does not switch channel upon radar detection. An application listening on a netlink socket would then retrieve the data. Therefore my question is: do you think this is possible? In the meantime, trying to use the Linksys device in STA mode does not work as the device does not manage to use DMA to allocate the tx buffer with dma_alloc_coherent in htt_tx.c. Given that it works for PCI devices, I assume that it is firmware-related but have not yet found a workaround. Do you have any ideas what might have caused this? Thank you for any time that you spend considering these questions. Hello, I do not have any access currently to the firmware for those devices, so it would be hard to add any features or stability fixes or even understand current bugs. With PCI based ath10k devices in AP mode, there is a feature in the driver that can disable channel switching on radar detection, and I recently added support for querying more details for these radar events through the debugfs API to my ath10k-ct driver. I am not sure radar detection works in monitor mode, but possibly it does. I recently did a test with some realtek USB NICs and was pleasantly surprised with performance and stability in station mode. Possibly it would work for your test case. The driver is out-of-kernel...this is the one I was using and I got it compiling against openwrt without a few tweaks: https://github.com/greearb/rtl8812AU_8821AU_linux Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs
On 11/19/2018 04:13 PM, Dave Taht wrote: On Mon, Nov 19, 2018 at 3:56 PM Ben Greear wrote: On 11/19/2018 03:47 PM, Dave Taht wrote: On Mon, Nov 19, 2018 at 3:30 PM Simon Barber wrote: On Nov 19, 2018, at 2:44 PM, Toke Høiland-Jørgensen wrote: Dave Taht writes: Toke Høiland-Jørgensen writes: Felix Fietkau writes: On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote: This part doesn't really make much sense to me, but maybe I'm misunderstanding how the code works. Let's assume we have a driver like ath9k or mt76, which tries to keep a …. Well, there's going to be a BQL-like queue limit (but for airtime) on top, which drivers can opt-in to if the hardware has too much queueing. Very happy to read this - I first talked to Dave Taht about the need for Time Queue Limits more than 5 years ago! Michal faked up a dql estimator 3 (?) years ago. it worked. http://blog.cerowrt.org/post/dql_on_wifi_2/ As a side note, in *any* real world working mu-mimo situation at any scale, on any equipment, does anyone have any stats on how often the feature is actually used and useful? My personal guess, from looking at the standard, was in home scenarios, usage would be about... 0, and in a controlled environment in a football stadium, quite a lot. In a office or apartment complex, I figured interference and so forth would make it a negative benefit due to retransmits. I felt when that part of the standard rolled around... that mu-mimo was an idea that should never have escaped the lab. I can be convinced by data, that we can aim for a higher goal here. But it would be comforting to have a measured non-lab, real-world, at real world rates, result for it, on some platform, of it actually being useful. We're working on building a lab with 20 or 30 mixed 'real' devices using various different /AC NICs (QCA wave2 on OpenWRT, Fedora, realtek USB 8812au on OpenWRT, Fedora, and some Intel NICs in NUCs on Windows, and maybe more). I'm not actually sure if that realtek or the NUCs can do MU-MIMO or not, but the QCA NICs will be able to. It should be at least somewhat similar to a classroom environment or coffee shop. In the last 3 coffee shops I went to, I could hear over 30 APs on competing SSIDs, running G, N, and AC, occupying every available channel. I especially like when someone uses channel 3 because, I guess, they think it is un-used :) I'm not sure if this was a fluke or not, but at Starbucks recently I sat outside, right next to their window, and could not scan their AP at all. Previously, I sat inside, 3 feet away through the glass, and got great signal. I wonder what that was all about! Maybe special tinting that blocks RF? Or just dumb luck of some sort. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs
On 11/19/2018 03:47 PM, Dave Taht wrote: On Mon, Nov 19, 2018 at 3:30 PM Simon Barber wrote: On Nov 19, 2018, at 2:44 PM, Toke Høiland-Jørgensen wrote: Dave Taht writes: Toke Høiland-Jørgensen writes: Felix Fietkau writes: On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote: This part doesn't really make much sense to me, but maybe I'm misunderstanding how the code works. Let's assume we have a driver like ath9k or mt76, which tries to keep a …. Well, there's going to be a BQL-like queue limit (but for airtime) on top, which drivers can opt-in to if the hardware has too much queueing. Very happy to read this - I first talked to Dave Taht about the need for Time Queue Limits more than 5 years ago! Michal faked up a dql estimator 3 (?) years ago. it worked. http://blog.cerowrt.org/post/dql_on_wifi_2/ As a side note, in *any* real world working mu-mimo situation at any scale, on any equipment, does anyone have any stats on how often the feature is actually used and useful? My personal guess, from looking at the standard, was in home scenarios, usage would be about... 0, and in a controlled environment in a football stadium, quite a lot. In a office or apartment complex, I figured interference and so forth would make it a negative benefit due to retransmits. I felt when that part of the standard rolled around... that mu-mimo was an idea that should never have escaped the lab. I can be convinced by data, that we can aim for a higher goal here. But it would be comforting to have a measured non-lab, real-world, at real world rates, result for it, on some platform, of it actually being useful. We're working on building a lab with 20 or 30 mixed 'real' devices using various different /AC NICs (QCA wave2 on OpenWRT, Fedora, realtek USB 8812au on OpenWRT, Fedora, and some Intel NICs in NUCs on Windows, and maybe more). I'm not actually sure if that realtek or the NUCs can do MU-MIMO or not, but the QCA NICs will be able to. It should be at least somewhat similar to a classroom environment or coffee shop. I'll let you know what we find as far as how well MU-MIMO improves things or not. At least in simple test cases (one 1x1 stations, one 2x2 station, with 4x4 MU-MIMO AP), it works very well for increased download throughput. In home setups, I'd guess that the DSL or Cable Modem or other uplink is the bottleneck way more often than the wifi is, even if your are just running /n. But, maybe that is just my experience living out at the end of a long skinny phone line all these years. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH 0/4] cfg80211/mac80211: Add support for TID specific configuration
On 11/07/2018 04:55 PM, Igor Mitsyanko wrote: On 10/22/2018 10:55 AM, Tamizh chelvam wrote: Add infrastructure for per TID aggregation/retry count configurations such as retry count and AMPDU aggregation control(disable/enable). In some scenario reducing the number of retry count for a specific data traffic can reduce the latency by proceeding with the next packet instead of retrying the same packet more time. This will be useful where the next packet can resume the operation without an issue. Here added NL80211_CMD_SET_TID_CONFIG to support this operation by accepting retry count and AMPDU aggregation control. This command can accept STA mac addreess to make the configuration station specific rather than applying to all the connected stations to the netdev. It's not immediately clear how to make use of these settings, here are several comments: 1. What max retry count limit should actually be applied to? Retries decisions are in a rate adaptation domain, it should know how many retries should be done on each rate, single "max retry" value will not suffice. For example, it can retry twice on MCS9, twice on MCS7, three times on MCS5 or something like that. I'm not familiar with what ATH10k is doing, 4th patch defines ATH10K_MAX_RETRY_COUNT=30, what does it actually mean? It's unlikely "do 30 retries on the same rate". Does retry limit setting interacts with rate adaptation somehow in ath10k? Maybe it makes sense to extend max retry value specification to make it possible to define per-rate? I'm not sure how much flexibility we want here, something like retry value per MCS:BW:SGI? For ath10k, my understanding is that each time it (re)sends a packet, it will query FW rate-ctrl and choose the optimal rate. It doesn't pay much attention to whether a specific frame is retried or not, other than to maybe enable RTS/CTS, but lots of retries will bump the rate-ctrl down to a lower rate. There are no per-rate retry counter logic, but I think there is per-tid control, though currently it might not be wired up to the driver. 2. AMPDU/AMSDU - the way it is, it is also relevant to rate in Tx direction only, correct? We keep advertised capabilities intact and peer has all rights to send AMPDUs/AMSDUs of whatever size that was advertised. Additionally, posted patches do not do anything with established blockack agreement. 3 With above being said, perhaps it would make sense for this new interface to indicate explicitly that it's related to Tx rate? That can be controlled per-TID, per-node or globally, depending on device capabilities. Some other settings that may be useful are fixed MCS, MCS limit, SGI on/off, bandwidth, maybe even provide rate retry rules. I think there should be a way to configure the advertised capabilities, and also a way to configure the settings actually used for transmit. This is what we use for test-related use cases, but maybe there is not a great deal of general use for this type of thing. For general use, the 'transmit' settings are probably more useful. I do know that several ath10k users are forcing it back to /n mode which works around some bugs in their mesh setup. You can already set a fixed transmit rate or set the MCS rates allowed to be used (my supplicant, ath10k-ct driver/firmware is needed to take full advantage of this for ath10k). In upstream kernels, this will not much affect the advertised capabilities. I also have patches that allow setting the advertised rates and capabilities, so you can force a station to advertise only a/n rates even though it and peer have /AC capability. Those patches are not upstream, though if opinions are changed, I'd be happy to repost and try to get them upstream. Thanks, Ben I don't see how it can be used in real product, unless there is an external rate adaptation logic of some kind. But definitely it will be useful for testing, and can be used for WFA certification. -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: htt_rx: Fix signedness bug in ath10k_update_per_peer_tx_stats
On 10/05/2018 11:42 AM, Gustavo A. R. Silva wrote: Currently, the error handling for the call to function ath10k_get_legacy_rate_idx() doesn't work because *rate_idx* is of type u8 (8 bits, unsigned), which makes it impossible for it to hold a value less than 0. Fix this by changing the type of variable *rate_idx* to s8 (8 bits, signed). There are more than 127 rates, are you sure this is doing what you want? Thanks, Ben Addresses-Coverity-ID: 1473914 ("Unsigned compared against 0") Fixes: 0189dbd71cbd ("ath10k: get the legacy rate index to update the txrate table") Signed-off-by: Gustavo A. R. Silva --- drivers/net/wireless/ath/ath10k/htt_rx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c index f240525..edd0e74 100644 --- a/drivers/net/wireless/ath/ath10k/htt_rx.c +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c @@ -2753,7 +2753,8 @@ ath10k_update_per_peer_tx_stats(struct ath10k *ar, struct ath10k_per_peer_tx_stats *peer_stats) { struct ath10k_sta *arsta = (struct ath10k_sta *)sta->drv_priv; - u8 rate = 0, rate_idx = 0, sgi; + u8 rate = 0, sgi; + s8 rate_idx = 0; struct rate_info txrate; lockdep_assert_held(>data_lock); -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [RFC 2/2] ath10k: reporting estimated tx airtime for fairness
On 09/28/2018 03:47 PM, Rajkumar Manoharan wrote: On 2018-09-28 12:57, Ben Greear wrote: On 09/28/2018 12:47 PM, Rajkumar Manoharan wrote: On 2018-09-28 08:25, Toke Høiland-Jørgensen wrote: So this just uses the calculated airtime based on rate and size? Wasn't there supposed to be an airtime usage value reported by the firmware? :) Firmware interface changes are in progress. Airtime for sta/tid will be reported via htt tx-compl and rx ind messages. Meantime I thought it would be useful to use Kan's changes for ATF validation in ath10k using existing firmware. :) Maybe you can get the firmware guys to report the tx rate in the tx-completion (like I have been doing for years in my ath10k-ct firmware)? Then let the host do the air-time calculating? I'll give them firmware patches if the want :) Ben, As you know, it needs cleanup in firmware to free up space for new interface changes. Most of time we try to leverage rsvd/unused slots. I am aware of that you did a lot of clean up in CT firmware which is quite hard in official firmware as it also has to support prop. releases. Kalle can answer much better. There are hard ways to get more space in the firmware, but there are also some easier ones (un-used members in structs, better natural packing, and such). If there was a QCA firmware engineer that could promptly discuss these things with me and apply patches, I can feed them patches. And, the 10.4 firmware already has some extra space in its tx descriptor that can be used to report tx-status without much additional code or RAM. The wave-1 stuff needs some more serious hacking and does consume more memory. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Advertize beacon_int_min_gcd as 100 while bring up multi vaps
On 09/17/2018 11:33 PM, Maharaja Kennadyrajan wrote: With the latest firmware design, the beacon interval should be greater than 100 to bring the multiple vaps. Set beacon_int_min_gcd to 100, when the wmi service WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT is enabled in the firmware. If not, beacon_int_min_gcd will be set to the default value 1. Tested in QCA4019 with firmware ver 10.4-3.2.1.1-00015 Tested in QCA9888 with firmware ver 10.4-3.5.1-0005 Signed-off-by: Maharaja Kennadyrajan --- drivers/net/wireless/ath/ath10k/mac.c | 25 + drivers/net/wireless/ath/ath10k/wmi.h | 9 + 2 files changed, 34 insertions(+) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 97548f9..532fc5d 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -8163,6 +8163,24 @@ void ath10k_mac_destroy(struct ath10k *ar) }, }; +static const struct +ieee80211_iface_combination ath10k_10_4_bcn_int_if_comb[] = { + { + .limits = ath10k_10_4_if_limits, + .n_limits = ARRAY_SIZE(ath10k_10_4_if_limits), + .max_interfaces = 16, + .num_different_channels = 1, + .beacon_int_infra_match = true, + .beacon_int_min_gcd = 100, +#ifdef CONFIG_ATH10K_DFS_CERTIFIED + .radar_detect_widths = BIT(NL80211_CHAN_WIDTH_20_NOHT) | + BIT(NL80211_CHAN_WIDTH_20) | + BIT(NL80211_CHAN_WIDTH_40) | + BIT(NL80211_CHAN_WIDTH_80), +#endif + }, +}; + static void ath10k_get_arvif_iter(void *data, u8 *mac, struct ieee80211_vif *vif) { @@ -8526,6 +8544,13 @@ int ath10k_mac_register(struct ath10k *ar) ar->hw->wiphy->iface_combinations = ath10k_10_4_if_comb; ar->hw->wiphy->n_iface_combinations = ARRAY_SIZE(ath10k_10_4_if_comb); + if (test_bit(WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, +ar->wmi.svc_map)) { + ar->hw->wiphy->iface_combinations = + ath10k_10_4_bcn_int_if_comb; + ar->hw->wiphy->n_iface_combinations = + ARRAY_SIZE(ath10k_10_4_bcn_int_if_comb); + } break; case ATH10K_FW_WMI_OP_VERSION_UNSET: case ATH10K_FW_WMI_OP_VERSION_MAX: diff --git a/drivers/net/wireless/ath/ath10k/wmi.h b/drivers/net/wireless/ath/ath10k/wmi.h index 1562294..126eb17 100644 --- a/drivers/net/wireless/ath/ath10k/wmi.h +++ b/drivers/net/wireless/ath/ath10k/wmi.h @@ -204,6 +204,7 @@ enum wmi_service { WMI_SERVICE_RESET_CHIP, WMI_SERVICE_SPOOF_MAC_SUPPORT, WMI_SERVICE_TX_DATA_ACK_RSSI, + WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, /* keep last */ WMI_SERVICE_MAX, @@ -353,6 +354,11 @@ enum wmi_10_4_service { WMI_10_4_SERVICE_TPC_STATS_FINAL, WMI_10_4_SERVICE_CFR_CAPTURE_SUPPORT, WMI_10_4_SERVICE_TX_DATA_ACK_RSSI, + WMI_10_4_SERVICE_CFR_CAPTURE_IND_MSG_TYPE_LAGACY, That should end with "LEGACY" instead of "LAGACY" maybe? + WMI_10_4_SERVICE_PER_PACKET_SW_ENCRYPT, + WMI_10_4_SERVICE_PEER_TID_CONFIGS_SUPPORT, + WMI_10_4_SERVICE_VDEV_BCN_RATE_CONTROL, + WMI_10_4_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, }; static inline char *wmi_service_name(int service_id) @@ -467,6 +473,7 @@ static inline char *wmi_service_name(int service_id) SVCSTR(WMI_SERVICE_TPC_STATS_FINAL); SVCSTR(WMI_SERVICE_RESET_CHIP); SVCSTR(WMI_SERVICE_TX_DATA_ACK_RSSI); + SVCSTR(WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT); default: return NULL; } @@ -777,6 +784,8 @@ static inline void wmi_10_4_svc_map(const __le32 *in, unsigned long *out, WMI_SERVICE_TPC_STATS_FINAL, len); SVCMAP(WMI_10_4_SERVICE_TX_DATA_ACK_RSSI, WMI_SERVICE_TX_DATA_ACK_RSSI, len); + SVCMAP(WMI_10_4_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, + WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, len); } #undef SVCMAP Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: Multiple simultaneous channels on QCA9882
On 09/10/2018 07:37 AM, Fraser Cadger wrote: Hi, I am using a device based on the Atheros QCA9882 chipset. I would like to make use of the simultaneous dual-band functionality that the device provides. The goal is to have to interfaces with one acting as an Access Point (wlan0) and the other acting as a client of another AP (wlan1). Ideally, both of these interfaces should be operating on different channels. These radios can support both bands, but at any given time, they can be on only a single channel. You need multiple radios to run on multiple channels. Thanks, Ben As I have understood it, Atheros/Qualcomm market the QCA9882 as being capable of both dual-band and concurrent operation: https://www.qualcomm.com/news/releases/2012/02/23/qualcomm-atheros-launches-80211ac-product-ecosystem-provide-end-end-gigabit I am using v4.1 of the Linux kernel. Having experienced some difficulties using hostapd and wpa_supplicant, I checked the device capabilities using iw. If I am interpreting the output below correctly, it is only possible to use 1 channel on both interfaces: valid interface combinations: * #{ managed, P2P-client } <= 36, #{ P2P-GO } <= 3, #{ AP } <= 7, #{ IBSS } <= 1, total <= 36, #channels <= 1, STA/AP BI must match As I understand it, #channels <=1 indicates that only 1 channel may be used at a particular time by all interfaces. I am wondering if this is a limitation of the driver, or if there has been a misunderstanding in the capabilities of the hardware. Does anyone have experience with this? Regards, Fraser ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH] ath10k: Add support for configuring management packet rate
On 09/09/2018 10:39 PM, Sriram R wrote: By default the firmware uses 1Mbps and 6Mbps rate for management packets in 2G and 5G bands respectively. But when the user selects different basic rates from the userspace, we need to send the management packets at the lowest basic rate selected by the user. This change makes use of WMI_VDEV_PARAM_MGMT_RATE param for configuring the management packets rate to the firmware. At least some users like to be able to set the mgt rate to higher rates, and have been using a debugfs api in my driver patches to do this for some time. Maybe you would like to add support for something like this as well? Thanks, Ben Chipsets Tested : QCA988X, QCA9887, QCA9984 FW Tested : 10.2.4-1.0-41, 10.4-3.6.104 Signed-off-by: Sriram R --- drivers/net/wireless/ath/ath10k/mac.c | 45 +-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 496772d..0b2ca9e 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -157,6 +157,22 @@ u8 ath10k_mac_bitrate_to_idx(const struct ieee80211_supported_band *sband, return 0; } +static int ath10k_mac_get_rate_hw_value(int bitrate) +{ + int i; + u8 hw_value_prefix = 0; + + if (ath10k_mac_bitrate_is_cck(bitrate)) + hw_value_prefix = WMI_RATE_PREAMBLE_CCK << 6; + + for (i = 0; i < sizeof(ath10k_rates); i++) { + if (ath10k_rates[i].bitrate == bitrate) + return hw_value_prefix | ath10k_rates[i].hw_value; + } + + return -EINVAL; +} + static int ath10k_mac_get_max_vht_mcs_map(u16 mcs_map, int nss) { switch ((mcs_map >> (2 * nss)) & 0x3) { @@ -5452,9 +5468,10 @@ static void ath10k_bss_info_changed(struct ieee80211_hw *hw, struct cfg80211_chan_def def; u32 vdev_param, pdev_param, slottime, preamble; u16 bitrate, hw_value; - u8 rate; - int rateidx, ret = 0; + u8 rate, basic_rate_idx; + int rateidx, ret = 0, hw_rate_code; enum nl80211_band band; + const struct ieee80211_supported_band *sband; mutex_lock(>conf_mutex); @@ -5660,6 +5677,30 @@ static void ath10k_bss_info_changed(struct ieee80211_hw *hw, arvif->vdev_id, ret); } + if (changed & BSS_CHANGED_BASIC_RATES) { + if (WARN_ON(ath10k_mac_vif_chan(vif, ))) { + mutex_unlock(>conf_mutex); + return; + } + + sband = ar->hw->wiphy->bands[def.chan->band]; + basic_rate_idx = ffs(vif->bss_conf.basic_rates) - 1; + bitrate = sband->bitrates[basic_rate_idx].bitrate; + + hw_rate_code = ath10k_mac_get_rate_hw_value(bitrate); + if (hw_rate_code < 0) { + ath10k_warn(ar, "bitrate not supported %d\n", bitrate); + mutex_unlock(>conf_mutex); + return; + } + + vdev_param = ar->wmi.vdev_param->mgmt_rate; + ret = ath10k_wmi_vdev_set_param(ar, arvif->vdev_id, vdev_param, + hw_rate_code); + if (ret) + ath10k_warn(ar, "failed to set mgmt tx rate %d\n", ret); + } + mutex_unlock(>conf_mutex); } -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput
On 08/20/2018 05:46 AM, Toke Høiland-Jørgensen wrote: Arend van Spriel writes: + Eric On 8/10/2018 9:52 PM, Ben Greear wrote: On 08/10/2018 12:28 PM, Arend van Spriel wrote: On 8/10/2018 3:20 PM, Toke Høiland-Jørgensen wrote: Arend van Spriel writes: On 8/8/2018 9:00 PM, Peter Oh wrote: On 08/08/2018 03:40 AM, Wen Gong wrote: Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set the default value to 8, and ath10k will change it to 6. Then mac80211 will use the changed value 6 as sk_pacing_shift since 6 is the best value for tx throughput by test result. I don't think you can convince people with the numbers unless you provide latency along with the numbers and also measurement result on different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From users view point, I also agree on Toke that we cannot scarify latency for the small throughput improvement. Yeah. The wireless industry (admittedly that is me too :-p ) has been focused on just throughput long enough. Tell me about it ;) All the preaching about bufferbloat from Dave and others is (just) starting to sink in here and there. Yeah, I've noticed; this is good! Now as for the value of the sk_pacing_shift I think we agree it depends on the specific device so in that sense the api makes sense, but I think there are a lot of variables so I was wondering if we could introduce a sysctl parameter for it. Does that make sense? I'm not sure a sysctl parameter would make sense; for one thing, it would be global for the host, while different network interfaces will probably need different values. And for another, I don't think it's something a user can reasonably be expected to set correctly, and I think it *is* actually possible to pick a value that works well at the driver level. I not sure either. Do you think a user could come up with something like this (found here [1]): sysctl -w net.core.rmem_max=8388608 sysctl -w net.core.wmem_max=8388608 sysctl -w net.core.rmem_default=65536 sysctl -w net.core.wmem_default=65536 sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608' sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608' sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608' sysctl -w net.ipv4.route.flush=1 Now the page listing this config claims this is for use "on Linux 2.4+ for high-bandwidth applications". Beats me if it still is correct in 4.17. Anyway, sysctl is nice for parameterizing code that is built-in the kernel so you don't need to rebuild it. mac80211 tends to be a module in most distros so maybe sysctl is not a good fit. So lets agree on that. Picking a value at driver level may be possible, but a driver tends to support a number of different devices. So how do you see the picking work. Some static table with entries for the different devices? Some users are not going to care about latency, and for others, latency may be absolutely important and they don't care about bandwidth. So, it should be tunable. sysctl can support per network-device settings, right? Or, probably could use ethtool API to set a per-netdev value as well. That might be nice for other network devices as well, not just wifi. I was under the impression that the parameters are all global, but your statement made me look. I came across some references here [2] so I checked the kernel sources under net/ and found net/ipv4/devinet.c [3]. So that confirms it supports per-netdev settings. Yeah, I think that *if* this is to be made configurable, a per-netdev sysctl would be the way to go, with the driver being able to set the default. However, the reason I think it may not be worth it to expose this as a setting is that it is very much a case of diminishing returns. Once the buffer size is large enough that full aggregates can be built, increasing it further just adds latency with very little effect on throughput. Which means that fiddling with the parameter is not going to have a lot of effect, so it is not very useful to expose, which makes it not worth the added configuration complexity... If it were easy, it would already be correct. I think adding tuning knob and some documentation will allow users to more easily try different things and use what is best for them (and let the community at large know what works so maybe the defaults can be tweaked over time). Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k
Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput
On 08/10/2018 12:28 PM, Arend van Spriel wrote: On 8/10/2018 3:20 PM, Toke Høiland-Jørgensen wrote: Arend van Spriel writes: On 8/8/2018 9:00 PM, Peter Oh wrote: On 08/08/2018 03:40 AM, Wen Gong wrote: Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set the default value to 8, and ath10k will change it to 6. Then mac80211 will use the changed value 6 as sk_pacing_shift since 6 is the best value for tx throughput by test result. I don't think you can convince people with the numbers unless you provide latency along with the numbers and also measurement result on different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From users view point, I also agree on Toke that we cannot scarify latency for the small throughput improvement. Yeah. The wireless industry (admittedly that is me too :-p ) has been focused on just throughput long enough. Tell me about it ;) All the preaching about bufferbloat from Dave and others is (just) starting to sink in here and there. Yeah, I've noticed; this is good! Now as for the value of the sk_pacing_shift I think we agree it depends on the specific device so in that sense the api makes sense, but I think there are a lot of variables so I was wondering if we could introduce a sysctl parameter for it. Does that make sense? I'm not sure a sysctl parameter would make sense; for one thing, it would be global for the host, while different network interfaces will probably need different values. And for another, I don't think it's something a user can reasonably be expected to set correctly, and I think it *is* actually possible to pick a value that works well at the driver level. I not sure either. Do you think a user could come up with something like this (found here [1]): sysctl -w net.core.rmem_max=8388608 sysctl -w net.core.wmem_max=8388608 sysctl -w net.core.rmem_default=65536 sysctl -w net.core.wmem_default=65536 sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608' sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608' sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608' sysctl -w net.ipv4.route.flush=1 Now the page listing this config claims this is for use "on Linux 2.4+ for high-bandwidth applications". Beats me if it still is correct in 4.17. Anyway, sysctl is nice for parameterizing code that is built-in the kernel so you don't need to rebuild it. mac80211 tends to be a module in most distros so maybe sysctl is not a good fit. So lets agree on that. Picking a value at driver level may be possible, but a driver tends to support a number of different devices. So how do you see the picking work. Some static table with entries for the different devices? Some users are not going to care about latency, and for others, latency may be absolutely important and they don't care about bandwidth. So, it should be tunable. sysctl can support per network-device settings, right? Or, probably could use ethtool API to set a per-netdev value as well. That might be nice for other network devices as well, not just wifi. If the driver is configuring the defaults, it can know the hardware type, firmware revision, and lots of other info to make the best decision it can when registering the radio with the upper stacks. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com ___ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k