from:"Ben Greear"

Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-22 Thread Ben Greear


On 3/22/21 6:20 PM, Brian Norris wrote:

On Mon, Mar 22, 2021 at 4:58 PM Ben Greear  wrote:

On 7/22/20 6:00 AM, Felix Fietkau wrote:

On 2020-07-22 14:55, Johannes Berg wrote:

On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:


I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.


Would this be *after* all the mac80211 processing, i.e. in place of the
rx-up-to-stack?

Yes, it would run all the rx handlers normally and then put the
resulting skbs into a list instead of calling netif_receive_skb or
napi_gro_frags.


Whatever came of this?  I realized I'm running Felix's patch since his mt76
driver needs it.  Any chance it will go upstream?


If you're asking about $subject (moving NAPI/RX to a thread), this
landed upstream recently:
http://git.kernel.org/linus/adbb4fb028452b1b0488a1a7b66ab856cdf20715

It needs a bit of coaxing to work on a WiFi driver (including: WiFi
drivers tend to have a different netdev for NAPI than they expose to
/sys/class/net/), but it's there.

I'm not sure if people had something else in mind in the stuff you're
quoting though.


No, I got it confused with something Felix did:

https://github.com/greearb/mt76/blob/master/patches/0001-net-add-support-for-threaded-NAPI-polling.patch

Maybe the NAPI/RX to a thread thing superceded Felix's patch?

Thanks,
Ben



Brian




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

2021-03-22 Thread Ben Greear


On 7/22/20 6:00 AM, Felix Fietkau wrote:

On 2020-07-22 14:55, Johannes Berg wrote:

On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:


I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.


Would this be *after* all the mac80211 processing, i.e. in place of the
rx-up-to-stack?

Yes, it would run all the rx handlers normally and then put the
resulting skbs into a list instead of calling netif_receive_skb or
napi_gro_frags.


Whatever came of this?  I realized I'm running Felix's patch since his mt76
driver needs it.  Any chance it will go upstream?

Thanks,
Ben



- Felix




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: EAP AP/VLAN: multicast not send to client

2021-02-08 Thread Ben Greear


On 2/8/21 12:32 PM, Sven Eckelmann wrote:

On Sunday, 7 February 2021 18:42:42 CET Ben Greear wrote:

Somewhere along the way I fixed up raw transmit in my firmware, so possibly
only then will vlans really have a chance of working.


The first step was to disable the check which enables AP_VLAN conditional and
just enable it all the time.


I appreciate the work you put into this.  Looks like it is at least not a 
regression
in code that I added, but I guess I'll need to fix whatever bug/feature upstream
added to get it working.

I think I'll have a way to set up a testbed for this sometime soon, as part
of work on another project, so I'll try to debug it then.

Thanks,
Ben



I've started testing with firmware-5-full-community-commit-0317-cf4991294.bin
but it doesn't provide the raw support + per packet swcrypto. So I've tried to
switch to firmware-5-full-community-commit-1187-774502ee5.bin but it has
exactly the same with the raw mode - but at least advertises
WMI_SERVICE_PER_PACKET_SW_ENCRYPT.

So my first target was to figure out what was the first firmware with
WMI_SERVICE_PER_PACKET_SW_ENCRYPT. So you would guess that bisect would be
suitable for this - but no, the first step directly found a crashing version.
I should not complain so much -- just have to skip more and have no extra test
results regarding the mcast support for them. Here is the log until I found the
first one which is supposed to support WMI_SERVICE_PER_PACKET_SW_ENCRYPT:

# has_sw_encrypt:  
firmware-5-full-community-commit-1187-774502ee5.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-0317-cf4991294.bin
# skip:
firmware-5-full-community-commit-0775-bb7462f22.bin
# skip:
firmware-5-full-community-commit-0782-c66b3495b.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-0533-4597878a6.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-0885-2d9cfe00b.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-1045-817be3ee8.bin
# has_sw_encrypt:  
firmware-5-full-community-commit-1112-68b46f73e.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-1077-44c74a25a.bin
# has_sw_encrypt:  
firmware-5-full-community-commit-1093-3c7065550.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-1085-c1d37213a.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-1089-1fbfebf26.bin
# has_sw_encrypt:  
firmware-5-full-community-commit-1091-3aa26dbdd.bin
# no_sw_encrypt:   
firmware-5-full-community-commit-1090-7cfbf3e6a.bin
# first has_sw_encrypt commit: 
firmware-5-full-community-commit-1091-3aa26dbdd.bin

None of the firmware version seem to have working multicast tx.


And here are some (not so random) picked ones (just so nobody can say that I
didn't check in the other direction):

# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0425-a422b044f.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0371-157623ac0.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0344-8b9e4442a.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0331-5259fada9.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0324-e6723f0f6.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0321-814d9dc06.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0319-ef95e743e.bin
# no_per_patcket_sw_encrypt_no_mcast: 
firmware-5-full-community-commit-0318-51cd44bdd.bin

I didn't do a complete sweep of the builds but at the moment it looks a
little bit like there might not be a single one which supports multicast over
this setup. If you think there is a specific firmware version which I should
test then just say which version.

So I've decided to try the ath10k firmware blobs from Kalle's repository to
provide at least something useful for someone who also has this problem
and searches for a compatible version:

firmware blob | works | PER_PACKET_SW_ENCRYPT
--+---+--
3.2/firmware-5.bin_10.4-3.2-00080 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-4 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-5 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-7 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00015 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00018 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00023 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00024 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00026 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00028 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00029 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00031 | N | N
3.2.1/firmware-5.bin_10.4-3.2.1-00033 | N

Re: EAP AP/VLAN: multicast not send to client

2021-02-07 Thread Ben Greear


On 2/7/21 9:13 AM, Sven Eckelmann wrote:

On Sunday, 7 February 2021 17:50:11 CET Ben Greear wrote:

Here are the images:

http://www.candelatech.com/downloads/ath10k-4019-10-4b/bisect/


Thanks, will try to have look at them tomorrow evening. Can you confirm which
QCA ath10k version was used as the base for this one? I've read somewhere on
your page 10.4.3.3-25 - which doesn't seem to be in Kalles' repository. And my
original plan was to test the relevant QCA firmware first and check if the
problem might already be in the base version which you've used for your
builds. But maybe I will just start with the oldest one in you tree and check
if the problem is also there and based on the result decide how to continue.


I don't know exactly how qca versioning works, but my notes are that the initial
upstream wave-2 code was 3.5.3-00050, from back in 2018.

The first commit in the series should be very similar to stock FW, though 
perhaps
missing some feature flags.  Maybe try forcing the driver to try to allow
vlans...if it is missing feature flag only, that might work around it.

Somewhere along the way I fixed up raw transmit in my firmware, so possibly
only then will vlans really have a chance of working.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: EAP AP/VLAN: multicast not send to client

2021-02-07 Thread Ben Greear


On 2/2/21 5:57 AM, Sven Eckelmann wrote:

On Tuesday, 2 February 2021 14:27:01 CET Ben Greear wrote:

Sven, I can build you a series of firmware if you have interest in bisecting to 
see if
this is a regression?


If it is ok for you then I can go through various firmware builds. But it
could be that I can only start with the bisect at the end of the week. At
least today, I will have no time after work.

Kind regards,
Sven



Here are the images:

http://www.candelatech.com/downloads/ath10k-4019-10-4b/bisect/

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: bss channel survey timed out and then hardware became unavailable on IPQ40xx.

2021-02-02 Thread Ben Greear

kernel: [ 245.289214] []
(cfg80211_netdev_notifier_call [cfg80211]) from []
(notifier_call_chain+0x2c/0x6c)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.300349] []
(notifier_call_chain) from []
(raw_notifier_call_chain+0x18/0x20)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.310938] []
(raw_notifier_call_chain) from []
(__dev_close_many+0x44/0xc8)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.320140] []
(__dev_close_many) from [] (dev_close_many+0x60/0xdc)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.328902] []
(dev_close_many) from [] (dev_close+0x34/0x48)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.337136] []
(dev_close) from []
(cfg80211_shutdown_all_interfaces+0x58/0xa8 [cfg80211])

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.344631] []
(cfg80211_shutdown_all_interfaces [cfg80211]) from []
(ieee80211_reconfig+0x790/0xb64 [mac80211])

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.354494] []
(ieee80211_reconfig [mac80211]) from []
(ieee80211_restart_work+0xa8/0xb4 [mac80211])

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.366483] []
(ieee80211_restart_work [mac80211]) from []
(process_one_work+0x280/0x410)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.377258] []
(process_one_work) from [] (worker_thread+0x330/0x560)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.387062] []
(worker_thread) from [] (kthread+0x134/0x13c)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.395221] []
(kthread) from [] (ret_from_fork+0x14/0x2c)

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.402846] ---[ end
trace fb0fc35d89e4ff3e ]---

Tue Jan 26 18:09:34 2021 kern.info kernel: [ 245.409750] group-ap2: HW
problem - can not stop rx aggregation for 06:ef:c0:01:33:6b tid 0

Tue Jan 26 18:09:34 2021 kern.warn kernel: [ 245.414504] [
cut here ]

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: EAP AP/VLAN: multicast not send to client

2021-02-02 Thread Ben Greear


On 2/2/21 2:12 AM, Sven Eckelmann wrote:

On Tuesday, 2 February 2021 10:12:45 CET Sebastian Gottschall wrote:

mmh. l have a idea

try the following (this a patch in my tree) and check also the wmi
services for this service flag which might be a difference between these
firmwares

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -9003,10 +9003,10 @@ int ath10k_mac_register(struct ath10k *ar)

[...]

Thanks, for the idea. But this has no effect on the problem. I have also
attached the services and feature information (from ath10k-ct's perspective to
have hopefully a more complete look at the differences). And it seems both
have WMI_SERVICE_PER_PACKET_SW_ENCRYPT and Ben's firmware also
ATH10K_FW_FEATURE_CONSUME_BLOCK_ACK_CT (which would also have "enabled" this
code section).

The biggest difference (which would affect also the non-ct ath10k) would be in
wmi_services. Ben Greears firmware doesnt support:

* WMI_SERVICE_PEER_CACHING
* WMI_SERVICE_HTT_MGMT_TX_COMP_VALID_FLAGS
* WMI_SERVICE_HOST_DFS_CHECK_SUPPORT
* WMI_SERVICE_TPC_STATS_FINAL

Sven, I can build you a series of firmware if you have interest in bisecting to 
see if
this is a regression?

I'll get started on the builds, looks like last time I did a full build of 4019
commits was a while back...

Thanks,
Ben



Kind regards,
Sven




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: RX rate is wrong in 5.10? (bisected to: mac80211: receive and process S1G beacons)

2021-01-04 Thread Ben Greear

On 1/4/21 4:25 PM, Thomas Pedersen wrote:

Hi Ben,

On 2021-01-04 16:18, Ben Greear wrote:

On 1/4/21 8:18 AM, Ben Greear wrote:

Hello,

I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in
5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at
all in 'iw dev wlan1 station dump'.

I'm using 9984 NIC...

Anyone else see this?

After a bisect, the first bad commit shows this:

commit 09a740ce352e1a1d16b9984115514ba9a4f4704b (refs/bisect/bad)
Author: Thomas Pedersen
Date: Mon Sep 21 19:28:14 2020 -0700

mac80211: receive and process S1G beacons

S1G beacons are 802.11 Extension Frames, so the fixed
header part differs from regular beacons.

Add a handler to process S1G beacons and abstract out the
fetching of BSSID and element start locations in the
beacon body handler.

Signed-off-by: Thomas Pedersen
Link: https://lore.kernel.org/r/20200922022818.15855-14-tho...@adapt-ip.com
[don't rename, small coding style cleanups]
Signed-off-by: Johannes Berg

From a glance through the diff, I'm at a loss as to why it causes the
symptom. I manually
double-checked the bisect, an it appears correct.

What I see is that in the commit before this, I see a useful rx rate
(1.3Gbps for instance)
in 'iw dev wlan1 station dump', but in this bad commit, both show
6Mbps rate. (Tx rate
in ath10k is probably broken for other reasons, so I only bisected the
rx side issue.)

I'm using ath10k 9984 radio with firmware 10.4-3.9.0.2-00070 in station mode.

AP is an ath11k Hawkeye...

I'm using a 1Mbps UDP packet 'download' stream to make sure I'm seeing
rates for data frames
and not just management frames.

Sorry about that.

Any idea what might be the issue?

It may be fixed by
https://patchwork.kernel.org/project/linux-wireless/patch/1607483189-3891-1-git-send-email-wg...@codeaurora.org/

Yes, that fixes it. Looks like it is already in 5.10.4 stable, so I'll upgrade
to that.

Thanks for the quick hint.

Thanks,
Ben

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: RX rate is wrong in 5.10? (bisected to: mac80211: receive and process S1G beacons)

2021-01-04 Thread Ben Greear


On 1/4/21 8:18 AM, Ben Greear wrote:

Hello,

I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in
5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at
all in 'iw dev wlan1 station dump'.

I'm using 9984 NIC...

Anyone else see this?


After a bisect, the first bad commit shows this:

commit 09a740ce352e1a1d16b9984115514ba9a4f4704b (refs/bisect/bad)
Author: Thomas Pedersen 
Date:   Mon Sep 21 19:28:14 2020 -0700

mac80211: receive and process S1G beacons

S1G beacons are 802.11 Extension Frames, so the fixed
header part differs from regular beacons.

Add a handler to process S1G beacons and abstract out the
fetching of BSSID and element start locations in the
beacon body handler.

Signed-off-by: Thomas Pedersen 
Link: https://lore.kernel.org/r/20200922022818.15855-14-tho...@adapt-ip.com
[don't rename, small coding style cleanups]
Signed-off-by: Johannes Berg 

From a glance through the diff, I'm at a loss as to why it causes the symptom.  
I manually
double-checked the bisect, an it appears correct.

What I see is that in the commit before this, I see a useful rx rate (1.3Gbps 
for instance)
in 'iw dev wlan1 station dump', but in this bad commit, both show 6Mbps rate.  
(Tx rate
in ath10k is probably broken for other reasons, so I only bisected the rx side 
issue.)

I'm using ath10k 9984 radio with firmware 10.4-3.9.0.2-00070 in station mode.

AP is an ath11k Hawkeye...

I'm using a 1Mbps UDP packet 'download' stream to make sure I'm seeing rates 
for data frames
and not just management frames.

Any idea what might be the issue?

Thanks,
Ben



___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

RX rate is wrong in 5.10?

2021-01-04 Thread Ben Greear


Hello,

I noticed that RX rate is always 6Mbps when I use -ct firmware and -ct driver in
5.10, and on stock 5.10.0 driver and stock firmware, rx-rate does not show up at
all in 'iw dev wlan1 station dump'.

I'm using 9984 NIC...

Anyone else see this?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: skb_cb corruption in ath10k

2020-12-24 Thread Ben Greear


On 12/21/20 3:55 PM, Ben Greear wrote:

Hello,

I'm trying to figure out what changed in the last few kernels that is making:

struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
if (info->control.flags & IEEE80211_TX_CTRL_RATE_INJECT)
 /* why is code here all of a sudden */

in data frames in ath10k,
when, to the best of my knowledge, nothing should be setting that up in the 
stack.

My guess is that something is stepping on the cb field somewhere in ath10k,
but I am not sure where that might be at this point.

And it also appears mac80211 or maybe supplicant is setting the rate-inject 
flag on some mgt frames,
but I think that is a separate concern at this point.

If anyone has any ideas of likely points, please let me know.


This issue was me being confused about how the ath10k skb_cb sits in
the same memory as the iee skb_cb.  I just needed to reorder the
ath10k-skb-cb struct a bit to not clobber the control.flags area.

I also see no reason not to natually pack that stuct so that the
pointers are 8-byte aligned.  Any idea why it is force-packed
currently instead of using proper padding?

Thanks,
Ben



___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

skb_cb corruption in ath10k

2020-12-21 Thread Ben Greear


Hello,

I'm trying to figure out what changed in the last few kernels that is making:

struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
if (info->control.flags & IEEE80211_TX_CTRL_RATE_INJECT)
/* why is code here all of a sudden */

in data frames in ath10k,
when, to the best of my knowledge, nothing should be setting that up in the 
stack.

My guess is that something is stepping on the cb field somewhere in ath10k,
but I am not sure where that might be at this point.

And it also appears mac80211 or maybe supplicant is setting the rate-inject 
flag on some mgt frames,
but I think that is a separate concern at this point.

If anyone has any ideas of likely points, please let me know.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2] ath10k: Per-chain rssi should sum the secondary channels

2020-12-21 Thread Ben Greear


On 12/21/20 10:30 AM, Kalle Valo wrote:

gree...@candelatech.com wrote:


From: Ben Greear 

This makes per-chain RSSI be more consistent between HT20, HT40, HT80.
Instead of doing precise log math for adding dbm, I did a rough estimate,
it seems to work good enough.

Tested on ath10k-ct 9984 firmware.

Signed-off-by: Ben Greear 


Commented out code etc so I assume this is an RFC. Has anyone tested
this with upstream firmware?


I probably tweaked this patch since sending.  my wave-1 didn't work with this 
approach,
and in the end, to get a valid RSSI, I ended up reading the
per-chain noise-floor periodically and storing that so I could use proper 
noise-floor
instead of just -95.  I am not sure upstream firmware can support that, so 
probably
not worth adding just the sum logic unless someone can figure out how to get 
the noise
floor out of the firmware...

Thanks,
Ben



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery

2020-12-17 Thread Ben Greear


On 12/17/20 2:24 PM, Brian Norris wrote:

On Tue, Dec 15, 2020 at 10:23:33AM -0800, Ben Greear wrote:

On 12/15/20 9:21 AM, Youghandhar Chintala wrote:

From: Rakesh Pillai 

Currently in case of target hardware restart ,we just reconfig and
re-enable the security keys and enable the network queues to start
data traffic back from where it was interrupted.


Are there any known mac80211 radios/drivers that *can* support seamless 
restarts?

If not, then just could always enable this feature in mac80211?


I'm quite sure that iwlwifi intentionally supports a seamless restart.
 From my experience with dealing with user reports, I don't recall any
issues where restart didn't function as expected, unless there was some
deeper underlying failure (e.g., hardware/power failure; driver bugs /
lockups).

I don't have very good stats for ath10k/QCA6174, but it survives
our testing OK and I again don't recall any user-reported complaints in
this area. I'd say this is a weaker example though, as I don't have as
clear of data. (By contrast, ath10k/WCN399x, which Rakesh, et al, are
patching here, does not pass our tests at all, and clearly fails to
recover from "seamless" restarts, as noted in patch 3.)

I'd also note that we don't operate in AP mode -- only STA -- and IIRC
Ben, you've complained about AP mode in the past.


I complain about all sorts of things, but I'm usually running
station mode :)

Do you actually see iwlwifi stations stay associated through
firmware crashes?

Anyway, happy to hear some have seamless recovery, and in that case,
I have no objections to the patch.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 0/3] mac80211: Trigger disconnect for STA during recovery

2020-12-15 Thread Ben Greear


On 12/15/20 9:21 AM, Youghandhar Chintala wrote:

From: Rakesh Pillai 

Currently in case of target hardware restart ,we just reconfig and
re-enable the security keys and enable the network queues to start
data traffic back from where it was interrupted.


Are there any known mac80211 radios/drivers that *can* support seamless 
restarts?

If not, then just could always enable this feature in mac80211?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v3] ath10k: add flag to protect napi operation to avoid dead loop hang

2020-12-09 Thread Ben Greear


On 12/9/20 1:24 AM, Kalle Valo wrote:

Wen Gong  writes:


On 2020-09-08 00:22, Kalle Valo wrote:


Just like with the recent firmware restart patch, isn't
ar->napi_enabled
racy? Wouldn't test_and_set_bit() and test_and_clear_bit() be safer?

Or are we holding a lock? But then that should be documented with
lockdep_assert_held().


yes, ath10k_hif_start is only called from ath10k_core_start, it has
"lockdep_assert_held(>conf_mutex)", and ath10k_hif_stop is only
called from ath10k_core_stop, it also has
"lockdep_assert_held(>conf_mutex)". then it will not 2 thread both
enter ath10k_hif_start/ath10k_hif_stop meanwhile.


Ok, but every function depending on a lock being held should still call
lockdep_assert_held(), that way we can catch the bug if locking changes
later. So it's not enough that ath10k_core_stop() has
lockdep_assert_held(), also these napi functions should have it.

I actually decided to switch using ATH10K_FLAG_NAPI_ENABLED with
set_bit() & co, simpler locking that way and no lockdep_assert_held()
needed anymore. Please check my changes in the pending branch, I have
only compile tested them:

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=pending=e0a466d296bd862080f7796b41349f9f586272c9



Why do you not need locking?  You can't just check a bit is set and then do 
work and set
it later without locking, two concurrent CPU threads can pass the first check 
and both get into
the logic below it?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"

2020-09-15 Thread Ben Greear


Hello Zhi,

Do you know of any ways to detect in the driver what platforms need your patch 
and what ones
break with it?  Otherwise, we're stuck with external config (which is what I 
added
so far as work-around).

Thanks,
Ben

On 9/8/20 9:02 PM, Ben Greear wrote:

Please see this bug report, and feel free to ask the reporter for more details 
if you
don't find everything you need there.  Seems a basic ping test reproduces 
packet loss
in their case...

  https://github.com/greearb/ath10k-ct/issues/153

I don't actually have the platform in question.

Thanks,
Ben

On 9/8/20 7:04 PM, Zhi Chen wrote:

Hi Ben,
   Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ 
STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced?


Thanks,
Zhi

On 2020-09-09 01:48, Ben Greear wrote:

Hello,

Just FYI:  I added this patch to my ath10k-ct driver, and a user
reported it causes
regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:

[   21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target
0x0100 chip_id 0x sub :
[   21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1
tracing 0 dfs 1 testmode 0
[   21.233298] ath10k_pci :00:00.0: firmware ver
10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features
mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 


crc32 a00b5f36
[   21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20
crc32 5bb32c02[   23.546156] ath10k_pci :00:00.0: unsupported HTC
service id: 1536

I'll revert this for the 9888 chipset (at least) in my driver,
possibly you need to do similar.

https://github.com/greearb/ath10k-ct/issues/153

Thanks,
Ben

On 1/13/20 8:35 PM, Zhi Chen wrote:

This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
PCIe hung issue was observed on multiple platforms. The issue was reproduced
when DUT was configured as AP and associated with 50+ STAs.

For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
of the RD/WR access to the HOST MEM.
0 - No split , RAW read/write transfer size from MAC is put out on bus
 as burst length
1 - Split at 256 byte boundary
2,3 - Reserved

With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
issue happened. It broke PCIe spec and caused PCIe stuck. So revert
the default value from 0 to 1.

Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
  QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
  Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040

Signed-off-by: Zhi Chen 
---
v2: restored 10.2 register configuration
v3: modified commit message
v4: resolved conflicts
---
  drivers/net/wireless/ath/ath10k/hw.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
b/drivers/net/wireless/ath/ath10k/hw.h
index 21b7a2a..775fd62 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
    #define TARGET_10_4_TX_DBG_LOG_SIZE    1024
  #define TARGET_10_4_NUM_WDS_ENTRIES    32
-#define TARGET_10_4_DMA_BURST_SIZE    0
+#define TARGET_10_4_DMA_BURST_SIZE    1
  #define TARGET_10_4_MAC_AGGR_DELIM    0
  #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
  #define TARGET_10_4_VOW_CONFIG    0









___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"

2020-09-08 Thread Ben Greear


Please see this bug report, and feel free to ask the reporter for more details 
if you
don't find everything you need there.  Seems a basic ping test reproduces 
packet loss
in their case...

 https://github.com/greearb/ath10k-ct/issues/153

I don't actually have the platform in question.

Thanks,
Ben

On 9/8/20 7:04 PM, Zhi Chen wrote:

Hi Ben,
   Thanks for your information. The DMA issue is host related. We never hit this issue with X86 platform. And it was only seen in stress cases with 50+ 
STAs(association and disassociation repeatedly). What's the host platform you are using? And how was the issue reproduced?


Thanks,
Zhi

On 2020-09-09 01:48, Ben Greear wrote:

Hello,

Just FYI:  I added this patch to my ath10k-ct driver, and a user
reported it causes
regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:

[   21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target
0x0100 chip_id 0x sub :
[   21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1
tracing 0 dfs 1 testmode 0
[   21.233298] ath10k_pci :00:00.0: firmware ver
10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features
mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 


crc32 a00b5f36
[   21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20
crc32 5bb32c02[   23.546156] ath10k_pci :00:00.0: unsupported HTC
service id: 1536

I'll revert this for the 9888 chipset (at least) in my driver,
possibly you need to do similar.

https://github.com/greearb/ath10k-ct/issues/153

Thanks,
Ben

On 1/13/20 8:35 PM, Zhi Chen wrote:

This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
PCIe hung issue was observed on multiple platforms. The issue was reproduced
when DUT was configured as AP and associated with 50+ STAs.

For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
of the RD/WR access to the HOST MEM.
0 - No split , RAW read/write transfer size from MAC is put out on bus
 as burst length
1 - Split at 256 byte boundary
2,3 - Reserved

With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
issue happened. It broke PCIe spec and caused PCIe stuck. So revert
the default value from 0 to 1.

Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
  QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
  Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040

Signed-off-by: Zhi Chen 
---
v2: restored 10.2 register configuration
v3: modified commit message
v4: resolved conflicts
---
  drivers/net/wireless/ath/ath10k/hw.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
b/drivers/net/wireless/ath/ath10k/hw.h
index 21b7a2a..775fd62 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
    #define TARGET_10_4_TX_DBG_LOG_SIZE    1024
  #define TARGET_10_4_NUM_WDS_ENTRIES    32
-#define TARGET_10_4_DMA_BURST_SIZE    0
+#define TARGET_10_4_DMA_BURST_SIZE    1
  #define TARGET_10_4_MAC_AGGR_DELIM    0
  #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
  #define TARGET_10_4_VOW_CONFIG    0






--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH,v4] Revert "ath10k: fix DMA related firmware crashes on multiple devices"

2020-09-08 Thread Ben Greear


Hello,

Just FYI:  I added this patch to my ath10k-ct driver, and a user reported it 
causes
regressions on his particular 9888 system when using ath10k-ct wave-2 firmware:

[   21.204868] ath10k_pci :00:00.0: qca9888 hw2.0 target 0x0100 chip_id 
0x sub :
[   21.214437] ath10k_pci :00:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 
1 testmode 0
[   21.233298] ath10k_pci :00:00.0: firmware ver 10.4b-ct-9888-tH-13-8c5b2baa2 api 5 features 
mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT 
crc32 a00b5f36

[   21.596684] ath10k_pci :00:00.0: board_file api 2 bmi_id 0:20 crc32 
5bb32c02[   23.546156] ath10k_pci :00:00.0: unsupported HTC service id: 1536

I'll revert this for the 9888 chipset (at least) in my driver, possibly you 
need to do similar.

https://github.com/greearb/ath10k-ct/issues/153

Thanks,
Ben

On 1/13/20 8:35 PM, Zhi Chen wrote:

This reverts commit 76d164f582150fd0259ec0fcbc485470bcd8033e.
PCIe hung issue was observed on multiple platforms. The issue was reproduced
when DUT was configured as AP and associated with 50+ STAs.

For QCA9984/QCA9888, the DMA_BURST_SIZE register controls the AXI burst size
of the RD/WR access to the HOST MEM.
0 - No split , RAW read/write transfer size from MAC is put out on bus
 as burst length
1 - Split at 256 byte boundary
2,3 - Reserved

With PCIe protocol analyzer, we can see DMA Read crossing 4KB boundary when
issue happened. It broke PCIe spec and caused PCIe stuck. So revert
the default value from 0 to 1.

Tested:  IPQ8064 + QCA9984 with firmware 10.4-3.10-00047
  QCS404 + QCA9984 with firmware 10.4-3.9.0.2--00044
  Synaptics AS370 + QCA9888  with firmware 10.4-3.9.0.2--00040

Signed-off-by: Zhi Chen 
---
v2: restored 10.2 register configuration
v3: modified commit message
v4: resolved conflicts
---
  drivers/net/wireless/ath/ath10k/hw.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
b/drivers/net/wireless/ath/ath10k/hw.h
index 21b7a2a..775fd62 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -816,7 +816,7 @@ ath10k_is_rssi_enable(struct ath10k_hw_params *hw,
  
  #define TARGET_10_4_TX_DBG_LOG_SIZE		1024

  #define TARGET_10_4_NUM_WDS_ENTRIES   32
-#define TARGET_10_4_DMA_BURST_SIZE 0
+#define TARGET_10_4_DMA_BURST_SIZE 1
  #define TARGET_10_4_MAC_AGGR_DELIM0
  #define TARGET_10_4_RX_SKIP_DEFRAG_TIMEOUT_DUP_DETECTION_CHECK 1
  #define TARGET_10_4_VOW_CONFIG0




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

2020-09-07 Thread Ben Greear


On 9/7/20 9:07 AM, Kalle Valo wrote:

Ben Greear  writes:


Here is my original patch to fix this, it is not complex.

https://patchwork.kernel.org/patch/10249363/

Sure, I have shared your patch above :).

Sent a bit early, any idea why this wasn't upstreamed earlier?


No, one comment from Michal indicated maybe there were more problems lurking
in this area, but he seemed to be OK with the patch over all.  After that,
it was just ignored.


Now might be a good time to push for it :)



It is generally a waste of time in my experience.  Kalle is the maintainer and 
should
be seeing any of this he cares to see.  If he likes the patch, he can apply it 
or
something similar.  If you have a reproducible test case, see if the patch fixes
things, that might help it be accepted.


The problem with yours (Ben's) patches is that you have your own set of
patches for ath10k and your own firmware. So I cannot know at all if
your patches work with upstream ath10k and upstream firmware, and would
need to test the patches myself. But nowadays I just can't find the time
for testing. So if someone else can do the testing and provide a
Tested-on tag it would it increase my confidence level for the patches.


Surely codeaura could get a few entry level engineers to run basic testing 
against
your target platforms on a regular basis?  The several years of time this bug 
was
known (to me at least, and to whoever saw my original patch) and the time wasted
by codeaura to rediscover and re-fix the bug would have much better been spent 
just
testing and review my patch to begin with.  And not just my patches either, this
pattern is far and wide in ath10k.

Also, my driver is often tested against various upstream QCA firmware and 
chipsets in openwrt,
so while bugs are always possible, there is some test coverage.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

2020-08-20 Thread Ben Greear


On 8/20/20 1:15 PM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 11:23 PM Ben Greear  wrote:


On 8/20/20 10:42 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
 wrote:


On Thu, Aug 20, 2020 at 10:38 PM Ben Greear  wrote:


On 8/20/20 10:00 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 10:02 PM Ben Greear  wrote:


On 8/20/20 9:08 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 8:07 PM Wen Gong  wrote:


On 2020-08-20 18:52, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 3:45 PM Wen Gong  wrote:


On 2020-08-20 17:19, Krishna Chaitanya wrote:

...

I'm not really convinced that this is the right fix, but I'm no NAPI
expert. Can anyone else help?

Calling napi_disable() twice can lead to hangs, but moving NAPI from
start/stop to
the probe isn't the right approach as the datapath is tied to
start/stop.

Maybe check the state of NAPI before disable?

 if (test_bit(NAPI_STATE_SCHED, >napi.napi.state))
  napi_disable(>napi)
or maintain napi_state like this
https://patchwork.kernel.org/patch/10249365/

it is better to use above link's patch.
napi.state is controlled by napi API, it is better ath10k not know it.

Sure, but IMHO just canceling the async rx work should solve the issue.

Oh no, canceling the async rx work will not solve this issue, rx worker
ath10k_rx_indication_async_work call napi_schedule, after napi_complete,
the NAPI_STATE_SCHED will clear.
The issue of this patch is because 2 thread called to hif_stop and
NAPI_STATE_SCHED not clear.

That fix is still valid and good to have.

ndev_stop being called twice is typical scenarios (stop vs rmmod), so
 just checking the netdev_flags for IFF_UP and returning from hif_Stop
should suffice, no?


My approach to fix this problem was to add a boolean in ath10k as to whether
it had napi enabled or not, and then check that before trying to enable/disable
it again.  Seems to work fine, and cleaner in my mind than checking internal
napi flags.

A much simpler approach is just to check for IFF_UP and skip NAPI (and others)
in the hif_stop no? (provided proper RTNL locking is done if hif_stop
is being called
internally as well).



I'm not sure, but I think the driver should be internally consistent and not
spend a lot of time trying to guess about interactions with objects higher
in the stack.

Fair enough, the network interface state is a basic thing controlled
by the driver,
so, should be okay to use. Anyways, the in-driver approach has more control.


Here is my original patch to fix this, it is not complex.

https://patchwork.kernel.org/patch/10249363/

Sure, I have shared your patch above :).

Sent a bit early, any idea why this wasn't upstreamed earlier?


No, one comment from Michal indicated maybe there were more problems lurking
in this area, but he seemed to be OK with the patch over all.  After that,
it was just ignored.


Now might be a good time to push for it :)



It is generally a waste of time in my experience.  Kalle is the maintainer and 
should
be seeing any of this he cares to see.  If he likes the patch, he can apply it 
or
something similar.  If you have a reproducible test case, see if the patch fixes
things, that might help it be accepted.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

2020-08-20 Thread Ben Greear


On 8/20/20 10:42 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
 wrote:


On Thu, Aug 20, 2020 at 10:38 PM Ben Greear  wrote:


On 8/20/20 10:00 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 10:02 PM Ben Greear  wrote:


On 8/20/20 9:08 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 8:07 PM Wen Gong  wrote:


On 2020-08-20 18:52, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 3:45 PM Wen Gong  wrote:


On 2020-08-20 17:19, Krishna Chaitanya wrote:

...

I'm not really convinced that this is the right fix, but I'm no NAPI
expert. Can anyone else help?

Calling napi_disable() twice can lead to hangs, but moving NAPI from
start/stop to
the probe isn't the right approach as the datapath is tied to
start/stop.

Maybe check the state of NAPI before disable?

if (test_bit(NAPI_STATE_SCHED, >napi.napi.state))
 napi_disable(>napi)
or maintain napi_state like this
https://patchwork.kernel.org/patch/10249365/

it is better to use above link's patch.
napi.state is controlled by napi API, it is better ath10k not know it.

Sure, but IMHO just canceling the async rx work should solve the issue.

Oh no, canceling the async rx work will not solve this issue, rx worker
ath10k_rx_indication_async_work call napi_schedule, after napi_complete,
the NAPI_STATE_SCHED will clear.
The issue of this patch is because 2 thread called to hif_stop and
NAPI_STATE_SCHED not clear.

That fix is still valid and good to have.

ndev_stop being called twice is typical scenarios (stop vs rmmod), so
just checking the netdev_flags for IFF_UP and returning from hif_Stop
should suffice, no?


My approach to fix this problem was to add a boolean in ath10k as to whether
it had napi enabled or not, and then check that before trying to enable/disable
it again.  Seems to work fine, and cleaner in my mind than checking internal
napi flags.

A much simpler approach is just to check for IFF_UP and skip NAPI (and others)
in the hif_stop no? (provided proper RTNL locking is done if hif_stop
is being called
internally as well).



I'm not sure, but I think the driver should be internally consistent and not
spend a lot of time trying to guess about interactions with objects higher
in the stack.

Fair enough, the network interface state is a basic thing controlled
by the driver,
so, should be okay to use. Anyways, the in-driver approach has more control.


Here is my original patch to fix this, it is not complex.

https://patchwork.kernel.org/patch/10249363/

Sure, I have shared your patch above :).

Sent a bit early, any idea why this wasn't upstreamed earlier?


No, one comment from Michal indicated maybe there were more problems lurking
in this area, but he seemed to be OK with the patch over all.  After that,
it was just ignored.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

2020-08-20 Thread Ben Greear


On 8/20/20 10:00 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 10:02 PM Ben Greear  wrote:


On 8/20/20 9:08 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 8:07 PM Wen Gong  wrote:


On 2020-08-20 18:52, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 3:45 PM Wen Gong  wrote:


On 2020-08-20 17:19, Krishna Chaitanya wrote:

...

I'm not really convinced that this is the right fix, but I'm no NAPI
expert. Can anyone else help?

Calling napi_disable() twice can lead to hangs, but moving NAPI from
start/stop to
the probe isn't the right approach as the datapath is tied to
start/stop.

Maybe check the state of NAPI before disable?

   if (test_bit(NAPI_STATE_SCHED, >napi.napi.state))
napi_disable(>napi)
or maintain napi_state like this
https://patchwork.kernel.org/patch/10249365/

it is better to use above link's patch.
napi.state is controlled by napi API, it is better ath10k not know it.

Sure, but IMHO just canceling the async rx work should solve the issue.

Oh no, canceling the async rx work will not solve this issue, rx worker
ath10k_rx_indication_async_work call napi_schedule, after napi_complete,
the NAPI_STATE_SCHED will clear.
The issue of this patch is because 2 thread called to hif_stop and
NAPI_STATE_SCHED not clear.

That fix is still valid and good to have.

ndev_stop being called twice is typical scenarios (stop vs rmmod), so
   just checking the netdev_flags for IFF_UP and returning from hif_Stop
should suffice, no?


My approach to fix this problem was to add a boolean in ath10k as to whether
it had napi enabled or not, and then check that before trying to enable/disable
it again.  Seems to work fine, and cleaner in my mind than checking internal
napi flags.

A much simpler approach is just to check for IFF_UP and skip NAPI (and others)
in the hif_stop no? (provided proper RTNL locking is done if hif_stop
is being called
internally as well).



I'm not sure, but I think the driver should be internally consistent and not
spend a lot of time trying to guess about interactions with objects higher
in the stack.

Here is my original patch to fix this, it is not complex.

https://patchwork.kernel.org/patch/10249363/

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

2020-08-20 Thread Ben Greear


On 8/20/20 9:08 AM, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 8:07 PM Wen Gong  wrote:


On 2020-08-20 18:52, Krishna Chaitanya wrote:

On Thu, Aug 20, 2020 at 3:45 PM Wen Gong  wrote:


On 2020-08-20 17:19, Krishna Chaitanya wrote:

...

I'm not really convinced that this is the right fix, but I'm no NAPI
expert. Can anyone else help?

Calling napi_disable() twice can lead to hangs, but moving NAPI from
start/stop to
the probe isn't the right approach as the datapath is tied to
start/stop.

Maybe check the state of NAPI before disable?

  if (test_bit(NAPI_STATE_SCHED, >napi.napi.state))
   napi_disable(>napi)
or maintain napi_state like this
https://patchwork.kernel.org/patch/10249365/

it is better to use above link's patch.
napi.state is controlled by napi API, it is better ath10k not know it.

Sure, but IMHO just canceling the async rx work should solve the issue.

Oh no, canceling the async rx work will not solve this issue, rx worker
ath10k_rx_indication_async_work call napi_schedule, after napi_complete,
the NAPI_STATE_SCHED will clear.
The issue of this patch is because 2 thread called to hif_stop and
NAPI_STATE_SCHED not clear.

That fix is still valid and good to have.

ndev_stop being called twice is typical scenarios (stop vs rmmod), so
  just checking the netdev_flags for IFF_UP and returning from hif_Stop
should suffice, no?


My approach to fix this problem was to add a boolean in ath10k as to whether
it had napi enabled or not, and then check that before trying to enable/disable
it again.  Seems to work fine, and cleaner in my mind than checking internal
napi flags.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: WMM not working for mcast packets on ath10k

2020-08-11 Thread Ben Greear


On 8/11/20 7:36 AM, Ahmed Zaki wrote:

On Tue, 11 Aug 2020 at 16:01, Ben Greear  wrote:


On 8/11/20 5:22 AM, Ahmed Zaki wrote:

On Tue, 11 Aug 2020 at 01:05, Ben Greear  wrote:


On 8/10/20 3:08 PM, Ahmed Zaki wrote:

Hello,

I have 2 ath10k devices set in Mesh Point mode. When I use iptables'
DSCP target to set the WMM AC for some traffic ports to VO, the rules
work fine but only for unicast packets. All broadcast on the target
UDP ports goes out as BE, and not VO.

With some ath10k debugging enabled, it seems that the htt is indeed
sending the tx descriptor with the correct tid (6) set.

Is this the intended behavior for some reason? If not, is there any
more debugging on the TX path that I can do?


I don't think that WMM makes sense for bcast frames since they go out on
a special TID that does not do aggregation or per peer QoS settings.


I get that, but we want to give some WMM priority to bcast as they tend to have
higher probability of loss under high network loads.

That special TID that you mentioned, is it set in the ath10k fw?
Because I traced the
htt and it is sending the intended tid in the txbuf flags, but bcast
still come 0 (BE) on air.
Or am I missing something in the driver?

Thanks again for help.


Maybe you can convert bcast to unicast in the driver/mac80211?  There are no 
retries
or proper rate-ctrl for bcast either, so I don't think you can get great 
performance
for bcast.



The bcast that we want to send with higher WMM are kind of KEEPALIVES,
small and
infrequent but important for network stability. So, we do not care
about aggregation, rate-ctrl
or higher MCS, just lower probability of being missed. They are, by
design, sent to all
neighbors and cannot be sent as unicast.

Sorry to ask again, but without access to the fw I cannot know the
answer to this: are you aware
of anything in the fw that can override the tid sent from the driver
if the destination is bcast?


There is no way to put bcast frames on the high priority queues.
If you run a sniffer near the transmitter, do you actually see that
your bcast frames are dropped before being put on air?

The ath10k driver probably has some tx status that indicates whether it
was dropped before being put on air as well...have you checked that?

You can send your frames more often if you want better chance of some
of them getting through.

Thanks,
Ben



Thanks,




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: WMM not working for mcast packets on ath10k

2020-08-11 Thread Ben Greear


On 8/11/20 5:22 AM, Ahmed Zaki wrote:

On Tue, 11 Aug 2020 at 01:05, Ben Greear  wrote:


On 8/10/20 3:08 PM, Ahmed Zaki wrote:

Hello,

I have 2 ath10k devices set in Mesh Point mode. When I use iptables'
DSCP target to set the WMM AC for some traffic ports to VO, the rules
work fine but only for unicast packets. All broadcast on the target
UDP ports goes out as BE, and not VO.

With some ath10k debugging enabled, it seems that the htt is indeed
sending the tx descriptor with the correct tid (6) set.

Is this the intended behavior for some reason? If not, is there any
more debugging on the TX path that I can do?


I don't think that WMM makes sense for bcast frames since they go out on
a special TID that does not do aggregation or per peer QoS settings.


I get that, but we want to give some WMM priority to bcast as they tend to have
higher probability of loss under high network loads.

That special TID that you mentioned, is it set in the ath10k fw?
Because I traced the
htt and it is sending the intended tid in the txbuf flags, but bcast
still come 0 (BE) on air.
Or am I missing something in the driver?

Thanks again for help.


Maybe you can convert bcast to unicast in the driver/mac80211?  There are no 
retries
or proper rate-ctrl for bcast either, so I don't think you can get great 
performance
for bcast.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: WMM not working for mcast packets on ath10k

2020-08-10 Thread Ben Greear


On 8/10/20 3:08 PM, Ahmed Zaki wrote:

Hello,

I have 2 ath10k devices set in Mesh Point mode. When I use iptables'
DSCP target to set the WMM AC for some traffic ports to VO, the rules
work fine but only for unicast packets. All broadcast on the target
UDP ports goes out as BE, and not VO.

With some ath10k debugging enabled, it seems that the htt is indeed
sending the tx descriptor with the correct tid (6) set.

Is this the intended behavior for some reason? If not, is there any
more debugging on the TX path that I can do?


I don't think that WMM makes sense for bcast frames since they go out on
a special TID that does not do aggregation or per peer QoS settings.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 1/3] ath10k: Add history for tracking certain events

2020-07-31 Thread Ben Greear


On 7/31/20 11:27 AM, Rakesh Pillai wrote:

Add history for tracking the below events
- register read
- register write
- IRQ trigger
- NAPI poll
- CE service
- WMI cmd
- WMI event
- WMI tx completion

This will help in debugging any crash or any
improper behaviour.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai 
---
  drivers/net/wireless/ath/ath10k/ce.c  |   1 +
  drivers/net/wireless/ath/ath10k/core.h|  74 +
  drivers/net/wireless/ath/ath10k/debug.c   | 133 ++
  drivers/net/wireless/ath/ath10k/debug.h   |  74 +
  drivers/net/wireless/ath/ath10k/snoc.c|  15 +++-
  drivers/net/wireless/ath/ath10k/wmi-tlv.c |   1 +
  drivers/net/wireless/ath/ath10k/wmi.c |  10 +++
  7 files changed, 307 insertions(+), 1 deletion(-)




+void ath10k_record_wmi_event(struct ath10k *ar, enum ath10k_wmi_type type,
+u32 id, unsigned char *data)
+{
+   struct ath10k_wmi_event_entry *entry;
+   u32 idx;
+
+   if (type == ATH10K_WMI_EVENT) {
+   if (!ar->wmi_event_history.record)
+   return;


This check above is duplicated below, add it once at top of the method
instead.


+
+   spin_lock_bh(>wmi_event_history.hist_lock);
+   idx = ath10k_core_get_next_idx(>reg_access_history.index,
+  
ar->wmi_event_history.max_entries);
+   spin_unlock_bh(>wmi_event_history.hist_lock);
+   entry = >wmi_event_history.record[idx];
+   } else {
+   if (!ar->wmi_cmd_history.record)
+   return;
+
+   spin_lock_bh(>wmi_cmd_history.hist_lock);
+   idx = ath10k_core_get_next_idx(>reg_access_history.index,
+  ar->wmi_cmd_history.max_entries);
+   spin_unlock_bh(>wmi_cmd_history.hist_lock);
+   entry = >wmi_cmd_history.record[idx];
+   }
+
+   entry->timestamp = ath10k_core_get_timestamp();
+   entry->cpu_id = smp_processor_id();
+   entry->type = type;
+   entry->id = id;
+   memcpy(>data, data + 4, ATH10K_WMI_DATA_LEN);
+}
+EXPORT_SYMBOL(ath10k_record_wmi_event);



@@ -1660,6 +1668,11 @@ static int ath10k_snoc_probe(struct platform_device 
*pdev)
ar->ce_priv = _snoc->ce;
msa_size = drv_data->msa_size;
  
+	ath10k_core_reg_access_history_init(ar, ATH10K_REG_ACCESS_HISTORY_MAX);

+   ath10k_core_wmi_event_history_init(ar, ATH10K_WMI_EVENT_HISTORY_MAX);
+   ath10k_core_wmi_cmd_history_init(ar, ATH10K_WMI_CMD_HISTORY_MAX);
+   ath10k_core_ce_event_history_init(ar, ATH10K_CE_EVENT_HISTORY_MAX);


Maybe only enable this once user turns it on?  It sucks up a bit of memory?


+
ath10k_snoc_quirks_init(ar);
  
  	ret = ath10k_snoc_resource_init(ar);

diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c 
b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index 932266d..9df5748 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -627,6 +627,7 @@ static void ath10k_wmi_tlv_op_rx(struct ath10k *ar, struct 
sk_buff *skb)
if (skb_pull(skb, sizeof(struct wmi_cmd_hdr)) == NULL)
goto out;
  
+	ath10k_record_wmi_event(ar, ATH10K_WMI_EVENT, id, skb->data);

trace_ath10k_wmi_event(ar, id, skb->data, skb->len);
  
  	consumed = ath10k_tm_event_wmi(ar, id, skb);

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index a81a1ab..8ebd05c 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1802,6 +1802,15 @@ struct sk_buff *ath10k_wmi_alloc_skb(struct ath10k *ar, 
u32 len)
  
  static void ath10k_wmi_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb)

  {
+   struct wmi_cmd_hdr *cmd_hdr;
+   enum wmi_tlv_event_id id;
+
+   cmd_hdr = (struct wmi_cmd_hdr *)skb->data;
+   id = MS(__le32_to_cpu(cmd_hdr->cmd_id), WMI_CMD_HDR_CMD_ID);
+
+   ath10k_record_wmi_event(ar, ATH10K_WMI_TX_COMPL, id,
+   skb->data + sizeof(struct wmi_cmd_hdr));
+
dev_kfree_skb(skb);
  }


I think guard the above new code with if 
(unlikely(ar->ce_event_history.record)) { ... }

All in all, I think I'd want to compile this out (while leaving other debug 
compiled
in) since it seems this stuff would be rarely used and it adds method calls to 
hot
paths.

That is a decision for Kalle though, so see what he says...

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: duplicate authentications / excessive missing ACKs / deauth due to inactivity timer

2020-07-22 Thread Ben Greear


On 7/22/20 7:18 AM, Leon M. George wrote:

Hi :-)

I've encountered connection issues that appear to be strongly related to the
problem in this thread [1] (found via the arch linux bug tracker [2]).


If this is ath10k-ct only problem, plz don't bother this list and send to just 
me
and/or add to ath10k-ct bugtracker on github.

If it does affect upstream ath10k too, then that would be valid for this mailing
list.

Thanks,
Ben



Symptoms:
ap: "disconnected due to excessive missing ACKs"
station: "No beacon heard and the time event is over already..."

Using a monitoring interface on the AP i was able to confirm that the beacon is
indeed not being sent at any time.

I've found a configuration that reliably produces this state
("error state" from here on):
If any number of mesh (or adhoc with ct) points are configured alongside any
number of ordinary APs, this issue starts appearing.
The mesh connections appear to be working correctly in the error state.

I tested various combinations of openwrt-19.xx and openwrt-trunk
with ath10k/ath10k-ct - all were affected.
Aligning with my vague memory, a user on the openwrt forum reports the issue 
isn't
present in openwrt-18.xx [3].

About the ruling out client issues:
My employer is operating installations with multiple hundreds of ath10k access
points.
We couldn't identify the source of the issue at first when we encountered it in
our live setup and received unsolicited reports from basically every
installation.
As far as we can tell, no client is able to connect in the error state.
We've had our users confirm the bug for
   - Apple phones/tablets/macbooks
   - Samsung phones, laptops
   - computers with Intel/Realtek/AzureWave-hardware.

I hope this info is helpful.

kind regards,
Leon George

[1] https://www.mail-archive.com/ath10k@lists.infradead.org/msg11599.html
[2] https://bugs.archlinux.org/task/58457
[3] https://forum.openwrt.org/t/wifi-connectivity-issues-with-ath10k/67779

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Add history for tracking certain events

2020-06-28 Thread Ben Greear





On 06/27/2020 10:12 PM, Rakesh Pillai wrote:




-Original Message-
From: Ben Greear 
Sent: Saturday, June 27, 2020 8:58 PM
To: Rakesh Pillai ; ath10k@lists.infradead.org
Cc: linux-wirel...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: Re: [PATCH] ath10k: Add history for tracking certain events



On 06/26/2020 11:22 PM, Rakesh Pillai wrote:

For debugging many issues, a history of the
below mentioned events can help get an idea
of what exactly was going on just before any
issue occurred in the system. These event
history will be collected only when the host
driver is run in debug mode (i.e. with the
config ATH10K_DEBUG enabled).


This should be disabled by default unless user specifically pokes some
debugfs
value to turn it on so that it does not impact performance.


Hi Ben,
This history is enabled only if the user compiles the kernel with
ATH10K_DEBUG.
Making it runtime, adds a lot of "if" conditions for this history record.
Do you suggest to add support to enable/disable it runtime even in
ATH10K_DEBUG ?


Yes, because you are adding lots of locks/unlocks.  That is way more expensive
than an if statement.  You can add an 'unlikely' to the if check as well, so
compiler will optimize for this feature not being enabled.

Thanks,
Ben





Thanks,
Ben



Add history for tracking the below events
- register read
- register write
- IRQ trigger
- IRQ Enable
- IRQ Disable
- NAPI poll
- CE service
- WMI cmd
- WMI event
- WMI tx completion

This will help in debugging any crash or any
improper behaviour.



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Add history for tracking certain events

2020-06-27 Thread Ben Greear





On 06/26/2020 11:22 PM, Rakesh Pillai wrote:

For debugging many issues, a history of the
below mentioned events can help get an idea
of what exactly was going on just before any
issue occurred in the system. These event
history will be collected only when the host
driver is run in debug mode (i.e. with the
config ATH10K_DEBUG enabled).


This should be disabled by default unless user specifically pokes some debugfs
value to turn it on so that it does not impact performance.

Thanks,
Ben



Add history for tracking the below events
- register read
- register write
- IRQ trigger
- IRQ Enable
- IRQ Disable
- NAPI poll
- CE service
- WMI cmd
- WMI event
- WMI tx completion

This will help in debugging any crash or any
improper behaviour.



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 2/2] ath10k: Skip wait for delete response if firmware is down

2020-06-26 Thread Ben Greear





On 06/26/2020 11:11 AM, Rakesh Pillai wrote:

Currently the driver waits for response from the
firmware for all the delete cmds, eg: vdev_delete,
peer delete. If the firmware is down, these wait
will always timeout and return an error.

Also during subsytems recovery, any attempt to
send a WMI cmd to the FW will return the -ESHUTDOWN
status, which when returned to mac80211, can cause
unnecessary warnings to be printed on to the console,
as shown below

[ 2559.529565] Call trace:
[ 2559.532214]  __sta_info_destroy_part2+0x160/0x168 [mac80211]
[ 2559.538157]  __sta_info_flush+0x124/0x180 [mac80211]
[ 2559.543402]  ieee80211_set_disassoc+0x130/0x2c0 [mac80211]
[ 2559.549172]  ieee80211_mgd_deauth+0x238/0x25c [mac80211]
[ 2559.554764]  ieee80211_deauth+0x24/0x30 [mac80211]
[ 2559.559860]  cfg80211_mlme_deauth+0x258/0x2b0 [cfg80211]
[ 2559.565446]  nl80211_deauthenticate+0xe4/0x110 [cfg80211]
[ 2559.571064]  genl_rcv_msg+0x3a0/0x440
[ 2559.574888]  netlink_rcv_skb+0xb4/0x11c
[ 2559.578877]  genl_rcv+0x34/0x48
[ 2559.582162]  netlink_unicast+0x14c/0x1e4
[ 2559.586235]  netlink_sendmsg+0x2f0/0x360
[ 2559.590317]  sock_sendmsg+0x44/0x5c
[ 2559.593951]  sys_sendmsg+0x1c8/0x290
[ 2559.598029]  ___sys_sendmsg+0xa8/0xfc
[ 2559.601840]  __sys_sendmsg+0x8c/0xd0
[ 2559.605572]  __arm64_compat_sys_sendmsg+0x2c/0x38
[ 2559.610468]  el0_svc_common+0xa8/0x160
[ 2559.614372]  el0_svc_compat_handler+0x2c/0x38
[ 2559.618905]  el0_svc_compat+0x8/0x10

Skip the wait for delete response from the
firmware if the firmware is down. Also return
success to the mac80211 calls when the peer delete
cmd fails with return status -ESHUTDOWN.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai 
---
 drivers/net/wireless/ath/ath10k/mac.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index dc7befc..7ac6549 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -701,7 +701,8 @@ static void ath10k_wait_for_peer_delete_done(struct ath10k 
*ar, u32 vdev_id,
unsigned long time_left;
int ret;

-   if (test_bit(WMI_SERVICE_SYNC_DELETE_CMDS, ar->wmi.svc_map)) {
+   if (test_bit(WMI_SERVICE_SYNC_DELETE_CMDS, ar->wmi.svc_map) &&
+   test_bit(ATH10K_FLAG_CRASH_FLUSH, >dev_flags)) {


Don't you mean !test_bit(ATH10K_FLAG_CRASH_FLUSH, >dev_flags))  ???

Or maybe I'm just mis-reading your patch?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Potential issue with htt flags

2020-05-29 Thread Ben Greear


While porting forward my patches to 5.7 , I noticed this:

#define HTT_TX_CMPL_FLAG_DATA_RSSI  BIT(0)
#define HTT_TX_CMPL_FLAG_PPID_PRESENT   BIT(1)
#define HTT_TX_CMPL_FLAG_PA_PRESENT BIT(2)
#define HTT_TX_CMPL_FLAG_PPDU_DURATION_PRESENT  BIT(3)

#define HTT_TX_DATA_RSSI_ENABLE_WCN3990 BIT(3)
#define HTT_TX_DATA_APPEND_RETRIES BIT(0)
#define HTT_TX_DATA_APPEND_TIMESTAMP BIT(1)


Both of these are used against 'flags2', but as you see, some bits are defined 
to different
things.  In particular usage in,:

static int ath10k_get_htt_tx_data_rssi_pad(struct htt_resp *resp)

looks suspicious to me.

Maybe that ath10k_get_htt_tx_data_rssi_pad should be labeled specific for one 
particular
chipset?

I didn't look further, but maybe whoever added this could take a look?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Un-recoverable ath10k 4019 NIC lockup.

2020-05-27 Thread Ben Greear

: [ 2545.370652] ath10k_ahb 
a80.wifi: failed to halt axi bus: 0
[ 2548.661207] ath10k_ahb a80.wifi: failed to receive initialized event 
from target: 8000
[ 2548.671340] ath10k_ahb a80.wifi: failed to halt axi bus: 0
Thu May 14 19:29:17 2020 kern.err kernel: [ 2548.661207] ath10k_ahb 
a80.wifi: failed to receive initialized event from target: 8000
Thu May 14 19:29:17 2020 kern.err kernel: [ 2548.671340] ath10k_ahb 
a80.wifi: failed to halt axi bus: 0
[ 2548.840677] ath10k_ahb a80.wifi: failed to reset chip: -110
[ 2548.840716] ath10k_ahb a80.wifi: Could not init hif: -110
[ 2548.845695] [ cut here ]
[ 2548.851832] WARNING: CPU: 3 PID: 98 at 
backports-4.19.98-1/net/mac80211/util.c:2040 ieee80211_reconfig+0x98/0xb64 
[mac80211]
[ 2548.856020] Hardware became unavailable during restart.




And endless -108 errors and other funk after this.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC 1/2] devlink: add simple fw crash helpers

2020-05-25 Thread Ben Greear





On 05/25/2020 02:07 AM, Andy Shevchenko wrote:

On Fri, May 22, 2020 at 04:23:55PM -0700, Steve deRosier wrote:

On Fri, May 22, 2020 at 2:51 PM Luis Chamberlain  wrote:



I had to go RTFM re: kernel taints because it has been a very long
time since I looked at them. It had always seemed to me that most were
caused by "kernel-unfriendly" user actions.  The most famous of course
is loading proprietary modules, out-of-tree modules, forced module
loads, etc...  Honestly, I had forgotten the large variety of uses of
the taint flags. For anyone who hasn't looked at taints recently, I
recommend: 
https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

In light of this I don't object to setting a taint on this anymore.
I'm a little uneasy, but I've softened on it now, and now I feel it
depends on implementation.

Specifically, I don't think we should set a taint flag when a driver
easily handles a routine firmware crash and is confident that things
have come up just fine again. In other words, triggering the taint in
every driver module where it spits out a log comment that it had a
firmware crash and had to recover seems too much. Sure, firmware
shouldn't crash, sure it should be open source so we can fix it,
whatever...


While it may sound idealistic the firmware for the end-user, and even for mere
kernel developer like me, is a complete blackbox which has more access than
root user in the kernel. We have tons of firmwares and each of them potentially
dangerous beast. As a user I really care about my data and privacy (hacker can
oops a firmware in order to set a specific vector attack). So, tainting kernel
is _a least_ we can do there, the strict rules would be to reboot immediately.


those sort of wishful comments simply ignore reality and
our ability to affect effective change.


We can encourage users not to buy cheap crap for the starter.


There is no stable wifi firmware for any price.

There is also no obvious feedback from even name-brand NICs like ath10k or AX200
when you report a crash.

That said, at least in my experience with ath10k-ct, the OS normally recovers 
fine
from firmware crashes.  ath10k already reports full crash reports on udev, so
easy for user-space to notice and report bug reports upstream if it cares to.  
Probably
other NICs do the same, and if not, they certainly could.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Ben Greear





On 05/18/2020 10:09 AM, Luis Chamberlain wrote:

On Mon, May 18, 2020 at 09:58:53AM -0700, Ben Greear wrote:



On 05/18/2020 09:51 AM, Luis Chamberlain wrote:

On Sat, May 16, 2020 at 03:24:01PM +0200, Johannes Berg wrote:

On Fri, 2020-05-15 at 21:28 +, Luis Chamberlain wrote:> 
module_firmware_crashed

You didn't CC me or the wireless list on the rest of the patches, so I'm
replying to a random one, but ...

What is the point here?

This should in no way affect the integrity of the system/kernel, for
most devices anyway.


Keyword you used here is "most device". And in the worst case, *who*
knows what other odd things may happen afterwards.


So what if ath10k's firmware crashes? If there's a driver bug it will
not handle it right (and probably crash, WARN_ON, or something else),
but if the driver is working right then that will not affect the kernel
at all.


Sometimes the device can go into a state which requires driver removal
and addition to get things back up.


It would be lovely to be able to detect this case in the driver/system
somehow!  I haven't seen any such cases recently,


I assure you that I have run into it. Once it does again I'll report
the crash, but the problem with some of this is that unless you scrape
the log you won't know. Eventually, a uevent would indeed tell inform
me.


but in case there is
some common case you see, maybe we can think of a way to detect it?


ath10k is just one case, this patch series addresses a simple way to
annotate this tree-wide.


So maybe I can understand that maybe you want an easy way to discover -
per device - that the firmware crashed, but that still doesn't warrant a
complete kernel taint.


That is one reason, another is that a taint helps support cases *fast*
easily detect if the issue was a firmware crash, instead of scraping
logs for driver specific ways to say the firmware has crashed.


You can listen for udev events (I think that is the right term),
and find crashes that way.  You get the actual crash info as well.


My follow up to this was to add uevent to add_taint() as well, this way
these could generically be processed by userspace.


I'm not opposed to the taint, though I have not thought much on it.

But, if you can already get the crash info from uevent, and it automatically
comes without polling or scraping logs, then what benefit beyond that does
the taint give you?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-18 Thread Ben Greear





On 05/18/2020 09:51 AM, Luis Chamberlain wrote:

On Sat, May 16, 2020 at 03:24:01PM +0200, Johannes Berg wrote:

On Fri, 2020-05-15 at 21:28 +, Luis Chamberlain wrote:> 
module_firmware_crashed

You didn't CC me or the wireless list on the rest of the patches, so I'm
replying to a random one, but ...

What is the point here?

This should in no way affect the integrity of the system/kernel, for
most devices anyway.


Keyword you used here is "most device". And in the worst case, *who*
knows what other odd things may happen afterwards.


So what if ath10k's firmware crashes? If there's a driver bug it will
not handle it right (and probably crash, WARN_ON, or something else),
but if the driver is working right then that will not affect the kernel
at all.


Sometimes the device can go into a state which requires driver removal
and addition to get things back up.


It would be lovely to be able to detect this case in the driver/system
somehow!  I haven't seen any such cases recently, but in case there is
some common case you see, maybe we can think of a way to detect it?




So maybe I can understand that maybe you want an easy way to discover -
per device - that the firmware crashed, but that still doesn't warrant a
complete kernel taint.


That is one reason, another is that a taint helps support cases *fast*
easily detect if the issue was a firmware crash, instead of scraping
logs for driver specific ways to say the firmware has crashed.


You can listen for udev events (I think that is the right term),
and find crashes that way.  You get the actual crash info as well.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 1/2] ath10k: use cumulative survey statistics

2020-05-04 Thread Ben Greear





On 05/04/2020 04:52 PM, Rajkumar Manoharan wrote:

On 2020-05-04 16:49, Ben Greear wrote:

On 05/04/2020 04:46 PM, Rajkumar Manoharan wrote:

On 2020-05-04 08:41, Markus Theil wrote:

ath10k currently reports survey results for the last interval between each
invocation of NL80211_CMD_GET_SURVEY. For concurrent invocations, this
can lead to unexpectedly small results, e.g. when hostapd uses survey
data and iw survey dump is invoked in parallel. Fix this by returning
cumulative results, that don't depend on the last invocation. Other
drivers, e.g. ath9k or mt76 also use this behavior.

Signed-off-by: Markus Theil 



IIRC this was fixed a while ago by below patch. Somehow it never landed in 
ath.git.
Simple one line change is enough.

https://patchwork.kernel.org/patch/10550707/

-Rajkumar


Have you tested this with wave-1?  Lots of older, at least, firmware
has brokenness in this area.


Yes. It was tested in wave-1 as well. Venkat replied to your comment on 
original change.


Ahh, sorry I missed that.

Hopefully no one is using the broken firmware anymore then!

--Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 1/2] ath10k: use cumulative survey statistics

2020-05-04 Thread Ben Greear





On 05/04/2020 04:46 PM, Rajkumar Manoharan wrote:

On 2020-05-04 08:41, Markus Theil wrote:

ath10k currently reports survey results for the last interval between each
invocation of NL80211_CMD_GET_SURVEY. For concurrent invocations, this
can lead to unexpectedly small results, e.g. when hostapd uses survey
data and iw survey dump is invoked in parallel. Fix this by returning
cumulative results, that don't depend on the last invocation. Other
drivers, e.g. ath9k or mt76 also use this behavior.

Signed-off-by: Markus Theil 



IIRC this was fixed a while ago by below patch. Somehow it never landed in 
ath.git.
Simple one line change is enough.

https://patchwork.kernel.org/patch/10550707/

-Rajkumar


Have you tested this with wave-1?  Lots of older, at least, firmware has 
brokenness in this area.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: increase rx buffer size to 2048

2020-04-28 Thread Ben Greear




On 04/28/2020 05:01 AM, Kalle Valo wrote:

Sven Eckelmann  writes:


On Wednesday, 1 April 2020 09:00:49 CEST Sven Eckelmann wrote:

On Wednesday, 5 February 2020 20:10:43 CEST Linus Lüssing wrote:

From: Linus Lüssing 

Before, only frames with a maximum size of 1528 bytes could be
transmitted between two 802.11s nodes.

For batman-adv for instance, which adds its own header to each frame,
we typically need an MTU of at least 1532 bytes to be able to transmit
without fragmentation.

This patch now increases the maxmimum frame size from 1528 to 1656
bytes.

[...]

@Kalle, I saw that this patch was marked as deferred [1] but I couldn't find
any mail why it was done so. It seems like this currently creates real world
problems - so would be nice if you could explain shortly what is currently
blocking its acceptance.


Ping?


Sorry for the delay, my plan was to first write some documentation about
different hardware families but haven't managed to do that yet.

My problem with this patch is that I don't know what hardware and
firmware versions were tested, so it needs analysis before I feel safe
to apply it. The ath10k hardware families are very different that even
if a patch works perfectly on one ath10k hardware it could still break
badly on another one.

What makes me faster to apply ath10k patches is to have comprehensive
analysis in the commit log. This shows me the patch author has
considered about all hardware families, not just the one he is testing
on, and that I don't need to do the analysis myself.


It has been in ath10k-ct for a while, and that has some fairly wide coverage
in OpenWrt, so likely if there were problems we would have seen it already.

I did not make any specific changes to firmware to support this, so upstream
firmware should behave similarly.

Seems like upstream ath10k could really benefit from having some test beds
so you can actually test code on different chips and have confidence
in your changes!

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Management rate-control on IPQ4019

2020-02-19 Thread Ben Greear


On 02/18/2020 03:06 PM, David Bauer wrote:

Hello Ben,

On 2/18/20 8:58 PM, Ben Greear wrote:

On 02/18/2020 11:12 AM, David Bauer wrote:

Hello,

while playing around with the 2.4GHz WiFi part of the IPQ4019, i was
expecting being able to set the rate at which IPQ4019 transmits it's beacon
frames.

Using OpenWrt, setting "legacy_rates=0" on the radio leads to only advertising
802.11g speeds, however the beacons are still sent out at 1 Mbit/s. Using a 
QCA9984,
the beacons are correctly sent out at the lowest 802.11g rate (6 Mbit/s). So i 
assume
this is either a bug in the ath10k firmware or a hardware-shortcoming. Has 
anyone else
experienced this bug and is it likely we'll see it fixed in a later firmware 
release?

Hardware: IPQ4029 (Aruba AP-303)
Firmware  Version: 10.4-3.6-00140 / 10.4-3.5.3-00078


There are separate API for setting management frame rates.  I forget exactly how
upstream supports this, but maybe check debugfs?


I'm using the mac80211 interface here [0], which works well for the QCA9984, 
but not for
the IPQ4019. I'm not aware of a debugfs interface with ath10k for setting the 
management
rate.

I can try the one ath10k-ct implements, but the fact it works on the QCA9984 
makes me believe
the culprit is the firmware. The patch adding support for mgmt-rate setting 
does not list the
IPQ4019 as a tested platform also.

[0] https://patchwork.kernel.org/patch/10593573/


Ok, maybe so.  I compile all of the wave-2 targets from the same firmware 
source,
but maybe upstream 4019 firmware lags others for one reason or another.

If you want to try -ct firmware/driver, please search for "Set multicast, broadcast, 
beacon tx rates."
in the link below:

http://www.candelatech.com/ath10k-ug.php

Possibly these driver changes will work with upstream firmware, I have not 
tried it.

# cat /debug/ieee80211/wiphy1/ath10k/set_rates
This is to set fixed bcast, mcast, and beacon rates.  Normal rate-ctrl
is handled through normal API using 'iw', etc.
To set a value, you specify the dev-name, type, band and rate-code:
types: bcast, mcast, beacon
bands: 2, 5, 60
rate-codes: 0x43 1M, 0x42 2M, 0x41 5.5M, 0x40 11M, 0x3 6M, 0x7 9M, 0x2 12M, 0x6 
18M, 0x1 24M, 0x5 36M, 0x0 48M, 0x4 54M, 0xFF default
 For example, to set beacon to 18Mbps on wlan0:  echo "wlan0 beacon 2 0x6" > 
/debug//set_rates

I'm not sure if 'beacon' also controls other mgt frames or not w/out reviewing 
the code.

Thanks,
Ben



Best wishes
David



Thanks,
Ben







--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Management rate-control on IPQ4019

2020-02-18 Thread Ben Greear


On 02/18/2020 11:12 AM, David Bauer wrote:

Hello,

while playing around with the 2.4GHz WiFi part of the IPQ4019, i was
expecting being able to set the rate at which IPQ4019 transmits it's beacon
frames.

Using OpenWrt, setting "legacy_rates=0" on the radio leads to only advertising
802.11g speeds, however the beacons are still sent out at 1 Mbit/s. Using a 
QCA9984,
the beacons are correctly sent out at the lowest 802.11g rate (6 Mbit/s). So i 
assume
this is either a bug in the ath10k firmware or a hardware-shortcoming. Has 
anyone else
experienced this bug and is it likely we'll see it fixed in a later firmware 
release?

Hardware: IPQ4029 (Aruba AP-303)
Firmware  Version: 10.4-3.6-00140 / 10.4-3.5.3-00078


There are separate API for setting management frame rates.  I forget exactly how
upstream supports this, but maybe check debugfs?

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-18 Thread Ben Greear





On 12/18/2019 12:05 AM, Justin Capella wrote:

Don't mean to steal your thread here, but since it's being discussed--
is there something that can be done to provide more accurate/precise
data? Use of the default is widespread so not a reason to hold back
the patch imo, but with a proposed pcap-ng capture information block
they would become more accessible and maybe there will be increased
interest in real values.


It would take some large effort up and down the stack, but we could potentially
report the raw data for the secondary frequencies.  Probably that is of so 
little
use for the general user that it is not worth the effort.

You could just uncomment the printk in my patch if you are curious, or perhaps
add some debugfs API if you wanted to get at lots of data with run-time config
change.



Anyway to fill out IEEE80211_RADIOTAP_DBM_ANT{SIGNAL,NOISE}?


Per-antenna rssi is already in wireshark capture for ath10k-ct.  I'm pretty
sure it is working in upstream ath10k too.


I recall from another thread that there isn't currently periodic
calibration but the floor could change with environment too.


I don't think it is correct to say periodic calibration does not happen with
ath10k.  Maybe very old wave-1 firmware has some issues, but recent stuff 
appears
to work.  I do see reported noise floor changing on 9984.

Thanks,
Ben



On Tue, Dec 17, 2019 at 8:05 PM Sebastian Gottschall
 wrote:



Am 18.12.2019 um 03:37 schrieb Ben Greear:



On 12/17/2019 06:12 PM, Sebastian Gottschall wrote:

i dont know what you want to compare here.

1. you compare 2 different wifi chipsets. both have different
sensititivy and overall output power spec

2. both have different amount of antenna chains. which does make a
difference in input sensitivity

3. the patch ben made has no effect on qca9880 chipsets. it only
takes effect on 10.4 based chipsets like 9984


The part of my patch that sums secondary frequencies should apply to
wave-1 as well, but I have
not verified that yet.

yeah. right. sorry i was just looking at total signal sum which uses
rssi_comb_ht




about noise floors in general. noise floors of -108 are bogus. there
is a physical limit a noise level can be.
since drivers like ath9k are doing a cyclic calibration, the noise
value might indeed change. but this calibration is
not running in realtime. its cyclic. i'm not aware if chipsets like
qca988x are going the same way, but since qca988x
has sime similaries with ath9k chipsets unlike the newer 9984
variants, it could be. the 30 seconds mentioned
in the bug report fits to my expectations of the early noisefloor
calibration which has a short delay and after success
turning to use a long delay. anyway. in this early calibration phase
signals might change and will stabilize after. this isnt a issue
since your connection will work anyway even if it might take a little
bit longer if you have poor signal levels

@ben. am i wrong or what do think?


I don't know enough about how the noise floor calculations are done or
how the apply to settings
to know the answer.

I will be happy in general if ath10k wave-1, wave-2, and ath9k report
similar RSSI for similar
setups.

that will not work. you compare different chipsets and depending on the
implementation by the card vendor
rf sensitivity can be very diffent. the same goes for output power. some
vendors are using additional rf amps
for enhancing output power (ubiquiti is best example here). this these
amps also may have influence to sensitivity.
on these cards you set 10 db output power, but in fact it outputs 18 db.
so there is a bias offset on these cards or devices. (the offset is
depending on the device model)

what you measure is what the chip receives, but not what was lost on the
pcb layout. (or was even generated in case of noise)
and when it comes to calibration data. correct would be if each
individual card is calibrated before shipment. in reality manufactures
are doing calibration on a single reference card and clone it on all
following cards to save time. the result depends on day or week of
production
and current position of the moon and sun. errors of +- 2 db are common
here. (this is not a fact for all card or device vendors)



If you look at the tx-rate-power table in ath10k, for instance, you
can see different MCS are transmitted
at different signal levels.  So, some change from initial conditions
might be because higher MCS is
being transmitted after rate-ctrl scales up?

yes. this is modulation related. as higher the rate goes as lower the
power will be. thats princible of QAM.
and the rate control itself isnt signal but error rate based. so high
packet loss triggers the rate control to lower the rate which results
in increased output power and vice versa. but as mentioned. at card
startup a noise floor calibration starts which may succeed or fail.
if it succeeds it will turn into a long delay phase. so cyclic
calibration. the calibration time is exactly 30 seconds (minimum

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-17 Thread Ben Greear





On 12/17/2019 06:12 PM, Sebastian Gottschall wrote:

i dont know what you want to compare here.

1. you compare 2 different wifi chipsets. both have different sensititivy and 
overall output power spec

2. both have different amount of antenna chains. which does make a difference 
in input sensitivity

3. the patch ben made has no effect on qca9880 chipsets. it only takes effect 
on 10.4 based chipsets like 9984


The part of my patch that sums secondary frequencies should apply to wave-1 as 
well, but I have
not verified that yet.



about noise floors in general. noise floors of -108 are bogus. there is a 
physical limit a noise level can be.
since drivers like ath9k are doing a cyclic calibration, the noise value might 
indeed change. but this calibration is
not running in realtime. its cyclic. i'm not aware if chipsets like qca988x are 
going the same way, but since qca988x
has sime similaries with ath9k chipsets unlike the newer 9984 variants, it 
could be. the 30 seconds mentioned
in the bug report fits to my expectations of the early noisefloor calibration 
which has a short delay and after success
turning to use a long delay. anyway. in this early calibration phase signals 
might change and will stabilize after. this isnt a issue
since your connection will work anyway even if it might take a little bit 
longer if you have poor signal levels

@ben. am i wrong or what do think?


I don't know enough about how the noise floor calculations are done or how the 
apply to settings
to know the answer.

I will be happy in general if ath10k wave-1, wave-2, and ath9k report similar 
RSSI for similar
setups.

If you look at the tx-rate-power table in ath10k, for instance, you can see 
different MCS are transmitted
at different signal levels.  So, some change from initial conditions might be 
because higher MCS is
being transmitted after rate-ctrl scales up?

Lots of moving parts...

Thanks,
Ben



Sebastian

Am 18.12.2019 um 00:37 schrieb Tom Psyborg:

also noticed now that the noise floor changes with signal strength as
described in this bug report:
https://www.mail-archive.com/ath10k@lists.infradead.org/msg11553.html

after wifi restart

iwinfo:

signal: -59dBm noise: -108dBm

then goes to

signal: -52dBm noise: -103dBm

and finally drops to

signal: -59dBm noise: -103dBm





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-17 Thread Ben Greear


On 12/17/19 3:37 PM, Tom Psyborg wrote:

also noticed now that the noise floor changes with signal strength as
described in this bug report:
https://www.mail-archive.com/ath10k@lists.infradead.org/msg11553.html

after wifi restart

iwinfo:

signal: -59dBm noise: -108dBm

then goes to

signal: -52dBm noise: -103dBm

and finally drops to

signal: -59dBm noise: -103dBm



The problem with debugging this sort of stuff is that you need an RF scope
to determine whether signal power of transmitter is changing or receiver
is reporting stuff weirdly.

If you are comparing against ath9k, probably you need to force your ath10k 
station to do /n only
(or change your AP to do /n only) so that you can be comparing similar MCS 
rates.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-17 Thread Ben Greear


On 12/17/19 10:29 AM, Tom Psyborg wrote:

On 17/12/2019, Ben Greear  wrote:

On 12/17/19 8:23 AM, Justin Capella wrote:

I believe someone recently submitted a patch that defined noise floors
per band (2/5).


I looked at using the real noise floor.  Our radio was reporting a noise
floor of around -102,
where the hard-coded default is -95.  This of course would make the reported
RSSI lower by 7db
in that case.  I am not sure that is correct.



Hi

I am getting similar NF values with all my ath10k devices, I thought
default was changed since ath9k from -95 to -115 just like in the
vendor driver? There were some discussions about it on mailing list.
On some channels (5Ghz) the value goes down to about -107, even saw
-110 once.



If you use ath9k and ath10k on same channel/environment, do you see similar
RSSI reported (especially with the ath10k patch I just posted)?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-17 Thread Ben Greear


On 12/17/19 8:23 AM, Justin Capella wrote:

I believe someone recently submitted a patch that defined noise floors
per band (2/5).


I looked at using the real noise floor.  Our radio was reporting a noise floor 
of around -102,
where the hard-coded default is -95.  This of course would make the reported 
RSSI lower by 7db
in that case.  I am not sure that is correct.

If this were to be implemented that way, then the firmware would have to be 
queried for
the noise floor in a better way than it is currently done.  So, I am not 
planning to work on
that soon.

Someone could post-process RSSI based on the reported noise floor if they want 
to adjust
the values in user-space, for isntance.


Can't say I'm a fan of the hacky code, in particular the if/else for
min/max // maybe abs(a-b)?


I like open coded stuff.  I'm more concerned that maybe the math could
be improved, but it seems to work pretty well in our testing.

Either way, please comment inline so that it is more obvious exactly
what code you are talking about.



if (e40 != 0x80) { // whats this case about?


0x80 means 'value is not valid'.  I can add a comment about that.



Are there reasons to not use log?


I don't want to use log in the rx path, it would very likely decrease
rx performance, especially on lower powered systems.


Thanks,
Ben






On Tue, Dec 17, 2019 at 7:59 AM Sebastian Gottschall
 wrote:




currently debugging in your code, but i already have seen that the
values are wrong now for this chipset


Thanks for testing.  I'll add a check for 0 and ignore that value
too.  That seem OK?

i tested already the 0 check and it works


Were the per-chain values OK?

on 9984 i see no disadvantage so far. seem to work and the values look
sane. i will do a side by side comparisation later this day on 9984


Thanks,
Ben



Am 16.12.2019 um 23:07 schrieb gree...@candelatech.com:

From: Ben Greear 

This makes per-chain RSSI be more consistent between HT20, HT40, HT80.
Instead of doing precise log math for adding dbm, I did a rough
estimate,
it seems to work good enough.

Tested on ath10k-ct 9984 firmware.

Signed-off-by: Ben Greear 
---
   drivers/net/wireless/ath/ath10k/htt_rx.c  | 64
---
   drivers/net/wireless/ath/ath10k/rx_desc.h |  3 +-
   2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 13f652b622df..034d4ace228d 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1167,6 +1167,44 @@ static bool ath10k_htt_rx_h_channel(struct
ath10k *ar,
   return true;
   }
   +static int ath10k_sum_sigs_2(int a, int b) {
+int diff;
+
+if (b == 0x80)
+return a;
+
+if (a >= b) {
+diff = a - b;
+if (diff == 0)
+return a + 3;
+else if (diff == 1)
+return a + 2;
+else if (diff == 2)
+return a + 1;
+return a;
+}
+else {
+diff = b - a;
+if (diff == 0)
+return b + 3;
+else if (diff == 1)
+return b + 2;
+else if (diff == 2)
+return b + 1;
+return b;
+}
+}
+
+static int ath10k_sum_sigs(int p20, int e20, int e40, int e80) {
+/* Hacky attempt at summing dbm without resorting to log(10)
business */
+if (e40 != 0x80) {
+return ath10k_sum_sigs_2(ath10k_sum_sigs_2(p20, e20),
ath10k_sum_sigs_2(e40, e80));
+}
+else {
+return ath10k_sum_sigs_2(p20, e20);
+}
+}
+
   static void ath10k_htt_rx_h_signal(struct ath10k *ar,
  struct ieee80211_rx_status *status,
  struct htt_rx_desc *rxd)
@@ -1177,18 +1215,32 @@ static void ath10k_htt_rx_h_signal(struct
ath10k *ar,
   status->chains &= ~BIT(i);
 if (rxd->ppdu_start.rssi_chains[i].pri20_mhz != 0x80) {
-status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR +
- rxd->ppdu_start.rssi_chains[i].pri20_mhz;
+status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR
++
ath10k_sum_sigs(rxd->ppdu_start.rssi_chains[i].pri20_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext20_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext40_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext80_mhz);
+//ath10k_warn(ar, "rx-h-sig, chain[%i] pri20: %d
ext20: %d  ext40: %d  ext80: %d\n",
+//i, rxd->ppdu_start.rssi_chains[i].pri20_mhz,
rxd->ppdu_start.rssi_chains[i].ext20_mhz,
+// rxd->ppdu_start.rssi_chains[i].ext40_mhz,
rxd->ppdu_start.rssi_chains[i].ext80_mhz);
 status->chains |= BIT(i);
   }
   }
 /* FIXME: Get real NF */
-status->signal = ATH10K_DEFAULT_NOISE_FLOOR +
- rxd->ppdu_start.rssi_comb;
-/* ath10k_warn(ar, "rx-h-sig, signal: %d  chains: 0x%x
chain[0]: %d  chain[1]: %d  chan[2]: %d\n",
-

Re: [PATCH] ath10k: Per-chain rssi should sum the secondary channels

2019-12-17 Thread Ben Greear





On 12/17/2019 04:32 AM, Sebastian Gottschall wrote:

result of my tests

on qca988x rxd->ppdu_start.rssi_comb_ht is always zero. so you need to add a 
additional check

Am 17.12.2019 um 13:02 schrieb Sebastian Gottschall:

i see a issue in your patch for qca988x chipsets

+if (rxd->ppdu_start.rssi_comb_ht != 0x80) {
+status->signal = ATH10K_DEFAULT_NOISE_FLOOR +
+rxd->ppdu_start.rssi_comb_ht;
+}


this is always true for qca988x, but the field is not provided on these older 
chipsets. so signal reporting will be broken
i'm currently debugging in your code, but i already have seen that the values 
are wrong now for this chipset


Thanks for testing.  I'll add a check for 0 and ignore that value too.  That 
seem OK?

Were the per-chain values OK?

Thanks,
Ben



Am 16.12.2019 um 23:07 schrieb gree...@candelatech.com:

From: Ben Greear 

This makes per-chain RSSI be more consistent between HT20, HT40, HT80.
Instead of doing precise log math for adding dbm, I did a rough estimate,
it seems to work good enough.

Tested on ath10k-ct 9984 firmware.

Signed-off-by: Ben Greear 
---
  drivers/net/wireless/ath/ath10k/htt_rx.c  | 64 ---
  drivers/net/wireless/ath/ath10k/rx_desc.h |  3 +-
  2 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 13f652b622df..034d4ace228d 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1167,6 +1167,44 @@ static bool ath10k_htt_rx_h_channel(struct ath10k *ar,
  return true;
  }
  +static int ath10k_sum_sigs_2(int a, int b) {
+int diff;
+
+if (b == 0x80)
+return a;
+
+if (a >= b) {
+diff = a - b;
+if (diff == 0)
+return a + 3;
+else if (diff == 1)
+return a + 2;
+else if (diff == 2)
+return a + 1;
+return a;
+}
+else {
+diff = b - a;
+if (diff == 0)
+return b + 3;
+else if (diff == 1)
+return b + 2;
+else if (diff == 2)
+return b + 1;
+return b;
+}
+}
+
+static int ath10k_sum_sigs(int p20, int e20, int e40, int e80) {
+/* Hacky attempt at summing dbm without resorting to log(10) business */
+if (e40 != 0x80) {
+return ath10k_sum_sigs_2(ath10k_sum_sigs_2(p20, e20), 
ath10k_sum_sigs_2(e40, e80));
+}
+else {
+return ath10k_sum_sigs_2(p20, e20);
+}
+}
+
  static void ath10k_htt_rx_h_signal(struct ath10k *ar,
 struct ieee80211_rx_status *status,
 struct htt_rx_desc *rxd)
@@ -1177,18 +1215,32 @@ static void ath10k_htt_rx_h_signal(struct ath10k *ar,
  status->chains &= ~BIT(i);
if (rxd->ppdu_start.rssi_chains[i].pri20_mhz != 0x80) {
-status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR +
-rxd->ppdu_start.rssi_chains[i].pri20_mhz;
+status->chain_signal[i] = ATH10K_DEFAULT_NOISE_FLOOR
++ ath10k_sum_sigs(rxd->ppdu_start.rssi_chains[i].pri20_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext20_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext40_mhz,
+ rxd->ppdu_start.rssi_chains[i].ext80_mhz);
+//ath10k_warn(ar, "rx-h-sig, chain[%i] pri20: %d ext20: %d  ext40: %d  
ext80: %d\n",
+//i, rxd->ppdu_start.rssi_chains[i].pri20_mhz, 
rxd->ppdu_start.rssi_chains[i].ext20_mhz,
+// rxd->ppdu_start.rssi_chains[i].ext40_mhz, 
rxd->ppdu_start.rssi_chains[i].ext80_mhz);
status->chains |= BIT(i);
  }
  }
/* FIXME: Get real NF */
-status->signal = ATH10K_DEFAULT_NOISE_FLOOR +
- rxd->ppdu_start.rssi_comb;
-/* ath10k_warn(ar, "rx-h-sig, signal: %d  chains: 0x%x chain[0]: %d  chain[1]: 
%d  chan[2]: %d\n",
-   status->signal, status->chains, status->chain_signal[0], 
status->chain_signal[1], status->chain_signal[2]); */
+if (rxd->ppdu_start.rssi_comb_ht != 0x80) {
+status->signal = ATH10K_DEFAULT_NOISE_FLOOR +
+rxd->ppdu_start.rssi_comb_ht;
+}
+else {
+status->signal = ATH10K_DEFAULT_NOISE_FLOOR +
+rxd->ppdu_start.rssi_comb;
+}
+
+//ath10k_warn(ar, "rx-h-sig, signal: %d  chains: 0x%x chain[0]: %d  chain[1]: 
%d  chain[2]: %d chain[3]: %d\n",
+//status->signal, status->chains, status->chain_signal[0],
+//status->chain_signal[1], status->chain_signal[2], 
status->chain_signal[3]);
  status->flag &= ~RX_FLAG_NO_SIGNAL_VAL;
  }
  diff --git a/drivers/net/wireless/ath/ath10k/rx_desc.h 
b/drivers/net/wireless/ath/ath10k/rx_desc.h
index dec1582005b9..6b44677474dd 100644
--- a/drivers/net/wireless/ath/ath10k/rx_desc.h
+++ b/drivers/net/wirele

Re: [PATCH] ath10k: Fix setting txpower to zero.

2019-12-13 Thread Ben Greear


On 12/12/19 9:14 AM, gree...@candelatech.com wrote:

From: Ben Greear 

Do not ignore 0 txpower setting unless the vif is of type p2p.


My patch has problems I think:  secondary stations also have un-init
txpower when they are first built and start scanning.

So, I'm going to try setting txpower to -1 in mac80211 and use that
to mean 'unset'.

Thanks,
Ben



This should fix regression in:

commit 88407beb1b1462f706a1950a355fd086e1c450b6
Author: Ryan Hsu 
Date:   Tue Dec 13 14:55:19 2016 -0800

 ath10k: fix incorrect txpower set by P2P_DEVICE interface

Tested (without p2p in use) on 9984 with ath10k-ct firmware, but I don't think
this is firmware specific.

Signed-off-by: Ben Greear 
---
  drivers/net/wireless/ath/ath10k/mac.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 289d03da14b2..1c5e1b5570f8 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -5902,11 +5902,18 @@ static int ath10k_mac_txpower_recalc(struct ath10k *ar)
  {
struct ath10k_vif *arvif;
int ret, txpower = -1;
+   int p2p_st;
+
+   p2p_st = ath10k_wmi_get_vdev_subtype(ar, WMI_VDEV_SUBTYPE_P2P_DEVICE);
  
  	lockdep_assert_held(>conf_mutex);
  
  	list_for_each_entry(arvif, >arvifs, list) {

-   if (arvif->txpower <= 0)
+   /* p2p may not initialize txpower, and we should ignore it
+* in that case.
+*/
+   if ((arvif->txpower < 0) ||
+   ((arvif->txpower == 0) && (arvif->vdev_subtype == p2p_st)))
continue;
  
  		if (txpower == -1)





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: set WMI_PEER_AUTHORIZE after a firmware crash

2019-12-02 Thread Ben Greear


On 12/1/19 8:45 PM, Justin Capella wrote:

Are there security concerns here? Was the peer known to be authorized
beforehand? Would it be better to just trash the peer in the event of
a fw crash?


I think you should completely re-associate the peer(s) when firmware
crashes.  The driver does not cache all possible changes, so it cannot
exactly rebuild the config to the previous state.

Thanks,
Ben



On Thu, Nov 28, 2019 at 11:46 PM Kalle Valo  wrote:


Wen Gong  wrote:


After the firmware crashes ath10k recovers via ieee80211_reconfig(),
which eventually leads to firmware configuration and including the
encryption keys. However, because there is no new auth/assoc and
4-way-handshake, and firmware set the authorize flag after
4-way-handshake, so the authorize flag in firmware is not set in
firmware without 4-way-handshake. This will lead to a failure of data
transmission after recovery done when using encrypted connections like
WPA-PSK. Set authorize flag after installing keys to firmware will fix
the issue.

This was noticed by testing firmware crashing using simulate_fw_crash
debugfs file.

Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-7-QCARMSWP-1.

Signed-off-by: Wen Gong 
Signed-off-by: Kalle Valo 


Patch applied to ath-next branch of ath.git, thanks.

382e51c139ef ath10k: set WMI_PEER_AUTHORIZE after a firmware crash

--
https://patchwork.kernel.org/patch/11263357/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches






--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH net-next] ath10k: fix RX of frames with broken FCS in monitor mode

2019-11-07 Thread Ben Greear




On 11/07/2019 06:03 AM, Linus Lüssing wrote:

On Tue, Nov 05, 2019 at 09:19:20AM -0800, Ben Greear wrote:

Thanks for adding the counter.  Since it us u32, I doubt you need the spin lock
below?


Ok, I can remove the spin-lock.

Just for clarification though, if I recall correctly then an increment operator
is not guaranteed to work atomically. But you think it's unlikely
to race with a concurrent ++ and therefore it's fine for just a debug counter?
(and if it were racing, it'd just be a missed +1)


I think it is fine to be off-by-one, and u32 is atomic so you would never read 
a really
weird number, like you can if u64 is non-atomically being incremented.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH net-next] ath10k: fix RX of frames with broken FCS in monitor mode

2019-11-05 Thread Ben Greear


On 11/5/19 8:49 AM, Linus Lüssing wrote:

From: Linus Lüssing 

So far, frames were forwarded regardless of the FCS correctness leading
to userspace applications listening on the monitor mode interface to
receive potentially broken frames, even with the "fcsfail" flag unset.

By default, with the "fcsfail" flag of a monitor mode interface
unset, frames with FCS errors should be dropped. With this patch, the
fcsfail flag is taken into account correctly.

Cc: Simon Wunderlich 
Signed-off-by: Linus Lüssing 
---
This was tested on an Open Mesh A41 device, featuring a QCA4019. And
with this firmware:

https://www.candelatech.com/downloads/ath10k-4019-10-4b/firmware-5-ct-full-community-12.bin-lede.011

But from looking at the code it seems that the vanilla ath10k has the
same issue, therefore submitting it here.

Changelog RFC->v1:

* removed "ar->monitor" check
* added a debug counter


Thanks for adding the counter.  Since it us u32, I doubt you need the spin lock
below?

--Ben


+   if (!(ar->filter_flags & FIF_FCSFAIL) &&
+   status->flag & RX_FLAG_FAILED_FCS_CRC) {
+   spin_lock_bh(>data_lock);
+   ar->stats.rx_crc_err_drop++;
+   spin_unlock_bh(>data_lock);
+
+   dev_kfree_skb_any(skb);
+   return;
+   }
+
ath10k_dbg(ar, ATH10K_DBG_DATA,
   "rx skb %pK len %u peer %pM %s %s sn %u %s%s%s%s%s%s %srate_idx 
%u vht_nss %u freq %u band %u flag 0x%x fcs-err %i mic-err %i amsdu-more %i\n",
   skb,




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC] ath10k: interface combination with monitor

2019-11-01 Thread Ben Greear


On 11/1/19 10:03 AM, Tom Psyborg wrote:

Hi

Is there a way to run monitor mode interface independent on STA/AP
interface presence or their state?
I am using airodump-ng/airmon-ng and I've noticed that while mon
interface is brought up airodump-ng is unable to find any beacons
unless sta interface is brought down. That is with QCA9880 devices,
while with QCA9377 airodump-ng only finds beacons if the sta interface
is associated to an AP.
Does this need firmware change to work or driver changes are sufficient?



I would expect it to work.  Have you tried -ct firmware on 9880 in this
manner?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC PATCH] ath10k: fix RX of frames with broken FCS in monitor mode

2019-11-01 Thread Ben Greear


On 11/1/19 4:11 AM, Linus Lüssing wrote:

From: Linus Lüssing 

So far, frames were forwarded regardless of the FCS correctness leading
to userspace applications listening on the monitor mode interface to
receive potentially broken frames, even with the "fcsfail" flag unset.

By default, with the "fcsfail" flag of a monitor mode interface
unset, frames with FCS errors should be dropped. With this patch, the
fcsfail flag is taken into account correctly.

Signed-off-by: Linus Lüssing 
---
This was tested on an Open Mesh A41 device, featuring a QCA4019. And
with this firmware:

https://www.candelatech.com/downloads/ath10k-4019-10-4b/firmware-5-ct-full-community-12.bin-lede.011

But from looking at the code it seems that the vanilla ath10k has the
same issue, therefore submitting it here.

I'm also not that familiar with the ath10k code yet. So not 100% sure if
it's the right place for this check. Therefore sending as RFC.
---
  drivers/net/wireless/ath/ath10k/htt_rx.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 53f1095de8ff..ce0a16ebb8bb 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1285,6 +1285,12 @@ static void ath10k_process_rx(struct ath10k *ar, struct 
sk_buff *skb)
  
  	status = IEEE80211_SKB_RXCB(skb);
  
+	if (ar->monitor && !(ar->filter_flags & FIF_FCSFAIL) &&

+   status->flag & RX_FLAG_FAILED_FCS_CRC) {
+   dev_kfree_skb_any(skb);
+   return;
+   }


Maybe worth adding a counter like 'rx_drop_crc' to the ath10k_debug struct and 
increment it here
and also show in debugfs and/or ethtool stats?

And, maybe no check for ar->monitor, in case somehow the frame is still received
with bad CRC even without monitor mode?

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: WARNING at net/mac80211/sta_info.c:1057 (__sta_info_destroy_part2())

2019-10-21 Thread Ben Greear





On 10/20/2019 08:12 AM, Tomislav Požega wrote:

-11 is -EAGAIN which would mean that the HTC credits have run out some
 reason for the WMI command:

if (ep->tx_credits < credits) {
ath10k_dbg(ar, ATH10K_DBG_HTC,
"htc insufficient credits ep %d required %d available %d\n",
eid, credits, ep->tx_credits);
spin_unlock_bh(>tx_lock);
ret = -EAGAIN;
goto err_pull;
}

Credits can run out, for example, if there's a lot of WMI command/event
activity and are not returned during the 3s wait, firmware crashed or
problems with the PCI bus.


Hi

Can this occur if the target memory is not properly allocated?


I have only seen this on wave-1 cards, and it is usually paired with situations
where the wave-1 stops doing WMI related interrupts properly as best as I can
understand.  If I force the firmware to poll instead of waiting for irqs, then
WMI communication will work for a while...I have not implemented that on the
driver side though, so I still see these WMI timeout issues.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH v2 1/4] mac80211: Rearrange ieee80211_tx_info to make room for tx_time_est

2019-10-18 Thread Ben Greear





On 10/18/2019 05:35 AM, Johannes Berg wrote:

On Fri, 2019-10-18 at 12:15 +0200, Toke Høiland-Jørgensen wrote:

Kan Yan  writes:


The "tx_time_est" field, shared by control and status, is not able to
survive until the skb returns to the mac80211 layer in some
architectures. The same space is defined as driver_data and some
wireless drivers use it for other purposes, as the cb in the sk_buff
is free to be used by any layer.

In the case of ath10k, the tx_time_est get clobbered by
struct ath10k_skb_cb {
dma_addr_t paddr;
u8 flags;
u8 eid;
u16 msdu_id;
u16 airtime_est;
struct ieee80211_vif *vif;
struct ieee80211_txq *txq;
} __packed;


Ah, bugger, of course the driver that actually needs this is using the
full driver_data space :P


Looks like you could shrink *this* fairly easily though.

E.g. most likely vif == txq->vif unless txq==NULL, so it's down to 22
bytes plus a bit/flag for knowing whether the pointer is a vif directly
(if no TXQ) or a TXQ.


And of course you get two bits in every pointer (0x3) and likely the
dma addr too.  Plenty of space!

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-18 Thread Ben Greear

On 09/18/2019 01:46 AM, Sven Eckelmann wrote:

On Tuesday, 17 September 2019 19:27:50 CEST Sven Eckelmann wrote:
[...]

So whatever the firmware does when it gets a
WMI_BSS_SURVEY_REQ_TYPE_READ_CLEAR - it is not a CLEAR after read. And they
also don't simply wrap around but there all values have to get some kind of
"fix" like the active time one shown in ath10k_hw_fill_survey_time.
Just that the actual "fixes" for them are unknown. To me it looks like
firmware ATH10K_HW_CC_WRAP_SHIFTED_ALL have busy and rx interlinked with
the overflow of total. But the tx and rx_bss are actually cleared.

Other than that, the counters are wrapping every ~14-30 seconds. So we
also need also some worker for ath10k which every couple of seconds
requests new values for all the channel from the firmware. Which already
sounds problematic because I get
"ath10k_pci :00:00.0: bss channelsurvey timed out" all the time
when requesting surveys manually.

I've just tested it on 10.4 (wave-2) cards and it seems like it is cleared as
expected on them. So the change I posted earlier (with a minor fix for
ath10k_hw_fill_survey_time) returns now useful (accumulated) values. This can
be seen in
https://stats.freifunk-vogtland.net/d/ffv_node/nodeinfo?orgId=1=ac86749f4d60=5=1568782046974=1568807068706
(after the reboot at 10:15 UTC+2)

So as Ben Greear said, the 10.4 firmware version is fixed and 10.2.* (for
the wave-1 cards) is still broken and we need a QCA firmware engineer to
fix it. Or to work around it by polling every couple of seconds and
manually do the cleanup of the values from the firmware.

Have you tried probing very fast, like every 100ms, to see if returned values
look sane? I seem to recall that there was some firmware issue with this, like
it only updates internal counters every second or so.

Polling slow would have the same off-by-a-second's-worth-of-data, but you would
not
easily notice it at slower polling intervals.

Thanks,
Ben

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

Re: [PATCH] cfg80211: Add cumulative channel survey dump support.

2019-09-17 Thread Ben Greear


On 9/17/19 10:27 AM, Sven Eckelmann wrote:

On Thursday, 31 May 2018 11:06:59 CEST vnara...@codeaurora.org wrote:

I will sent next version of patch with updated commit log.


Can you please point me to the second version?

Btw. I've just checked the minimal changes in ath10k to get this working. It
seems we need SURVEY_INFO_NON_ACC_DATA in ath10k's ath10k_get_survey + memset
of ar->survey[idx].

But right now the total time looks (especially) wrong to me. At least it is
rather unlikely that I can have around 30 second active time delta in
roughly 1 real world second.  Maybe a bug with the READ_CLEAR handling in
firmware 10.2.4-1.0-00043 or maybe all firmware version? More logs about
that at the end.

@Ben: Was this also what you've experience in the past with the 10.2 firmware
bss_chan_info counter bugs or am I just misusing the functionality of the
firmware?


Last I recall, the upstream code had several bugs.  Maybe some QCA
firmware person can let you know if they fixed the upstream firmware.

If you want to test against ath10k-ct driver/firmware, and if you still see 
bogus values, then
I can debug and fix it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] ath10k: restore QCA9880-AR1A (v1) detection

2019-09-10 Thread Ben Greear





On 09/10/2019 06:51 AM, Sebastian Gottschall wrote:

the tplink archer c7 v1 indeed has this hw 1 version. but thats the only device 
i know comming with this chipset version
but the v1 has also a minipcie slot and is not soldered like all other 
revisions.  so i just replaced the card on my test device.
in addition we may ask ben grear if he is able to provide a v1 firmware from 
his ct tree since the qca sourcecodes do contain support
for the v1 revision. but dont expect too much. there was a reason why v1 was 
never really on the market


Hello,

I don't think I can even build a v1 firmware if I wanted to, and I'd much 
rather work on newer
chips.  That v1 was an unstable wreck from the beginning, at least with 
open-source driver.

Thanks,
Ben



Am 10.09.2019 um 14:59 schrieb Tom Psyborg:

On 10/09/2019, Kalle Valo  wrote:

(dropping stable list)

Tom Psyborg  writes:


According to this very old post
http://lists.infradead.org/pipermail/ath10k/2013-July/21.html
seems like you've been misinformed on amount of these cards that were
put out in the market.

At least digipart only have >4 units in stocks
https://www.digipart.com/part/QCA9880-AR1A and other retailers
probably few thousands more.

With that large amount of cards I think it is justified to request
firmware support for the chip. And probably a lot easier to make few
firmware modifications than go hacking a bunch of API calls so it
works with v2 firmware.

I'm very surprised that QCA9880 hw1.0 boards are still available, after
six years. Did you confirm that it really is hw1.0 and not just some
mixup with hardware ids or something like that?

Print on the chip clearly says QCA9880-AR1A. ID same as for v2 - 003C.


old hw1.0 firmware to see if it works.

I don't know which fw blob version that is. I could not find it
online. All files are v2 related.


But if it's really is hw1.0 I doubt there will be any support for that.
I recommend to avoid hw1.0 altogether.

--
Kalle Valo


That would be too bad, even worse when you find out that
qca-wifi-10.2.4.58.1 driver fails to load firmware too. The only one
that works is qca-wifi that comes with tp-link firmware, some very
early version 10.0.108 or somtehing like that that has no available
sources..

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k



___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"

2019-08-21 Thread Ben Greear


Looks like it should work.

Why is this rotting in patchwork?

Thanks
Ben

On 08/21/2019 02:12 PM, Rakesh Pillai wrote:

Hi Ben,
Can you please check https://patchwork.kernel.org/patch/10844513/ ?
This change fixes the below mentioned regression. A different structure is made 
for tlv specific event handling.

Thanks,
Rakesh Pillai.


On 2019-08-21 14:06, Ben Greear wrote:

On 08/21/2019 01:56 PM, Ben Greear wrote:

Hello,

I just noticed in 5.2.7+ kernel than this commit below appears to break WMI
message for my 10.1 firmware, and based on code inspection, 10.2 will be broken
as well.

10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number
after that, but which is not chan_tx_pwr_range.

I guess you need to create your own wmi msg for the WCN3990.

The change to 10.4 chan_info event is also wrong for my relatively
new version of 10.4 code, so likely breaks firmware in use.  last member
in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;'


Sorry, I mis-read this 10.4 part of the patch, it was not changing the wmi event
itself, so probably that part is fine.

Thanks,
Ben




commit 13104929d2ec32aec0552007d55b9e15bc07176b
Author: Rakesh Pillai 
Date:   Wed Oct 17 16:50:03 2018 +0530

 ath10k: fill the channel survey results for WCN3990 correctly



diff --git a/drivers/net/wireless/ath/ath10k/wmi.h 
b/drivers/net/wireless/ath/ath10k/wmi.h
index 4971d61..58e33ab 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -6442,6 +6442,14 @@ struct wmi_chan_info_event {
 __le32 noise_floor;
 __le32 rx_clear_count;
 __le32 cycle_count;
+   __le32 chan_tx_pwr_range;
+   __le32 chan_tx_pwr_tp;
+   __le32 rx_frame_count;
+   __le32 my_bss_rx_cycle_count;
+   __le32 rx_11b_mode_data_duration;
+   __le32 tx_frame_cnt;
+   __le32 mac_clk_mhz;
+
  } __packed;



Thanks,
Ben






--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"

2019-08-21 Thread Ben Greear


On 08/21/2019 01:56 PM, Ben Greear wrote:

Hello,

I just noticed in 5.2.7+ kernel than this commit below appears to break WMI
message for my 10.1 firmware, and based on code inspection, 10.2 will be broken
as well.

10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number
after that, but which is not chan_tx_pwr_range.

I guess you need to create your own wmi msg for the WCN3990.

The change to 10.4 chan_info event is also wrong for my relatively
new version of 10.4 code, so likely breaks firmware in use.  last member
in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;'


Sorry, I mis-read this 10.4 part of the patch, it was not changing the wmi event
itself, so probably that part is fine.

Thanks,
Ben




commit 13104929d2ec32aec0552007d55b9e15bc07176b
Author: Rakesh Pillai 
Date:   Wed Oct 17 16:50:03 2018 +0530

 ath10k: fill the channel survey results for WCN3990 correctly



diff --git a/drivers/net/wireless/ath/ath10k/wmi.h 
b/drivers/net/wireless/ath/ath10k/wmi.h
index 4971d61..58e33ab 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -6442,6 +6442,14 @@ struct wmi_chan_info_event {
 __le32 noise_floor;
 __le32 rx_clear_count;
 __le32 cycle_count;
+   __le32 chan_tx_pwr_range;
+   __le32 chan_tx_pwr_tp;
+   __le32 rx_frame_count;
+   __le32 my_bss_rx_cycle_count;
+   __le32 rx_11b_mode_data_duration;
+   __le32 tx_frame_cnt;
+   __le32 mac_clk_mhz;
+
  } __packed;



Thanks,
Ben




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Regression with commit "ath10k: fill the channel survey results for WCN3990 correctly"

2019-08-21 Thread Ben Greear


Hello,

I just noticed in 5.2.7+ kernel than this commit below appears to break WMI
message for my 10.1 firmware, and based on code inspection, 10.2 will be broken
as well.

10.1 struct ends with cycle_count, and 10.2 ends with one 32-bit number
after that, but which is not chan_tx_pwr_range.

I guess you need to create your own wmi msg for the WCN3990.

The change to 10.4 chan_info event is also wrong for my relatively
new version of 10.4 code, so likely breaks firmware in use.  last member
in that struct in my 10.4 fw src is 'A_UINT32 rx_11b_mode_data_duration;'


commit 13104929d2ec32aec0552007d55b9e15bc07176b
Author: Rakesh Pillai 
Date:   Wed Oct 17 16:50:03 2018 +0530

ath10k: fill the channel survey results for WCN3990 correctly



diff --git a/drivers/net/wireless/ath/ath10k/wmi.h 
b/drivers/net/wireless/ath/ath10k/wmi.h
index 4971d61..58e33ab 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -6442,6 +6442,14 @@ struct wmi_chan_info_event {
__le32 noise_floor;
__le32 rx_clear_count;
__le32 cycle_count;
+   __le32 chan_tx_pwr_range;
+   __le32 chan_tx_pwr_tp;
+   __le32 rx_frame_count;
+   __le32 my_bss_rx_cycle_count;
+   __le32 rx_11b_mode_data_duration;
+   __le32 tx_frame_cnt;
+   __le32 mac_clk_mhz;
+
 } __packed;



Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] ath10k: add mic bytes for pmf management packet

2019-06-17 Thread Ben Greear


On 6/17/19 12:37 AM, Wen Gong wrote:

For PMF case, the action,deauth,disassoc management need to encrypt
by hardware, it need to reserve 8 bytes for encryption, otherwise
the packet will be sent out with error format, then PMF case will
fail.

After add the 8 bytes, it will pass the PMF case.

Tested with QCA6174 SDIO with firmware
WLAN.RMH.4.4.1-5-QCARMSWP-1.

Signed-off-by: Wen Gong 
---
  drivers/net/wireless/ath/ath10k/htt_tx.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c 
b/drivers/net/wireless/ath/ath10k/htt_tx.c
index d8e9cc0..7bef9d9 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -1236,6 +1236,7 @@ static int ath10k_htt_tx_hl(struct ath10k_htt *htt, enum 
ath10k_hw_txrx_mode txm
struct ath10k *ar = htt->ar;
int res, data_len;
struct htt_cmd_hdr *cmd_hdr;
+   struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)msdu->data;
struct htt_data_tx_desc *tx_desc;
struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(msdu);
struct sk_buff *tmp_skb;
@@ -1245,6 +1246,13 @@ static int ath10k_htt_tx_hl(struct ath10k_htt *htt, enum 
ath10k_hw_txrx_mode txm
u8 flags0 = 0;
u16 flags1 = 0;
  
+	if ((ieee80211_is_action(hdr->frame_control) ||

+ieee80211_is_deauth(hdr->frame_control) ||
+ieee80211_is_disassoc(hdr->frame_control)) &&
+ieee80211_has_protected(hdr->frame_control)) {
+   skb_put(msdu, IEEE80211_CCMP_MIC_LEN);
+   }


I was looking at mac80211 code recently, and it seems some action
frames are NOT supposed to be protected.  I added my own helper
method to my local ath10k.  Maybe you want to use this?


/* Copied from ieee80211_is_robust_mgmt_frame, but disable the check for 
has_protected
 * since we do tx hw crypt, and it won't actually be encrypted even when this 
flag is
 * set.
 */
bool ieee80211_is_robust_mgmt_frame_tx(struct ieee80211_hdr *hdr)
{
if (ieee80211_is_disassoc(hdr->frame_control) ||
ieee80211_is_deauth(hdr->frame_control))
return true;

if (ieee80211_is_action(hdr->frame_control)) {
u8 *category;

/*
 * Action frames, excluding Public Action frames, are Robust
 * Management Frames. However, if we are looking at a Protected
 * frame, skip the check since the data may be encrypted and
 * the frame has already been found to be a Robust Management
 * Frame (by the other end).
 */
/*
if (ieee80211_has_protected(hdr->frame_control))
return true;
*/
category = ((u8 *) hdr) + 24;
return *category != WLAN_CATEGORY_PUBLIC &&
*category != WLAN_CATEGORY_HT &&
*category != WLAN_CATEGORY_WNM_UNPROTECTED &&
*category != WLAN_CATEGORY_SELF_PROTECTED &&
*category != WLAN_CATEGORY_UNPROT_DMG &&
*category != WLAN_CATEGORY_VHT &&
*category != WLAN_CATEGORY_VENDOR_SPECIFIC;
}

return false;
}

Thanks,
Ben


+
data_len = msdu->len;
  
  	switch (txmode) {





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] ath10k: fix max antenna gain unit

2019-06-11 Thread Ben Greear


On 6/11/19 5:19 AM, Sven Eckelmann wrote:

From: Sven Eckelmann 

Most of the txpower for the ath10k firmware is stored as twicepower (0.5 dB
steps). This isn't the case for max_antenna_gain - which is still expected
by the firmware as dB.

The firmware is converting it from dB to the internal (twicepower)
representation when it calculates the limits of a channel. This can be seen
in tpc_stats when configuring "12" as max_antenna_gain. Instead of the
expected 12 (6 dB), the tpc_stats shows 24 (12 dB).

Tested on QCA9888 and IPQ4019 with firmware 10.4-3.5.3-00057.


I did a visual inspection of wave-1 firmware source and it appears this change 
would be correct
for it as well.

I would also suggest updating the comments in the wmi.h structure to document 
the
units.

Thanks,
Ben



Fixes: 02256930d9b8 ("ath10k: use proper tx power unit")
Signed-off-by: Sven Eckelmann 
---
Cc: Michal Kazior 

  drivers/net/wireless/ath/ath10k/mac.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 9c703d287333..35d026a2772a 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -1008,7 +1008,7 @@ static int ath10k_monitor_vdev_start(struct ath10k *ar, 
int vdev_id)
arg.channel.min_power = 0;
arg.channel.max_power = channel->max_power * 2;
arg.channel.max_reg_power = channel->max_reg_power * 2;
-   arg.channel.max_antenna_gain = channel->max_antenna_gain * 2;
+   arg.channel.max_antenna_gain = channel->max_antenna_gain;
  
  	reinit_completion(>vdev_setup_done);
  
@@ -1450,7 +1450,7 @@ static int ath10k_vdev_start_restart(struct ath10k_vif *arvif,

arg.channel.min_power = 0;
arg.channel.max_power = chandef->chan->max_power * 2;
arg.channel.max_reg_power = chandef->chan->max_reg_power * 2;
-   arg.channel.max_antenna_gain = chandef->chan->max_antenna_gain * 2;
+   arg.channel.max_antenna_gain = chandef->chan->max_antenna_gain;
  
  	if (arvif->vdev_type == WMI_VDEV_TYPE_AP) {

arg.ssid = arvif->u.ap.ssid;
@@ -3105,7 +3105,7 @@ static int ath10k_update_channel_list(struct ath10k *ar)
ch->min_power = 0;
ch->max_power = channel->max_power * 2;
ch->max_reg_power = channel->max_reg_power * 2;
-   ch->max_antenna_gain = channel->max_antenna_gain * 2;
+   ch->max_antenna_gain = channel->max_antenna_gain;
ch->reg_class_id = 0; /* FIXME */
  
  			/* FIXME: why use only legacy modes, why not any





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-20 Thread Ben Greear


On 5/20/19 12:25 PM, Adrian Chadd wrote:

On Mon, 20 May 2019 at 09:59, Sebastian Gottschall
 wrote:



the curious thing is still that the fallback code applies only for 2.4
ghz so it would never have affected 802.11ac


Hm, does RC fall back to 11na or 11a rates when doing 11ac? (in 5G
mode.) It's good to know fixing that would fix it in 2.4GHz operation
but yeah, I wonder about RC in 5G.


It appears the rate-ctrl tries to fall to CCK 2Mbps or 1Mbps and skips a/g 
rates.  /n
rates are a subset of VHT, so those are used as part of normal VHT rate-ctrl.

I have no explanation for why I saw the tx-hang in stock-ish firmware, which 
indeed
should not have tried to use any a/g rates in 5Ghz.  The high-level failure 
looked exactly like
what I eventually debugged as falling back to /a rates in my firmware, for what 
that
is worth.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-17 Thread Ben Greear


On 5/17/19 8:47 AM, Adrian Chadd wrote:

On Fri, 17 May 2019 at 08:06, Sebastian Gottschall
 wrote:



personally i think going back to basic rates like 2 mbit makes no sense
anyway. its that dead slow that a connection must break and has to be
broken if this doesnt work.
still a shame that beacons are still transmitted in this way and
multicast/broadcast packets as well which causes a hell of problems. but
thats for backward compatibility of cause


It depends on what kind of channel you are. Not everyone is deploying
super dense enterprise APs. :-)

The 11ac and 11ax chips that do constant frequency readjusting work
better in things like moving drones, where you have constant doppler
shift. I know some people doing drone work that just don't bother with
MCS and aggregation because they need a super reliable channel and the
conditions constantly shift.

That said, they're very sad that they can't hack on the 11ac/11ax
firmware to fix some err, less optimal decisions in their use case
space like they can with ath5k/ath9k.

ISTR back at QCA days there were some people on the systems team that
could demonstrate CCK was more stable in some use cases and so didn't
like the Linux rate control behaviour of not falling back to CCK in 2G
11n mode. There was .. pushback against the linux upstream rate
control in this respect right until the linux folk totally deprecated
the QCA rate control in ath9k. :)

(And then bugs like what ben is seeing :)

Ben - did disabling CCK/OFDM fallback rates help? Did you fix the bit
that tries to send AMPDU frames in non-11n rates?


Yes, disabling the fallback appears to have fixed my issue.

I did not try to fix the fallback code because I think it will be quite
complicated to do it properly (I suspect a different tid must be used for this
to work).  I'm not even entirely sure of exactly why the transmit logic fails
in this case, and by the time rate-ctrl logic is queried, I think it is too
late to easily change tids.

And FYI, in my firmware/driver, you can now specify the exact preamble-type, 
mcs, bandwidth, txpower,
retransmit count, etc on a per packet basis.  I'm not sure of all the bugs and 
limitations
in this code, but it at least mostly works as hoped for the ways we are using it
(rx sensitivity test rigs, etc).

Might be of interest if someone wants to do a somewhat limited user-space 
rate-ctrl for
ath10k wave-2.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-17 Thread Ben Greear





On 05/16/2019 09:21 PM, Sebastian Gottschall wrote:


Am 16.05.2019 um 21:40 schrieb Ben Greear:

On 5/15/19 6:00 AM, Ben Greear wrote:

On 5/15/19 5:26 AM, Sebastian Gottschall wrote:


Am 15.05.2019 um 14:20 schrieb Ben Greear:

On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which 
commands have been used for generating the traffic etc. (iperf3?)


I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth.  Maybe you can test with UDP small-packet 
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you 
see issues?

thats the plan. can you do a test with iperf3 to see if its reproduceable. i 
mean i will test it on ipq based boards and x64. but to make sure that the 
scenario
is identical which raised up your issue, it would be find if we have identical 
software for testing including the same options


I think I found the issue.  The rate-ctrl logic in the firmware allows a 
transition from HT/VHT 20 MCS0 down to OFDM rates.
It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz 
and a VHT rate for 80Mhz (or maybe just the
single OFDM rate is the fault).

If you can edit firmware, then setting this to 0 probably fixes the issue.

g_rc_cck_rate_allowed


according to the code this variable has only effect on 2.4 ghz. the fallback to 
cck rates will only be done if phymode is 2.4 ghz


Ok, maybe the symptom I saw with stock-ish firmware was due to some other 
cause.  In my firmware,
I had "fixed" that cck-fallback to use OFDM rates in case CCK was not 
available, so mine was
definitely trying to use an OFDM rate.

That said, very likely the same bug exists in upstream QCA firmware for 2.4Ghz 
radios where CCK is available,
so still might be worth fixing or at least adding API to let the user disable 
the fallback in case strange
problems are seen.

I am guessing that if it really wants to send OFDM/CCK rates, then it will have 
to use a different
TID that is not set up for AMPDUs, and the current code does not deal with that 
as far as I can tell.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-16 Thread Ben Greear

On 5/16/19 1:16 PM, Adrian Chadd wrote:

You can't do AMPDU with OFDM/CCK. If they're setting the AMPDU bit
then that's wrong. it needs to be individual MPDU/PPDUs.

There's a benefit for CCK. OFDM 6M is I think roughly the same as OFDM
MCS0. But CCK is a lot more reliable.

5Ghz can (should) not do CCK anyway. Do you have any reference for why
you think CCK will be better? The one I found shows otherwise:

https://d2cpnw0u24fjm4.cloudfront.net/wp-content/uploads/LaminatedCard_RevolutionWiFiMCStoSNRSinglePage.png

Thanks,
Ben

-adrian

On Thu, 16 May 2019 at 13:10, Ben Greear wrote:

On 5/16/19 12:55 PM, Adrian Chadd wrote:

You can totally go down to OFDM yeah but you then need to send it at
20MHz and non-AMPDU.

Is it maybe the retry code + rate control code is retagging an AMPDU
at a lower rate and it's transitioning down to CCK/OFDM without
breaking the AMPDU apart?

It was sending a one-frame AMPDU, and one frame AMSDU for that matter. Maybe
there is some bit in the tx descriptor that needed to be twiddled as well
to make OFDM able to work, but I don't know what that would be.

Is there any advantage of (any) OFDM over MCS0 HT 20Mhz as far as range or
SNR goes? The chart I found made it look like there was not, and if
not, then why bother at all with OFDM if peer advertises HT/VHT rates?

Thanks,
Ben

-a

On Thu, 16 May 2019 at 12:40, Ben Greear wrote:

On 5/15/19 6:00 AM, Ben Greear wrote:

On 5/15/19 5:26 AM, Sebastian Gottschall wrote:

Am 15.05.2019 um 14:20 schrieb Ben Greear:

On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which
commands have been used for generating the traffic etc. (iperf3?)

I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you
see issues?

thats the plan. can you do a test with iperf3 to see if its reproduceable. i
mean i will test it on ipq based boards and x64. but to make sure that the
scenario
is identical which raised up your issue, it would be find if we have identical
software for testing including the same options

I think I found the issue. The rate-ctrl logic in the firmware allows a
transition from HT/VHT 20 MCS0 down to OFDM rates.
It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz
and a VHT rate for 80Mhz (or maybe just the
single OFDM rate is the fault).

If you can edit firmware, then setting this to 0 probably fixes the issue.

g_rc_cck_rate_allowed

I think to reproduce you'd need to send high speed traffic in a situation where
the
RF environment is going to make rate-ctrl fail quite a bit. (Slow speed should
work too, but it would likely take a lot longer).

And, it is always possible that whatever I saw when testing mostly-stock FW is
different
from what I eventually debugged to in my firmware. Still, from looking at MCS
vs SNR
charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz.

Thanks,
Ben

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-16 Thread Ben Greear


On 5/16/19 12:55 PM, Adrian Chadd wrote:

You can totally go down to OFDM yeah but you then need to send it at
20MHz and non-AMPDU.

Is it maybe the retry code + rate control code is retagging an AMPDU
at a lower rate and it's transitioning down to CCK/OFDM without
breaking the AMPDU apart?


It was sending a one-frame AMPDU, and one frame AMSDU for that matter.  Maybe
there is some bit in the tx descriptor that needed to be twiddled as well
to make OFDM able to work, but I don't know what that would be.

Is there any advantage of (any) OFDM over MCS0 HT 20Mhz as far as range or
SNR goes?  The chart I found made it look like there was not, and if
not, then why bother at all with OFDM if peer advertises HT/VHT rates?

Thanks,
Ben




-a

On Thu, 16 May 2019 at 12:40, Ben Greear  wrote:


On 5/15/19 6:00 AM, Ben Greear wrote:

On 5/15/19 5:26 AM, Sebastian Gottschall wrote:


Am 15.05.2019 um 14:20 schrieb Ben Greear:

On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which 
commands have been used for generating the traffic etc. (iperf3?)


I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth.  Maybe you can test with UDP small-packet 
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you 
see issues?

thats the plan. can you do a test with iperf3 to see if its reproduceable. i 
mean i will test it on ipq based boards and x64. but to make sure that the 
scenario
is identical which raised up your issue, it would be find if we have identical 
software for testing including the same options


I think I found the issue.  The rate-ctrl logic in the firmware allows a 
transition from HT/VHT 20 MCS0 down to OFDM rates.
It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz 
and a VHT rate for 80Mhz (or maybe just the
single OFDM rate is the fault).

If you can edit firmware, then setting this to 0 probably fixes the issue.

g_rc_cck_rate_allowed

I think to reproduce you'd need to send high speed traffic in a situation where 
the
RF environment is going to make rate-ctrl fail quite a bit.  (Slow speed should
work too, but it would likely take a lot longer).

And, it is always possible that whatever I saw when testing mostly-stock FW is 
different
from what I eventually debugged to in my firmware.  Still, from looking at MCS 
vs SNR
charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-16 Thread Ben Greear


On 5/15/19 6:00 AM, Ben Greear wrote:

On 5/15/19 5:26 AM, Sebastian Gottschall wrote:


Am 15.05.2019 um 14:20 schrieb Ben Greear:

On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which 
commands have been used for generating the traffic etc. (iperf3?)


I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth.  Maybe you can test with UDP small-packet 
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you 
see issues?

thats the plan. can you do a test with iperf3 to see if its reproduceable. i 
mean i will test it on ipq based boards and x64. but to make sure that the 
scenario
is identical which raised up your issue, it would be find if we have identical 
software for testing including the same options


I think I found the issue.  The rate-ctrl logic in the firmware allows a 
transition from HT/VHT 20 MCS0 down to OFDM rates.
It seems the hardware does not like to see an AMPDU with an OFDM rate for 20Mhz 
and a VHT rate for 80Mhz (or maybe just the
single OFDM rate is the fault).

If you can edit firmware, then setting this to 0 probably fixes the issue.

g_rc_cck_rate_allowed

I think to reproduce you'd need to send high speed traffic in a situation where 
the
RF environment is going to make rate-ctrl fail quite a bit.  (Slow speed should
work too, but it would likely take a lot longer).

And, it is always possible that whatever I saw when testing mostly-stock FW is 
different
from what I eventually debugged to in my firmware.  Still, from looking at MCS 
vs SNR
charts, there seems to be no advantage to trying OFDM vs MCS0 for 20Mhz.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-15 Thread Ben Greear

On 5/15/19 5:26 AM, Sebastian Gottschall wrote:

Am 15.05.2019 um 14:20 schrieb Ben Greear:

On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which
commands have been used for generating the traffic etc. (iperf3?)

I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth. Maybe you can test with UDP small-packet
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you
see issues?

One of our other engineers will try to reproduce it on a different system today.

And in case you are sniffing during your own testing, I'd be curious if you see
any AMSDU frames? I only see AMPDUS full of single-packet frames. I would
expect
several of the 512b frames to be put into AMSDU sub-frames. I plan to look
into that
after I figure out the tx stall issue.

Thanks,
Ben

From debugging yesterday, I see a lot of tx-hang notifications in the firmware,
and
I also believe I saw it trying to transmit with a 0 rate-indx, which is 11Mbps
CCK,
which is invalid for 5Ghz. I'll be debugging that more today. I don't know if
stock
firmware fails for the same reasons, but the symptom looked the same.

could be a buffer overflow/locking due a udp flooding. so a missing check to
drop packets which are out of limit or a too restrictive buffer management.
like static frame buffers of max mtu size, but its just used partially by frame due the small size of the udp packets. you may know it better since you have
much better

knowledge about the firmware internals.

Thanks,
Ben

Sebastian

Am 15.05.2019 um 03:52 schrieb Ben Greear:

Hello,

I found a strange issue and curious if someone has seen similar.

I made an AP where the AP interface acts as a routed interface. I generate
traffic through another interface in the router. When sending 300Mbps of 512
byte
UDP payloads, in the downstream direction, and with the station being a 1x1 /AC
device,
then the AP NIC appears to mostly lock up within about 1 minute. I still see
beacons, but no
data frames. In some cases, I reproduced with very slow speed traffic as well.

I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as
well as my
hacked upon firmware, and all act similarly. I'm using the 4.20 kernel, but at
least for now,
it does not appear to be a kernel issue.

If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it
is much harder
to reproduce (and maybe cannot be reproduced). Also, when generating traffic
directly on
the AP device instead of using the routed interface as a traffic source, it is
harder to
reproduce.

Thanks,
Ben

--
Ben Greear
Candela Technologies Inc http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Problem with 9984 in routed mode with 512b frames.

2019-05-15 Thread Ben Greear


On 05/14/2019 09:26 PM, Sebastian Gottschall wrote:

can you send me a detailed instruction for testing this on my devices? so which 
commands have been used for generating the traffic etc. (iperf3?)


I am using our own traffic generator, but I imagine iperf3 should work fine too.

I am testing on x86-64 and so forth.  Maybe you can test with UDP small-packet 
load on your platform
in routed mode (ie, external iperf generator through your AP) and see if you 
see issues?

From debugging yesterday, I see a lot of tx-hang notifications in the firmware, 
and
I also believe I saw it trying to transmit with a 0 rate-indx, which is 11Mbps 
CCK,
which is invalid for 5Ghz.  I'll be debugging that more today.  I don't know if 
stock
firmware fails for the same reasons, but the symptom looked the same.

Thanks,
Ben



Sebastian

Am 15.05.2019 um 03:52 schrieb Ben Greear:

Hello,

I found a strange issue and curious if someone has seen similar.

I made an AP where the AP interface acts as a routed interface.  I generate
traffic through another interface in the router.  When sending 300Mbps of 512 
byte
UDP payloads, in the downstream direction, and with the station being a 1x1 /AC 
device,
then the AP NIC appears to mostly lock up within about 1 minute. I still see 
beacons, but no
data frames.  In some cases, I reproduced with very slow speed traffic as well.

I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as 
well as my
hacked upon firmware, and all act similarly.  I'm using the 4.20 kernel, but at 
least for now,
it does not appear to be a kernel issue.

If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it 
is much harder
to reproduce (and maybe cannot be reproduced).  Also, when generating traffic 
directly on
the AP device instead of using the routed interface as a traffic source, it is 
harder to
reproduce.

Thanks,
Ben





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Problem with 9984 in routed mode with 512b frames.

2019-05-14 Thread Ben Greear


Hello,

I found a strange issue and curious if someone has seen similar.

I made an AP where the AP interface acts as a routed interface.  I generate
traffic through another interface in the router.  When sending 300Mbps of 512 
byte
UDP payloads, in the downstream direction, and with the station being a 1x1 /AC 
device,
then the AP NIC appears to mostly lock up within about 1 minute.  I still see 
beacons, but no
data frames.  In some cases, I reproduced with very slow speed traffic as well.

I tried using a mostly un-modified firmware (ie, similar to upstream QCA), as 
well as my
hacked upon firmware, and all act similarly.  I'm using the 4.20 kernel, but at 
least for now,
it does not appear to be a kernel issue.

If I use larger MTU sized frames, or have a 2x2 station instead of 1x1 then it 
is much harder
to reproduce (and maybe cannot be reproduced).  Also, when generating traffic 
directly on
the AP device instead of using the routed interface as a traffic source, it is 
harder to
reproduce.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2] mac80211: remove warning message

2019-05-14 Thread Ben Greear


On 5/14/19 11:40 AM, Johannes Berg wrote:



We know the WARN hits, we have the backtrace, and it is easy enough (in my setup
at least) to reproduce this.  So, the WARN logic has done its job.

Having more of these spam the kernel doesn't add much benefit I think.


Well, this doesn't necessarily just catch a *single* issue, so it should
remain for the future, I'd think.


Anyone have any suggestions on how to fix the underlying issue?


I don't even have the backtrace and scenario that causes it.

johannes



Here is the info I have in my commit that changed this to WARN_ON_ONCE.
I never posted it because I had to hack ath10k to get to this state, so maybe
this is not a valid case to debug.


Maybe Yibo Zhao has a better example.

mac80211: don't spam kernel logs when chantx is null.

I set up ath10k to be chandef based again in order to test
WDS.  My WDS stations are not very functional yet, and
when ethtool stats are queried, there is a WARN_ON splat
generated.  Change this to WARN_ON_ONCE so that there is
less kernel spam.

[ 2401.445631] WARNING: CPU: 1 PID: 14070 at 
/home/greearb/git/linux-4.13.dev.y/net/mac80211/ieee80211_i.h:1452 
sta_set_rate_info_tx+0x18c/0x1a0 [mac80211]
[ 2401.445727] Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink wanlink(O) ath10k_pci ath10k_core mac80211_hwsim ath5k ath9k ath9k_common 
ath9k_hw ath mac80211 cfg80211 nf_defrag_ipv4 libcrc32c 8021q garp mrp stp llc fuse macvlan pktgen nfsv3 nfs fscache amd64_edac_mod edac_mce_amd kvm_amd kvm 
irqbypass sp5100_tco fam15h_power k10temp i2c_piix4 ccp shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sch_fq_codel sunrpc sdhci_pci sdhci mmc_core 
igb hwmon ptp pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt [last unloaded: nfnetlink]

[ 2401.445911] CPU: 1 PID: 14070 Comm: btserver Tainted: GW  O
4.13.11+ #18
[ 2401.445914] Hardware name: PC Engines apu2/apu2, BIOS 88a4f96 03/07/2016
[ 2401.445918] task: 880118c73b80 task.stack: c9000140
[ 2401.445973] RIP: 0010:sta_set_rate_info_tx+0x18c/0x1a0 [mac80211]
[ 2401.446003] RSP: 0018:c90001403820 EFLAGS: 00010246
[ 2401.446007] RAX:  RBX: 8800ca38e4a0 RCX: 
000c
[ 2401.446010] RDX:  RSI: 8800ca38e4a0 RDI: 
8800ca38e000
[ 2401.446013] RBP: c90001403850 R08: a04437a0 R09: 
2000
[ 2401.446016] R10: 1183be82 R11:  R12: 
c90001403970
[ 2401.446018] R13:  R14: 8800c01d8900 R15: 
8800ca180780
[ 2401.446023] FS:  7f8123ed7740() GS:88011ec8() 
knlGS:
[ 2401.446026] CS:  0010 DS:  ES:  CR0: 80050033
[ 2401.446029] CR2: 036e8018 CR3: c0b29000 CR4: 
000406e0
[ 2401.446034] Call Trace:
[ 2401.446140]  sta_set_sinfo+0x629/0x8b0 [mac80211]
[ 2401.446192]  ieee80211_get_stats+0x3f2/0x8c0 [mac80211]
[ 2401.446207]  ? __nla_put+0x20/0x30
[ 2401.446221]  ? __kmalloc_reserve.isra.35+0x2c/0x80
[ 2401.446229]  ? netlink_deliver_tap+0x2d/0x1e0
[ 2401.446235]  ? sock_def_readable+0x6d/0x70
[ 2401.446239]  ? __netlink_sendskb+0x36/0x40
[ 2401.446245]  ? netlink_unicast+0x1b0/0x1f0
[ 2401.446252]  ? rtnl_getlink+0x135/0x1c0
[ 2401.446261]  ? get_page_from_freelist+0x913/0xac0
[ 2401.446270]  ? vmap_page_range_noflush+0x27d/0x370
[ 2401.446277]  ? map_vm_area+0x31/0x40
[ 2401.446284]  ? __vmalloc_node_range+0x21f/0x270
[ 2401.446319]  dev_ethtool+0x11d1/0x1ce0
[ 2401.446325]  ? __rtnl_unlock+0x25/0x50
[ 2401.446330]  ? netdev_run_todo+0x4d/0x2e0
[ 2401.446338]  ? dev_get_by_name_rcu+0x6f/0xa0
[ 2401.446344]  dev_ioctl+0x330/0x550
[ 2401.446349]  ? reuse_swap_page+0x30/0x100
[ 2401.446355]  sock_do_ioctl+0x3d/0x50
[ 2401.446359]  ? sock_do_ioctl+0x3d/0x50
[ 2401.446363]  sock_ioctl+0x1e5/0x2a0
[ 2401.446370]  do_vfs_ioctl+0x8b/0x5b0
[ 2401.446376]  ? getnstimeofday64+0x9/0x20
[ 2401.446383]  ? __audit_syscall_entry+0xba/0x110
[ 2401.446391]  ? syscall_trace_enter+0x1b0/0x2b0
[ 2401.446395]  SyS_ioctl+0x74/0x80
[ 2401.446400]  ? __audit_syscall_exit+0x215/0x2b0
[ 2401.446405]  do_syscall_64+0x5c/0x190
[ 2401.446412]  entry_SYSCALL64_slow_path+0x25/0x25


Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH v2] mac80211: remove warning message

2019-05-14 Thread Ben Greear


On 5/14/19 8:44 AM, Joe Perches wrote:

On Tue, 2019-05-14 at 11:12 +0200, Johannes Berg wrote:

On Tue, 2019-05-14 at 17:10 +0800, Yibo Zhao wrote:

On 2019-05-14 17:05, Johannes Berg wrote:

On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything
it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

[]

I was planning to use WARN_ON_ONCE() in the first place to replace
WARN_ON() then after some discussion, we think removing it could be
better. So the first patch was based on my first version which is sent
incorrectly. Please check again.

[]

I guess changing it to WARN_ON_ONCE() makes sense,


WARN_ON_RATELIMIT exists.


We know the WARN hits, we have the backtrace, and it is easy enough (in my setup
at least) to reproduce this.  So, the WARN logic has done its job.

Having more of these spam the kernel doesn't add much benefit I think.

Anyone have any suggestions on how to fix the underlying issue?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH] mac80211: remove warning message

2019-05-10 Thread Ben Greear





On 05/10/2019 12:01 AM, Yibo Zhao wrote:

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.


Since it is a WARN_ON_ONCE, how it the console hang due to warnings?  You should
get no more than once per boot?

I have no problem with removing it though.  Seems a harmless splat and I removed
it from my tree some time back as well.

Thanks,
Ben



If this case can be hit during normal functionality, there should be no
WARN_ON(). Those should be reserved to cases that are not supposed to be
hit at all or some other more specific cases like indicating obsolete
interface.

Signed-off-by: Zhi Chen 
Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2ae0364..f39c289 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1435,7 +1435,7 @@ struct ieee80211_local {
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);

-   if (WARN_ON_ONCE(!chanctx_conf)) {
+   if (!chanctx_conf) {
rcu_read_unlock();
return NULL;
}



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: ath10k: wmi service ready event not received

2019-05-10 Thread Ben Greear





On 05/10/2019 05:28 AM, Linus Torvalds wrote:

Hmm.

I have a nice new laptop, and it works fine. Except today it lost
wireless, and I have no idea why.

It's not happened before (but it's fairly new and I'm actually on my
first trip with it), so I don't know how common this is, but the
kernel messages seem to say that the cause of it was

  ath10k_pci :02:00.0: wmi service ready event not received
  ath10k_pci :02:00.0: could not init core (-110)
  ath10k_pci :02:00.0: could not probe fw (-110)

and then nothing works. -110 is ETIMEDOUT, fwiw.

Rebooting got wireless back. It's possible I could have done something
less drastic, but I was thinking that it would be the new kernel and
rebooted into an older version. But then rebooting into the new one
afterwards (double-checking before starting a bisect) and it all
worked.

Is there anything I can do to debug this if it happens again?


Please provide 'lspci' or other info on the NIC chipset, for reference.

Sometimes a work-around is:

rmmod ath10k_pci ath10k_core; modprobe ath10k_pci

Sometimes you will get a firmware register dump in this crash case, and then 
someone
from QCA might can get a backtrace if you post that with the chipset info and
such (or if it is one of the NICs my ath10k-ct firmware supports and you can 
reproduce
an issue with that firmware, then I can debug it).

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug

2019-04-26 Thread Ben Greear


On 4/26/19 6:38 AM, Venkateswara Naralasetty wrote:


   #ifdef CONFIG_ATH10K_DEBUG
-void ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask,
-   const char *fmt, ...)
+void __ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask,
+ const char *fmt, ...)
   {
struct va_format vaf;
va_list args;


Do you still need the check later in this method:

if (ath10k_debug_mask & mask)

since you already checked in the ath10k_dbg() macro?

Yes, we need this check.
Otherwise all debug messages will be printed even without any debug mask set in 
case of tracing enabled.


Ahh, I see.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug

2019-04-26 Thread Ben Greear


On 4/26/19 6:44 AM, Michał Kazior wrote:

On Fri, 26 Apr 2019 at 14:58, Venkateswara Naralasetty
 wrote:


ath10k_dbg() is called in ath10k_process_rx() with huge set of arguments
which is causing CPU overhead even when debug_mask is not set.
Good improvement was observed in the receive side performance when call
to ath10k_dbg() is avoided in the RX path.

[...]


+/* Avoid calling __ath10k_dbg() if debug_mask is not set and tracing
+ * disabled.
+ */
+#define ath10k_dbg(ar, dbg_mask, fmt, ...) \
+do {   \
+   if ((ath10k_debug_mask & dbg_mask) ||   \
+   trace_ath10k_log_dbg_enabled()) \
+   __ath10k_dbg(ar, dbg_mask, fmt, ##__VA_ARGS__); \
+} while (0)


Did you consider using jump labels (see include/linux/jump_label.h)?
It's what tracing uses under the hood. I wonder if you could squeeze
out a bit more performance with that? I guess you'd need to add
`struct static_key ath10k_dbg_mask_keys[ATH10K_DBG_MAX]` and re-do
ath10k_debug_mask enum a bit.


Maybe first test with debugging just compiled out to see if there is still
any significant overhead with this new patch applied?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCHv2] ath10k: Add wrapper function to ath10k debug

2019-04-26 Thread Ben Greear


On 4/26/19 5:58 AM, Venkateswara Naralasetty wrote:

ath10k_dbg() is called in ath10k_process_rx() with huge set of arguments
which is causing CPU overhead even when debug_mask is not set.
Good improvement was observed in the receive side performance when call
to ath10k_dbg() is avoided in the RX path.

Since currently all debug messages are sent via tracing infrastructure,
we cannot entirely avoid calling ath10k_dbg. Therefore, call to
ath10k_dbg() is made conditional based on tracing config in the driver.

Trasmit performance remains unchanged with this patch; below are some
experimental results with this patch and tracing disabled.

mesh mode:

w/o this patch  with this patch
Traffic   TP  CPU Usage  TP  CPU usage

TCP  840Mbps76.53%  960Mbps78.14%
UDP  1030Mbps   74.58%  1132Mbps   74.31%

Infra mode:

w/o this patch  with this patch
TrafficTP  CPU Usage  TP  CPU usage

TCP Rx   1241Mbps   80.89%  1270Mbps   73.50%
UDP Rx   1433Mbps   81.77%  1472Mbps   72.80%

Tested platform : IPQ8064
hardware used   : QCA9984
firmware ver: ver 10.4-3.5.3-00057

Signed-off-by: Kan Yan 
Signed-off-by: Venkateswara Naralasetty 
---
v2:
  * changed trace enabled check from IS_ENABLED(CONFIG_ATH10K_TRACING)
  * to trace_ath10k_log_dbg_enabled().

  drivers/net/wireless/ath/ath10k/core.c  |  2 ++
  drivers/net/wireless/ath/ath10k/debug.c |  8 
  drivers/net/wireless/ath/ath10k/debug.h | 22 --
  drivers/net/wireless/ath/ath10k/trace.c |  1 +
  drivers/net/wireless/ath/ath10k/trace.h |  6 +-
  5 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c 
b/drivers/net/wireless/ath/ath10k/core.c
index cfd7bb2..ab709bf 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -26,6 +26,8 @@
  #include "coredump.h"
  
  unsigned int ath10k_debug_mask;

+EXPORT_SYMBOL(ath10k_debug_mask);
+
  static unsigned int ath10k_cryptmode_param;
  static bool uart_print;
  static bool skip_otp;
diff --git a/drivers/net/wireless/ath/ath10k/debug.c 
b/drivers/net/wireless/ath/ath10k/debug.c
index 32d967a..1b63929 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -2620,8 +2620,8 @@ void ath10k_debug_unregister(struct ath10k *ar)
  #endif /* CONFIG_ATH10K_DEBUGFS */
  
  #ifdef CONFIG_ATH10K_DEBUG

-void ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask,
-   const char *fmt, ...)
+void __ath10k_dbg(struct ath10k *ar, enum ath10k_debug_mask mask,
+ const char *fmt, ...)
  {
struct va_format vaf;
va_list args;


Do you still need the check later in this method:

if (ath10k_debug_mask & mask)

since you already checked in the ath10k_dbg() macro?

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [BUG] Can't change default country code from US to DE with Compex WLE900VX (Debian 10 Buster amd64)

2019-04-07 Thread Ben Greear


On 04/06/2019 11:19 PM, Rene 'Renne' Bartsch, B.Sc. Informatics wrote:

Hi,

I posted this to the linux-wireless mailing-list 4 days ago and did not receive 
an answer.

The ath10k module sets the country code to "US" at initialization. After that the country 
code can't be changed anymore (e.g. to "DE").


You might have to patch the driver and maybe the ath/ logic too, but the
country code is handled in driver/kernel, so at least you can fix it.

Thanks,
Ben



Compex support suggests setting "reg->country_code = CTRY_UNITED_STATES;" in 
"/drivers/net/wireless/ath/regd.c" to the local country.
Patching and re-compiling every kernel-update isn't an option on an UEFI-only 
production system.

Kernel version: 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 
GNU/Linux on Debian 10 Buster

The WLE900VX is based on the QCA XB140 reference design.

Until Thursday we have the option to return the cards to the dealer.

Thanx for any hint,

Renne


renne@cloud:/lib/firmware/ath10k/QCA988X/hw2.0$ uname -a
Linux cloud 4.19.0-4-amd64 #1 SMP Debian 4.19.28-2 (2019-03-15) x86_64 GNU/Linux

renne@cloud:~$ ls /lib/firmware/ath10k/QCA988X/hw2.0/
board.bin  firmware-4.bin  firmware-5.bin

What doesn't work:

/etc/modprobe.d/cfg80211.conf:
options cfg80211 ieee80211_regdom=DE

/etc/hostapd/hostapd.conf:
...
ieee80211d=1
country_code=DE
...

root@cloud:/# export COUNTRY=DE; /sbin/crda
Failed to set regulatory domain: -7

root@cloud:/# iw reg set DE && iw reg get
global
country 98: DFS-UNSET
   (2402 - 2472 @ 40), (N/A, 20), (N/A)
   (5170 - 5250 @ 80), (N/A, 20), (N/A), NO-OUTDOOR, AUTO-BW
   (5250 - 5330 @ 80), (N/A, 20), (0 ms), NO-OUTDOOR, DFS, AUTO-BW
   (5490 - 5725 @ 160), (N/A, 23), (0 ms), DFS
   (5725 - 5730 @ 5), (N/A, 13), (0 ms), DFS
   (5735 - 5835 @ 80), (N/A, 13), (N/A)
   (57240 - 63720 @ 2160), (N/A, 40), (N/A)

phy#1
country US: DFS-FCC
  (2402 - 2472 @ 40), (N/A, 30), (N/A)
  (5170 - 5250 @ 80), (N/A, 23), (N/A), AUTO-BW
  (5250 - 5330 @ 80), (N/A, 23), (0 ms), DFS, AUTO-BW
  (5490 - 5730 @ 160), (N/A, 23), (0 ms), DFS
  (5735 - 5835 @ 80), (N/A, 30), (N/A)
  (57240 - 63720 @ 2160), (N/A, 40), (N/A)

phy#0
country US: DFS-FCC
   (2402 - 2472 @ 40), (N/A, 30), (N/A)
   (5170 - 5250 @ 80), (N/A, 23), (N/A), AUTO-BW
   (5250 - 5330 @ 80), (N/A, 23), (0 ms), DFS, AUTO-BW
   (5490 - 5730 @ 160), (N/A, 23), (0 ms), DFS
   (5735 - 5835 @ 80), (N/A, 30), (N/A)
   (57240 - 63720 @ 2160), (N/A, 40), (N/A)


LOGs:

renne@cloud:~$ sudo dmesg | grep ath
[4.630113] ath10k_pci :04:00.0: enabling device ( -> 0002)
[4.630548] ath10k_pci :04:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 
reset_mode 0
[4.803687] ath10k_pci :04:00.0: firmware: failed to load 
ath10k/pre-cal-pci-:04:00.0.bin (-2)
[4.803700] ath10k_pci :04:00.0: firmware: failed to load 
ath10k/cal-pci-:04:00.0.bin (-2)
[4.803872] ath10k_pci :04:00.0: firmware: failed to load 
ath10k/QCA988X/hw2.0/firmware-6.bin (-2)
[4.804994] ath10k_pci :04:00.0: firmware: direct-loading firmware 
ath10k/QCA988X/hw2.0/firmware-5.bin
[4.804999] ath10k_pci :04:00.0: qca988x hw2.0 target 0x4100016c chip_id 
0x043202ff sub :
[4.805000] ath10k_pci :04:00.0: kconfig debug 0 debugfs 0 tracing 0 dfs 
0 testmode 0
[4.805147] ath10k_pci :04:00.0: firmware ver 10.2.4-1.0-00041 api 5 
features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 f43fa422
[4.840158] ath10k_pci :04:00.0: firmware: failed to load 
ath10k/QCA988X/hw2.0/board-2.bin (-2)
[4.840338] ath10k_pci :04:00.0: firmware: direct-loading firmware 
ath10k/QCA988X/hw2.0/board.bin
[4.840343] ath10k_pci :04:00.0: board_file api 1 bmi_id N/A crc32 
bebc7c08
[5.988575] ath10k_pci :04:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp 
max-sta 128 raw 0 hwcrypto 1
[6.067999] ath: EEPROM regdomain: 0x0
[6.068000] ath: EEPROM indicates default country code should be used
[6.068000] ath: doing EEPROM country->regdmn map search
[6.068001] ath: country maps to regdmn code: 0x3a
[6.068002] ath: Country alpha2 being used: US
[6.068002] ath: Regpair used: 0x3a
[6.078078] ath10k_pci :04:00.0 wlp4s0: renamed from wlan0
[ 5099.420780] ath10k_pci :04:00.0: pdev param 0 not supported by firmware

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Don't log a traceback on invalid event IDs.

2019-04-05 Thread Ben Greear

ff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c 
b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
index 582fb11f648..ca990c8d306 100644
--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
@@ -614,7 +614,7 @@ static void ath10k_wmi_tlv_op_rx(struct ath10k *ar, struct 
sk_buff *skb)
ath10k_wmi_event_mgmt_tx_bundle_compl(ar, skb);
break;
default:
-   ath10k_dbg(ar, ATH10K_DBG_WMI, "Unknown eventid: %d\n", id);
+   ath10k_info(ar, ATH10K_DBG_WMI, "Unknown eventid: %d\n", id);
break;
}

diff --git a/drivers/net/wireless/ath/ath10k/wmi.c 
b/drivers/net/wireless/ath/ath10k/wmi.c
index 98a90e49d66..f4fa406d9fe 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -5850,7 +5850,7 @@ static void ath10k_wmi_op_rx(struct ath10k *ar, struct 
sk_buff *skb)
ath10k_wmi_event_service_available(ar, skb);
break;
default:
-   ath10k_warn(ar, "Unknown eventid: %d\n", id);
+   ath10k_info(ar, "Unknown eventid: %d\n", id);
break;
}

@@ -5981,7 +5981,7 @@ static void ath10k_wmi_10_1_op_rx(struct ath10k *ar, 
struct sk_buff *skb)
/* ignore utf events */
break;
default:
-   ath10k_warn(ar, "Unknown eventid: %d\n", id);
+   ath10k_info(ar, "Unknown eventid: %d\n", id);
break;
}

@@ -6130,7 +6130,7 @@ static void ath10k_wmi_10_2_op_rx(struct ath10k *ar, 
struct sk_buff *skb)
ath10k_wmi_event_peer_sta_ps_state_chg(ar, skb);
break;
default:
-   ath10k_warn(ar, "Unknown eventid: %d\n", id);
+   ath10k_info(ar, "Unknown eventid: %d\n", id);
break;
}

@@ -6250,7 +6250,7 @@ static void ath10k_wmi_10_4_op_rx(struct ath10k *ar, 
struct sk_buff *skb)
ath10k_wmi_event_peer_sta_ps_state_chg(ar, skb);
break;
default:
-   ath10k_warn(ar, "Unknown eventid: %d\n", id);
+   ath10k_info(ar, "Unknown eventid: %d\n", id);
break;
}




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Kernel crash in skb_put caused by ath10k_htt_t2h_msg_handler

2019-04-05 Thread Ben Greear





On 04/05/2019 02:24 AM, Petr Štetiar wrote:

Hi,

I've just hit following crash on my TP Link Archer C7 v5 with QCA9880,
running latest OpenWrt with 4.14.109 (4.19.23-1 backports) and qca988x-ct
firmware:


Hello,

Can you use gdb to print out the lines of code around that crash site in 
t2h_msg_handler?  If
I can figure out which message caused it I can add debugging and/or protective 
code.

Thanks,
Ben



 skbuff: skb_over_panic: text:87622780 len:360 put:360 head:  (null) data:  (null) 
tail:0x168 end:0x0 dev:
 Kernel bug detected[#1]:
 CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.109 #0
 task: 804e5490 task.stack: 804e
 $ 0   :  0001 006f 
 $ 4   : 804ea7d8 804ea7d8 804f1a50 85c8
 $ 8   :   0007 
 $12   : 01c1 efec8fad 01c0 
 $16   :  87551bf8 87263054 0416
 $20   : 875514e0 87c07dc0 87551bf8 
 $24   : 0002 80279994
 $28   : 804e 87c07d40 8765d994 802f4bd0
 Hi: 
 Lo: ec4e4000
 epc   : 802f4bd0 skb_panic+0x58/0x5c
 ra: 802f4bd0 skb_panic+0x58/0x5c
 Status: 1100dc03 KERNEL EXL IE
 Cause : 00800024 (ExcCode 09)
 PrId  : 00019750 (MIPS 74Kc)
 Modules linked in: ath9k ath9k_common ath9k_hw ath10k_pci ath10k_core ath 
mac80211 iptable_nat iptable_mangle iptable_filter ipt_REJECT ipt_MASQUERADE 
ip_tables cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark 
xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG 
xt_FLOWOFFLOAD xt_CT x_tables nf_reject_ipv4 nf_nat_redirect 
nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 
nf_log_common nf_flow_table_hw nf_flow_table nf_defrag_ipv4 
nf_conntrack_rtcache nf_conntrack compat ledtrig_usbport tun ehci_platform 
ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
 Process swapper (pid: 0, threadinfo=804e, task=804e5490, tls=)
 Stack : 80501cc0 8041f040 87622780 0168 0168   0168
  804b209c 1770 802f4c60 8749 87649118 804fcbf4 0900
 01080020 87622780 0900 875514e0 875578c4 804e  
 fffb 87c07dc4 87551cdc 804e 875514e0 804e  0004
 87263054 87263054 0001 0034 87465180 8755777c 87465b40 8766
 ...
 Call Trace:
 [<802f4bd0>] skb_panic+0x58/0x5c
 [<802f4c60>] skb_put+0x48/0x54
 [<87622780>] ath10k_htt_t2h_msg_handler+0x27e0/0x31dc [ath10k_core]
 [<8764901c>] ath10k_ce_rx_update_write_idx+0x9c/0xc4 [ath10k_core]
 Code: 00602825  0c02ce18  248433d4 <000c000d> 8c8200ac  8c88005c  8c8700a8  
00451023  01054021

 ---[ end trace 6b934e1b587e6bcd ]---
 Kernel panic - not syncing: Fatal exception in interrupt
 Rebooting in 3 seconds..

I've looked at the git log till 5.1-rc3 for htt_rx.c and ce.c, but couldn't
find anything possibly related to this issue, so I'm wondering if this is known
and already fixed bug. Thanks for any pointers!

-- ynezz

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: QCA9984 all devices being de-authenticated and unable to re-connect

2019-04-01 Thread Ben Greear


Try down this page a bit, you just need to copy the firmware-5.bin and reboot 
usually.

https://www.candelatech.com/ath10k-bugs.php

Thanks,
Ben

On 04/01/2019 04:04 PM, Carlito Nueno wrote:

Hi Ben,

Can you tell me how I can compile ath10k-ct firmware version present
in 18.06.2 into the latest snapshot?

I want to try and see if it's firmware related.

Awesome to know you had 64+ stations connected to your firmware.

Thanks for the help.

On Sun, Mar 31, 2019 at 8:06 PM Ben Greear  wrote:


Hello,

You could try using the ath10k-ct firmware that does NOT work for you (18.06.2) 
in
the latest snapshot.  If problem still happens, then it is firmware related,
and I can then build you a series of images so you can do a bisect to find what
commit fixes the issue if you really want to know what is the fix.

Otherwise, maybe something else fixed the problem.  For what it is worth, we
have regularly done 64+ stations connected to our ath10k firmware/driver
for years.

Thanks,
Ben

On 03/29/2019 07:02 PM, Carlito Nueno wrote:

Hi all,

I am using:

ath10k-firmware-qca9984 - 2018-12-16-211de167-1
kmod-ath10k - 4.14.109+4.19.23-1-5

## Problem
I am able to associate and authenticate many clients. Max I tested was
15 clients.
But when more than 4 clients start to play video stream (youtube,
twitch, netflix):

1. all the clients loose internet connectivity
2. all of them are _de-authenticated_

3. when trying to reconnect, they connect but are _disassociated_ immediately.

c2:44:2f:f3:3c:22  -64 dBm / -109 dBm (SNR 45)  40 ms ago
RX: 200.0 MBit/s, VHT-MCS 9, 40MHz, VHT-NSS 148 Pkts.
TX: 12.0 MBit/s9 Pkts.
expected throughput: unknown

c2:44:2f:f3:3c:22  -57 dBm / -109 dBm (SNR 52)  10 ms ago
RX: 12.0 MBit/s   17 Pkts.
TX: 12.0 MBit/s5 Pkts.
expected throughput: unknown

4. internet on the AP works. (I am able to ping google.com)

## Firmware and OS this problem occurs
- ath10k + 18.06.2 = yes, there is this problem
- ath10k + snapshot = yes, there is this problem
- ath10k-ct + 18.06.2 = yes, similar problem occurs
(https://github.com/greearb/ath10k-ct/issues/82)
- ath10k-ct + snapshot = no, works fine

## OpenWRT info

### Release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r9753-6df5ab89cf'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r9753-6df5ab89cf'
DISTRIB_TAINTS='no-all'

### Logs
Include:
- Full list of OPKGs installed: 2_opkg_installed-txt
- Logs up to the point of crash: 3_ath10k_crash-txt
- Logs after the crash and trying to reconnect: 4_after_crash_reconnect-txt

https://gist.github.com/ironpillow/96ce9173721163a8c8c93113b2a677d7

### More logs
I noticed that the some devices stay connected :man_shrugging: and
when these connected devices make a dns request, the request is
reaching the DNS server but the AP is not receiving response.

I ran ping on *one device* and captured packets on AP (two interfaces):
- tcpdump -i wlan0-ap
- tcpdump -i br-lan

ping google.com:
https://gist.github.com/ironpillow/50cb0e2010ac5bc9acc7abc7e20ab910
ping 8.8.8.8: 
https://gist.github.com/ironpillow/97cb3dd6eb8e9d028a8231f142fae01f

Packets are not reaching wifi wlan0-ap interface.

I am happy to run more tests.

Any advice?

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: QCA9984 all devices being de-authenticated and unable to re-connect

2019-03-31 Thread Ben Greear


Hello,

You could try using the ath10k-ct firmware that does NOT work for you (18.06.2) 
in
the latest snapshot.  If problem still happens, then it is firmware related,
and I can then build you a series of images so you can do a bisect to find what
commit fixes the issue if you really want to know what is the fix.

Otherwise, maybe something else fixed the problem.  For what it is worth, we
have regularly done 64+ stations connected to our ath10k firmware/driver
for years.

Thanks,
Ben

On 03/29/2019 07:02 PM, Carlito Nueno wrote:

Hi all,

I am using:

ath10k-firmware-qca9984 - 2018-12-16-211de167-1
kmod-ath10k - 4.14.109+4.19.23-1-5

## Problem
I am able to associate and authenticate many clients. Max I tested was
15 clients.
But when more than 4 clients start to play video stream (youtube,
twitch, netflix):

1. all the clients loose internet connectivity
2. all of them are _de-authenticated_

3. when trying to reconnect, they connect but are _disassociated_ immediately.

c2:44:2f:f3:3c:22  -64 dBm / -109 dBm (SNR 45)  40 ms ago
RX: 200.0 MBit/s, VHT-MCS 9, 40MHz, VHT-NSS 148 Pkts.
TX: 12.0 MBit/s9 Pkts.
expected throughput: unknown

c2:44:2f:f3:3c:22  -57 dBm / -109 dBm (SNR 52)  10 ms ago
RX: 12.0 MBit/s   17 Pkts.
TX: 12.0 MBit/s5 Pkts.
expected throughput: unknown

4. internet on the AP works. (I am able to ping google.com)

## Firmware and OS this problem occurs
- ath10k + 18.06.2 = yes, there is this problem
- ath10k + snapshot = yes, there is this problem
- ath10k-ct + 18.06.2 = yes, similar problem occurs
(https://github.com/greearb/ath10k-ct/issues/82)
- ath10k-ct + snapshot = no, works fine

## OpenWRT info

### Release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r9753-6df5ab89cf'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r9753-6df5ab89cf'
DISTRIB_TAINTS='no-all'

### Logs
Include:
- Full list of OPKGs installed: 2_opkg_installed-txt
- Logs up to the point of crash: 3_ath10k_crash-txt
- Logs after the crash and trying to reconnect: 4_after_crash_reconnect-txt

https://gist.github.com/ironpillow/96ce9173721163a8c8c93113b2a677d7

### More logs
I noticed that the some devices stay connected :man_shrugging: and
when these connected devices make a dns request, the request is
reaching the DNS server but the AP is not receiving response.

I ran ping on *one device* and captured packets on AP (two interfaces):
- tcpdump -i wlan0-ap
- tcpdump -i br-lan

ping google.com:
https://gist.github.com/ironpillow/50cb0e2010ac5bc9acc7abc7e20ab910
ping 8.8.8.8: 
https://gist.github.com/ironpillow/97cb3dd6eb8e9d028a8231f142fae01f

Packets are not reaching wifi wlan0-ap interface.

I am happy to run more tests.

Any advice?

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Ben Greear


On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:

Ben Greear  writes:


On 2/21/19 8:10 AM, Kalle Valo wrote:

Toke Høiland-Jørgensen  writes:


Grant Grundler  writes:


On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:


Grant Grundler  writes:


And, well, Grant's data is from a single test in a noisy
environment where the time series graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)


It's a fair complaint and I agree with it. My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.


Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...


Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".


Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
 limited by the signal conditions, or by contention with other devices.
 Both of these happen regularly, and I worry that latency will be
 badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
 the driver->firmware path (especially drivers without push/pull mode
 support)? For these, the lower-level queueing structure is less
 effective at controlling queueing latency.


Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.


Just as a note, at least kernels such as 4.14.whatever perform poorly when
running ath10k on 9984 when acting as TCP endpoints.  This makes them not
really usable for stuff like serving video to lots of clients.

Tweaking TCP (I do it a bit differently, but either way) can significantly
improve performance.


Differently how? Did you have to do more than fiddle with the pacing_shift?


This one, or a slightly tweaked version that applies to different kernels:

https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5


Recently I helped a user that could get barely 70 stations streaming
at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
and we got 110 working with a tweaked TCP stack. These were /n
stations too.

I think it is lame that it _still_ requires out of tree patches to
make TCP work well on ath10k...even if you want to default to current
behaviour, you should allow users to tweak it to work with their use
case.


Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...


So, it has been broken for years, and waiting for a perfect solution has not
gotten the problem fixed.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Ben Greear


On 2/21/19 8:10 AM, Kalle Valo wrote:

Toke Høiland-Jørgensen  writes:


Grant Grundler  writes:


On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:


Grant Grundler  writes:


And, well, Grant's data is from a single test in a noisy
environment where the time series graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)


It's a fair complaint and I agree with it. My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.


Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...


Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".


Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
limited by the signal conditions, or by contention with other devices.
Both of these happen regularly, and I worry that latency will be
badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
the driver->firmware path (especially drivers without push/pull mode
support)? For these, the lower-level queueing structure is less
effective at controlling queueing latency.


Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.


Just as a note, at least kernels such as 4.14.whatever perform poorly when
running ath10k on 9984 when acting as TCP endpoints.  This makes them not
really usable for stuff like serving video to lots of clients.

Tweaking TCP (I do it a bit differently, but either way) can significantly
improve performance.

Recently I helped a user that could get barely 70 stations streaming at 1Mbps
on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
and we got 110 working with a tweaked TCP stack.  These were /n stations too.

I think it is lame that it _still_ requires out of tree patches to make TCP work
well on ath10k...even if you want to default to current behaviour, you should
allow users to tweak it to work with their use case.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: QCA9888: Driver/Firmware Crash After Initialization

2019-02-16 Thread Ben Greear

e2-CT
  crc32 4a66be6f

OpenWrt package identifiers

ath10k-firmware-qca4019-ct - 2018-10-10-d366b80d-1
ath10k-firmware-qca9888-ct - 2018-10-10-d366b80d-1
kmod-ath10k-ct - 4.14.99+2018-12-20-118e16da-2







_______
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: implement set_base_macaddr to fix rx-bssid mask in multiple APs conf

2019-02-07 Thread Ben Greear





On 02/07/2019 06:19 AM, Kalle Valo wrote:

Christian Lamparter  writes:


On Monday, February 4, 2019 4:45:12 PM CET Kalle Valo wrote:

Christian Lamparter  writes:


@@ -8885,6 +8904,7 @@ static const struct wmi_ops wmi_ops = {

.gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend,
.gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume,
+   .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr,
.gen_pdev_set_rd = ath10k_wmi_op_gen_pdev_set_rd,
.gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param,
.gen_init = ath10k_wmi_op_gen_init,
@@ -8960,6 +8980,7 @@ static const struct wmi_ops wmi_10_1_ops = {

.gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend,
.gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume,
+   .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr,
.gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param,
.gen_stop_scan = ath10k_wmi_op_gen_stop_scan,
.gen_vdev_create = ath10k_wmi_op_gen_vdev_create,
@@ -9032,6 +9053,7 @@ static const struct wmi_ops wmi_10_2_ops = {

.gen_pdev_suspend = ath10k_wmi_op_gen_pdev_suspend,
.gen_pdev_resume = ath10k_wmi_op_gen_pdev_resume,
+   .gen_pdev_set_base_macaddr = ath10k_wmi_op_gen_pdev_set_base_macaddr,
.gen_pdev_set_param = ath10k_wmi_op_gen_pdev_set_param,
.gen_stop_scan = ath10k_wmi_op_gen_stop_scan,
.gen_vdev_create = ath10k_wmi_op_gen_vdev_create,


These are in practise obsolete WMI interfaces so not sure if it makes it
worth to support this parameter in them. But on the other hand it won't
hurt either, so dunno.


Ok. I looked what firmware interfaces (wmi_cmd_map) supported the
pdev_set_base_macaddr_cmdid and all did (including the old and tlv)
so I added the line everywhere I could.
As far as the support for the old firmwares goes: I don't think
anybody with a current ath10k is willingly still stuck on the 10.1,
10.2 firmware. So, I might as well just remove those for 10_2, 10_1 and MAIN.


Yeah, that's the best. BTW I'm planning (or better hoping) to remove
10.1, 10.2 and main WMI interfaces altogether. They are just making
these unnecessary complex.


My wave-1 firmware uses the 10.1 interface and it is used by a fair number of 
people,
so please leave that one in place.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [Ath10k] QCA9377-based USB dongle for aeronautical research project

2018-12-13 Thread Ben Greear





On 12/13/2018 01:01 AM, Vincent Guenat wrote:

Dear Mr. Greear,

I come to you to ask for directions. TRIAGNOSYS Gmbh, a division of Zodiac 
Inflight Innovations part of the SAFRAN GROUP, participate in a research 
project for the ARINC CSS committee aiming at investigating the potential use 
of 5 GHz DFS channels in aircrafts. In order to do that, the partners in this 
project aim to gather information about where radar interference occurs and on 
which channels by monitoring DFS channels during commercial flights. Partners 
including AIRBUS and DELTA AIRLINES have already provided their support.

In order to monitor the DFS channels, a device based on WiFi dongles to be 
installed on planes is currently explored. The Linksys WUSB6100M device based 
on QCA9377 seems a promising choice, especially given the features proposed by 
the ath10k driver. The idea is to put this device in monitor mode and use the 
DFS pattern detector to report DFS events to userspace through 
cfg80211_radar_event/nl80211_radar_notify. This would require some modification 
in kernelspace so that the device does not switch channel upon radar detection. 
An application listening on a netlink socket would then retrieve the data. 
Therefore my question is: do you think this is possible?

In the meantime, trying to use the Linksys device in STA mode does not work as 
the device does not manage to use DMA to allocate the tx buffer with 
dma_alloc_coherent in htt_tx.c. Given that it works for PCI devices, I assume 
that it is firmware-related but have not yet found a workaround. Do you have 
any ideas what might have caused this?

Thank you for any time that you spend considering these questions.


Hello,

I do not have any access currently to the firmware for those devices, so it
would be hard to add any features or stability fixes or even understand
current bugs.

With PCI based ath10k devices in AP mode, there is a feature in the driver
that can disable channel switching on radar detection, and I recently
added support for querying more details for these radar events through
the debugfs API to my ath10k-ct driver.

I am not sure radar detection works in monitor mode, but possibly it does.

I recently did a test with some realtek USB NICs and was pleasantly surprised
with performance and stability in station mode.  Possibly it would work for your
test case.  The driver is out-of-kernel...this is the one I was using and I got 
it
compiling against openwrt without a few tweaks:

https://github.com/greearb/rtl8812AU_8821AU_linux

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-19 Thread Ben Greear


On 11/19/2018 04:13 PM, Dave Taht wrote:

On Mon, Nov 19, 2018 at 3:56 PM Ben Greear  wrote:


On 11/19/2018 03:47 PM, Dave Taht wrote:

On Mon, Nov 19, 2018 at 3:30 PM Simon Barber  wrote:




On Nov 19, 2018, at 2:44 PM, Toke Høiland-Jørgensen  wrote:

Dave Taht  writes:

Toke Høiland-Jørgensen  writes:

Felix Fietkau  writes:

On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:

This part doesn't really make much sense to me, but maybe I'm
misunderstanding how the code works.
Let's assume we have a driver like ath9k or mt76, which tries to keep a

….


Well, there's going to be a BQL-like queue limit (but for airtime) on
top, which drivers can opt-in to if the hardware has too much queueing.


Very happy to read this - I first talked to Dave Taht about the need for Time 
Queue Limits more than 5 years ago!


Michal faked up a dql estimator 3 (?) years ago. it worked.

http://blog.cerowrt.org/post/dql_on_wifi_2/

As a side note, in *any* real world working mu-mimo situation at any
scale, on any equipment, does anyone have any stats on how often the
feature is actually used and useful?

My personal guess, from looking at the standard, was in home
scenarios, usage would be about... 0, and in a controlled environment
in a football stadium, quite a lot.

In a office or apartment complex, I figured interference and so forth
would make it a negative benefit due to retransmits.

I felt when that part of the standard rolled around... that mu-mimo
was an idea that should never have escaped the lab. I can be convinced
by data, that we can aim for a higher goal here. But it would be
comforting to have a measured non-lab, real-world, at real world
rates, result for it, on some platform, of it actually being useful.


We're working on building a lab with 20 or 30 mixed 'real' devices
using various different /AC NICs (QCA wave2 on OpenWRT, Fedora, realtek USB 
8812au on OpenWRT, Fedora,
and some Intel NICs in NUCs on Windows, and maybe more).  I'm not actually sure 
if that realtek
  or the NUCs can do MU-MIMO or not, but the QCA NICs will be able to.  It 
should be at least somewhat similar
to a classroom environment or coffee shop.


In the last 3 coffee shops I went to, I could hear over 30 APs on
competing SSIDs, running G, N, and AC,
occupying every available channel.


I especially like when someone uses channel 3 because, I guess, they
think it is un-used :)

I'm not sure if this was a fluke or not, but at Starbucks recently I sat 
outside,
right next to their window, and could not scan their AP at all.  Previously, I 
sat
inside, 3 feet away through the glass, and got great signal.  I wonder what 
that was
all about!  Maybe special tinting that blocks RF?  Or just dumb luck of some 
sort.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-19 Thread Ben Greear


On 11/19/2018 03:47 PM, Dave Taht wrote:

On Mon, Nov 19, 2018 at 3:30 PM Simon Barber  wrote:




On Nov 19, 2018, at 2:44 PM, Toke Høiland-Jørgensen  wrote:

Dave Taht  writes:

Toke Høiland-Jørgensen  writes:

Felix Fietkau  writes:

On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:

This part doesn't really make much sense to me, but maybe I'm
misunderstanding how the code works.
Let's assume we have a driver like ath9k or mt76, which tries to keep a

….


Well, there's going to be a BQL-like queue limit (but for airtime) on
top, which drivers can opt-in to if the hardware has too much queueing.


Very happy to read this - I first talked to Dave Taht about the need for Time 
Queue Limits more than 5 years ago!


Michal faked up a dql estimator 3 (?) years ago. it worked.

http://blog.cerowrt.org/post/dql_on_wifi_2/

As a side note, in *any* real world working mu-mimo situation at any
scale, on any equipment, does anyone have any stats on how often the
feature is actually used and useful?

My personal guess, from looking at the standard, was in home
scenarios, usage would be about... 0, and in a controlled environment
in a football stadium, quite a lot.

In a office or apartment complex, I figured interference and so forth
would make it a negative benefit due to retransmits.

I felt when that part of the standard rolled around... that mu-mimo
was an idea that should never have escaped the lab. I can be convinced
by data, that we can aim for a higher goal here. But it would be
comforting to have a measured non-lab, real-world, at real world
rates, result for it, on some platform, of it actually being useful.


We're working on building a lab with 20 or 30 mixed 'real' devices
using various different /AC NICs (QCA wave2 on OpenWRT, Fedora, realtek USB 
8812au on OpenWRT, Fedora,
and some Intel NICs in NUCs on Windows, and maybe more).  I'm not actually sure 
if that realtek
 or the NUCs can do MU-MIMO or not, but the QCA NICs will be able to.  It 
should be at least somewhat similar
to a classroom environment or coffee shop.  I'll let you know what we find
as far as how well MU-MIMO improves things or not.

At least in simple test cases (one 1x1 stations, one 2x2 station, with 4x4 
MU-MIMO AP),
it works very well for increased download throughput.

In home setups, I'd guess that the DSL or Cable Modem or other uplink is the 
bottleneck
way more often than the wifi is, even if your are just running /n.  But, maybe 
that is just
my experience living out at the end of a long skinny phone line all these years.

Thanks,
Ben



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH 0/4] cfg80211/mac80211: Add support for TID specific configuration

2018-11-08 Thread Ben Greear





On 11/07/2018 04:55 PM, Igor Mitsyanko wrote:

On 10/22/2018 10:55 AM, Tamizh chelvam wrote:


Add infrastructure for per TID aggregation/retry count configurations
such as retry count and AMPDU aggregation control(disable/enable).
In some scenario reducing the number of retry count for a specific data
traffic can reduce the latency by proceeding with the next packet
instead of retrying the same packet more time. This will be useful
where the next packet can resume the operation without an issue.
Here added NL80211_CMD_SET_TID_CONFIG to support this operation by
accepting retry count and AMPDU aggregation control.
This command can accept STA mac addreess to make the configuration
station specific rather than applying to all the connected stations
to the netdev.



It's not immediately clear how to make use of these settings, here are
several comments:

1. What max retry count limit should actually be applied to? Retries
decisions are in a rate adaptation domain, it should know how many
retries should be done on each rate, single "max retry" value will not
suffice. For example, it can retry twice on MCS9, twice on MCS7, three
times on MCS5 or something like that.

I'm not familiar with what ATH10k is doing, 4th patch defines
ATH10K_MAX_RETRY_COUNT=30, what does it actually mean? It's unlikely "do
30 retries on the same rate". Does retry limit setting interacts with
rate adaptation somehow in ath10k?

Maybe it makes sense to extend max retry value specification to make it
possible to define per-rate? I'm not sure how much flexibility we want
here, something like retry value per MCS:BW:SGI?


For ath10k, my understanding is that each time it (re)sends a packet, it will
query FW rate-ctrl and choose the optimal rate.  It doesn't pay much attention 
to
whether a specific frame is retried or not, other than to maybe enable RTS/CTS,
but lots of retries will bump the rate-ctrl down to a lower rate.

There are no per-rate retry counter logic, but I think there is per-tid
control, though currently it might not be wired up to the driver.



2. AMPDU/AMSDU - the way it is, it is also relevant to rate in Tx
direction only, correct? We keep advertised capabilities intact and peer
has all rights to send AMPDUs/AMSDUs of whatever size that was advertised.
Additionally, posted patches do not do anything with established
blockack agreement.

3 With above being said, perhaps it would make sense for this new
interface to indicate explicitly that it's related to Tx rate? That can
be controlled per-TID, per-node or globally, depending on device
capabilities.
Some other settings that may be useful are fixed MCS, MCS limit, SGI
on/off, bandwidth, maybe even provide rate retry rules.


I think there should be a way to configure the advertised capabilities, and 
also a way to
configure the settings actually used for transmit.  This is what we use
for test-related use cases, but maybe there is not a great deal of general
use for this type of thing.  For general use, the 'transmit' settings are 
probably
more useful.  I do know that several ath10k users are forcing it back to /n
mode which works around some bugs in their mesh setup.

You can already set a fixed transmit rate or set the MCS rates allowed to be 
used
(my supplicant, ath10k-ct driver/firmware is needed to take full advantage of 
this
 for ath10k).  In upstream kernels, this will not much affect the advertised 
capabilities.

I also have patches that allow setting the advertised rates and capabilities, 
so you can force
a station to advertise only a/n rates even though it and peer have /AC 
capability.
Those patches are not upstream, though if opinions are changed, I'd be happy
to repost and try to get them upstream.

Thanks,
Ben


I don't see how it can be used in real product, unless there is an
external rate adaptation logic of some kind. But definitely it will be
useful for testing, and can be used for WFA certification.



--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: htt_rx: Fix signedness bug in ath10k_update_per_peer_tx_stats

2018-10-05 Thread Ben Greear


On 10/05/2018 11:42 AM, Gustavo A. R. Silva wrote:

Currently, the error handling for the call to function
ath10k_get_legacy_rate_idx() doesn't work because
*rate_idx* is of type u8 (8 bits, unsigned), which
makes it impossible for it to hold a value less
than 0.

Fix this by changing the type of variable *rate_idx*
to s8 (8 bits, signed).


There are more than 127 rates, are you sure this is doing
what you want?

Thanks,
Ben



Addresses-Coverity-ID: 1473914 ("Unsigned compared against 0")
Fixes: 0189dbd71cbd ("ath10k: get the legacy rate index to update the txrate 
table")
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index f240525..edd0e74 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2753,7 +2753,8 @@ ath10k_update_per_peer_tx_stats(struct ath10k *ar,
struct ath10k_per_peer_tx_stats *peer_stats)
 {
struct ath10k_sta *arsta = (struct ath10k_sta *)sta->drv_priv;
-   u8 rate = 0, rate_idx = 0, sgi;
+   u8 rate = 0, sgi;
+   s8 rate_idx = 0;
struct rate_info txrate;

lockdep_assert_held(>data_lock);




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [RFC 2/2] ath10k: reporting estimated tx airtime for fairness

2018-09-28 Thread Ben Greear


On 09/28/2018 03:47 PM, Rajkumar Manoharan wrote:

On 2018-09-28 12:57, Ben Greear wrote:

On 09/28/2018 12:47 PM, Rajkumar Manoharan wrote:

On 2018-09-28 08:25, Toke Høiland-Jørgensen wrote:


So this just uses the calculated airtime based on rate and size? Wasn't
there supposed to be an airtime usage value reported by the firmware? :)


Firmware interface changes are in progress. Airtime for sta/tid will be 
reported via
htt tx-compl and rx ind messages. Meantime I thought it would be useful to use 
Kan's changes
for ATF validation in ath10k using existing firmware. :)


Maybe you can get the firmware guys to report the tx rate in the tx-completion
(like I have been doing for years in my ath10k-ct firmware)?  Then let the host
do the air-time calculating?

I'll give them firmware patches if the want :)


Ben,

As you know, it needs cleanup in firmware to free up space for new interface
changes. Most of time we try to leverage rsvd/unused slots. I am aware of that
you did a lot of clean up in CT firmware which is quite hard in official
firmware as it also has to support prop. releases. Kalle can answer much better.


There are hard ways to get more space in the firmware, but there are also some
easier ones (un-used members in structs, better natural packing, and such).

If there was a QCA firmware engineer that could promptly discuss these things
with me and apply patches, I can feed them patches.

And, the 10.4 firmware already has some extra space in its tx descriptor that
can be used to report tx-status without much additional code or RAM.  The 
wave-1 stuff
needs some more serious hacking and does consume more memory.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Advertize beacon_int_min_gcd as 100 while bring up multi vaps

2018-09-18 Thread Ben Greear





On 09/17/2018 11:33 PM, Maharaja Kennadyrajan wrote:

With the latest firmware design, the beacon interval should be
greater than 100 to bring the multiple vaps.

Set beacon_int_min_gcd to 100, when the wmi service
WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT is enabled
in the firmware. If not, beacon_int_min_gcd will be set
to the default value 1.

Tested in QCA4019 with firmware ver 10.4-3.2.1.1-00015
Tested in QCA9888 with firmware ver 10.4-3.5.1-0005

Signed-off-by: Maharaja Kennadyrajan 
---
 drivers/net/wireless/ath/ath10k/mac.c | 25 +
 drivers/net/wireless/ath/ath10k/wmi.h |  9 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 97548f9..532fc5d 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -8163,6 +8163,24 @@ void ath10k_mac_destroy(struct ath10k *ar)
},
 };

+static const struct
+ieee80211_iface_combination ath10k_10_4_bcn_int_if_comb[] = {
+   {
+   .limits = ath10k_10_4_if_limits,
+   .n_limits = ARRAY_SIZE(ath10k_10_4_if_limits),
+   .max_interfaces = 16,
+   .num_different_channels = 1,
+   .beacon_int_infra_match = true,
+   .beacon_int_min_gcd = 100,
+#ifdef CONFIG_ATH10K_DFS_CERTIFIED
+   .radar_detect_widths =  BIT(NL80211_CHAN_WIDTH_20_NOHT) |
+   BIT(NL80211_CHAN_WIDTH_20) |
+   BIT(NL80211_CHAN_WIDTH_40) |
+   BIT(NL80211_CHAN_WIDTH_80),
+#endif
+   },
+};
+
 static void ath10k_get_arvif_iter(void *data, u8 *mac,
  struct ieee80211_vif *vif)
 {
@@ -8526,6 +8544,13 @@ int ath10k_mac_register(struct ath10k *ar)
ar->hw->wiphy->iface_combinations = ath10k_10_4_if_comb;
ar->hw->wiphy->n_iface_combinations =
ARRAY_SIZE(ath10k_10_4_if_comb);
+   if (test_bit(WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT,
+ar->wmi.svc_map)) {
+   ar->hw->wiphy->iface_combinations =
+   ath10k_10_4_bcn_int_if_comb;
+   ar->hw->wiphy->n_iface_combinations =
+   ARRAY_SIZE(ath10k_10_4_bcn_int_if_comb);
+   }
break;
case ATH10K_FW_WMI_OP_VERSION_UNSET:
case ATH10K_FW_WMI_OP_VERSION_MAX:
diff --git a/drivers/net/wireless/ath/ath10k/wmi.h 
b/drivers/net/wireless/ath/ath10k/wmi.h
index 1562294..126eb17 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.h
+++ b/drivers/net/wireless/ath/ath10k/wmi.h
@@ -204,6 +204,7 @@ enum wmi_service {
WMI_SERVICE_RESET_CHIP,
WMI_SERVICE_SPOOF_MAC_SUPPORT,
WMI_SERVICE_TX_DATA_ACK_RSSI,
+   WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT,

/* keep last */
WMI_SERVICE_MAX,
@@ -353,6 +354,11 @@ enum wmi_10_4_service {
WMI_10_4_SERVICE_TPC_STATS_FINAL,
WMI_10_4_SERVICE_CFR_CAPTURE_SUPPORT,
WMI_10_4_SERVICE_TX_DATA_ACK_RSSI,
+   WMI_10_4_SERVICE_CFR_CAPTURE_IND_MSG_TYPE_LAGACY,


That should end with "LEGACY" instead of "LAGACY" maybe?


+   WMI_10_4_SERVICE_PER_PACKET_SW_ENCRYPT,
+   WMI_10_4_SERVICE_PEER_TID_CONFIGS_SUPPORT,
+   WMI_10_4_SERVICE_VDEV_BCN_RATE_CONTROL,
+   WMI_10_4_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT,
 };

 static inline char *wmi_service_name(int service_id)
@@ -467,6 +473,7 @@ static inline char *wmi_service_name(int service_id)
SVCSTR(WMI_SERVICE_TPC_STATS_FINAL);
SVCSTR(WMI_SERVICE_RESET_CHIP);
SVCSTR(WMI_SERVICE_TX_DATA_ACK_RSSI);
+   SVCSTR(WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT);
default:
return NULL;
}
@@ -777,6 +784,8 @@ static inline void wmi_10_4_svc_map(const __le32 *in, 
unsigned long *out,
   WMI_SERVICE_TPC_STATS_FINAL, len);
SVCMAP(WMI_10_4_SERVICE_TX_DATA_ACK_RSSI,
   WMI_SERVICE_TX_DATA_ACK_RSSI, len);
+   SVCMAP(WMI_10_4_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT,
+  WMI_SERVICE_VDEV_DIFFERENT_BEACON_INTERVAL_SUPPORT, len);
 }

 #undef SVCMAP



Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: Multiple simultaneous channels on QCA9882

2018-09-10 Thread Ben Greear


On 09/10/2018 07:37 AM, Fraser Cadger wrote:

Hi,

I am using a device based on the Atheros QCA9882 chipset.

I would like to make use of the simultaneous dual-band functionality
that the device provides. The goal is to have to interfaces with one
acting as
an Access Point (wlan0) and the other acting as a client of another AP
(wlan1). Ideally, both of these interfaces should be operating on
different channels.


These radios can support both bands, but at any given time, they can be on
only a single channel.  You need multiple radios to run on multiple channels.

Thanks,
Ben



As I have understood it, Atheros/Qualcomm market the QCA9882 as being
capable of both dual-band and concurrent operation:
https://www.qualcomm.com/news/releases/2012/02/23/qualcomm-atheros-launches-80211ac-product-ecosystem-provide-end-end-gigabit

I am using v4.1 of the Linux kernel.

Having experienced some difficulties using hostapd and wpa_supplicant,
I checked the device capabilities using iw. If I am interpreting the
output below correctly, it is only possible to use 1 channel on both
interfaces:

valid interface combinations:
 * #{ managed, P2P-client } <= 36, #{ P2P-GO } <= 3,
#{ AP } <= 7, #{ IBSS } <= 1,
   total <= 36, #channels <= 1, STA/AP BI must match


As I understand it, #channels <=1 indicates that only 1 channel may be
used at a particular time by all interfaces.

I am wondering if this is a limitation of the driver, or if there has
been a misunderstanding in the capabilities of the hardware.

Does anyone have experience with this?

Regards,

Fraser

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH] ath10k: Add support for configuring management packet rate

2018-09-10 Thread Ben Greear


On 09/09/2018 10:39 PM, Sriram R wrote:

By default the firmware uses 1Mbps and 6Mbps rate for management packets
in 2G and 5G bands respectively. But when the user selects different
basic rates from the userspace, we need to send the management
packets at the lowest basic rate selected by the user.

This change makes use of WMI_VDEV_PARAM_MGMT_RATE param for configuring the
management packets rate to the firmware.


At least some users like to be able to set the mgt rate to higher rates,
and have been using a debugfs api in my driver patches to do this for some
time.

Maybe you would like to add support for something like this as well?

Thanks,
Ben



Chipsets Tested : QCA988X, QCA9887, QCA9984
FW Tested   : 10.2.4-1.0-41, 10.4-3.6.104

Signed-off-by: Sriram R 
---
 drivers/net/wireless/ath/ath10k/mac.c | 45 +--
 1 file changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 496772d..0b2ca9e 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -157,6 +157,22 @@ u8 ath10k_mac_bitrate_to_idx(const struct 
ieee80211_supported_band *sband,
return 0;
 }

+static int ath10k_mac_get_rate_hw_value(int bitrate)
+{
+   int i;
+   u8 hw_value_prefix = 0;
+
+   if (ath10k_mac_bitrate_is_cck(bitrate))
+   hw_value_prefix = WMI_RATE_PREAMBLE_CCK << 6;
+
+   for (i = 0; i < sizeof(ath10k_rates); i++) {
+   if (ath10k_rates[i].bitrate == bitrate)
+   return hw_value_prefix | ath10k_rates[i].hw_value;
+   }
+
+   return -EINVAL;
+}
+
 static int ath10k_mac_get_max_vht_mcs_map(u16 mcs_map, int nss)
 {
switch ((mcs_map >> (2 * nss)) & 0x3) {
@@ -5452,9 +5468,10 @@ static void ath10k_bss_info_changed(struct ieee80211_hw 
*hw,
struct cfg80211_chan_def def;
u32 vdev_param, pdev_param, slottime, preamble;
u16 bitrate, hw_value;
-   u8 rate;
-   int rateidx, ret = 0;
+   u8 rate, basic_rate_idx;
+   int rateidx, ret = 0, hw_rate_code;
enum nl80211_band band;
+   const struct ieee80211_supported_band *sband;

mutex_lock(>conf_mutex);

@@ -5660,6 +5677,30 @@ static void ath10k_bss_info_changed(struct ieee80211_hw 
*hw,
arvif->vdev_id,  ret);
}

+   if (changed & BSS_CHANGED_BASIC_RATES) {
+   if (WARN_ON(ath10k_mac_vif_chan(vif, ))) {
+   mutex_unlock(>conf_mutex);
+   return;
+   }
+
+   sband = ar->hw->wiphy->bands[def.chan->band];
+   basic_rate_idx = ffs(vif->bss_conf.basic_rates) - 1;
+   bitrate = sband->bitrates[basic_rate_idx].bitrate;
+
+   hw_rate_code = ath10k_mac_get_rate_hw_value(bitrate);
+   if (hw_rate_code < 0) {
+   ath10k_warn(ar, "bitrate not supported %d\n", bitrate);
+   mutex_unlock(>conf_mutex);
+   return;
+   }
+
+   vdev_param = ar->wmi.vdev_param->mgmt_rate;
+   ret = ath10k_wmi_vdev_set_param(ar, arvif->vdev_id, vdev_param,
+   hw_rate_code);
+   if (ret)
+   ath10k_warn(ar, "failed to set mgmt tx rate %d\n", ret);
+   }
+
mutex_unlock(>conf_mutex);
 }





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput

2018-08-20 Thread Ben Greear




On 08/20/2018 05:46 AM, Toke Høiland-Jørgensen wrote:

Arend van Spriel  writes:


+ Eric

On 8/10/2018 9:52 PM, Ben Greear wrote:

On 08/10/2018 12:28 PM, Arend van Spriel wrote:

On 8/10/2018 3:20 PM, Toke Høiland-Jørgensen wrote:

Arend van Spriel  writes:


On 8/8/2018 9:00 PM, Peter Oh wrote:



On 08/08/2018 03:40 AM, Wen Gong wrote:

Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set
the default value to 8, and ath10k will change it to 6. Then mac80211
will use the changed value 6 as sk_pacing_shift since 6 is the best
value for tx throughput by test result.

I don't think you can convince people with the numbers unless you
provide latency along with the numbers and also measurement result on
different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From
users view point, I also agree on Toke that we cannot scarify latency
for the small throughput improvement.


Yeah. The wireless industry (admittedly that is me too :-p ) has been
focused on just throughput long enough.


Tell me about it ;)


All the preaching about bufferbloat from Dave and others is (just)
starting to sink in here and there.


Yeah, I've noticed; this is good!


Now as for the value of the sk_pacing_shift I think we agree it
depends on the specific device so in that sense the api makes sense,
but I think there are a lot of variables so I was wondering if we
could introduce a sysctl parameter for it. Does that make sense?


I'm not sure a sysctl parameter would make sense; for one thing, it
would be global for the host, while different network interfaces will
probably need different values. And for another, I don't think it's
something a user can reasonably be expected to set correctly, and I
think it *is* actually possible to pick a value that works well at the
driver level.


I not sure either. Do you think a user could come up with something
like this (found here [1]):

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
sysctl -w net.ipv4.route.flush=1

Now the page listing this config claims this is for use "on Linux 2.4+
for high-bandwidth applications". Beats me if it still is correct in
4.17.

Anyway, sysctl is nice for parameterizing code that is built-in the
kernel so you don't need to rebuild it. mac80211 tends to be a module
in most distros so
maybe sysctl is not a good fit. So lets agree on that.

Picking a value at driver level may be possible, but a driver tends to
support a number of different devices. So how do you see the picking
work. Some static
table with entries for the different devices?


Some users are not going to care about latency, and for others, latency may
be absolutely important and they don't care about bandwidth.

So, it should be tunable.  sysctl can support per network-device settings,
right?  Or, probably could use ethtool API to set a per-netdev value as
well.
That might be nice for other network devices as well, not just wifi.


I was under the impression that the parameters are all global, but your
statement made me look. I came across some references here [2] so I
checked the kernel sources under net/ and found net/ipv4/devinet.c [3].
So that confirms it supports per-netdev settings.


Yeah, I think that *if* this is to be made configurable, a per-netdev
sysctl would be the way to go, with the driver being able to set the
default.

However, the reason I think it may not be worth it to expose this as a
setting is that it is very much a case of diminishing returns. Once the
buffer size is large enough that full aggregates can be built,
increasing it further just adds latency with very little effect on
throughput. Which means that fiddling with the parameter is not going to
have a lot of effect, so it is not very useful to expose, which makes it
not worth the added configuration complexity...


If it were easy, it would already be correct.  I think adding tuning knob
and some documentation will allow users to more easily try different things
and use what is best for them (and let the community at large know what works
so maybe the defaults can be tweaked over time).

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

Re: [PATCH v2 0/2] Change sk_pacing_shift in ieee80211_hw for best tx throughput

2018-08-10 Thread Ben Greear


On 08/10/2018 12:28 PM, Arend van Spriel wrote:

On 8/10/2018 3:20 PM, Toke Høiland-Jørgensen wrote:

Arend van Spriel  writes:


On 8/8/2018 9:00 PM, Peter Oh wrote:



On 08/08/2018 03:40 AM, Wen Gong wrote:

Add a field for ath10k to adjust the sk_pacing_shift, mac80211 set
the default value to 8, and ath10k will change it to 6. Then mac80211
will use the changed value 6 as sk_pacing_shift since 6 is the best
value for tx throughput by test result.

I don't think you can convince people with the numbers unless you
provide latency along with the numbers and also measurement result on
different chipsets as Michal addressed (QCA4019, QCA9984, etc.) From
users view point, I also agree on Toke that we cannot scarify latency
for the small throughput improvement.


Yeah. The wireless industry (admittedly that is me too :-p ) has been
focused on just throughput long enough.


Tell me about it ;)


All the preaching about bufferbloat from Dave and others is (just)
starting to sink in here and there.


Yeah, I've noticed; this is good!


Now as for the value of the sk_pacing_shift I think we agree it
depends on the specific device so in that sense the api makes sense,
but I think there are a lot of variables so I was wondering if we
could introduce a sysctl parameter for it. Does that make sense?


I'm not sure a sysctl parameter would make sense; for one thing, it
would be global for the host, while different network interfaces will
probably need different values. And for another, I don't think it's
something a user can reasonably be expected to set correctly, and I
think it *is* actually possible to pick a value that works well at the
driver level.


I not sure either. Do you think a user could come up with something like this 
(found here [1]):

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
sysctl -w net.ipv4.route.flush=1

Now the page listing this config claims this is for use "on Linux 2.4+ for 
high-bandwidth applications". Beats me if it still is correct in 4.17.

Anyway, sysctl is nice for parameterizing code that is built-in the kernel so 
you don't need to rebuild it. mac80211 tends to be a module in most distros so
maybe sysctl is not a good fit. So lets agree on that.

Picking a value at driver level may be possible, but a driver tends to support 
a number of different devices. So how do you see the picking work. Some static
table with entries for the different devices?


Some users are not going to care about latency, and for others, latency may
be absolutely important and they don't care about bandwidth.

So, it should be tunable.  sysctl can support per network-device settings,
right?  Or, probably could use ethtool API to set a per-netdev value as well.
That might be nice for other network devices as well, not just wifi.

If the driver is configuring the defaults, it can know the hardware type, 
firmware
revision, and lots of other info to make the best decision it can when 
registering
the radio with the upper stacks.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

1 2 3 4 5 6 7 8 >

1 - 100 of 703 matches

Mail list logo