Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2020-04-23 Thread Kalle Valo
Wen Gong  wrote:

> Upstream kernel has an interface to help adjust sk_pacing_shift to help
> improve TCP UL throughput.
> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> VHT80 TCP UL throughput testing result shows 6 is the optimal.
> Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
> 
> Tested with QCA6174 PCI with firmware
> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
> It's not a regression with new firmware releases.
> 
> There have 2 test result of different settings:
> 
> ARM CPU based device with QCA6174A PCI with different
> sk_pacing_shift:
> 
>  sk_pacing_shift  throughput(Mbps) CPU utilization
>  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>  8   288~90% idle, Focus on CPU1: ~35%idle
>  9  ~200~92% idle, Focus on CPU1: ~50%idle
> 
> 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
> set to 6:
> 
>   tcp_limit_output_bytesthroughput(Mbps)
>  default(262144)+1 Stream 336
>  default(262144)+2 Streams558
>  default(262144)+3 Streams584
>  default(262144)+4 Streams602
>  default(262144)+5 Streams598
>  changed(2621440)+1 Stream598
>  changed(2621440)+2 Streams   601
> 
> Signed-off-by: Wen Gong 

The final result of this patch is unclear so I'm dropping this. Please
resend if the issue still exists.

Patch set to Changes Requested.

-- 
https://patchwork.kernel.org/patch/10559733/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 9:15 AM, Toke Høiland-Jørgensen wrote:
>> Ben Greear  writes:
>> 
>>> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
 Ben Greear  writes:

> On 2/21/19 8:10 AM, Kalle Valo wrote:
>> Toke Høiland-Jørgensen  writes:
>>
>>> Grant Grundler  writes:
>>>
 On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  
 wrote:
>
> Grant Grundler  writes:
>
>>> And, well, Grant's data is from a single test in a noisy
>>> environment where the time series graph shows that throughput is 
>>> all over
>>> the place for the duration of the test; so it's hard to draw solid
>>> conclusions from (for instance, for the 5-stream test, the average
>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and 
>>> for 7
>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same 
>>> hardware
>>> used in this test, so I can't go verify it myself; so the only 
>>> thing I
>>> can do is grumble about it here... :)
>>
>> It's a fair complaint and I agree with it. My counter argument is the
>> opposite is true too: most ideal benchmarks don't measure what most
>> users see. While the data wgong provided are way more noisy than I
>> like, my overall "confidence" in the "conclusion" I offered is still
>> positive.
>
> Right. I guess I would just prefer a slightly more comprehensive
> evaluation to base a 4x increase in buffer size on...

 Kalle, is this why you didn't accept this patch? Other reasons?

 Toke, what else would you like to see evaluated?

 I generally want to see three things measured when "benchmarking"
 technologies: throughput, latency, cpu utilization
 We've covered those three I think "reasonably".
>>>
>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>> patch), I think I had two main concerns:
>>>
>>> 1. What happens in a degraded signal situation, where the throughput is
>>>   limited by the signal conditions, or by contention with other 
>>> devices.
>>>   Both of these happen regularly, and I worry that latency will be
>>>   badly affected under those conditions.
>>>
>>> 2. What happens with old hardware that has worse buffer management in
>>>   the driver->firmware path (especially drivers without push/pull 
>>> mode
>>>   support)? For these, the lower-level queueing structure is less
>>>   effective at controlling queueing latency.
>>
>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>> PCI devices, which IIRC do not even support push/pull mode. All the
>> rest, including QCA988X and QCA9984 are unaffected.
>
> Just as a note, at least kernels such as 4.14.whatever perform poorly when
> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
> really usable for stuff like serving video to lots of clients.
>
> Tweaking TCP (I do it a bit differently, but either way) can significantly
> improve performance.

 Differently how? Did you have to do more than fiddle with the pacing_shift?
>>>
>>> This one, or a slightly tweaked version that applies to different kernels:
>>>
>>> https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5
>> 
>> Right; but the current mac80211 default (pacing shift 8) corresponds to
>> setting your sysctl to 4...
>> 
> Recently I helped a user that could get barely 70 stations streaming
> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
> and we got 110 working with a tweaked TCP stack. These were /n
> stations too.
>
> I think it is lame that it _still_ requires out of tree patches to
> make TCP work well on ath10k...even if you want to default to current
> behaviour, you should allow users to tweak it to work with their use
> case.

 Well if TCP is broken to the point of being unusable I do think we
 should fix it; but I think "just provide a configuration knob" should be
 the last resort...
>>>
>>> So, it has been broken for years, and waiting for a perfect solution
>>> has not gotten the problem fixed.
>> 
>> Well, the current default should at least be closer to something that
>> works well.
>> 
>> I do think I may have erred on the wrong side of the optimum when I
>> submitted the original patch to set the default to 8; that should
>> probably have been 7 (i.e., 8 ms; the optimum in the evaluation we did
>> was around 6 ms, which is sadly not a power of two). Maybe changing that
>> default is actually better than having to redo the testing for all the
>> different devices as we're discussing in the context of this patch.
>> Maybe I 

Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:
>> Ben Greear  writes:
>> 
>>> On 2/21/19 8:10 AM, Kalle Valo wrote:
 Toke Høiland-Jørgensen  writes:

> Grant Grundler  writes:
>
>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  
>> wrote:
>>>
>>> Grant Grundler  writes:
>>>
> And, well, Grant's data is from a single test in a noisy
> environment where the time series graph shows that throughput is all 
> over
> the place for the duration of the test; so it's hard to draw solid
> conclusions from (for instance, for the 5-stream test, the average
> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 
> 7
> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
> used in this test, so I can't go verify it myself; so the only thing I
> can do is grumble about it here... :)

 It's a fair complaint and I agree with it. My counter argument is the
 opposite is true too: most ideal benchmarks don't measure what most
 users see. While the data wgong provided are way more noisy than I
 like, my overall "confidence" in the "conclusion" I offered is still
 positive.
>>>
>>> Right. I guess I would just prefer a slightly more comprehensive
>>> evaluation to base a 4x increase in buffer size on...
>>
>> Kalle, is this why you didn't accept this patch? Other reasons?
>>
>> Toke, what else would you like to see evaluated?
>>
>> I generally want to see three things measured when "benchmarking"
>> technologies: throughput, latency, cpu utilization
>> We've covered those three I think "reasonably".
>
> Hmm, going back and looking at this (I'd completely forgotten about this
> patch), I think I had two main concerns:
>
> 1. What happens in a degraded signal situation, where the throughput is
>  limited by the signal conditions, or by contention with other 
> devices.
>  Both of these happen regularly, and I worry that latency will be
>  badly affected under those conditions.
>
> 2. What happens with old hardware that has worse buffer management in
>  the driver->firmware path (especially drivers without push/pull mode
>  support)? For these, the lower-level queueing structure is less
>  effective at controlling queueing latency.

 Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
 PCI devices, which IIRC do not even support push/pull mode. All the
 rest, including QCA988X and QCA9984 are unaffected.
>>>
>>> Just as a note, at least kernels such as 4.14.whatever perform poorly when
>>> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
>>> really usable for stuff like serving video to lots of clients.
>>>
>>> Tweaking TCP (I do it a bit differently, but either way) can significantly
>>> improve performance.
>> 
>> Differently how? Did you have to do more than fiddle with the pacing_shift?
>
> This one, or a slightly tweaked version that applies to different kernels:
>
> https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5

Right; but the current mac80211 default (pacing shift 8) corresponds to
setting your sysctl to 4...

>>> Recently I helped a user that could get barely 70 stations streaming
>>> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
>>> and we got 110 working with a tweaked TCP stack. These were /n
>>> stations too.
>>>
>>> I think it is lame that it _still_ requires out of tree patches to
>>> make TCP work well on ath10k...even if you want to default to current
>>> behaviour, you should allow users to tweak it to work with their use
>>> case.
>> 
>> Well if TCP is broken to the point of being unusable I do think we
>> should fix it; but I think "just provide a configuration knob" should be
>> the last resort...
>
> So, it has been broken for years, and waiting for a perfect solution
> has not gotten the problem fixed.

Well, the current default should at least be closer to something that
works well.

I do think I may have erred on the wrong side of the optimum when I
submitted the original patch to set the default to 8; that should
probably have been 7 (i.e., 8 ms; the optimum in the evaluation we did
was around 6 ms, which is sadly not a power of two). Maybe changing that
default is actually better than having to redo the testing for all the
different devices as we're discussing in the context of this patch.
Maybe I should just send a patch to do that...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Ben Greear

On 2/21/19 8:37 AM, Toke Høiland-Jørgensen wrote:

Ben Greear  writes:


On 2/21/19 8:10 AM, Kalle Valo wrote:

Toke Høiland-Jørgensen  writes:


Grant Grundler  writes:


On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:


Grant Grundler  writes:


And, well, Grant's data is from a single test in a noisy
environment where the time series graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)


It's a fair complaint and I agree with it. My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.


Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...


Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".


Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
 limited by the signal conditions, or by contention with other devices.
 Both of these happen regularly, and I worry that latency will be
 badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
 the driver->firmware path (especially drivers without push/pull mode
 support)? For these, the lower-level queueing structure is less
 effective at controlling queueing latency.


Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.


Just as a note, at least kernels such as 4.14.whatever perform poorly when
running ath10k on 9984 when acting as TCP endpoints.  This makes them not
really usable for stuff like serving video to lots of clients.

Tweaking TCP (I do it a bit differently, but either way) can significantly
improve performance.


Differently how? Did you have to do more than fiddle with the pacing_shift?


This one, or a slightly tweaked version that applies to different kernels:

https://github.com/greearb/linux-ct-4.16/commit/3e14e8491a5b31ce994fb2752347145e6ab7eaf5


Recently I helped a user that could get barely 70 stations streaming
at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
and we got 110 working with a tweaked TCP stack. These were /n
stations too.

I think it is lame that it _still_ requires out of tree patches to
make TCP work well on ath10k...even if you want to default to current
behaviour, you should allow users to tweak it to work with their use
case.


Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...


So, it has been broken for years, and waiting for a perfect solution has not
gotten the problem fixed.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 2/21/19 8:10 AM, Kalle Valo wrote:
>> Toke Høiland-Jørgensen  writes:
>> 
>>> Grant Grundler  writes:
>>>
 On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>
> Grant Grundler  writes:
>
>>> And, well, Grant's data is from a single test in a noisy
>>> environment where the time series graph shows that throughput is all 
>>> over
>>> the place for the duration of the test; so it's hard to draw solid
>>> conclusions from (for instance, for the 5-stream test, the average
>>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>> used in this test, so I can't go verify it myself; so the only thing I
>>> can do is grumble about it here... :)
>>
>> It's a fair complaint and I agree with it. My counter argument is the
>> opposite is true too: most ideal benchmarks don't measure what most
>> users see. While the data wgong provided are way more noisy than I
>> like, my overall "confidence" in the "conclusion" I offered is still
>> positive.
>
> Right. I guess I would just prefer a slightly more comprehensive
> evaluation to base a 4x increase in buffer size on...

 Kalle, is this why you didn't accept this patch? Other reasons?

 Toke, what else would you like to see evaluated?

 I generally want to see three things measured when "benchmarking"
 technologies: throughput, latency, cpu utilization
 We've covered those three I think "reasonably".
>>>
>>> Hmm, going back and looking at this (I'd completely forgotten about this
>>> patch), I think I had two main concerns:
>>>
>>> 1. What happens in a degraded signal situation, where the throughput is
>>> limited by the signal conditions, or by contention with other devices.
>>> Both of these happen regularly, and I worry that latency will be
>>> badly affected under those conditions.
>>>
>>> 2. What happens with old hardware that has worse buffer management in
>>> the driver->firmware path (especially drivers without push/pull mode
>>> support)? For these, the lower-level queueing structure is less
>>> effective at controlling queueing latency.
>> 
>> Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
>> PCI devices, which IIRC do not even support push/pull mode. All the
>> rest, including QCA988X and QCA9984 are unaffected.
>
> Just as a note, at least kernels such as 4.14.whatever perform poorly when
> running ath10k on 9984 when acting as TCP endpoints.  This makes them not
> really usable for stuff like serving video to lots of clients.
>
> Tweaking TCP (I do it a bit differently, but either way) can significantly
> improve performance.

Differently how? Did you have to do more than fiddle with the pacing_shift?

> Recently I helped a user that could get barely 70 stations streaming
> at 1Mbps on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
> and we got 110 working with a tweaked TCP stack. These were /n
> stations too.
>
> I think it is lame that it _still_ requires out of tree patches to
> make TCP work well on ath10k...even if you want to default to current
> behaviour, you should allow users to tweak it to work with their use
> case.

Well if TCP is broken to the point of being unusable I do think we
should fix it; but I think "just provide a configuration knob" should be
the last resort...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Ben Greear

On 2/21/19 8:10 AM, Kalle Valo wrote:

Toke Høiland-Jørgensen  writes:


Grant Grundler  writes:


On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:


Grant Grundler  writes:


And, well, Grant's data is from a single test in a noisy
environment where the time series graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)


It's a fair complaint and I agree with it. My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.


Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...


Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".


Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
limited by the signal conditions, or by contention with other devices.
Both of these happen regularly, and I worry that latency will be
badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
the driver->firmware path (especially drivers without push/pull mode
support)? For these, the lower-level queueing structure is less
effective at controlling queueing latency.


Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.


Just as a note, at least kernels such as 4.14.whatever perform poorly when
running ath10k on 9984 when acting as TCP endpoints.  This makes them not
really usable for stuff like serving video to lots of clients.

Tweaking TCP (I do it a bit differently, but either way) can significantly
improve performance.

Recently I helped a user that could get barely 70 stations streaming at 1Mbps
on stock kernel (using one wave1 on 2.4, one wave-2 on 5Ghz),
and we got 110 working with a tweaked TCP stack.  These were /n stations too.

I think it is lame that it _still_ requires out of tree patches to make TCP work
well on ath10k...even if you want to default to current behaviour, you should
allow users to tweak it to work with their use case.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Kalle Valo
Toke Høiland-Jørgensen  writes:

> Grant Grundler  writes:
>
>> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>>>
>>> Grant Grundler  writes:
>>>
>>> >> And, well, Grant's data is from a single test in a noisy
>>> >> environment where the time series graph shows that throughput is all over
>>> >> the place for the duration of the test; so it's hard to draw solid
>>> >> conclusions from (for instance, for the 5-stream test, the average
>>> >> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>>> >> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>>> >> used in this test, so I can't go verify it myself; so the only thing I
>>> >> can do is grumble about it here... :)
>>> >
>>> > It's a fair complaint and I agree with it. My counter argument is the
>>> > opposite is true too: most ideal benchmarks don't measure what most
>>> > users see. While the data wgong provided are way more noisy than I
>>> > like, my overall "confidence" in the "conclusion" I offered is still
>>> > positive.
>>>
>>> Right. I guess I would just prefer a slightly more comprehensive
>>> evaluation to base a 4x increase in buffer size on...
>>
>> Kalle, is this why you didn't accept this patch? Other reasons?
>>
>> Toke, what else would you like to see evaluated?
>>
>> I generally want to see three things measured when "benchmarking"
>> technologies: throughput, latency, cpu utilization
>> We've covered those three I think "reasonably".
>
> Hmm, going back and looking at this (I'd completely forgotten about this
> patch), I think I had two main concerns:
>
> 1. What happens in a degraded signal situation, where the throughput is
>limited by the signal conditions, or by contention with other devices.
>Both of these happen regularly, and I worry that latency will be
>badly affected under those conditions.
>
> 2. What happens with old hardware that has worse buffer management in
>the driver->firmware path (especially drivers without push/pull mode
>support)? For these, the lower-level queueing structure is less
>effective at controlling queueing latency.

Do note that this patch changes behaviour _only_ for QCA6174 and QCA9377
PCI devices, which IIRC do not even support push/pull mode. All the
rest, including QCA988X and QCA9984 are unaffected.

-- 
Kalle Valo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-21 Thread Toke Høiland-Jørgensen
Grant Grundler  writes:

> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>>
>> Grant Grundler  writes:
>>
>> >> And, well, Grant's data is from a single test in a noisy
>> >> environment where the time series graph shows that throughput is all over
>> >> the place for the duration of the test; so it's hard to draw solid
>> >> conclusions from (for instance, for the 5-stream test, the average
>> >> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>> >> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>> >> used in this test, so I can't go verify it myself; so the only thing I
>> >> can do is grumble about it here... :)
>> >
>> > It's a fair complaint and I agree with it. My counter argument is the
>> > opposite is true too: most ideal benchmarks don't measure what most
>> > users see. While the data wgong provided are way more noisy than I
>> > like, my overall "confidence" in the "conclusion" I offered is still
>> > positive.
>>
>> Right. I guess I would just prefer a slightly more comprehensive
>> evaluation to base a 4x increase in buffer size on...
>
> Kalle, is this why you didn't accept this patch? Other reasons?
>
> Toke, what else would you like to see evaluated?
>
> I generally want to see three things measured when "benchmarking"
> technologies: throughput, latency, cpu utilization
> We've covered those three I think "reasonably".

Hmm, going back and looking at this (I'd completely forgotten about this
patch), I think I had two main concerns:

1. What happens in a degraded signal situation, where the throughput is
   limited by the signal conditions, or by contention with other devices.
   Both of these happen regularly, and I worry that latency will be
   badly affected under those conditions.

2. What happens with old hardware that has worse buffer management in
   the driver->firmware path (especially drivers without push/pull mode
   support)? For these, the lower-level queueing structure is less
   effective at controlling queueing latency.

Getting the queue size limit patches from ChromeOS ported would
alleviate point 2. I do believe Kan said he'd look into that once the
airtime patches were merged. So Kan, any progress on that front? :)

> What does a "4x increase in memory" mean here? Wen, how much more
> memory does this cause ath10k to use?

I didn't say "memory", I said "buffer size"... :)
I.e., it's the latency impact of the increased buffering I'm worried
about (see above), not the system memory usage.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-20 Thread Kalle Valo
Grant Grundler  writes:

> On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>>
>> Grant Grundler  writes:
>>
>> >> And, well, Grant's data is from a single test in a noisy
>> >> environment where the time series graph shows that throughput is all over
>> >> the place for the duration of the test; so it's hard to draw solid
>> >> conclusions from (for instance, for the 5-stream test, the average
>> >> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>> >> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>> >> used in this test, so I can't go verify it myself; so the only thing I
>> >> can do is grumble about it here... :)
>> >
>> > It's a fair complaint and I agree with it. My counter argument is the
>> > opposite is true too: most ideal benchmarks don't measure what most
>> > users see. While the data wgong provided are way more noisy than I
>> > like, my overall "confidence" in the "conclusion" I offered is still
>> > positive.
>>
>> Right. I guess I would just prefer a slightly more comprehensive
>> evaluation to base a 4x increase in buffer size on...
>
> Kalle, is this why you didn't accept this patch? Other reasons?

Just for reference, here are patchwork links:

https://patchwork.kernel.org/cover/10559729/
https://patchwork.kernel.org/patch/10559731/
https://patchwork.kernel.org/patch/10559733/

The mac80211 patch is accepted and the ath10k patch is deferred. IIRC I
put it deferred as I didn't see a real consensus about the patch and was
supposed to look at it later, but haven't done yet. I don't have any
issues with the patch, except maybe removing the duplicate define for
9377 (which I can easily fix in the pending branch).

-- 
Kalle Valo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2019-02-20 Thread Grant Grundler
On Thu, Sep 6, 2018 at 3:18 AM Toke Høiland-Jørgensen  wrote:
>
> Grant Grundler  writes:
>
> >> And, well, Grant's data is from a single test in a noisy
> >> environment where the time series graph shows that throughput is all over
> >> the place for the duration of the test; so it's hard to draw solid
> >> conclusions from (for instance, for the 5-stream test, the average
> >> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
> >> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
> >> used in this test, so I can't go verify it myself; so the only thing I
> >> can do is grumble about it here... :)
> >
> > It's a fair complaint and I agree with it. My counter argument is the
> > opposite is true too: most ideal benchmarks don't measure what most
> > users see. While the data wgong provided are way more noisy than I
> > like, my overall "confidence" in the "conclusion" I offered is still
> > positive.
>
> Right. I guess I would just prefer a slightly more comprehensive
> evaluation to base a 4x increase in buffer size on...

Kalle, is this why you didn't accept this patch? Other reasons?

Toke, what else would you like to see evaluated?

I generally want to see three things measured when "benchmarking"
technologies: throughput, latency, cpu utilization
We've covered those three I think "reasonably".

What does a "4x increase in memory" mean here?  Wen, how much more
memory does this cause ath10k to use?

If a "4x increase in memory" means I'm using 1MB instead of 256KB, I'm
not going worry about that on a system with 2GB-16GB of RAM if it
doubles the throughput of the WIFI for a given workload.  I expect
routers with 128-256MB RAM would make that tradeoff as well assuming
they don't have other RAM-demanding workload.

cheers,
grant

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-06 Thread Toke Høiland-Jørgensen
Grant Grundler  writes:

>> And, well, Grant's data is from a single test in a noisy
>> environment where the time series graph shows that throughput is all over
>> the place for the duration of the test; so it's hard to draw solid
>> conclusions from (for instance, for the 5-stream test, the average
>> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
>> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
>> used in this test, so I can't go verify it myself; so the only thing I
>> can do is grumble about it here... :)
>
> It's a fair complaint and I agree with it. My counter argument is the
> opposite is true too: most ideal benchmarks don't measure what most
> users see. While the data wgong provided are way more noisy than I
> like, my overall "confidence" in the "conclusion" I offered is still
> positive.

Right. I guess I would just prefer a slightly more comprehensive
evaluation to base a 4x increase in buffer size on...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-05 Thread Wen Gong

On 2018-09-05 07:43, Grant Grundler wrote:
On Mon, Sep 3, 2018 at 6:35 AM Toke Høiland-Jørgensen  
wrote:


Johannes Berg  writes:



> Grant's data shows a significant difference between 6 and 7 for both
> latency and throughput:


Minor nit: this is wgong's data more thoughtfully processed.




wgong: can you confirm (a) I've entered the data correctly in the
spreadsheet and (b) you've labeled the data sets correctly when you
generated the data?
If either of us made a mistake, it would be good to know. :)


yes, the data is the test result from flent tool.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-04 Thread Grant Grundler
On Mon, Sep 3, 2018 at 6:35 AM Toke Høiland-Jørgensen  wrote:
>
> Johannes Berg  writes:

> > Grant's data shows a significant difference between 6 and 7 for both
> > latency and throughput:

Minor nit: this is wgong's data more thoughtfully processed.

> >
> >  * median tpt
> >- ~241 vs ~201 (both 1 and 5 streams)
> >  * median latency
> >- 7.5 vs 6 (1 stream)
> >- 17.3 vs. 16.6 (5 streams)
> >
> > A 20% throughput improvement at <= 1.5ms latency cost seems like a
> > pretty reasonable trade-off?
>
> Yeah, on it's face. What I'm bothered about is that it is the exact
> opposite results that I got from my ath10k tests (there, throughput
> *dropped* and latency doubled when going to from 4 to 16 ms of
> buffering).

Hrm, yeah...that would bother me too. I think even if we don't
understand why/how that happened, at some level we need to allow
subsystems or drivers to adjust sk_pacing_shift value. Changing
sk_pacing_shift clearly has an effect that can be optimized.

If smaller values of sk_pacing_shift is increasing the interval (and
allows more buffering), I'm wondering why CPU utilization gets higher.
More buffering is usually more efficient. :/

wgong: can you confirm (a) I've entered the data correctly in the
spreadsheet and (b) you've labeled the data sets correctly when you
generated the data?
If either of us made a mistake, it would be good to know. :)

I'm going to speculate that "other processing" (e.g. device level
interrupt mitigation or possibly CPU scheduling behaviors which
handles TX/RX completion) could cause a "bathtub" effect similar to
the performance graphs that originally got NAPI accepted into the
kernel ~15 years ago.  So the "optimal" value could be different for
different architectures and different IO devices (which have different
max link rates and different host bus interconnects). But honestly, I
don't understand all the details of sk_pacing_shift value nearly as
well as just about anyone else reading this thread.

> And, well, Grant's data is from a single test in a noisy
> environment where the time series graph shows that throughput is all over
> the place for the duration of the test; so it's hard to draw solid
> conclusions from (for instance, for the 5-stream test, the average
> throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
> it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
> used in this test, so I can't go verify it myself; so the only thing I
> can do is grumble about it here... :)

It's a fair complaint and I agree with it.  My counter argument is the
opposite is true too: most ideal benchmarks don't measure what most
users see. While the data wgong provided are way more noisy than I
like, my overall "confidence" in the "conclusion" I offered is still
positive.

cheers,
grant

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Dave Taht
While I'm being overly vague and general...

* Doing some form of ack compression (whether in the tcp stack or in
mac80211) seems useful. It's really
hard to fill 4Mbytes one way and the needed number of acks the other.
I'd also pointed out (on some thread on the make-wifi-fast list that I
can't find right now) that drivers had a tendency to eat an entire
txop for that first single ack and ping pong between empty and full on
a one way tcp benchmark.

* Similarly grabbing air during the contention window. If the sender
is grabbing 3x as much air as the reciever

(A1 A2 A3 B1 A4 B2)

* I know of one org that upped the codel target value for 802.11ac
with decent results but I can't talk about it...
(sort of why I asked for a cap). My problem with decreasing the pacing
shift is that it makes for less interleaving (I think! have to look),
where increasing the target compensates for more jittery networks.
('course my overall theme of my prior post was to reduce jittery
networks with less buffering and smaller txops overall)

* Is this a two device test? or a test through an AP?

Looking over this thread, I see a suggestion is to expose the pacing
rate via a sysctl. I don't think there's
any one answer, either, but I'm not sure this is the right variable to
expose. Perhaps a tunable for
latency that might (one day) look at multiple variables and decide?

Getting this more right is really a job for an algorithm of
minstrel-like complexity.

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Dave Taht
I have not been on this thread (I have had to shut down my wifi lab
and am not planning on doing any more wifi work in the future). But
for what it's worth:

* tcp_pacing_shift affects the performance of the tcp stack only. I
think 16ms is an outrageous amount, and I'm thinking we actually have
a problem on the other side of the link in getting acks out as I
write. That said,
I don't care all that much compared to the bad old days (
http://blog.cerowrt.org/tags/wifi ) and 16ms
is far better than the seconds it used to be.

That said, I'd rather appreciate a test on HT20 with the same chipset
at a lower rate. The hope was that
we'd achieve flat latency at every rate (see
http://blog.cerowrt.org/post/real_results/ for some lovely pics)

Flent can sample minstrel stats but I think we're SOL on the ath10k.

* The irtt test can measure one way delays pretty accurately so you
can see if the other side is actually the source of this issue.

* Not clear to me if the AP is also running the fq_codel for wifi
code? That, um, er, helps an AP enormously

* Having an aircap and regular capture of what's going on during these
tests would be useful. Flent is a great test driver, but a tcpdump
taken during a flent test tells the truth via tcptrace and xplot.org.
We can inspect cwnd, rwnd, and look at drops. (actually I usually turn
ECN on so I can see when the codel algorithm is kicking in). Being
able to zoom in on the early ramp would be helpful. Etc.

I'll take a look at them if you'll supply them.

* At some point the ath10k driver also gained NAPI support. (?) I
don't think NAPI is a good fit for wifi. Delays in TCP RX processing
lead to less throughput.

* The firmware is perpetually trying to do more stuff with less
interrupts. Me, I dreamed of a register you could poll for "when will
the air be clear", and a "tx has one ms left to go, give me some more
data" interrupt. The fact we still have a minimum of ~12ms of
uncontrolled buffering bugs me. If we could just hold off submittal
until just before it was needed, we could cut it in half (and further,
by 10s of ms, when the media is contended).

* My general recomendation for contended APs was that they advertise a
reduced TXOP as the number of active stations go up. (this also
applies to the VI/VO queues). Never got around to algorizing it... I
gave up on my links and, disable VO/VI/BK entirely and just tell
hostapd to advertise 2ms. Yep, kills conventional measures of
throughput. Cuts inter-station service time by 2/3s though, and *that*
you can notice and measure with PLT benchmarks and overall feel.

This implies increasing the pacing shift to suit the advertised size
of the txop, not decreasing it - and I have no idea how to get that
from one place to another. Same goes for stuff destined for the VI/VO
queues.

* (also it would be good to see in the capture what the beacon says)

* Powersave has always been a problem...

* Cubic is a terrible tcp for wifi. BBR, however, is possibly worse...
or better. It would be educational to try running BBR on this link at
either shift setting. (seriously!) TCP is not TCP is not TCP

Lastly... tcp_pacing_shift does not affect the number of other packets
that can get into a given txop given the other buffering - it's one in
the hardware, one ready to go, one being formed, try setting
pacing_shift to to 4 to see what happens

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Mon, 2018-09-03 at 13:11 +0200, Toke Høiland-Jørgensen wrote:
>
>> > 6 vs. 8, I think? But I didn't follow the full discussion.
>
> Err, I just realized that I was completely wrong - the default, of
> course, is 10. So smaller values mean more buffering.
>
> Most of my argumentation was thus utter garbage :-)

Well, I got the gist, even if the sign bit was wrong ;)

>> > I also think that we shouldn't necessarily restrict this to "for the
>> > ath10k". Is there any reason to think that this would be different for
>> > different chips?
>> 
>> No, I also think it should be possible to select a value that will work
>> for all drivers. There's a strong diminishing returns effect here after
>> a certain point, and I believe we should strongly push back against just
>> adding more buffering to chase (single-flow) throughput numbers; and I'm
>> not convinced that having different values for different drivers is
>> worth the complexity.
>
> I think I can see some point in it if the driver requires some
> buffering for some specific reason? But you'd have to be able to state
> that reason, I guess, I could imagine having a firmware limitation to
> need to build two aggregates, or something around MU-MIMO, etc.

Right, I'm not ruling out that there can be legitimate reasons to add
extra buffering; but a lot of times it's just used to paper over other
issues, so a good explanation is definitely needed...

>> As far as the actual value, I *think* it may be that the default shift
>> should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back
>> and looking at my data from when I submitted the original patch, it
>> looks like the point of diminishing returns is somewhere between those
>> two with ath9k (where I did most of my testing), and it seems reasonable
>> that it could be slightly higher (i.e., more buffering) for ath10k.
>
> Grant's data shows a significant difference between 6 and 7 for both
> latency and throughput:
>
>  * median tpt
>- ~241 vs ~201 (both 1 and 5 streams)
>  * median latency
>- 7.5 vs 6 (1 stream)
>- 17.3 vs. 16.6 (5 streams)
>
> A 20% throughput improvement at <= 1.5ms latency cost seems like a
> pretty reasonable trade-off?

Yeah, on it's face. What I'm bothered about is that it is the exact
opposite results that I got from my ath10k tests (there, throughput
*dropped* and latency doubled when going to from 4 to 16 ms of
buffering). And, well, Grant's data is from a single test in a noisy
environment where the timeseries graph shows that throughput is all over
the place for the duration of the test; so it's hard to draw solid
conclusions from (for instance, for the 5-stream test, the average
throughput for 6 is 331 and 379 Mbps for the two repetitions, and for 7
it's 326 and 371 Mbps) . Unfortunately I don't have the same hardware
used in this test, so I can't go verify it myself; so the only thing I
can do is grumble about it here... :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Johannes Berg
On Mon, 2018-09-03 at 13:11 +0200, Toke Høiland-Jørgensen wrote:

> > 6 vs. 8, I think? But I didn't follow the full discussion.

Err, I just realized that I was completely wrong - the default, of
course, is 10. So smaller values mean more buffering.

Most of my argumentation was thus utter garbage :-)

> > I also think that we shouldn't necessarily restrict this to "for the
> > ath10k". Is there any reason to think that this would be different for
> > different chips?
> 
> No, I also think it should be possible to select a value that will work
> for all drivers. There's a strong diminishing returns effect here after
> a certain point, and I believe we should strongly push back against just
> adding more buffering to chase (single-flow) throughput numbers; and I'm
> not convinced that having different values for different drivers is
> worth the complexity.

I think I can see some point in it if the driver requires some buffering
for some specific reason? But you'd have to be able to state that
reason, I guess, I could imagine having a firmware limitation to need to
build two aggregates, or something around MU-MIMO, etc.

> As far as I'm concerned, just showing an increase in maximum throughput
> under ideal conditions (as this initial patch submission did) is not
> sufficient argument for adding more buffering. At a minimum, the
> evaluation should also include:
> 
> - Impact on latency and throughput for competing flows in the same test.
> - Impact on latency for the TCP flow itself.
> - A full evaluation at low rates and in scenarios with more than one
>   client.

That seems reasonable.

> As far as the actual value, I *think* it may be that the default shift
> should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back
> and looking at my data from when I submitted the original patch, it
> looks like the point of diminishing returns is somewhere between those
> two with ath9k (where I did most of my testing), and it seems reasonable
> that it could be slightly higher (i.e., more buffering) for ath10k.

Grant's data shows a significant difference between 6 and 7 for both
latency and throughput:

 * median tpt
   - ~241 vs ~201 (both 1 and 5 streams)
 * median latency
   - 7.5 vs 6 (1 stream)
   - 17.3 vs. 16.6 (5 streams)

A 20% throughput improvement at <= 1.5ms latency cost seems like a
pretty reasonable trade-off?

johannes

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Thu, 2018-08-30 at 16:32 -0700, Grant Grundler wrote:
>> On Sun, Aug 12, 2018 at 10:37 PM Wen Gong  wrote:
>> ...
>> > I have tested with your method for shift 6/8/10/7 all twice time in open 
>> > air environment.
>> 
>> I've attached the graphed data which Wen provided me and I made one
>> "cleanup": the median and average "ICMP" are computed only using
>> values where flent has an "antagonist" load running.  The first 28 and
>> last 26 seconds of the test are "idle" - it makes no sense to include
>> the latency of those. This drops about 1/7th of the data points.
>> 
>> My observations:
>> o the delta between average and median is evidence this is not
>> particularly reliable data (but expected result for "open air" wifi
>> testing in a corp environment).
>> o sk_pacing_shift=8 and/or =7 appears to have suffered from
>> significant open air interference - I've attached a graph of the raw
>> data.
>> o I'm comfortable asserting throughput is higher and latency is lower
>> for sk_pacing_shift=6 (vs 7) at the cost of additional CPU
>> utilization.
>> 
>> My questions:
>> 1) Is the current conversation just quibbling about 6 vs 7 for the
>> ath10k default?
>
> 6 vs. 8, I think? But I didn't follow the full discussion.
>
> I also think that we shouldn't necessarily restrict this to "for the
> ath10k". Is there any reason to think that this would be different for
> different chips?

No, I also think it should be possible to select a value that will work
for all drivers. There's a strong diminishing returns effect here after
a certain point, and I believe we should strongly push back against just
adding more buffering to chase (single-flow) throughput numbers; and I'm
not convinced that having different values for different drivers is
worth the complexity.

As far as I'm concerned, just showing an increase in maximum throughput
under ideal conditions (as this initial patch submission did) is not
sufficient argument for adding more buffering. At a minimum, the
evaluation should also include:

- Impact on latency and throughput for competing flows in the same test.
- Impact on latency for the TCP flow itself.
- A full evaluation at low rates and in scenarios with more than one
  client.

As far as the actual value, I *think* it may be that the default shift
should be 7 (8 ms) rather than 8 (4 ms) as it currently is. Going back
and looking at my data from when I submitted the original patch, it
looks like the point of diminishing returns is somewhere between those
two with ath9k (where I did most of my testing), and it seems reasonable
that it could be slightly higher (i.e., more buffering) for ath10k.

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-09-03 Thread Johannes Berg
On Thu, 2018-08-30 at 16:32 -0700, Grant Grundler wrote:
> On Sun, Aug 12, 2018 at 10:37 PM Wen Gong  wrote:
> ...
> > I have tested with your method for shift 6/8/10/7 all twice time in open 
> > air environment.
> 
> I've attached the graphed data which Wen provided me and I made one
> "cleanup": the median and average "ICMP" are computed only using
> values where flent has an "antagonist" load running.  The first 28 and
> last 26 seconds of the test are "idle" - it makes no sense to include
> the latency of those. This drops about 1/7th of the data points.
> 
> My observations:
> o the delta between average and median is evidence this is not
> particularly reliable data (but expected result for "open air" wifi
> testing in a corp environment).
> o sk_pacing_shift=8 and/or =7 appears to have suffered from
> significant open air interference - I've attached a graph of the raw
> data.
> o I'm comfortable asserting throughput is higher and latency is lower
> for sk_pacing_shift=6 (vs 7) at the cost of additional CPU
> utilization.
> 
> My questions:
> 1) Is the current conversation just quibbling about 6 vs 7 for the
> ath10k default?

6 vs. 8, I think? But I didn't follow the full discussion.

I also think that we shouldn't necessarily restrict this to "for the
ath10k". Is there any reason to think that this would be different for
different chips?

I could see a reason for a driver to set the default *higher* than the
mac80211 default (needing more buffers for whatever internal reason),
but I can't really see a reason it should be *lower*. Also, the
networking-stack wide default is 0, after all, so it seems that each new
layer might have some knowledge about why to *increase* it (like
mac80211 knows that A-MPDUs/A-MSDUs need to be built), but it seems not
so reasonable that it should be reduced again. Unless you have some
specific knowledge that ath10k would have a smaller A-MSDU/A-MPDU size
limit or something?

> 2) If yes, can both patches be accepted and then later add code to
> select a default based on CPU available?

I've applied the first patch now (with some comment cleanups, you may
want to pick those up for your internal usage) because I can see an
argument for increasing it.

However, I think that the "quibbling" over the default should most
likely result in a changed default value for mac80211, rather than
ath10k using the driver setting as a way out of that discussion.

johannes

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-31 Thread Toke Høiland-Jørgensen
Peter Oh  writes:

> Hi Toke,
>
>
> On 08/17/2018 04:32 AM, Toke Høiland-Jørgensen wrote:
>>
> Shift 6:
> wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
> "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1

> Does flent have options to set TOS or QoS queue for TCP or UDP test?
> If I add "--test-paramete burst_tos=0xe0", will it use corresponding 
> mac80211 QoS queue?

Yes, for some tests:

- The RRUL test will run four TCP flows in each direction, one marked for
  each 802.11 QoS queue.

- For the rtt_fair* tests you can set them with a test parameter:
--test-parameter markings=BE,CS1 to set flows 1 and 2 to BE and CS1
respectively. You can use the rtt_fair_var* tests to get a variable
number of flows (the same host can be specified multiple times by having
different hostnames resolve to the same IP).

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-30 Thread Peter Oh

Hi Toke,


On 08/17/2018 04:32 AM, Toke Høiland-Jørgensen wrote:



Shift 6:
wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
"sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1



Does flent have options to set TOS or QoS queue for TCP or UDP test?
If I add "--test-paramete burst_tos=0xe0", will it use corresponding 
mac80211 QoS queue?
flent -H  -t "" tcp_nup --test-parameter upload_streams=1 
--test-paramete burst_tos=0xe0


Thanks,
Peter

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-17 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: Toke Høiland-Jørgensen 
>> Sent: Monday, August 13, 2018 7:18 PM
>> To: Wen Gong ; Wen Gong
>> ; ath10k@lists.infradead.org;
>> johan...@sipsolutions.net
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
>> chips
>> 
>> Wen Gong  writes:
>> 
>> > Hi Toke,
>> >
>> > I have tested with your method for shift 6/8/10/7 all twice time in
>> > open air environment.
>> 
>> Great, thanks!
>> 
>> > Shift 6:
>> > wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
>> > "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
>> 
>> Just to be sure: You are running this on your laptop? And that laptop has the
>> ath10k and the modified kernel you are testing? Is 192.168.1.7 the AP or
>> another device?
> Hi Toke,
> Yes, I run on my laptop(Dell LATITUDE E5440 + QCA6174 PCI wireless card) in 
> open air environment,
> Kernel is 4.18.0-rc2-wt-ath+ #2 SMP which is built myself.
> The mac80211.ko/ath10k is built with the 2 patches.
> 192.168.1.7 is another device(PC installed with 4.4.0-116-generic 
> #140~14.04.1-Ubuntu)
>> 
>> I'd like to see what happens when the link is fully saturated my multiple
>> streams as well. Could you repeat the tests with upload_streams=5, please?
>> And if you could share the .flent.gz files (upload them somewhere, or email
>> me off-list), that would be useful as well :)
>
> Test result with upload_streams=5:

Sorry for the delay in looking at this, I've been terribly busy this
week; will get back to it in more detail next week.

One question for now: In the results you quote below the total
throughput is way lower than the up to 600 mbps you quoted in the patch
description. How did you run your original tests to achieve that high a
throughput?

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-13 Thread Wen Gong
> -Original Message-
> From: Toke Høiland-Jørgensen 
> Sent: Monday, August 13, 2018 7:18 PM
> To: Wen Gong ; Wen Gong
> ; ath10k@lists.infradead.org;
> johan...@sipsolutions.net
> Cc: linux-wirel...@vger.kernel.org
> Subject: RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
> chips
> 
> Wen Gong  writes:
> 
> > Hi Toke,
> >
> > I have tested with your method for shift 6/8/10/7 all twice time in
> > open air environment.
> 
> Great, thanks!
> 
> > Shift 6:
> > wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
> > "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
> 
> Just to be sure: You are running this on your laptop? And that laptop has the
> ath10k and the modified kernel you are testing? Is 192.168.1.7 the AP or
> another device?
Hi Toke,
Yes, I run on my laptop(Dell LATITUDE E5440 + QCA6174 PCI wireless card) in 
open air environment,
Kernel is 4.18.0-rc2-wt-ath+ #2 SMP which is built myself.
The mac80211.ko/ath10k is built with the 2 patches.
192.168.1.7 is another device(PC installed with 4.4.0-116-generic 
#140~14.04.1-Ubuntu)
> 
> I'd like to see what happens when the link is fully saturated my multiple
> streams as well. Could you repeat the tests with upload_streams=5, please?
> And if you could share the .flent.gz files (upload them somewhere, or email
> me off-list), that would be useful as well :)

Test result with upload_streams=5:

sk_pacing_shift6:
wgong@wgong-Latitude-E5440-1:~/flent/5stream$ flent -H 192.168.1.7 -t 
"sk_pacing_shift6" tcp_nup --test-parameter upload_streams=5
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-14T105332.356811.sk_pacing_shift6.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift6' (at 2018-08-14 02:53:32.356811):

   avg   median  # data pts
 Ping (ms) ICMP :20.4613.85 ms  350
 TCP upload avg :66.3068.71 Mbits/s 301
 TCP upload sum :   331.49   343.55 Mbits/s 301
 TCP upload::1  :60.8064.65 Mbits/s 202
 TCP upload::2  :77.7282.89 Mbits/s 211
 TCP upload::3  :60.5256.09 Mbits/s 202
 TCP upload::4  :67.3973.56 Mbits/s 204
 TCP upload::5  :65.0671.97 Mbits/s 201
wgong@wgong-Latitude-E5440-1:~/flent/5stream$ flent -H 192.168.1.7 -t 
"sk_pacing_shift6" tcp_nup --test-parameter upload_streams=5
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-14T105554.583603.sk_pacing_shift6.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift6' (at 2018-08-14 02:55:54.583603):

   avg   median  # data pts
 Ping (ms) ICMP :20.8613.80 ms  350
 TCP upload avg :75.8883.17 Mbits/s 301
 TCP upload sum :   379.42   415.84 Mbits/s 301
 TCP upload::1  :82.0790.73 Mbits/s 225
 TCP upload::2  :74.0878.29 Mbits/s 204
 TCP upload::3  :70.4475.65 Mbits/s 217
 TCP upload::4  :82.7092.86 Mbits/s 223
 TCP upload::5  :70.1376.87 Mbits/s 210

sk_pacing_shift7:
wgong@wgong-Latitude-E5440-1:~/flent/5stream$ flent -H 192.168.1.7 -t 
"sk_pacing_shift7" tcp_nup --test-parameter upload_streams=5
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-14T105759.169367.sk_pacing_shift7.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift7' (at 2018-08-14 02:57:59.169367):

   avg   median  # data pts
 Ping (ms) ICMP :24.6615.55 ms  350
 TCP upload avg :65.3372.83 Mbits/s 301
 TCP upload sum :   326.67   363.10 Mbits/s 301
 TCP upload::1  :77.6092.93 Mbits/s 214
 TCP upload::2  :67.2275.95 Mbits/s 213
 TCP upload::3  :65.8174.54 Mbits/s 205
 TCP upload::4  :63.0670.37 Mbits/s 207
 TCP upload::5  :52.9855.78 Mbits/s 198
wgong@wgong-Latitude-E5440-1:~/flent/5stream$ flent -H 192.168.1.7 -t 
"sk_pacing_shift7" tcp_nup --test-parameter upload_streams=5
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-14T105923.620572.sk_pacing_shift7.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift7' (at 2018-08-14 02:59:23.620572):

   avg   median  # data pts
 Ping (ms) ICMP

RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-13 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Hi Toke,
>
> I have tested with your method for shift 6/8/10/7 all twice time in
> open air environment.

Great, thanks!

> Shift 6:
> wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t
> "sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1

Just to be sure: You are running this on your laptop? And that laptop
has the ath10k and the modified kernel you are testing? Is 192.168.1.7
the AP or another device?

I'd like to see what happens when the link is fully saturated my
multiple streams as well. Could you repeat the tests with
upload_streams=5, please? And if you could share the .flent.gz files
(upload them somewhere, or email me off-list), that would be useful as
well :)

Thanks,

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-12 Thread Wen Gong
> -Original Message-
> From: Toke Høiland-Jørgensen 
> Sent: Friday, August 10, 2018 9:18 PM
> To: Wen Gong ; Wen Gong
> ; ath10k@lists.infradead.org;
> johan...@sipsolutions.net
> Cc: linux-wirel...@vger.kernel.org
> Subject: RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
> chips
> 
> Wen Gong  writes:
> 
> >>
> > Hi Toke,
> > Could you give the command line for the latency test?
> > https://flent.org/intro.html#quick-start
> > I used the command but test failed:
> > flent tcp_download -p 1 -l 60 -H 192.168.1.5 -t
> > text-to-be-included-in-plot -o file1.png error loading plotter: unable to 
> > find
> plot configuration "1"
> 
> Try something like:
> 
> 
> flent -H 192.168.1.5 -t "sk_pacing_shift 7" tcp_nup --test-parameter
> upload_streams=1
> 
> 
> This will note the value of sk_pacing_shift you are testing in the data file 
> (so
> change that as appropriate), and you can vary the number of TCP streams by
> changing the upload_streams parameter.
> 
> Note that in this case I'm assuming you are running Flent on the device with
> the kernel you are trying to test, so you want a TCP transfer going
> *from* the device. If not, change "tcp_nup" to "tcp_ndown" and
> "upload_streams" to "download_streams". Upload is netperf TCP_STREAM
> test, and download is TCP_MAERTS.
> 
> When running the above command you'll get a summary output on the
> terminal that you can paste on the list; and also a data file to plot things 
> form.
> For instance, you can do something like 'flent -p ping_cdf *.flent.gz' to get 
> a
> CDF plot of all your test results afterwards. You are also very welcome to
> send me the .flent.gz data files and I'll take a look to make sure everything
> looks reasonable :)
> 
> -Toke

Hi Toke,

I have tested with your method for shift 6/8/10/7 all twice time in open air 
environment.
Shift 6:
wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t 
"sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-13T110414.699512.sk_pacing_shift6.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift6' (at 2018-08-13 03:04:14.699512):

   avg   median  # data pts
 Ping (ms) ICMP : 9.91 4.99 ms  350
 TCP upload avg :   242.48   262.43 Mbits/s 301
 TCP upload sum :   242.48   262.43 Mbits/s 301
 TCP upload::1  :   242.48   263.34 Mbits/s 271
wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t 
"sk_pacing_shift6" tcp_nup --test-parameter upload_streams=1
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-13T113317.074077.sk_pacing_shift6.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift6' (at 2018-08-13 03:33:17.074077):

   avg   median  # data pts
 Ping (ms) ICMP : 7.75 5.30 ms  350
 TCP upload avg :   239.17   250.84 Mbits/s 301
 TCP upload sum :   239.17   250.84 Mbits/s 301
 TCP upload::1  :   239.17   255.03 Mbits/s 266

Shift 8:
wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t 
"sk_pacing_shift8" tcp_nup --test-parameter upload_streams=1
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-13T121433.187781.sk_pacing_shift8.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift8' (at 2018-08-13 04:14:33.187781):

   avg   median  # data pts
 Ping (ms) ICMP :17.12 7.07 ms  350
 TCP upload avg :   180.05   185.82 Mbits/s 301
 TCP upload sum :   180.05   185.82 Mbits/s 301
 TCP upload::1  :   180.05   189.41 Mbits/s 253
wgong@wgong-Latitude-E5440-1:~/flent$ flent -H 192.168.1.7 -t 
"sk_pacing_shift8" tcp_nup --test-parameter upload_streams=1
Started Flent 1.2.2 using Python 2.7.12.
Starting tcp_nup test. Expected run time: 70 seconds.
Data file written to 
./tcp_nup-2018-08-13T121602.382575.sk_pacing_shift8.flent.gz.
Summary of tcp_nup test run 'sk_pacing_shift8' (at 2018-08-13 04:16:02.382575):

   avg   median  # data pts
 Ping (ms) ICMP :13.90 5.89 ms  350
 TCP upload avg :   207.56   228.16 Mbits/s 301
 TCP upload sum :   207.56   228.16 Mbits/s 301
 TCP upload::1  :   207.56   228.11 Mbits/s 259

Shift 10:
wgong@wgong-Latit

RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-10 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

>> -Original Message-
>> From: ath10k  On Behalf Of Toke
>> Høiland-Jørgensen
>> Sent: Wednesday, August 8, 2018 6:44 PM
>> To: Wen Gong ; ath10k@lists.infradead.org;
>> johan...@sipsolutions.net
>> Cc: linux-wirel...@vger.kernel.org
>> Subject: Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
>> chips
>> 
>> Wen Gong  writes:
>> 
>> > Upstream kernel has an interface to help adjust sk_pacing_shift to
>> > help improve TCP UL throughput.
>> > The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
>> > WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
>> > VHT80 TCP UL throughput testing result shows 6 is the optimal.
>> > Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
>> >
>> > Tested with QCA6174 PCI with firmware
>> > WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377
>> PCI.
>> > It's not a regression with new firmware releases.
>> >
>> > There have 2 test result of different settings:
>> >
>> > ARM CPU based device with QCA6174A PCI with different
>> > sk_pacing_shift:
>> >
>> >  sk_pacing_shift  throughput(Mbps) CPU utilization
>> >  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>> >  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>> >  8   288~90% idle, Focus on CPU1: ~35%idle
>> >  9  ~200~92% idle, Focus on CPU1: ~50%idle
>> >
>> > 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with
>> > sk_packing_shift set to 6:
>> >
>> >   tcp_limit_output_bytesthroughput(Mbps)
>> >  default(262144)+1 Stream 336
>> >  default(262144)+2 Streams558
>> >  default(262144)+3 Streams584
>> >  default(262144)+4 Streams602
>> >  default(262144)+5 Streams598
>> >  changed(2621440)+1 Stream598
>> >  changed(2621440)+2 Streams   601
>> 
>> You still haven't provided any latency numbers for these tests, which makes
>> it impossible to verify that setting sk_pacing_shift to 6 is the right 
>> tradeoff.
>> 
>> As I said before, from your numbers I suspect the right setting is actually 
>> 7,
>> which would be 10-20ms less latency under load; way more important than
>> ~50 Mbps...
>> 
> Hi Toke,
> Could you give the command line for the latency test?
> https://flent.org/intro.html#quick-start
> I used the command but test failed:
> flent tcp_download -p 1 -l 60 -H 192.168.1.5 -t text-to-be-included-in-plot 
> -o file1.png
> error loading plotter: unable to find plot configuration "1"

Try something like:


flent -H 192.168.1.5 -t "sk_pacing_shift 7" tcp_nup --test-parameter 
upload_streams=1


This will note the value of sk_pacing_shift you are testing in the data
file (so change that as appropriate), and you can vary the number of TCP
streams by changing the upload_streams parameter.

Note that in this case I'm assuming you are running Flent on the device
with the kernel you are trying to test, so you want a TCP transfer going
*from* the device. If not, change "tcp_nup" to "tcp_ndown" and
"upload_streams" to "download_streams". Upload is netperf TCP_STREAM
test, and download is TCP_MAERTS.

When running the above command you'll get a summary output on the
terminal that you can paste on the list; and also a data file to plot
things form. For instance, you can do something like 'flent -p ping_cdf
*.flent.gz' to get a CDF plot of all your test results afterwards. You
are also very welcome to send me the .flent.gz data files and I'll take
a look to make sure everything looks reasonable :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


RE: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-10 Thread Wen Gong
> -Original Message-
> From: ath10k  On Behalf Of Toke
> Høiland-Jørgensen
> Sent: Wednesday, August 8, 2018 6:44 PM
> To: Wen Gong ; ath10k@lists.infradead.org;
> johan...@sipsolutions.net
> Cc: linux-wirel...@vger.kernel.org
> Subject: Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi
> chips
> 
> Wen Gong  writes:
> 
> > Upstream kernel has an interface to help adjust sk_pacing_shift to
> > help improve TCP UL throughput.
> > The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> > WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> > VHT80 TCP UL throughput testing result shows 6 is the optimal.
> > Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
> >
> > Tested with QCA6174 PCI with firmware
> > WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377
> PCI.
> > It's not a regression with new firmware releases.
> >
> > There have 2 test result of different settings:
> >
> > ARM CPU based device with QCA6174A PCI with different
> > sk_pacing_shift:
> >
> >  sk_pacing_shift  throughput(Mbps) CPU utilization
> >  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
> >  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
> >  8   288~90% idle, Focus on CPU1: ~35%idle
> >  9  ~200~92% idle, Focus on CPU1: ~50%idle
> >
> > 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with
> > sk_packing_shift set to 6:
> >
> >   tcp_limit_output_bytesthroughput(Mbps)
> >  default(262144)+1 Stream 336
> >  default(262144)+2 Streams558
> >  default(262144)+3 Streams584
> >  default(262144)+4 Streams602
> >  default(262144)+5 Streams598
> >  changed(2621440)+1 Stream598
> >  changed(2621440)+2 Streams   601
> 
> You still haven't provided any latency numbers for these tests, which makes
> it impossible to verify that setting sk_pacing_shift to 6 is the right 
> tradeoff.
> 
> As I said before, from your numbers I suspect the right setting is actually 7,
> which would be 10-20ms less latency under load; way more important than
> ~50 Mbps...
> 
Hi Toke,
Could you give the command line for the latency test?
https://flent.org/intro.html#quick-start
I used the command but test failed:
flent tcp_download -p 1 -l 60 -H 192.168.1.5 -t text-to-be-included-in-plot -o 
file1.png
error loading plotter: unable to find plot configuration "1"

> -Toke
> 
> ___
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-08 Thread Toke Høiland-Jørgensen
Wen Gong  writes:

> Upstream kernel has an interface to help adjust sk_pacing_shift to help
> improve TCP UL throughput.
> The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
> WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
> VHT80 TCP UL throughput testing result shows 6 is the optimal.
> Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.
>
> Tested with QCA6174 PCI with firmware
> WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
> It's not a regression with new firmware releases.
>
> There have 2 test result of different settings:
>
> ARM CPU based device with QCA6174A PCI with different
> sk_pacing_shift:
>
>  sk_pacing_shift  throughput(Mbps) CPU utilization
>  6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
>  7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
>  8   288~90% idle, Focus on CPU1: ~35%idle
>  9  ~200~92% idle, Focus on CPU1: ~50%idle
>
> 5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
> set to 6:
>
>   tcp_limit_output_bytesthroughput(Mbps)
>  default(262144)+1 Stream 336
>  default(262144)+2 Streams558
>  default(262144)+3 Streams584
>  default(262144)+4 Streams602
>  default(262144)+5 Streams598
>  changed(2621440)+1 Stream598
>  changed(2621440)+2 Streams   601

You still haven't provided any latency numbers for these tests, which
makes it impossible to verify that setting sk_pacing_shift to 6 is the
right tradeoff.

As I said before, from your numbers I suspect the right setting is
actually 7, which would be 10-20ms less latency under load; way more
important than ~50 Mbps...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v2 2/2] ath10k: Set sk_pacing_shift to 6 for 11AC WiFi chips

2018-08-08 Thread Wen Gong
Upstream kernel has an interface to help adjust sk_pacing_shift to help
improve TCP UL throughput.
The sk_pacing_shift is 8 in mac80211, this is based on test with 11N
WiFi chips with ath9k. For QCA6174/QCA9377 PCI 11AC chips, the 11AC
VHT80 TCP UL throughput testing result shows 6 is the optimal.
Overwrite the sk_pacing_shift to 6 in ath10k driver for QCA6174/9377 PCI.

Tested with QCA6174 PCI with firmware
WLAN.RM.4.4.1-00109-QCARMSWPZ-1, but this will also affect QCA9377 PCI.
It's not a regression with new firmware releases.

There have 2 test result of different settings:

ARM CPU based device with QCA6174A PCI with different
sk_pacing_shift:

 sk_pacing_shift  throughput(Mbps) CPU utilization
 6500(-P5)  ~75% idle, Focus on CPU1: ~14%idle
 7454(-P5)  ~80% idle, Focus on CPU1: ~4%idle
 8   288~90% idle, Focus on CPU1: ~35%idle
 9  ~200~92% idle, Focus on CPU1: ~50%idle

5G TCP UL VTH80 on X86 platform with QCA6174A PCI with sk_packing_shift
set to 6:

  tcp_limit_output_bytesthroughput(Mbps)
 default(262144)+1 Stream 336
 default(262144)+2 Streams558
 default(262144)+3 Streams584
 default(262144)+4 Streams602
 default(262144)+5 Streams598
 changed(2621440)+1 Stream598
 changed(2621440)+2 Streams   601

Signed-off-by: Wen Gong 
---
V2:
-add the optimal for configurable for each hardware type
 drivers/net/wireless/ath/ath10k/core.c | 6 ++
 drivers/net/wireless/ath/ath10k/hw.h   | 6 ++
 drivers/net/wireless/ath/ath10k/mac.c  | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.c 
b/drivers/net/wireless/ath/ath10k/core.c
index 85c58eb..fbd13ec 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -189,6 +189,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_6174,
},
{
.id = QCA6174_HW_2_1_VERSION,
@@ -221,6 +222,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_6174,
},
{
.id = QCA6174_HW_3_0_VERSION,
@@ -253,6 +255,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_6174,
},
{
.id = QCA6174_HW_3_2_VERSION,
@@ -288,6 +291,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_6174,
},
{
.id = QCA99X0_HW_2_0_DEV_VERSION,
@@ -443,6 +447,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_9377,
},
{
.id = QCA9377_HW_1_1_DEV_VERSION,
@@ -477,6 +482,7 @@
.per_ce_irq = false,
.shadow_reg_support = false,
.rri_on_ddr = false,
+   .tx_sk_pacing_shift = SK_PACING_SHIFT_9377,
},
{
.id = QCA4019_HW_1_0_DEV_VERSION,
diff --git a/drivers/net/wireless/ath/ath10k/hw.h 
b/drivers/net/wireless/ath/ath10k/hw.h
index a274bd8..1f956d6 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -161,6 +161,9 @@ enum qca9377_chip_id_rev {
 
 #define REG_DUMP_COUNT_QCA988X 60
 
+#define SK_PACING_SHIFT_6174 6
+#define SK_PACING_SHIFT_9377 6
+
 struct ath10k_fw_ie {
__le32 id;
__le32 len;
@@ -589,6 +592,9 @@ struct ath10k_hw_params {
 
/* Number of bytes to be the offset for each FFT sample */
int spectral_bin_offset;
+
+   /* Number of shift to override the default value of ieee80211_hw*/
+   u8 tx_sk_pacing_shift;
 };
 
 struct htt_rx_desc;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 95243b4..4f2c07f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -8361,6 +8361,9 @@ int ath10k_mac_register(struct ath10k *ar)
ar->hw->wiphy->max_scan_ssids = WLAN_SCAN_PARAMS_MAX_SSID;
ar->hw->wiphy->max_scan_ie_len = WLAN_SCAN_PARAMS_MAX_IE_LEN;
 
+   if (ar->hw_params.tx_sk_pacing_shift != 0)
+   ar->hw->tx_sk_pacing_shift = ar->hw_params.tx_sk_pacing_shift;
+
ar->hw->vif_data_size = sizeof(struct ath10k_vif);
ar->hw->sta_data_size = sizeof(struct ath10k_sta);
ar->hw->txq_data_size = sizeof(struct ath10k_txq);
-- 
1.9.1