Re: [PATCH] ath10k: support PCIe enter L1 state

2018-11-15 Thread Michał Kazior
On Fri, 16 Nov 2018 at 08:00, Kalle Valo  wrote:
>
> Brian Norris  writes:
> > On Thu, Nov 15, 2018 at 06:38:25AM +, Wen Gong wrote:
> >> >
> >> > Is there some reason L1 was disabled in the first place? Was it known to 
> >> > be
> >> > unreliable?
> >>
> >> Hi Brian,
> >> It is a BUG for power, and it is not considered this BUG before.
> >> So this change will fix the bug.
> >
> > I understand that the existing behavior is suboptimal for power, but on
> > the other hand, code that goes out of its way to *clear* the L1 flag
> > doesn't just pop up out of nowhere. Somebody clearly wrote that! If it
> > just meant "we didn't verify L1 at first", then maybe it's fine to
> > enable it unconditionally and see what happens, but if it meant "we
> > tried L1 on some old chip  and it caused problems", then it would be
> > nice to know what those problems were.
> >
> > Or maybe that is hard to figure out, given there's no public git history
> > tracking the original code, and we just need to try it out.
>
> Yeah, it seems L1 was disabled already on the first ath10k commit:
>
> 5e3dd157d7e70 (Kalle Valo2013-06-12 20:52:10 +0300 2404)  
>   pcie_config_flags &= ~PCIE_CONFIG_FLAG_ENABLE_L1;
>
> I don't remember anymore why but my guess is that the proprietary driver
> at the time didn't support it with QCA998X. Or maybe QCA988X doesn't
> even support L1? Michal, do you remember?

Proprietary driver has it ifdef-ed to enable/disable.

Older driver enabled it only for some station-only target/product so
by default QCA988X would build with L1 flag masked out. It made sense
to be conservative and change as little as possible to avoid random
bugs and breakage - so the logic was inherited minus the build-time
ifdef.

However 10.4 driver seems to enable it unconditionally. I'm not sure
if this depends on target firmware in any way or if some other host
driver or bus settings need to be pre-set before L1 can be expected to
work reliably.

I guess there's no way other than testing it out.


Michał

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: support PCIe enter L1 state

2018-11-15 Thread Kalle Valo
Brian Norris  writes:
> On Thu, Nov 15, 2018 at 06:38:25AM +, Wen Gong wrote:
>> > 
>> > Is there some reason L1 was disabled in the first place? Was it known to be
>> > unreliable?
>>
>> Hi Brian,
>> It is a BUG for power, and it is not considered this BUG before.
>> So this change will fix the bug.
>
> I understand that the existing behavior is suboptimal for power, but on
> the other hand, code that goes out of its way to *clear* the L1 flag
> doesn't just pop up out of nowhere. Somebody clearly wrote that! If it
> just meant "we didn't verify L1 at first", then maybe it's fine to
> enable it unconditionally and see what happens, but if it meant "we
> tried L1 on some old chip  and it caused problems", then it would be
> nice to know what those problems were.
>
> Or maybe that is hard to figure out, given there's no public git history
> tracking the original code, and we just need to try it out.

Yeah, it seems L1 was disabled already on the first ath10k commit:

5e3dd157d7e70 (Kalle Valo2013-06-12 20:52:10 +0300 2404)
pcie_config_flags &= ~PCIE_CONFIG_FLAG_ENABLE_L1;

I don't remember anymore why but my guess is that the proprietary driver
at the time didn't support it with QCA998X. Or maybe QCA988X doesn't
even support L1? Michal, do you remember?

Related to QCA988X supporting L1 state also the commit log is misleading
as it only talks QCA6174/QCA9377, and has been only tested on QCA6174,
but the actual change enables L1 on _all_ PCI and AHB devices. So this
patch needs a lot more testing so that we have confidence that no
existing setups break.

-- 
Kalle Valo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: support PCIe enter L1 state

2018-11-15 Thread Brian Norris
Hi,

On Thu, Nov 15, 2018 at 06:38:25AM +, Wen Gong wrote:
> > -Original Message-
> > From: ath10k  On Behalf Of Brian Norris
> > 
> > Is there some reason L1 was disabled in the first place? Was it known to be
> > unreliable?
>
> Hi Brian,
> It is a BUG for power, and it is not considered this BUG before.
> So this change will fix the bug.

I understand that the existing behavior is suboptimal for power, but on
the other hand, code that goes out of its way to *clear* the L1 flag
doesn't just pop up out of nowhere. Somebody clearly wrote that! If it
just meant "we didn't verify L1 at first", then maybe it's fine to
enable it unconditionally and see what happens, but if it meant "we
tried L1 on some old chip  and it caused problems", then it would be
nice to know what those problems were.

Or maybe that is hard to figure out, given there's no public git history
tracking the original code, and we just need to try it out.

Anyway, I'm giving it a try here, but I just wanted to ask :)

Thanks,
Brian

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Toke Høiland-Jørgensen
Felix Fietkau  writes:

> On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:
>>> This part doesn't really make much sense to me, but maybe I'm
>>> misunderstanding how the code works.
>>> Let's assume we have a driver like ath9k or mt76, which tries to keep a
>>> number of aggregates in the hardware queue, and the hardware queue is
>>> currently empty.
>>> If the current txq entry is kept at the head of the schedule list,
>>> wouldn't the code just pull from that one over and over again, until
>>> enough packets are transmitted by the hardware and their tx status
>>> processed?
>>> It seems to me that while fairness is still preserved in the long run,
>>> this could lead to rather bursty scheduling, which may not be
>>> particularly latency friendly.
>> 
>> Yes, it'll be a bit more bursty when the hardware queue is completely
>> empty. However, when a TX completion comes back, that will adjust the
>> deficit of that sta and cause it to be rotated on the next dequeue. This
>> obviously relies on the fact that the lower-level hardware queue is
>> sufficiently shallow to not add a lot of latency. But we want that to be
>> the case anyway. In practice, it works quite well for ath9k, but not so
>> well for ath10k because it has a large buffer in firmware.
>> 
>> If we requeue the TXQ at the end of the list, a station that is taking
>> up too much airtime will fail to be throttled properly, so the
>> queue-at-head is kinda needed to ensure fairness...
> Thanks for the explanation, that makes sense to me. I have an idea on
> how to mitigate the burstiness within the driver. I'll write it down in
> pseudocode, please let me know if you think that'll work.

I don't think it will, unfortunately. For example, consider the case
where there are two stations queued; one with a large negative deficit
(say, -10ms), and one with a positive deficit.

In this case, we really need to throttle the station with a negative
deficit. But if the driver loops and caches txqs, we'll get something
like the following:

- First driver loop iteration: returns TXQ with positive deficit.
- Second driver loop iteration: Only the negative-deficit TXQ is in the
  mac80211 list, so it will loop until that TXQ's deficit turns positive
  and return it.

Because of this, the negative-deficit station won't be throttled, and we
won't get fairness.

How many frames will mt76 queue up below the driver point? I.e., how
much burstiness are you expecting this will introduce on that driver?

Taking a step back, it's clear that it would be good to be able to
dequeue packets to multiple STAs at once (we need that for MU-MIMO on
ath10k as well). However, I don't think we can do that with the
round-robin fairness scheduler; so we are going to need a different
algorithm. I *think* it may be possible to do this with a virtual-time
scheduler, but I haven't sat down and worked out the details yet...

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Toke Høiland-Jørgensen
Louie Lu  writes:

> Hi Rajkumar, Toke,
>
> I found the series (v3,4/6) remove the debugfs remove reset station's
> airtime method, and didn't added at here.
>
> Not sure how to help this kind of situation, do I need a separate
> patch to fix this, or posting the patch here is fine?

This is fine; we can fold it into the next version. Thanks :)

-Toke

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] ath10k: report tx airtime provided by fw

2018-11-15 Thread Manikanta Pubbisetty
If supported, update transmit airtime in mac80211 with the airtime
values reported by the firmware. TX airtime of the PPDU is reported
via HTT data TX completion indication message.

A new service flag 'WMI_SERVICE_REPORT_AIRTIME' is added to advertise
the firmware support. For firmwares which do not support this feature,
TX airtime is calculated in the driver using TX bitrate.

Hardwares tested : QCA9984
Firmwares tested : 10.4-3.6.1-00841

Signed-off-by: Manikanta Pubbisetty 
---
* Changes are made on top of Rajkumar's changes for ATF

 drivers/net/wireless/ath/ath10k/core.c   |  3 ++
 drivers/net/wireless/ath/ath10k/htt.h| 21 -
 drivers/net/wireless/ath/ath10k/htt_rx.c | 52 ++--
 drivers/net/wireless/ath/ath10k/mac.c|  6 +++-
 drivers/net/wireless/ath/ath10k/txrx.c   |  2 +-
 drivers/net/wireless/ath/ath10k/wmi.h| 42 ++
 6 files changed, 121 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c 
b/drivers/net/wireless/ath/ath10k/core.c
index adf2b13..ce2338b 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -2599,6 +2599,9 @@ int ath10k_core_start(struct ath10k *ar, enum 
ath10k_firmware_mode mode,
 ar->wmi.svc_map))
val |= WMI_10_4_TX_DATA_ACK_RSSI;
 
+   if (test_bit(WMI_SERVICE_REPORT_AIRTIME, ar->wmi.svc_map))
+   val |= WMI_10_4_REPORT_AIRTIME;
+
status = ath10k_mac_ext_resource_config(ar, val);
if (status) {
ath10k_err(ar,
diff --git a/drivers/net/wireless/ath/ath10k/htt.h 
b/drivers/net/wireless/ath/ath10k/htt.h
index a76f7c9..e2d40ab 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -564,6 +564,7 @@ struct htt_mgmt_tx_completion {
 #define HTT_RX_INDICATION_INFO0_EXT_TID_LSB   (0)
 #define HTT_RX_INDICATION_INFO0_FLUSH_VALID   (1 << 5)
 #define HTT_RX_INDICATION_INFO0_RELEASE_VALID (1 << 6)
+#define HTT_RX_INDICATION_INFO0_PPDU_DURATION BIT(7)
 
 #define HTT_RX_INDICATION_INFO1_FLUSH_START_SEQNO_MASK   0x003F
 #define HTT_RX_INDICATION_INFO1_FLUSH_START_SEQNO_LSB0
@@ -576,7 +577,10 @@ struct htt_mgmt_tx_completion {
 #define HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES_MASK 0xFF00
 #define HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES_LSB  24
 
-#define HTT_TX_CMPL_FLAG_DATA_RSSI BIT(0)
+#define HTT_TX_CMPL_FLAG_DATA_RSSI BIT(0)
+#define HTT_TX_CMPL_FLAG_PPID_PRESENT  BIT(1)
+#define HTT_TX_CMPL_FLAG_PA_PRESENTBIT(2)
+#define HTT_TX_CMPL_FLAG_PPDU_DURATION_PRESENT BIT(3)
 
 struct htt_rx_indication_hdr {
u8 info0; /* %HTT_RX_INDICATION_INFO0_ */
@@ -866,6 +870,21 @@ struct htt_data_tx_completion {
__le16 msdus[0]; /* variable length based on %num_msdus */
 } __packed;
 
+#define HTT_TX_PPDU_DUR_INFO0_PEER_ID_MASK GENMASK(15, 0)
+#define HTT_TX_PPDU_DUR_INFO0_TID_MASK GENMASK(20, 16)
+
+struct htt_data_tx_ppdu_dur {
+   __le32 info0; /* HTT_TX_PPDU_DUR_INFO0_ */
+   __le32 tx_duration; /* in usecs */
+} __packed;
+
+#define HTT_TX_COMPL_PPDU_DUR_INFO0_NUM_ENTRIES_MASK   GENMASK(7, 0)
+
+struct htt_data_tx_compl_ppdu_dur {
+   __le32 info0; /* HTT_TX_COMPL_PPDU_DUR_INFO0_ */
+   struct htt_data_tx_ppdu_dur ppdu_dur[0];
+} __packed;
+
 struct htt_tx_compl_ind_base {
u32 hdr;
u16 payload[1/*or more*/];
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 0f775be..3b55f8a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2003,8 +2003,12 @@ static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
int status = MS(resp->data_tx_completion.flags, HTT_DATA_TX_STATUS);
__le16 msdu_id, *msdus;
bool rssi_enabled = false;
-   u8 msdu_count = 0;
+   u8 msdu_count = 0, num_airtime_records, tid;
int i;
+   struct htt_data_tx_compl_ppdu_dur *ppdu_info;
+   struct ath10k_peer *peer;
+   u16 ppdu_info_offset = 0, peer_id;
+   u32 tx_duration;
 
switch (status) {
case HTT_DATA_TX_STATUS_NO_ACK:
@@ -2028,12 +2032,12 @@ static void ath10k_htt_rx_tx_compl_ind(struct ath10k 
*ar,
   resp->data_tx_completion.num_msdus);
 
msdu_count = resp->data_tx_completion.num_msdus;
+   msdus = resp->data_tx_completion.msdus;
 
if (resp->data_tx_completion.flags2 & HTT_TX_CMPL_FLAG_DATA_RSSI)
rssi_enabled = true;
 
for (i = 0; i < msdu_count; i++) {
-   msdus = resp->data_tx_completion.msdus;
msdu_id = msdus[i];
tx_done.msdu_id = __le16_to_cpu(msdu_id);
 
@@ -2065,6 +2069,50 @@ static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
ath10k_txrx_tx_unref(htt, &tx_done);
}

Re: [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Felix Fietkau
On 2018-11-14 18:40, Toke Høiland-Jørgensen wrote:
>> This part doesn't really make much sense to me, but maybe I'm
>> misunderstanding how the code works.
>> Let's assume we have a driver like ath9k or mt76, which tries to keep a
>> number of aggregates in the hardware queue, and the hardware queue is
>> currently empty.
>> If the current txq entry is kept at the head of the schedule list,
>> wouldn't the code just pull from that one over and over again, until
>> enough packets are transmitted by the hardware and their tx status
>> processed?
>> It seems to me that while fairness is still preserved in the long run,
>> this could lead to rather bursty scheduling, which may not be
>> particularly latency friendly.
> 
> Yes, it'll be a bit more bursty when the hardware queue is completely
> empty. However, when a TX completion comes back, that will adjust the
> deficit of that sta and cause it to be rotated on the next dequeue. This
> obviously relies on the fact that the lower-level hardware queue is
> sufficiently shallow to not add a lot of latency. But we want that to be
> the case anyway. In practice, it works quite well for ath9k, but not so
> well for ath10k because it has a large buffer in firmware.
> 
> If we requeue the TXQ at the end of the list, a station that is taking
> up too much airtime will fail to be throttled properly, so the
> queue-at-head is kinda needed to ensure fairness...
Thanks for the explanation, that makes sense to me. I have an idea on
how to mitigate the burstiness within the driver. I'll write it down in
pseudocode, please let me know if you think that'll work.

do {
struct ieee80211_txq *pending_txq[4];
int n_pending_txq = 0;
int i;

if (hwq->pending < 4)
break;

nframes = 0;

ieee80211_txq_schedule_start(hw, ac)
do {
bool requeue = false;

struct ieee80211_txq *txq;

txq = ieee80211_next_txq(hw, ac);
if (!txq)
break;

nframes += schedule_txq(txq, &requeue);
if (requeue)
pending_txq[n_pending_txq++] = txq;

} while (n_pending_txq < ARRAY_SIZE(pending_txq));

for (i = n_pending_txq; i > 0; i--)
ieee80211_return_txq(hw, pending_txq[i - 1]);

ieee80211_txq_schedule_end(hw, ac)
} while (nframes);

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [Make-wifi-fast] [PATCH v3 3/6] mac80211: Add airtime accounting and scheduling to TXQs

2018-11-15 Thread Louie Lu
Hi Rajkumar, Toke,

I found the series (v3,4/6) remove the debugfs remove reset station's
airtime method, and didn't added at here.

Not sure how to help this kind of situation, do I need a separate
patch to fix this, or posting the patch here is fine?



From 3a4a856c397345311c9d7f3679828cadc40e6a80 Mon Sep 17 00:00:00 2001
From: Louie Lu 
Date: Thu, 15 Nov 2018 16:13:57 +0800
Subject: [PATCH] mac80211: Add reset for station's airtime

Let user can reset station airtime status by debugfs, it will
reset all airtime deficit to `sta->airtime_weight` and reset rx/tx
airtime accumulate to 0.

diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 446908ab3f5d..d84d2369a76e 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -233,7 +233,28 @@ static ssize_t sta_airtime_read(struct file
*file, char __user *userbuf,
 kfree(buf);
 return rv;
 }
-STA_OPS(airtime);
+
+/*
+ * FIXME: This *only* reset station airtime, didn't accept input
+ */
+static ssize_t sta_airtime_write(struct file *file, const char __user *userbuf,
+ size_t count, loff_t *ppos)
+{
+struct sta_info *sta = file->private_data;
+struct ieee80211_local *local = sta->sdata->local;
+int ac;
+
+for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+spin_lock_bh(&local->active_txq_lock[ac]);
+sta->airtime[ac].rx_airtime = 0;
+sta->airtime[ac].tx_airtime = 0;
+sta->airtime[ac].deficit = sta->airtime_weight;
+spin_unlock_bh(&local->active_txq_lock[ac]);
+}
+
+return count;
+}
+STA_OPS_RW(airtime);

 static ssize_t sta_agg_status_read(struct file *file, char __user *userbuf,
 size_t count, loff_t *ppos)
-- 
2.18.0


Rajkumar Manoharan  於 2018年11月13日 週二 上午6:52寫道:
>
> From: Toke Høiland-Jørgensen 
>
> This adds airtime accounting and scheduling to the mac80211 TXQ
> scheduler. A new callback, ieee80211_sta_register_airtime(), is added
> that drivers can call to report airtime usage for stations.
>
> When airtime information is present, mac80211 will schedule TXQs
> (through ieee80211_next_txq()) in a way that enforces airtime fairness
> between active stations. This scheduling works the same way as the ath9k
> in-driver airtime fairness scheduling. If no airtime usage is reported
> by the driver, the scheduler will default to round-robin scheduling.
>
> For drivers that don't control TXQ scheduling in software, a new API
> function, ieee80211_txq_may_transmit(), is added which the driver can use
> to check if the TXQ is eligible for transmission, or should be throttled to
> enforce fairness. Calls to this function must also be enclosed in
> ieee80211_txq_schedule_{start,end}() calls to ensure proper locking.
>
> The API ieee80211_txq_may_transmit() also ensures that TXQ list will be
> aligned aginst driver's own round-robin scheduler list. i.e it rotates
> the TXQ list till it makes the requested node becomes the first entry
> in TXQ list. Thus both the TXQ list and driver's list are in sync.
>
> Co-Developed-by: Rajkumar Manoharan 
> Signed-off-by: Toke Høiland-Jørgensen 
> Signed-off-by: Rajkumar Manoharan 
> ---
>  include/net/mac80211.h | 59 ++
>  net/mac80211/cfg.c |  3 ++
>  net/mac80211/debugfs.c |  3 ++
>  net/mac80211/debugfs_sta.c | 50 --
>  net/mac80211/ieee80211_i.h |  2 ++
>  net/mac80211/main.c|  4 +++
>  net/mac80211/sta_info.c| 44 +--
>  net/mac80211/sta_info.h| 13 +++
>  net/mac80211/status.c  |  6 
>  net/mac80211/tx.c  | 90 
> +++---
>  10 files changed, 264 insertions(+), 10 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 18b11c119b7e..c43d615ee9b1 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -2357,6 +2357,9 @@ enum ieee80211_hw_flags {
>   * @tx_sk_pacing_shift: Pacing shift to set on TCP sockets when frames from
>   * them are encountered. The default should typically not be changed,
>   * unless the driver has good reasons for needing more buffers.
> + *
> + * @weight_multipler: Driver specific airtime weight multiplier used while
> + * refilling deficit of each TXQ.
>   */
>  struct ieee80211_hw {
> struct ieee80211_conf conf;
> @@ -2393,6 +2396,7 @@ struct ieee80211_hw {
> const struct ieee80211_cipher_scheme *cipher_schemes;
> u8 max_nan_de_entries;
> u8 tx_sk_pacing_shift;
> +   u8 weight_multiplier;
>  };
>
>  static inline bool _ieee80211_hw_check(struct ieee80211_hw *hw,
> @@ -5393,6 +5397,34 @@ void ieee80211_sta_block_awake(struct ieee80211_hw *hw,
>  void ieee80211_send_eosp_nullfunc(struct ieee80211_sta *pubsta, int tid);
>
>  /**
> + * ieee80211_sta_register_airtime - register airtime usage for a sta/tid
> + *
> + * Register airtime usage for a given sta on a given tid. The driver can call