Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-12-05 Thread Ben Greear

On 12/05/2016 12:50 AM, Michal Kazior wrote:

On 2 December 2016 at 01:24, Ben Greear  wrote:

On 12/01/2016 02:52 PM, Ben Greear wrote:


On 08/19/2016 06:34 AM, Ben Greear wrote:




On 08/18/2016 11:59 PM, Michal Kazior wrote:


On 19 August 2016 at 03:26,   wrote:


From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.



Michal, thanks for the help on IRC.  I added this logic:

static void ieee80211_drv_tx(struct ieee80211_local *local,
 struct ieee80211_vif *vif,
 struct ieee80211_sta *pubsta,
 struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
struct ieee80211_tx_control control = {
.sta = pubsta,
};
struct ieee80211_txq *txq = NULL;
struct txq_info *txqi;
u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
goto tx_normal;

if (!ieee80211_is_data(hdr->frame_control))
goto tx_normal;

if (unlikely(!ieee80211_sdata_running(sdata))) {
WARN_ON_ONCE(1);
goto delete_and_return;
}

...

if (atomic_read(>txqs_len[ac]) >=
(local->hw.txq_ac_max_pending * 2)) {
/* Must be that something is not paying attention to
 * max-pending, like pktgen, so just drop this frame.
 */
delete_and_return:
ieee80211_free_txskb(>hw, skb);
return;
}


But, I still see the txq entries on the ar->txqs list in the
ath10k_mac_txq_init
after firmware crash in my test case.  Is this the test you were suggesting?


Yes.

Now that I think about it mac80211 doesn't call anything in driver
during hw_restart that would unref txqs. This means you'll have them
still linked when add_interface/sta_state is called, no?

This means that either:
 (a) txq (re-)init should be smarter in ath10k
 (b) txqs should be purged during hw_restart in ath10k


I posted a patch that does (a) last Friday:

"ath10k:  work-around for stale txq in ar->txqs"

I realized it will not apply upstream because it is also patching the previous
work-around I had in the patch that originated this email thread.

With these patches and the iterate hack to mac80211, then I no longer
see crashes in my test case that previously crashed very readily.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-12-05 Thread Michal Kazior
On 2 December 2016 at 01:24, Ben Greear  wrote:
> On 12/01/2016 02:52 PM, Ben Greear wrote:
>>
>> On 08/19/2016 06:34 AM, Ben Greear wrote:
>>>
>>>
>>>
>>> On 08/18/2016 11:59 PM, Michal Kazior wrote:

 On 19 August 2016 at 03:26,   wrote:
>
> From: Ben Greear 
>
> I was seeing kernel crashes due to accessing freed memory
> while debugging a 9984 firmware that was crashing often.
>
> This patch fixes the crashes.  I am not certain if there
> is a better way or not.
>
>
> Michal, thanks for the help on IRC.  I added this logic:
>
> static void ieee80211_drv_tx(struct ieee80211_local *local,
>  struct ieee80211_vif *vif,
>  struct ieee80211_sta *pubsta,
>  struct sk_buff *skb)
> {
> struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
> struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
> struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
> struct ieee80211_tx_control control = {
> .sta = pubsta,
> };
> struct ieee80211_txq *txq = NULL;
> struct txq_info *txqi;
> u8 ac;
>
> if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
> (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
> goto tx_normal;
>
> if (!ieee80211_is_data(hdr->frame_control))
> goto tx_normal;
>
> if (unlikely(!ieee80211_sdata_running(sdata))) {
> WARN_ON_ONCE(1);
> goto delete_and_return;
> }
>
> ...
>
> if (atomic_read(>txqs_len[ac]) >=
> (local->hw.txq_ac_max_pending * 2)) {
> /* Must be that something is not paying attention to
>  * max-pending, like pktgen, so just drop this frame.
>  */
> delete_and_return:
> ieee80211_free_txskb(>hw, skb);
> return;
> }
>
>
> But, I still see the txq entries on the ar->txqs list in the
> ath10k_mac_txq_init
> after firmware crash in my test case.  Is this the test you were suggesting?

Yes.

Now that I think about it mac80211 doesn't call anything in driver
during hw_restart that would unref txqs. This means you'll have them
still linked when add_interface/sta_state is called, no?

This means that either:
 (a) txq (re-)init should be smarter in ath10k
 (b) txqs should be purged during hw_restart in ath10k


Michał

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-12-01 Thread Ben Greear

On 12/01/2016 02:52 PM, Ben Greear wrote:

On 08/19/2016 06:34 AM, Ben Greear wrote:



On 08/18/2016 11:59 PM, Michal Kazior wrote:

On 19 August 2016 at 03:26,   wrote:

From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.


Michal, thanks for the help on IRC.  I added this logic:

static void ieee80211_drv_tx(struct ieee80211_local *local,
 struct ieee80211_vif *vif,
 struct ieee80211_sta *pubsta,
 struct sk_buff *skb)
{
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif);
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb);
struct ieee80211_tx_control control = {
.sta = pubsta,
};
struct ieee80211_txq *txq = NULL;
struct txq_info *txqi;
u8 ac;

if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
(info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
goto tx_normal;

if (!ieee80211_is_data(hdr->frame_control))
goto tx_normal;

if (unlikely(!ieee80211_sdata_running(sdata))) {
WARN_ON_ONCE(1);
goto delete_and_return;
}

...

if (atomic_read(>txqs_len[ac]) >=
(local->hw.txq_ac_max_pending * 2)) {
/* Must be that something is not paying attention to
 * max-pending, like pktgen, so just drop this frame.
 */
delete_and_return:
ieee80211_free_txskb(>hw, skb);
return;
}


But, I still see the txq entries on the ar->txqs list in the ath10k_mac_txq_init
after firmware crash in my test case.  Is this the test you were suggesting?


Thanks,
Ben




I did some more hacking on this today.  I think I found some better clue on 
this.

I added this code:

static void ath10k_mac_txq_init(struct ath10k *ar, struct ieee80211_txq *txq)
{
struct ath10k_txq *artxq = (void *)txq->drv_priv;
struct ath10k_txq *tmp, *walker;
struct ieee80211_txq *txq_tmp;
int i = 0;

if (!txq)
return;

spin_lock_bh(>txqs_lock);
ar->txqs_lock.rlock.dbg1 = 104;

/* Remove from ar->txqs in case it still exists there. */
list_for_each_entry_safe(walker, tmp, >txqs, list) {
txq_tmp = container_of((void *)walker, struct ieee80211_txq,
   drv_priv);
if ((++i % 1) == 0) {
ath10k_err(ar, "txq-init: Checking txq_tmp: %p i: %d\n", txq_tmp, 
i);
ath10k_err(ar, "txq-init: txqs: %p walker->list: %p w->next: %p  w->prev: 
%p ar->txqs: %p\n",
   >txqs, &(walker->list), walker->list.next, walker->list.prev, 
>txqs);
}

if (txq_tmp == txq) {
WARN_ON_ONCE(1);
ath10k_err(ar, "txq-init: Found txq when it should be deleted, txq_tmp: 
%p  txq: %p\n",
   txq_tmp, txq);
list_del(>list);
}
}
spin_unlock_bh(>txqs_lock);

INIT_LIST_HEAD(>list);
}


[firmware has just crashed]

Dec 01 14:43:06 wave2 kernel: [ cut here ]
Dec 01 14:43:06 wave2 kernel: WARNING: CPU: 0 PID: 193 at 
/home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/mac.c:4217
ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core]
Dec 01 14:43:06 wave2 kernel: Modules linked in: nf_conntrack_netlink 
nf_conntrack nfnetlink nf_defrag_ipv4 bridge 8021q garp mrp stp llc bnep 
bluetooth fuse
macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache coretemp hwmon intel_rapl 
x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support ath10k_pci 
ath10k_core joydev ath snd_hda_intel mac80211 snd_hda_codec snd_hda_core 
snd_hwdep
snd_seq snd_seq_device pcspkr cfg80211 snd_pcm snd_timer snd i2c_i801 lpc_ich 
shpchp soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
serio_raw
i915 i2c_algo_bit ata_generic drm_kms_helper pata_acpi e1000e ptp drm pps_core 
i2c_core fjes video ipv6 [last unloaded: nf_conntrack]
Dec 01 14:43:06 wave2 kernel: CPU: 0 PID: 193 Comm: kworker/0:1 Not tainted 
4.7.10+ #14
Dec 01 14:43:06 wave2 kernel: Hardware name: To be filled by O.E.M. To be 
filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
Dec 01 14:43:06 wave2 kernel: Workqueue: events_freezable 
ieee80211_restart_work [mac80211]
Dec 01 14:43:06 wave2 kernel:  a0e29507 8801d14f7920 
8169ed08 
Dec 01 14:43:06 wave2 kernel:   8801d14f7968 
811569bc 8801d14e4f00
Dec 01 14:43:06 wave2 kernel:  10790001 8800c43ec9a0 
0027 8800c43ec988
Dec 01 14:43:06 wave2 kernel: 

Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-12-01 Thread Ben Greear

On 08/19/2016 06:34 AM, Ben Greear wrote:



On 08/18/2016 11:59 PM, Michal Kazior wrote:

On 19 August 2016 at 03:26,   wrote:

From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.



I did some more hacking on this today.  I think I found some better clue on 
this.

I added this code:

static void ath10k_mac_txq_init(struct ath10k *ar, struct ieee80211_txq *txq)
{
struct ath10k_txq *artxq = (void *)txq->drv_priv;
struct ath10k_txq *tmp, *walker;
struct ieee80211_txq *txq_tmp;
int i = 0;

if (!txq)
return;

spin_lock_bh(>txqs_lock);
ar->txqs_lock.rlock.dbg1 = 104;

/* Remove from ar->txqs in case it still exists there. */
list_for_each_entry_safe(walker, tmp, >txqs, list) {
txq_tmp = container_of((void *)walker, struct ieee80211_txq,
   drv_priv);
if ((++i % 1) == 0) {
ath10k_err(ar, "txq-init: Checking txq_tmp: %p i: 
%d\n", txq_tmp, i);
ath10k_err(ar, "txq-init: txqs: %p walker->list: %p w->next: %p  
w->prev: %p ar->txqs: %p\n",
   >txqs, &(walker->list), walker->list.next, 
walker->list.prev, >txqs);
}

if (txq_tmp == txq) {
WARN_ON_ONCE(1);
ath10k_err(ar, "txq-init: Found txq when it should be 
deleted, txq_tmp: %p  txq: %p\n",
   txq_tmp, txq);
list_del(>list);
}
}
spin_unlock_bh(>txqs_lock);

INIT_LIST_HEAD(>list);
}


[firmware has just crashed]

Dec 01 14:43:06 wave2 kernel: [ cut here ]
Dec 01 14:43:06 wave2 kernel: WARNING: CPU: 0 PID: 193 at /home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/mac.c:4217 
ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core]
Dec 01 14:43:06 wave2 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 bridge 8021q garp mrp stp llc bnep bluetooth fuse 
macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_realtek 
snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support ath10k_pci ath10k_core joydev ath snd_hda_intel mac80211 snd_hda_codec snd_hda_core snd_hwdep 
snd_seq snd_seq_device pcspkr cfg80211 snd_pcm snd_timer snd i2c_i801 lpc_ich shpchp soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc serio_raw 
i915 i2c_algo_bit ata_generic drm_kms_helper pata_acpi e1000e ptp drm pps_core i2c_core fjes video ipv6 [last unloaded: nf_conntrack]

Dec 01 14:43:06 wave2 kernel: CPU: 0 PID: 193 Comm: kworker/0:1 Not tainted 
4.7.10+ #14
Dec 01 14:43:06 wave2 kernel: Hardware name: To be filled by O.E.M. To be 
filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
Dec 01 14:43:06 wave2 kernel: Workqueue: events_freezable 
ieee80211_restart_work [mac80211]
Dec 01 14:43:06 wave2 kernel:  a0e29507 8801d14f7920 
8169ed08 
Dec 01 14:43:06 wave2 kernel:   8801d14f7968 
811569bc 8801d14e4f00
Dec 01 14:43:06 wave2 kernel:  10790001 8800c43ec9a0 
0027 8800c43ec988
Dec 01 14:43:06 wave2 kernel: Call Trace:
Dec 01 14:43:06 wave2 kernel:  [] ? 
ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core]
Dec 01 14:43:06 wave2 kernel:  [] dump_stack+0x85/0xcd
Dec 01 14:43:06 wave2 kernel:  [] __warn+0x10c/0x130
Dec 01 14:43:06 wave2 kernel:  [] warn_slowpath_null+0x18/0x20
Dec 01 14:43:06 wave2 kernel:  [] 
ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core]
Dec 01 14:43:06 wave2 kernel:  [] 
ath10k_sta_state+0x4ef/0x1350 [ath10k_core]
Dec 01 14:43:06 wave2 kernel:  [] ? mark_lock+0x6d/0x8a0
Dec 01 14:43:06 wave2 kernel:  [] ? 
ath10k_station_assoc+0x920/0x920 [ath10k_core]
Dec 01 14:43:06 wave2 kernel:  [] ? __lock_is_held+0x84/0xc0
Dec 01 14:43:06 wave2 kernel:  [] drv_sta_state+0xef/0xc50 
[mac80211]
Dec 01 14:43:06 wave2 kernel:  [] 
ieee80211_reconfig+0x10a0/0x2890 [mac80211]
Dec 01 14:43:06 wave2 kernel:  [] 
ieee80211_restart_work+0xb1/0xf0 [mac80211]
Dec 01 14:43:06 wave2 kernel:  [] process_one_work+0x42d/0xac0
Dec 01 14:43:06 wave2 kernel:  [] ? 
process_one_work+0x374/0xac0
Dec 01 14:43:06 wave2 kernel:  [] ? 
pwq_dec_nr_in_flight+0x110/0x110
Dec 01 14:43:06 wave2 kernel:  [] worker_thread+0x86/0x730
Dec 01 14:43:06 wave2 kernel:  [] ? 
_raw_spin_unlock_irqrestore+0x5a/0x70
Dec 01 14:43:06 wave2 kernel:  [] ? 
process_one_work+0xac0/0xac0
Dec 01 14:43:06 wave2 kernel:  [] kthread+0x191/0x1b0
Dec 01 14:43:06 wave2 kernel:  [] ? 
kthread_create_on_node+0x320/0x320
Dec 01 14:43:06 wave2 kernel:  [] ? 

Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-09-09 Thread Ben Greear

On 09/09/2016 10:25 AM, Felix Fietkau wrote:

On 2016-08-19 03:26, gree...@candelatech.com wrote:

From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.

Signed-off-by: Ben Greear 
---
 drivers/net/wireless/ath/ath10k/mac.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 5659ef1..916119c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
*txq)
 static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
 {
struct ath10k_txq *artxq = (void *)txq->drv_priv;
+   struct ath10k_txq *tmp, *walker;
struct ath10k_skb_cb *cb;
struct sk_buff *msdu;
+   struct ieee80211_txq *txq_tmp;
int msdu_id;

if (!txq)
@@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
struct ieee80211_txq *txq)
spin_lock_bh(>txqs_lock);
if (!list_empty(>list))
list_del_init(>list);
+
+   /* Remove from ar->txqs in case it still exists there. */
+   list_for_each_entry_safe(walker, tmp, >txqs, list) {
+   txq_tmp = container_of((void *)walker, struct ieee80211_txq,
+  drv_priv);
+   if (txq_tmp == txq)
+   list_del(>list);
+   }

This makes no sense at all. From txq_tmp == txq we can deduce that
walker == artxq. In the context above, it already does a
list_del_init(>list).


This fixed my problem, so something about this matters.

Possibly it works around some other race, just possibly it
is because of some other regression/bug in my driver/kernel.

I thought maybe the issue was that flushing doesn't really work
for ath10k, so when the upper stack tried to flush and delete
a station there were still skbs in the driver that were referencing
the txqs up in mac80211.

Also, this bug was triggered by firmware that crashed very often on
transmit of an skb, so in general there were skbs that were not properly
transmitted and maybe that also triggers some other bug/race in the
driver.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-09-09 Thread Felix Fietkau
On 2016-08-19 03:26, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> I was seeing kernel crashes due to accessing freed memory
> while debugging a 9984 firmware that was crashing often.
> 
> This patch fixes the crashes.  I am not certain if there
> is a better way or not.
> 
> Signed-off-by: Ben Greear 
> ---
>  drivers/net/wireless/ath/ath10k/mac.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index 5659ef1..916119c 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
> *txq)
>  static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq 
> *txq)
>  {
>   struct ath10k_txq *artxq = (void *)txq->drv_priv;
> + struct ath10k_txq *tmp, *walker;
>   struct ath10k_skb_cb *cb;
>   struct sk_buff *msdu;
> + struct ieee80211_txq *txq_tmp;
>   int msdu_id;
>  
>   if (!txq)
> @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
> struct ieee80211_txq *txq)
>   spin_lock_bh(>txqs_lock);
>   if (!list_empty(>list))
>   list_del_init(>list);
> +
> + /* Remove from ar->txqs in case it still exists there. */
> + list_for_each_entry_safe(walker, tmp, >txqs, list) {
> + txq_tmp = container_of((void *)walker, struct ieee80211_txq,
> +drv_priv);
> + if (txq_tmp == txq)
> + list_del(>list);
> + }
This makes no sense at all. From txq_tmp == txq we can deduce that
walker == artxq. In the context above, it already does a
list_del_init(>list).

- Felix

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-08-19 Thread Ben Greear



On 08/18/2016 11:59 PM, Michal Kazior wrote:

On 19 August 2016 at 03:26,   wrote:

From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.

Signed-off-by: Ben Greear 
---
  drivers/net/wireless/ath/ath10k/mac.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 5659ef1..916119c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
*txq)
  static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
  {
 struct ath10k_txq *artxq = (void *)txq->drv_priv;
+   struct ath10k_txq *tmp, *walker;
 struct ath10k_skb_cb *cb;
 struct sk_buff *msdu;
+   struct ieee80211_txq *txq_tmp;
 int msdu_id;

 if (!txq)
@@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
struct ieee80211_txq *txq)
 spin_lock_bh(>txqs_lock);
 if (!list_empty(>list))
 list_del_init(>list);
+
+   /* Remove from ar->txqs in case it still exists there. */
+   list_for_each_entry_safe(walker, tmp, >txqs, list) {
+   txq_tmp = container_of((void *)walker, struct ieee80211_txq,
+  drv_priv);
+   if (txq_tmp == txq)
+   list_del(>list);
+   }


How could this even happen? All artxq->list accesses (add/del) are
protected by txqs_lock so this shouldn't happen, no?

Do you perhaps have the logic around txqs reworked in your tree?


I don't have any significant changes as far as I can tell.

I can build you a buggy 9984 firmware to reproduce the problem if you want...

Maybe the upstream patch could WARN_ON in this case to see if anyone else
ever hits it?

I did see a comment in the mac80211 about some assumptions on the driver with
regard to station teardown...I am not 100% sure ath10k meets that assumption,
so maybe that is why I could see this problem.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-08-19 Thread Michal Kazior
On 19 August 2016 at 03:26,   wrote:
> From: Ben Greear 
>
> I was seeing kernel crashes due to accessing freed memory
> while debugging a 9984 firmware that was crashing often.
>
> This patch fixes the crashes.  I am not certain if there
> is a better way or not.
>
> Signed-off-by: Ben Greear 
> ---
>  drivers/net/wireless/ath/ath10k/mac.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
> b/drivers/net/wireless/ath/ath10k/mac.c
> index 5659ef1..916119c 100644
> --- a/drivers/net/wireless/ath/ath10k/mac.c
> +++ b/drivers/net/wireless/ath/ath10k/mac.c
> @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
> *txq)
>  static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq 
> *txq)
>  {
> struct ath10k_txq *artxq = (void *)txq->drv_priv;
> +   struct ath10k_txq *tmp, *walker;
> struct ath10k_skb_cb *cb;
> struct sk_buff *msdu;
> +   struct ieee80211_txq *txq_tmp;
> int msdu_id;
>
> if (!txq)
> @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
> struct ieee80211_txq *txq)
> spin_lock_bh(>txqs_lock);
> if (!list_empty(>list))
> list_del_init(>list);
> +
> +   /* Remove from ar->txqs in case it still exists there. */
> +   list_for_each_entry_safe(walker, tmp, >txqs, list) {
> +   txq_tmp = container_of((void *)walker, struct ieee80211_txq,
> +  drv_priv);
> +   if (txq_tmp == txq)
> +   list_del(>list);
> +   }

How could this even happen? All artxq->list accesses (add/del) are
protected by txqs_lock so this shouldn't happen, no?

Do you perhaps have the logic around txqs reworked in your tree?


Michał

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.

2016-08-18 Thread greearb
From: Ben Greear 

I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.

This patch fixes the crashes.  I am not certain if there
is a better way or not.

Signed-off-by: Ben Greear 
---
 drivers/net/wireless/ath/ath10k/mac.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c 
b/drivers/net/wireless/ath/ath10k/mac.c
index 5659ef1..916119c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq 
*txq)
 static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
 {
struct ath10k_txq *artxq = (void *)txq->drv_priv;
+   struct ath10k_txq *tmp, *walker;
struct ath10k_skb_cb *cb;
struct sk_buff *msdu;
+   struct ieee80211_txq *txq_tmp;
int msdu_id;
 
if (!txq)
@@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, 
struct ieee80211_txq *txq)
spin_lock_bh(>txqs_lock);
if (!list_empty(>list))
list_del_init(>list);
+
+   /* Remove from ar->txqs in case it still exists there. */
+   list_for_each_entry_safe(walker, tmp, >txqs, list) {
+   txq_tmp = container_of((void *)walker, struct ieee80211_txq,
+  drv_priv);
+   if (txq_tmp == txq)
+   list_del(>list);
+   }
spin_unlock_bh(>txqs_lock);
 
spin_lock_bh(>htt.tx_lock);
-- 
2.4.11


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k