Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Luis R. Rodriguez
On Sat, Dec 04, 2010 at 09:18:50PM -0800, Ben Greear wrote:
 On 12/04/2010 06:41 PM, Felix Fietkau wrote:
  On 2010-12-03 9:14 AM, Ben Greear wrote:
  On 12/01/2010 03:22 PM, Ben Greear wrote:
  On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
  On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:
 
  BUG: unable to handle kernel NULL pointer dereference at 0040
  IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
  *pde = 
  Oops:  [#1] SMP DEBUG_PAGEALLOC
  last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
  Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
  auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common 
  ath9k_hw mi]
 
  Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
  PDSBM/PDSBM
  EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
  EIP is at ath_tx_start+0x461/0x5ef [ath9k]
 
  Please use
 
  gdb drivers/net/wireless/ath/ath9k/
  l *(ath_tx_start+0x461)
 
   Luis
 
  I managed to hit that ath_tx_start crash again, and this time there were 
  no obvious
  DMA or irq errors immediately preceding it.  So, it might be a real bug
  after all.  I'll add some extra checks to see if tid-ac is NULL.
 
  I've made some small progress on this general issue.
 
  First, I added all sorts of debugging to try to figure out ath_tx_start 
  crash.
  As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
  and probably something close to 0x0.  I've added yet more debugging, but 
  haven't
  hit the problem again.
 
  I also tried stopping DMA in a loop up to 5 times if it failed to stop
  previously in the loop.  This did not appear to help at all.
 
  I also managed to make both the ath_tx_start crash and the DMA errors very 
  hard to reproduce
  (I dare not say fixed, yet).
 
  It appears that this small patch (and possibly, the fact that I set 
  debugging to 0x600
  instead of 0x400) makes the problems go away.  This makes me wonder if a 
  root cause is
  something to do with repeatedly resetting the hardware too fast, as 
  setting channels rapidly
  would tend to do that, and channels are set on association by supplicant, 
  it appears.
  Please try this patch while leaving the unnecessary resets in place.
  I found that when ath_drain_all_txq finds tx dma not stopped, it will
  issue a reset at a point in time where it is both useless (since it's
  right before a reset anyway) and dangerous (since the rx dma engine
  isn't even disabled yet), so IMHO the right thing to do is to drop
  this extra reset.
 
  --- a/drivers/net/wireless/ath/ath9k/xmit.c
  +++ b/drivers/net/wireless/ath/ath9k/xmit.c
  @@ -1194,18 +1194,8 @@ void ath_drain_all_txq(struct ath_softc
  }
  }
 
  -   if (npend) {
  -   int r;
  -
  -   ath_print(common, ATH_DBG_FATAL,
  - Failed to stop TX DMA. Resetting hardware!\n);
  -
  -   r = ath9k_hw_reset(ah, sc-sc_ah-curchan, ah-caldata, false);
  -   if (r)
  -   ath_print(common, ATH_DBG_FATAL,
  - Unable to reset hardware; reset status %d\n,
  - r);
  -   }
  +   if (npend)
  +   ath_print(common, ATH_DBG_FATAL,  Failed to stop TX DMA!\n);
 
  for (i = 0; i  ATH9K_NUM_TX_QUEUES; i++) {
  if (ATH_TXQ_SETUP(sc, i))
 
 
 I applied this on top of all my patches, and on top of the 4 that Luis 
 recently
 posted.
 
 I'm trying this on a different system than normal..happens to be configured
 with 115 stations.  It was getting this fail-to-stop-RX warning even with my
 channel-change mitigation patch, so I left it in.  I can still test w/it 
 removed
 if you want.
 
 None of my interfaces are using WPA (or supplicant)..just un-encrypted
 association to an AP 3 feet away.
 
 The recent success I had on Friday was on a different system entirely,
 with only 84 STAs, and using wpa-supplicant with 30 or so stations
 using WPA and the other 55 on a different AP un-encrypted (still using
 wpa_supplicant for all of these).
 
 So, can't compare my previous reports directly with this one.
 
 I'm going to re-configure this one to have smaller numbers of
 stations and use wpa_supplicant..will see how that goes.
 
 Even with all these warnings in the logs..system is basically stable and
 a few interfaces are able to associate, at least for a short time.

 
 WARNING: at 
 /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:538
  ath_stoprecv+0xcd/0xd7 [ath9k]()
 Hardware name: 945GM
 Could not stop RX, we could be confusing the DMA engine when we start RX up
 Modules linked in: 8021q garp stp llc michael_mic macvlan pktgen iscsi_tcp 
 libiscsi_tcp libiscsi scsi_transport_iscsi nfs lockd fscache nfs_acl 
 auth_rpcgss 
 sunrpc p4_clockmod ipv6 uinput arc4 ecb ath9k mac80211 snd_intel8x0 
 snd_ac97_codec ath9k_common ac97_bus snd_seq snd_seq_device ath9k_hw ath 
 snd_pcm pcspkr 
 i2c_i801 serio_raw 

Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Ben Greear
On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote:

 Can you clarify the status of this issue. It remains unclear to me from
 your above description how things are going. As I read it some things
 look OK now but you still get a warning.

Ok, since you asked :)

I worked on this over the weekend and this morning.  I had all sorts of
issues until I realized that I had one STA with non-configured SSID.
It sometimes connected to one /a AP and the other STAs attempted to connect
to another /n (on entirely different band) AP.  I basically got zero stations 
associated for any length
of time due to constant channel switching.  No crashes, but lots of
warnings about DMA failing to stop.

Now..I've fixed this configuration issue (and adding steps to help prevent this 
mis-configuration
again).

With 16 properly configured non-encrypted stations, running with wpa-supplicant
with netlink driver  sharing scan results,  the interfaces quickly associate.

However, I do continue to see DMA warnings such as these (I had picked up my
portable phone, and it knocked all the interfaces offline ..here
they are coming back up after I hung up the phone).

Please note that I ported Felix's 2.6.37 patch he posted this morning
to wireless-testing and have applied it.

I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs
aren't spammed so heavily with the recv.c:531 DMA warning.

Dec  6 11:32:15 atom kernel: sta2: direct probe to 00:18:e7:cb:ad:6e timed out
Dec  6 11:32:15 atom kernel: sta14: direct probe to 00:18:e7:cb:ad:6e timed out
Dec  6 11:32:15 atom kernel: ieee80211 wiphy0: device now idle
Dec  6 11:32:15 atom kernel: ieee80211 wiphy0: device no longer idle - scanning
Dec  6 11:32:15 atom kernel: start_sw_scan: running-other-vifs: 0  
running-station-vifs: 16, associated-stations: 0 scanning all channels.
Dec  6 11:32:17 atom kernel: ieee80211 wiphy0: device now idle
Dec  6 11:32:22 atom kernel: ieee80211 wiphy0: device no longer idle - scanning
Dec  6 11:32:22 atom kernel: start_sw_scan: running-other-vifs: 0  
running-station-vifs: 16, associated-stations: 0 scanning all channels.
Dec  6 11:32:24 atom kernel: ieee80211 wiphy0: device now idle
Dec  6 11:32:29 atom kernel: ieee80211 wiphy0: device no longer idle - scanning
Dec  6 11:32:29 atom kernel: start_sw_scan: running-other-vifs: 0  
running-station-vifs: 16, associated-stations: 0 scanning all channels.
Dec  6 11:32:29 atom kernel: ath: DMA failed to stop in 10 ms AR_CR=0x0024 
AR_DIAG_SW=0x4220
Dec  6 11:32:29 atom kernel: [ cut here ]
Dec  6 11:32:29 atom kernel: WARNING: at 
/home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:531
 ath_stoprecv+0x90/0x9a [ath9)
Dec  6 11:32:29 atom kernel: Hardware name: 945GM
Dec  6 11:32:29 atom kernel: Could not stop RX, we could be confusing the DMA 
engine when we start RX up
Dec  6 11:32:29 atom kernel: Modules linked in: michael_mic ath9k mac80211 
ath9k_common ath9k_hw ath cfg80211 arc4 8021q garp stp llc macvlan pktgen isc]
Dec  6 11:32:29 atom kernel: Pid: 2732, comm: kworker/u:2 Tainted: GW   
2.6.37-rc4-wl+ #17
Dec  6 11:32:29 atom kernel: Call Trace:
Dec  6 11:32:29 atom kernel: [78436fbd] warn_slowpath_common+0x77/0x8c
Dec  6 11:32:29 atom kernel: [fb7a125e] ? ath_stoprecv+0x90/0x9a [ath9k]
Dec  6 11:32:29 atom kernel: [fb7a125e] ? ath_stoprecv+0x90/0x9a [ath9k]
Dec  6 11:32:29 atom kernel: [7843704e] warn_slowpath_fmt+0x2e/0x30
Dec  6 11:32:29 atom kernel: [fb7a125e] ath_stoprecv+0x90/0x9a [ath9k]
Dec  6 11:32:29 atom kernel: [fb7a0182] ath_set_channel+0x94/0x1f2 [ath9k]
Dec  6 11:32:29 atom kernel: [7845a405] ? mark_held_locks+0x47/0x5f
Dec  6 11:32:29 atom kernel: [7878e7cb] ? 
_raw_spin_unlock_irqrestore+0x3c/0x48
Dec  6 11:32:29 atom kernel: [fb7a067a] ath9k_config+0x39a/0x479 [ath9k]
Dec  6 11:32:29 atom kernel: [fb6c] ieee80211_hw_config+0x11b/0x125 
[mac80211]
Dec  6 11:32:29 atom kernel: [fb6cef1b] ieee80211_scan_work+0x29e/0x3f7 
[mac80211]
Dec  6 11:32:29 atom kernel: [78446f63] ? process_one_work+0x13e/0x2bf
Dec  6 11:32:29 atom kernel: [78446fd4] process_one_work+0x1af/0x2bf
Dec  6 11:32:29 atom kernel: [78446f63] ? process_one_work+0x13e/0x2bf
Dec  6 11:32:29 atom kernel: [fb6cec7d] ? ieee80211_scan_work+0x0/0x3f7 
[mac80211]
Dec  6 11:32:29 atom kernel: [78448722] worker_thread+0xf9/0x1bf
Dec  6 11:32:29 atom kernel: [78448629] ? worker_thread+0x0/0x1bf
Dec  6 11:32:29 atom kernel: [7844b252] kthread+0x62/0x67
Dec  6 11:32:29 atom kernel: [7844b1f0] ? kthread+0x0/0x67
Dec  6 11:32:29 atom kernel: [784036c6] kernel_thread_helper+0x6/0x1a
Dec  6 11:32:29 atom kernel: ---[ end trace 617a0f44fc30537b ]---
Dec  6 11:32:29 atom kernel: ath: DMA failed to stop in 10 ms AR_CR=0x0024 
AR_DIAG_SW=0x4220


On module unload, I sometimes see lots of more scary looking DMA warnings,
..but again, system seems stable aside from the noise
in the logs.  I will capture these and post them next time I get a clean
set of them 

Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Luis R. Rodriguez
On Mon, Dec 06, 2010 at 11:47:47AM -0800, Ben Greear wrote:
 On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote:
 
  Can you clarify the status of this issue. It remains unclear to me from
  your above description how things are going. As I read it some things
  look OK now but you still get a warning.
 
 Ok, since you asked :)
 
 I worked on this over the weekend and this morning.  I had all sorts of
 issues until I realized that I had one STA with non-configured SSID.
 It sometimes connected to one /a AP and the other STAs attempted to connect
 to another /n (on entirely different band) AP.  I basically got zero stations 
 associated for any length
 of time due to constant channel switching.  No crashes, but lots of
 warnings about DMA failing to stop.
 
 Now..I've fixed this configuration issue (and adding steps to help prevent 
 this mis-configuration
 again).
 
 With 16 properly configured non-encrypted stations, running with 
 wpa-supplicant
 with netlink driver  sharing scan results,  the interfaces quickly associate.
 
 However, I do continue to see DMA warnings such as these (I had picked up my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).
 
 Please note that I ported Felix's 2.6.37 patch he posted this morning
 to wireless-testing and have applied it.
 
 I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs
 aren't spammed so heavily with the recv.c:531 DMA warning.

You can send this change upstream as well.

  Luis
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Luis R. Rodriguez
On Mon, Dec 06, 2010 at 11:53:13AM -0800, Luis Rodriguez wrote:
 On Mon, Dec 06, 2010 at 11:47:47AM -0800, Ben Greear wrote:
  On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote:
  
   Can you clarify the status of this issue. It remains unclear to me from
   your above description how things are going. As I read it some things
   look OK now but you still get a warning.
  
  Ok, since you asked :)
  
  I worked on this over the weekend and this morning.  I had all sorts of
  issues until I realized that I had one STA with non-configured SSID.
  It sometimes connected to one /a AP and the other STAs attempted to connect
  to another /n (on entirely different band) AP.  I basically got zero 
  stations associated for any length
  of time due to constant channel switching.  No crashes, but lots of
  warnings about DMA failing to stop.
  
  Now..I've fixed this configuration issue (and adding steps to help prevent 
  this mis-configuration
  again).
  
  With 16 properly configured non-encrypted stations, running with 
  wpa-supplicant
  with netlink driver  sharing scan results,  the interfaces quickly 
  associate.
  
  However, I do continue to see DMA warnings such as these (I had picked up my
  portable phone, and it knocked all the interfaces offline ..here
  they are coming back up after I hung up the phone).
  
  Please note that I ported Felix's 2.6.37 patch he posted this morning
  to wireless-testing and have applied it.
  
  I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs
  aren't spammed so heavily with the recv.c:531 DMA warning.
 
 You can send this change upstream as well.

Also, feel free to limit the number of STAs you can have up
physically by setting this to a number you bless yourself.

  Luis
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Björn Smedman
On Mon, Dec 6, 2010 at 8:47 PM, Ben Greear gree...@candelatech.com wrote:
 With 16 properly configured non-encrypted stations, running with
 wpa-supplicant
 with netlink driver  sharing scan results,  the interfaces quickly
 associate.

 However, I do continue to see DMA warnings such as these (I had picked up my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).

Is there some theory as to why using multiple interfaces cause so many
problems with DMA?

/Björn
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Ben Greear
On 12/06/2010 12:11 PM, Björn Smedman wrote:
 On Mon, Dec 6, 2010 at 8:47 PM, Ben Greeargree...@candelatech.com  wrote:
 With 16 properly configured non-encrypted stations, running with
 wpa-supplicant
 with netlink driver  sharing scan results,  the interfaces quickly
 associate.

 However, I do continue to see DMA warnings such as these (I had picked up my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).

 Is there some theory as to why using multiple interfaces cause so many
 problems with DMA?

Seems pretty directly related to channel changes and/or resets, and exacerbated
by other interfaces sending data while another is scanning, for instance.

Other issues we've found in the past have been various races that you wouldn't
normally see with a single VIF.

Thanks,
Ben


 /Björn


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Ben Greear
On 12/06/2010 11:53 AM, Luis R. Rodriguez wrote:
 On Mon, Dec 06, 2010 at 11:53:13AM -0800, Luis Rodriguez wrote:
 On Mon, Dec 06, 2010 at 11:47:47AM -0800, Ben Greear wrote:
 On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote:

 Can you clarify the status of this issue. It remains unclear to me from
 your above description how things are going. As I read it some things
 look OK now but you still get a warning.

 Ok, since you asked :)

 I worked on this over the weekend and this morning.  I had all sorts of
 issues until I realized that I had one STA with non-configured SSID.
 It sometimes connected to one /a AP and the other STAs attempted to connect
 to another /n (on entirely different band) AP.  I basically got zero 
 stations associated for any length
 of time due to constant channel switching.  No crashes, but lots of
 warnings about DMA failing to stop.

 Now..I've fixed this configuration issue (and adding steps to help prevent 
 this mis-configuration
 again).

 With 16 properly configured non-encrypted stations, running with 
 wpa-supplicant
 with netlink driver  sharing scan results,  the interfaces quickly 
 associate.

 However, I do continue to see DMA warnings such as these (I had picked up my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).

 Please note that I ported Felix's 2.6.37 patch he posted this morning
 to wireless-testing and have applied it.

 I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs
 aren't spammed so heavily with the recv.c:531 DMA warning.

 You can send this change upstream as well.

 Also, feel free to limit the number of STAs you can have up
 physically by setting this to a number you bless yourself.

I have a feeling there is no hard limit..but if I do find one,
I'll cook up a patch.  Probably not many of us ever going to push
anywhere near what I'm trying, and folks like me can limit in
user-space if wanted...

I'll do up the warn-on-once patch shortly.

By the way, would you consider this channel-change suppression
patch, or something similar?


 drivers/net/wireless/ath/ath9k/main.c 
index f026a03..6c1c43b 100644
@@ -1605,6 +1605,16 @@ static int ath9k_config(struct ieee80211_hw *hw, u32 
changed)
else
sc-sc_flags = ~SC_OP_OFFCHANNEL;

+   /* If channels  HT are the same, then don't actually do 
anything.
+*/
+   if ((sc-sc_ah-curchan == sc-sc_ah-channels[pos]) 
+   (aphy-chan_is_ht == conf_is_ht(conf))) {
+   ath_print(common, ATH_DBG_CONFIG,
+ Skip Set channel: %d MHz, already there.\n,
+ curchan-center_freq);
+   goto skip_chan_change;
+   }
+
if (aphy-state == ATH_WIPHY_SCAN ||
aphy-state == ATH_WIPHY_ACTIVE)
ath9k_wiphy_pause_all_forced(sc, aphy);

Thanks,
Ben


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Felix Fietkau
On 2010-12-06 9:28 PM, Ben Greear wrote:
 On 12/06/2010 11:53 AM, Luis R. Rodriguez wrote:
 On Mon, Dec 06, 2010 at 11:53:13AM -0800, Luis Rodriguez wrote:
 On Mon, Dec 06, 2010 at 11:47:47AM -0800, Ben Greear wrote:
 On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote:

 Can you clarify the status of this issue. It remains unclear to me from
 your above description how things are going. As I read it some things
 look OK now but you still get a warning.

 Ok, since you asked :)

 I worked on this over the weekend and this morning.  I had all sorts of
 issues until I realized that I had one STA with non-configured SSID.
 It sometimes connected to one /a AP and the other STAs attempted to connect
 to another /n (on entirely different band) AP.  I basically got zero 
 stations associated for any length
 of time due to constant channel switching.  No crashes, but lots of
 warnings about DMA failing to stop.

 Now..I've fixed this configuration issue (and adding steps to help prevent 
 this mis-configuration
 again).

 With 16 properly configured non-encrypted stations, running with 
 wpa-supplicant
 with netlink driver  sharing scan results,  the interfaces quickly 
 associate.

 However, I do continue to see DMA warnings such as these (I had picked up 
 my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).

 Please note that I ported Felix's 2.6.37 patch he posted this morning
 to wireless-testing and have applied it.

 I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs
 aren't spammed so heavily with the recv.c:531 DMA warning.

 You can send this change upstream as well.

 Also, feel free to limit the number of STAs you can have up
 physically by setting this to a number you bless yourself.
 
 I have a feeling there is no hard limit..but if I do find one,
 I'll cook up a patch.  Probably not many of us ever going to push
 anywhere near what I'm trying, and folks like me can limit in
 user-space if wanted...
 
 I'll do up the warn-on-once patch shortly.
 
 By the way, would you consider this channel-change suppression
 patch, or something similar?
 
 
  drivers/net/wireless/ath/ath9k/main.c 
 
 index f026a03..6c1c43b 100644
 @@ -1605,6 +1605,16 @@ static int ath9k_config(struct ieee80211_hw *hw, u32 
 changed)
   else
   sc-sc_flags = ~SC_OP_OFFCHANNEL;
 
 + /* If channels  HT are the same, then don't actually do 
 anything.
 +  */
 + if ((sc-sc_ah-curchan == sc-sc_ah-channels[pos]) 
 + (aphy-chan_is_ht == conf_is_ht(conf))) {
 + ath_print(common, ATH_DBG_CONFIG,
 +   Skip Set channel: %d MHz, already there.\n,
 +   curchan-center_freq);
 + goto skip_chan_change;
 + }
 +
I think this needs to check the offchannel flag as well, at least in one
direction. Skipping on-channel - off-channel is fine, but the other way
around might break calibration

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Luis R. Rodriguez
On Mon, Dec 06, 2010 at 12:22:26PM -0800, Ben Greear wrote:
 On 12/06/2010 12:11 PM, Björn Smedman wrote:
  On Mon, Dec 6, 2010 at 8:47 PM, Ben Greeargree...@candelatech.com  wrote:
  With 16 properly configured non-encrypted stations, running with
  wpa-supplicant
  with netlink driver  sharing scan results,  the interfaces quickly
  associate.
 
  However, I do continue to see DMA warnings such as these (I had picked up 
  my
  portable phone, and it knocked all the interfaces offline ..here
  they are coming back up after I hung up the phone).
 
  Is there some theory as to why using multiple interfaces cause so many
  problems with DMA?
 
 Seems pretty directly related to channel changes and/or resets, and 
 exacerbated
 by other interfaces sending data while another is scanning, for instance.
 
 Other issues we've found in the past have been various races that you wouldn't
 normally see with a single VIF.

Right, there might be some other hot path we need to lock around over.
Not sure what it could be though we should be locking stopping RX
over resets already though. These should all be atomic, in fact
starting TX too IIRC, hence the name change of the lock to be
specific to the PCU together. There may be other PCU changes
we may need to contend against.

  Luis
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Ben Greear
On 12/06/2010 12:42 PM, Luis R. Rodriguez wrote:
 On Mon, Dec 06, 2010 at 12:22:26PM -0800, Ben Greear wrote:
 On 12/06/2010 12:11 PM, Björn Smedman wrote:
 On Mon, Dec 6, 2010 at 8:47 PM, Ben Greeargree...@candelatech.com   wrote:
 With 16 properly configured non-encrypted stations, running with
 wpa-supplicant
 with netlink driver   sharing scan results,  the interfaces quickly
 associate.

 However, I do continue to see DMA warnings such as these (I had picked up 
 my
 portable phone, and it knocked all the interfaces offline ..here
 they are coming back up after I hung up the phone).

 Is there some theory as to why using multiple interfaces cause so many
 problems with DMA?

 Seems pretty directly related to channel changes and/or resets, and 
 exacerbated
 by other interfaces sending data while another is scanning, for instance.

 Other issues we've found in the past have been various races that you 
 wouldn't
 normally see with a single VIF.

 Right, there might be some other hot path we need to lock around over.
 Not sure what it could be though we should be locking stopping RX
 over resets already though. These should all be atomic, in fact
 starting TX too IIRC, hence the name change of the lock to be
 specific to the PCU together. There may be other PCU changes
 we may need to contend against.

Maybe the hardware/firmware guys could give us some clues as to what
types of things can cause stopping RMA to fail?  Maybe that could
point us to what might be racing with the attempts to stop RMA?

Thanks,
Ben

-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-06 Thread Luis R. Rodriguez
On Mon, Dec 06, 2010 at 01:00:05PM -0800, Ben Greear wrote:
 On 12/06/2010 12:42 PM, Luis R. Rodriguez wrote:
  On Mon, Dec 06, 2010 at 12:22:26PM -0800, Ben Greear wrote:
  On 12/06/2010 12:11 PM, Björn Smedman wrote:
  On Mon, Dec 6, 2010 at 8:47 PM, Ben Greeargree...@candelatech.com   
  wrote:
  With 16 properly configured non-encrypted stations, running with
  wpa-supplicant
  with netlink driver   sharing scan results,  the interfaces quickly
  associate.
 
  However, I do continue to see DMA warnings such as these (I had picked 
  up my
  portable phone, and it knocked all the interfaces offline ..here
  they are coming back up after I hung up the phone).
 
  Is there some theory as to why using multiple interfaces cause so many
  problems with DMA?
 
  Seems pretty directly related to channel changes and/or resets, and 
  exacerbated
  by other interfaces sending data while another is scanning, for instance.
 
  Other issues we've found in the past have been various races that you 
  wouldn't
  normally see with a single VIF.
 
  Right, there might be some other hot path we need to lock around over.
  Not sure what it could be though we should be locking stopping RX
  over resets already though. These should all be atomic, in fact
  starting TX too IIRC, hence the name change of the lock to be
  specific to the PCU together. There may be other PCU changes
  we may need to contend against.
 
 Maybe the hardware/firmware guys could give us some clues as to what
 types of things can cause stopping RMA to fail?  Maybe that could
 point us to what might be racing with the attempts to stop RMA?

We have no firmware, but yeah understanding how the hardware
blocks would be key here. Good point.

 Luis
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-04 Thread Felix Fietkau
On 2010-12-03 9:14 AM, Ben Greear wrote:
 On 12/01/2010 03:22 PM, Ben Greear wrote:
 On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common 
 ath9k_hw mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

 Luis

 I managed to hit that ath_tx_start crash again, and this time there were no 
 obvious
 DMA or irq errors immediately preceding it.  So, it might be a real bug
 after all.  I'll add some extra checks to see if tid-ac is NULL.
 
 I've made some small progress on this general issue.
 
 First, I added all sorts of debugging to try to figure out ath_tx_start crash.
 As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
 and probably something close to 0x0.  I've added yet more debugging, but 
 haven't
 hit the problem again.
 
 I also tried stopping DMA in a loop up to 5 times if it failed to stop
 previously in the loop.  This did not appear to help at all.
 
 I also managed to make both the ath_tx_start crash and the DMA errors very 
 hard to reproduce
 (I dare not say fixed, yet).
 
 It appears that this small patch (and possibly, the fact that I set debugging 
 to 0x600
 instead of 0x400) makes the problems go away.  This makes me wonder if a root 
 cause is
 something to do with repeatedly resetting the hardware too fast, as setting 
 channels rapidly
 would tend to do that, and channels are set on association by supplicant, it 
 appears.
Please try this patch while leaving the unnecessary resets in place.
I found that when ath_drain_all_txq finds tx dma not stopped, it will
issue a reset at a point in time where it is both useless (since it's
right before a reset anyway) and dangerous (since the rx dma engine
isn't even disabled yet), so IMHO the right thing to do is to drop
this extra reset.

--- a/drivers/net/wireless/ath/ath9k/xmit.c
+++ b/drivers/net/wireless/ath/ath9k/xmit.c
@@ -1194,18 +1194,8 @@ void ath_drain_all_txq(struct ath_softc 
}
}
 
-   if (npend) {
-   int r;
-
-   ath_print(common, ATH_DBG_FATAL,
- Failed to stop TX DMA. Resetting hardware!\n);
-
-   r = ath9k_hw_reset(ah, sc-sc_ah-curchan, ah-caldata, false);
-   if (r)
-   ath_print(common, ATH_DBG_FATAL,
- Unable to reset hardware; reset status %d\n,
- r);
-   }
+   if (npend)
+   ath_print(common, ATH_DBG_FATAL,  Failed to stop TX DMA!\n);
 
for (i = 0; i  ATH9K_NUM_TX_QUEUES; i++) {
if (ATH_TXQ_SETUP(sc, i))
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-04 Thread Ben Greear
On 12/04/2010 06:41 PM, Felix Fietkau wrote:
 On 2010-12-03 9:14 AM, Ben Greear wrote:
 On 12/01/2010 03:22 PM, Ben Greear wrote:
 On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common 
 ath9k_hw mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

  Luis

 I managed to hit that ath_tx_start crash again, and this time there were no 
 obvious
 DMA or irq errors immediately preceding it.  So, it might be a real bug
 after all.  I'll add some extra checks to see if tid-ac is NULL.

 I've made some small progress on this general issue.

 First, I added all sorts of debugging to try to figure out ath_tx_start 
 crash.
 As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
 and probably something close to 0x0.  I've added yet more debugging, but 
 haven't
 hit the problem again.

 I also tried stopping DMA in a loop up to 5 times if it failed to stop
 previously in the loop.  This did not appear to help at all.

 I also managed to make both the ath_tx_start crash and the DMA errors very 
 hard to reproduce
 (I dare not say fixed, yet).

 It appears that this small patch (and possibly, the fact that I set 
 debugging to 0x600
 instead of 0x400) makes the problems go away.  This makes me wonder if a 
 root cause is
 something to do with repeatedly resetting the hardware too fast, as setting 
 channels rapidly
 would tend to do that, and channels are set on association by supplicant, it 
 appears.
 Please try this patch while leaving the unnecessary resets in place.
 I found that when ath_drain_all_txq finds tx dma not stopped, it will
 issue a reset at a point in time where it is both useless (since it's
 right before a reset anyway) and dangerous (since the rx dma engine
 isn't even disabled yet), so IMHO the right thing to do is to drop
 this extra reset.

I'll give this a try, not sure if I'll get to it before Monday though...

Thanks,
Ben

-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-04 Thread Ben Greear
On 12/04/2010 06:41 PM, Felix Fietkau wrote:
 On 2010-12-03 9:14 AM, Ben Greear wrote:
 On 12/01/2010 03:22 PM, Ben Greear wrote:
 On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common 
 ath9k_hw mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

  Luis

 I managed to hit that ath_tx_start crash again, and this time there were no 
 obvious
 DMA or irq errors immediately preceding it.  So, it might be a real bug
 after all.  I'll add some extra checks to see if tid-ac is NULL.

 I've made some small progress on this general issue.

 First, I added all sorts of debugging to try to figure out ath_tx_start 
 crash.
 As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
 and probably something close to 0x0.  I've added yet more debugging, but 
 haven't
 hit the problem again.

 I also tried stopping DMA in a loop up to 5 times if it failed to stop
 previously in the loop.  This did not appear to help at all.

 I also managed to make both the ath_tx_start crash and the DMA errors very 
 hard to reproduce
 (I dare not say fixed, yet).

 It appears that this small patch (and possibly, the fact that I set 
 debugging to 0x600
 instead of 0x400) makes the problems go away.  This makes me wonder if a 
 root cause is
 something to do with repeatedly resetting the hardware too fast, as setting 
 channels rapidly
 would tend to do that, and channels are set on association by supplicant, it 
 appears.
 Please try this patch while leaving the unnecessary resets in place.
 I found that when ath_drain_all_txq finds tx dma not stopped, it will
 issue a reset at a point in time where it is both useless (since it's
 right before a reset anyway) and dangerous (since the rx dma engine
 isn't even disabled yet), so IMHO the right thing to do is to drop
 this extra reset.

 --- a/drivers/net/wireless/ath/ath9k/xmit.c
 +++ b/drivers/net/wireless/ath/ath9k/xmit.c
 @@ -1194,18 +1194,8 @@ void ath_drain_all_txq(struct ath_softc
   }
   }

 - if (npend) {
 - int r;
 -
 - ath_print(common, ATH_DBG_FATAL,
 -   Failed to stop TX DMA. Resetting hardware!\n);
 -
 - r = ath9k_hw_reset(ah, sc-sc_ah-curchan, ah-caldata, false);
 - if (r)
 - ath_print(common, ATH_DBG_FATAL,
 -   Unable to reset hardware; reset status %d\n,
 -   r);
 - }
 + if (npend)
 + ath_print(common, ATH_DBG_FATAL,  Failed to stop TX DMA!\n);

   for (i = 0; i  ATH9K_NUM_TX_QUEUES; i++) {
   if (ATH_TXQ_SETUP(sc, i))


I applied this on top of all my patches, and on top of the 4 that Luis recently
posted.

I'm trying this on a different system than normal..happens to be configured
with 115 stations.  It was getting this fail-to-stop-RX warning even with my
channel-change mitigation patch, so I left it in.  I can still test w/it removed
if you want.

None of my interfaces are using WPA (or supplicant)..just un-encrypted
association to an AP 3 feet away.

The recent success I had on Friday was on a different system entirely,
with only 84 STAs, and using wpa-supplicant with 30 or so stations
using WPA and the other 55 on a different AP un-encrypted (still using
wpa_supplicant for all of these).

So, can't compare my previous reports directly with this one.

I'm going to re-configure this one to have smaller numbers of
stations and use wpa_supplicant..will see how that goes.

Even with all these warnings in the logs..system is basically stable and
a few interfaces are able to associate, at least for a short time.

WARNING: at 
/home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:538
 ath_stoprecv+0xcd/0xd7 [ath9k]()
Hardware name: 945GM
Could not stop RX, we could be confusing the DMA engine when we start RX up
Modules linked in: 8021q garp stp llc michael_mic macvlan pktgen iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi nfs lockd fscache nfs_acl 
auth_rpcgss 
sunrpc p4_clockmod ipv6 uinput arc4 ecb ath9k mac80211 snd_intel8x0 
snd_ac97_codec ath9k_common ac97_bus snd_seq snd_seq_device ath9k_hw ath 
snd_pcm pcspkr 
i2c_i801 serio_raw cfg80211 iTCO_wdt iTCO_vendor_support microcode snd_timer 
snd soundcore e1000e snd_page_alloc yenta_socket floppy i915 drm_kms_helper drm 
i2c_algo_bit 

Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-03 Thread Ben Greear
On 12/01/2010 03:22 PM, Ben Greear wrote:
 On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common 
 ath9k_hw mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

 Luis

 I managed to hit that ath_tx_start crash again, and this time there were no 
 obvious
 DMA or irq errors immediately preceding it.  So, it might be a real bug
 after all.  I'll add some extra checks to see if tid-ac is NULL.

I've made some small progress on this general issue.

First, I added all sorts of debugging to try to figure out ath_tx_start crash.
As best as I can tell, 'tid' is not NULL, but also is not a valid pointer,
and probably something close to 0x0.  I've added yet more debugging, but haven't
hit the problem again.

I also tried stopping DMA in a loop up to 5 times if it failed to stop
previously in the loop.  This did not appear to help at all.

I also managed to make both the ath_tx_start crash and the DMA errors very hard 
to reproduce
(I dare not say fixed, yet).

It appears that this small patch (and possibly, the fact that I set debugging 
to 0x600
instead of 0x400) makes the problems go away.  This makes me wonder if a root 
cause is
something to do with repeatedly resetting the hardware too fast, as setting 
channels rapidly
would tend to do that, and channels are set on association by supplicant, it 
appears.

diff --git a/drivers/net/wireless/ath/ath9k/main.c 
b/drivers/net/wireless/ath/ath9k/main.c
index f026a03..46b1791 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -1605,6 +1605,16 @@ static int ath9k_config(struct ieee80211_hw *hw, u32 
changed)
 else
 sc-sc_flags = ~SC_OP_OFFCHANNEL;

+   /* If channels  HT are the same, then don't actually do 
anything.
+*/
+   if ((sc-sc_ah-curchan == sc-sc_ah-channels[pos]) 
+   (aphy-chan_is_ht == conf_is_ht(conf))) {
+   ath_print(common, ATH_DBG_CONFIG,
+ Skip Set channel: %d MHz, already there.\n,
+ curchan-center_freq);
+   goto skip_chan_change;
+   }
+
 if (aphy-state == ATH_WIPHY_SCAN ||
 aphy-state == ATH_WIPHY_ACTIVE)
 ath9k_wiphy_pause_all_forced(sc, aphy);

Thanks,
Ben

-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-12-01 Thread Ben Greear
On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw 
 mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

Luis

I managed to hit that ath_tx_start crash again, and this time there were no 
obvious
DMA or irq errors immediately preceding it.  So, it might be a real bug
after all.  I'll add some extra checks to see if tid-ac is NULL.

Thanks,
Ben

-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


[ath9k-devel] Script to crash ath9k with DMA errors.

2010-11-29 Thread Ben Greear
Here is a script that reliably crashes my ath9k box.
A second box with completely different hardware (except
for ath9k) experiences similar problems.

I am using today's wireless-testing kernel with a few
patches of my own.

You will also need the very latest hostap tree as it has the
optimizations for allowing STAs to share scans.  Without
this optimization, I did not see this problem.

A few notes about the script:

* I cannot remove any interfaces, seems a ref-count leak somewhere.
   I haven't debugged this issue.

* Without the background ping, it is very hard to reproduce this problem,
   but with it, it happens almost every time.

* You'll need to set up your paths at the top of the script.


#!/usr/bin/perl

use strict;

my $iw = ./local/sbin/iw;
my $ip = ./local/sbin/ip;
my $wpa_s = ./local/bin/wpa_supplicant;
my $ssid = candela-n;
my $key = wpadmz123;

my $phy = wiphy0;
my $max = 32;
my $i;
my $bmac = 00:01:02:03:04:;
my $cmd;

# Cleanup previous stuff
runCmd(killall wpa_supplicant);
runCmd(killall ping);

for ($i = 0; $i$max; $i++) {
   # Work around ref-counting bugs in kernel
   runCmd($ip link set sta$i down);
   runCmd($ip addr flush dev sta$i);
   runCmd($ip route flush dev sta$i);
   runCmd($ip -6 addr flush dev sta$i);
   runCmd($ip -6 route flush dev sta$i);

   # Bugger, cannot get the ref-count problem to go away.
   # runCmd($iw dev sta$i del);
}

#exit(0);

open(FD, pingbg) || die(Couldn't open pingbg.);
print FD #!/bin/bash\n\n;
print FD ping \$*  /dev/null 21 \n;
print FD echo continuing\n;
close(FD);
runCmd(chmod a+x pingbg);

# Create stations
for ($i = 0; $i$max; $i++) {
   runCmd($iw phy $phy interface add sta$i type station);
   my $mc5 = $i + 1;
   if (length($mc5) == 1) {
 $mc5 = 0$mc5; # pad mac octet
   }
   my $mac = $bmac$mc5;
   runCmd($ip link set sta$i address $mac);

   runCmd($iw dev sta$i set power_save off);
   runCmd($ip addr add 9.99.1.$mc5/24 dev sta$i);
   runCmd(./pingbg -I sta$i 9.99.1.1);
}

# Bring them up with WPA
for ($i = 0; $i$max; $i++) {
   open(FD, sta$i . _wpa.conf) || die(Couldn't open file: $!\n);
   print FD 
ctrl_interface=/var/run/wpa_supplicant
fast_reauth=1
#can_scan_one=1
network={
 ssid=\$ssid\
 proto=WPA
 key_mgmt=WPA-PSK
 psk=\$key\
 pairwise=TKIP CCMP
 group=TKIP CCMP
}
;
   #runCmd($wpa_s -B -i sta$i -c sta$i . _wpa.conf -P sta$i . _wpa.pid -t 
-f sta$i . _wpa.log);
}

# Build command to start one wpa_supplicant for all interfaces.
my $cmd = $wpa_s -B -g /var/run/wpa_supplicant_if -P 
/tmp/wpa_supplicant-all.pid -t -f /tmp/wpa_supplicant_log_all.txt -i sta0 -c 
sta0_wpa.conf;
for ($i = 1; $i$max; $i++) {
   $cmd = $cmd -N -i sta$i -c sta$i . _wpa.conf;
}
runCmd($cmd);

sub runCmd {
   my $cmd = shift;
   print $cmd\n;
   `$cmd`;
}


Example kernel crash output:

ADDRCONF(NETDEV_CHANGE): sta6: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta5: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta4: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta3: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta1: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta0: link becomes ready
padlock: VIA PadLock not detected.

[r...@ath9k-dev1 ~]# ADDRCONF(NETDEV_CHANGE): sta30: link becomes ready
ADDRCONF(NETDEV_CHANGE): sta29: link becomes ready
[ cut here ]
WARNING: at 
/home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:532
 ath_stoprecv+0x90/0x9a [ath9k]()
Hardware name: PDSBM
Could not stop RX, we could be confusing the DMA engine when we start RX up
Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw mi]
Pid: 3505, comm: wpa_supplicant Not tainted 2.6.37-rc3-wl+ #53
Call Trace:
  [78436fe9] warn_slowpath_common+0x77/0x8c
  [f933019e] ? ath_stoprecv+0x90/0x9a [ath9k]
  [f933019e] ? ath_stoprecv+0x90/0x9a [ath9k]
  [7843707a] warn_slowpath_fmt+0x2e/0x30
  [f933019e] ath_stoprecv+0x90/0x9a [ath9k]
  [f932f13c] ath_set_channel+0x94/0x1e8 [ath9k]
  [7845a425] ? mark_held_locks+0x47/0x5f
  [7878e5bb] ? _raw_spin_unlock_irqrestore+0x3c/0x48
  [f932f5d4] ath9k_config+0x344/0x423 [ath9k]
  [f919] ieee80211_hw_config+0x11b/0x125 [mac80211]
  [f91aa25a] ieee80211_set_channel+0x74/0x9e [mac80211]
  [f8d37d36] cfg80211_set_freq+0xf3/0x12d [cfg80211]
  [f91aa1e6] ? ieee80211_set_channel+0x0/0x9e [mac80211]
  [f8d3a24c] cfg80211_mgd_wext_siwfreq+0x108/0x148 [cfg80211]
  [f8d395c9] cfg80211_wext_siwfreq+0x42/0xbf [cfg80211]
  [7876e14f] ioctl_standard_call+0x52/0x28e
  [786f2db3] ? dev_name_hash+0x16/0x48
  [786f67cc] ? __dev_get_by_name+0x32/0x3d
  [7876e418] wext_handle_ioctl+0x8d/0x18d
  [f8d39587] ? cfg80211_wext_siwfreq+0x0/0xbf [cfg80211]
  [786f78f9] dev_ioctl+0x520/0x53f
  [786e5f7f] ? sock_ioctl+0x0/0x202
  [786e6175] sock_ioctl+0x1f6/0x202
  [7878e576] ? _raw_spin_unlock_irq+0x22/0x2b
  [786e5f7f] ? sock_ioctl+0x0/0x202
  [784cc151] do_vfs_ioctl+0x4b1/0x4f6
  [7878e576] ? 

Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-11-29 Thread Luis R. Rodriguez
On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:
 Here is a script that reliably crashes my ath9k box.
 A second box with completely different hardware (except
 for ath9k) experiences similar problems.
 
 I am using today's wireless-testing kernel with a few
 patches of my own.
 
 You will also need the very latest hostap tree as it has the
 optimizations for allowing STAs to share scans.  Without
 this optimization, I did not see this problem.
 
 A few notes about the script:
 
 * I cannot remove any interfaces, seems a ref-count leak somewhere.
I haven't debugged this issue.
 
 * Without the background ping, it is very hard to reproduce this problem,
but with it, it happens almost every time.
 
 * You'll need to set up your paths at the top of the script.
 
 
 #!/usr/bin/perl
 
 use strict;
 
 my $iw = ./local/sbin/iw;
 my $ip = ./local/sbin/ip;
 my $wpa_s = ./local/bin/wpa_supplicant;
 my $ssid = candela-n;
 my $key = wpadmz123;
 
 my $phy = wiphy0;
 my $max = 32;
 my $i;
 my $bmac = 00:01:02:03:04:;
 my $cmd;
 
 # Cleanup previous stuff
 runCmd(killall wpa_supplicant);
 runCmd(killall ping);
 
 for ($i = 0; $i$max; $i++) {
# Work around ref-counting bugs in kernel
runCmd($ip link set sta$i down);
runCmd($ip addr flush dev sta$i);
runCmd($ip route flush dev sta$i);
runCmd($ip -6 addr flush dev sta$i);
runCmd($ip -6 route flush dev sta$i);
 
# Bugger, cannot get the ref-count problem to go away.
# runCmd($iw dev sta$i del);
 }
 
 #exit(0);
 
 open(FD, pingbg) || die(Couldn't open pingbg.);
 print FD #!/bin/bash\n\n;
 print FD ping \$*  /dev/null 21 \n;
 print FD echo continuing\n;
 close(FD);
 runCmd(chmod a+x pingbg);
 
 # Create stations
 for ($i = 0; $i$max; $i++) {
runCmd($iw phy $phy interface add sta$i type station);
my $mc5 = $i + 1;
if (length($mc5) == 1) {
  $mc5 = 0$mc5; # pad mac octet
}
my $mac = $bmac$mc5;
runCmd($ip link set sta$i address $mac);
 
runCmd($iw dev sta$i set power_save off);
runCmd($ip addr add 9.99.1.$mc5/24 dev sta$i);
runCmd(./pingbg -I sta$i 9.99.1.1);
 }
 
 # Bring them up with WPA
 for ($i = 0; $i$max; $i++) {
open(FD, sta$i . _wpa.conf) || die(Couldn't open file: $!\n);
print FD 
 ctrl_interface=/var/run/wpa_supplicant
 fast_reauth=1
 #can_scan_one=1
 network={
  ssid=\$ssid\
  proto=WPA
  key_mgmt=WPA-PSK
  psk=\$key\
  pairwise=TKIP CCMP
  group=TKIP CCMP
 }
 ;
#runCmd($wpa_s -B -i sta$i -c sta$i . _wpa.conf -P sta$i . _wpa.pid 
 -t -f sta$i . _wpa.log);
 }
 
 # Build command to start one wpa_supplicant for all interfaces.
 my $cmd = $wpa_s -B -g /var/run/wpa_supplicant_if -P 
 /tmp/wpa_supplicant-all.pid -t -f /tmp/wpa_supplicant_log_all.txt -i sta0 -c 
 sta0_wpa.conf;
 for ($i = 1; $i$max; $i++) {
$cmd = $cmd -N -i sta$i -c sta$i . _wpa.conf;
 }
 runCmd($cmd);
 
 sub runCmd {
my $cmd = shift;
print $cmd\n;
`$cmd`;
 }
 
 
 Example kernel crash output:
 
 ADDRCONF(NETDEV_CHANGE): sta6: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta5: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta4: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta3: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta1: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta0: link becomes ready
 padlock: VIA PadLock not detected.
 
 [r...@ath9k-dev1 ~]# ADDRCONF(NETDEV_CHANGE): sta30: link becomes ready
 ADDRCONF(NETDEV_CHANGE): sta29: link becomes ready
 [ cut here ]
 WARNING: at 
 /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/recv.c:532
  ath_stoprecv+0x90/0x9a [ath9k]()
 Hardware name: PDSBM
 Could not stop RX, we could be confusing the DMA engine when we start RX up
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw 
 mi]
 Pid: 3505, comm: wpa_supplicant Not tainted 2.6.37-rc3-wl+ #53
 Call Trace:
   [78436fe9] warn_slowpath_common+0x77/0x8c
   [f933019e] ? ath_stoprecv+0x90/0x9a [ath9k]
   [f933019e] ? ath_stoprecv+0x90/0x9a [ath9k]
   [7843707a] warn_slowpath_fmt+0x2e/0x30
   [f933019e] ath_stoprecv+0x90/0x9a [ath9k]
   [f932f13c] ath_set_channel+0x94/0x1e8 [ath9k]
   [7845a425] ? mark_held_locks+0x47/0x5f
   [7878e5bb] ? _raw_spin_unlock_irqrestore+0x3c/0x48
   [f932f5d4] ath9k_config+0x344/0x423 [ath9k]
   [f919] ieee80211_hw_config+0x11b/0x125 [mac80211]
   [f91aa25a] ieee80211_set_channel+0x74/0x9e [mac80211]
   [f8d37d36] cfg80211_set_freq+0xf3/0x12d [cfg80211]
   [f91aa1e6] ? ieee80211_set_channel+0x0/0x9e [mac80211]
   [f8d3a24c] cfg80211_mgd_wext_siwfreq+0x108/0x148 [cfg80211]
   [f8d395c9] cfg80211_wext_siwfreq+0x42/0xbf [cfg80211]
   [7876e14f] ioctl_standard_call+0x52/0x28e
   [786f2db3] ? dev_name_hash+0x16/0x48
   [786f67cc] ? __dev_get_by_name+0x32/0x3d
   [7876e418] wext_handle_ioctl+0x8d/0x18d
   [f8d39587] ? cfg80211_wext_siwfreq+0x0/0xbf [cfg80211]
   [786f78f9] dev_ioctl+0x520/0x53f
   

Re: [ath9k-devel] Script to crash ath9k with DMA errors.

2010-11-29 Thread Ben Greear
On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote:
 On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote:
 Here is a script that reliably crashes my ath9k box.
 A second box with completely different hardware (except
 for ath9k) experiences similar problems.

 BUG: unable to handle kernel NULL pointer dereference at 0040
 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k]
 *pde = 
 Oops:  [#1] SMP DEBUG_PAGEALLOC
 last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq
 Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl 
 auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw 
 mi]

 Pid: 38, comm: kworker/u:1 Tainted: GW   2.6.37-rc3-wl+ #53 
 PDSBM/PDSBM
 EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1
 EIP is at ath_tx_start+0x461/0x5ef [ath9k]

 Please use

 gdb drivers/net/wireless/ath/ath9k/
 l *(ath_tx_start+0x461)

Usually the machine locks pretty hard with irq errors reported from wired NICs
and/or the hard-drive.  I'm not sure that the ath_tx_start
issue is real, or maybe just some un-lucky side-affect of earlier
bugs in this trace.

Reading symbols from 
/home/greearb/kernel/2.6/wireless-testing-dbg.p4s/drivers/net/wireless/ath/ath9k/ath9k.ko...done.
(gdb) l *(ath_tx_start+0x461)
0x972e is in ath_tx_start 
(/home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath9k/xmit.c:1691).
1686if ((tx_info-flags  IEEE80211_TX_CTL_AMPDU)  txctl-an) {
1687tidno = ieee80211_get_qos_ctl(hdr)[0] 
1688IEEE80211_QOS_CTL_TID_MASK;
1689tid = ATH_AN_2_TID(txctl-an, tidno);
1690
1691WARN_ON(tid-ac-txq != txctl-txq);
1692/*
1693 * Try aggregation if it's a unicast data frame
1694 * and the destination is HT capable.
1695 */


Thanks,
Ben


Luis


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel