Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-08-26 Thread Alexandre Sunny Kovalenko
On Sat, 2008-07-12 at 21:26 -0700, Sam Leffler wrote:
 Alexandre Sunny Kovalenko wrote:
  On Sat, 2008-07-12 at 09:57 -0700, Sam Leffler wrote:

  Alexandre Sunny Kovalenko wrote:
  
  On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:


  Alexandre Sunny Kovalenko wrote:
  
  
  On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:


  Alexandre Sunny Kovalenko wrote:
  
  
  On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 


  Alexandre Sunny Kovalenko wrote:
  
  
  I seem to be able to lock my machine by going into wpa_cli and 
  asking it
  to 'reassoc'. The reason for question mark after hard is that 
  debug
  information (caused by wlandebug and athdebug) is being printed on 
  the
  console. The only way to get machine's attention is to hold power 
  button
  for 8 seconds.


  So this is just livelock due to console debug msgs.
  
  
  I am not sure, I have parsed this well enough, so I will try to 
  clarify:
  machine becomes unresponsive *without* any debugging turned on, to an
  extent that pressing the power button twice *does not* generate ACPI
  console message (something to the tune of going into S5 already --
  gimme a break). If I turn ath debugging on, I do see those messages,
  and only them, scrolling on the console.


  Guess I misunderstood you.
  
  
  I have finally got enough time and equipment to investigate this
  further. Here are some conclusions:
 
  -- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
  a livelock.
 
  -- all system does, is executing ath_intr (if_ath.c) in the tight loop
  with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
  possibility that I have caused livelock with the debug messages, I have
  put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
  reaches 10,000. Without any kind of the debug messages, it will be
  triggered within 40-60 seconds after starting of wpa_supplicant.
 
  -- I suspect that comment below, might not hold true on my equipment
  if (status  HAL_INT_MIB) {
 sc-sc_stats.ast_mib++;
 /*
  * Disable interrupts until we service the MIB
  * interrupt; otherwise it will continue to fire.
  */
 ath_hal_intrset(ah, 0);
 /*
  * Let the hal handle the event.  We assume it will 
  * clear whatever condition caused the interrupt.   
  */
  ath_hal_mibevent(ah, sc-sc_halstats);
  ath_hal_intrset(ah, sc-sc_imask);
  }
 
  My hardware is:
  ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
  RF5413)
  ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
  pci3
  ath0: [ITHREAD]
  ath0: using obsoleted if_watchdog interface
  ath0: Ethernet address: 00:16:cf:26:2f:3f
  ath0: mac 10.3 phy 6.1 radio 10.2
 
  My wpa_supplicant.conf is:
  ctrl_interface=/var/run/wpa_supplicant
  ctrl_interface_group=wheel
  eapol_version=2
 
  network={
ssid=XXX
scan_ssid=1
priority=1
proto=WPA
pairwise=TKIP
group=TKIP
key_mgmt=WPA-PSK
psk=xx
  }
 
  Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
  different values. Only 11G SSID is configured in wpa_supplicant.conf. In
  the test setup, AP is with 10' (3m) from the laptop.
 
  AP is successfully used by handful of Windows clients (including this
  same laptop) and iBook G4.
 
  Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
  anything but normal scan. 
 
  athdebug 0x loops with ath_intr: status 0x1000 until I power
  down my laptop.
 
  I would appreciate any suggestion on what I can investigate further --
  at this point I have comfortable console setup and should be able to
  field requests for further information much better.
 


  Are you running powerd?
  
  
  I do. And I just tried disabling it, and I could not reproduce the
  problem any more. Is there any way to reconcile if_ath with powerd?
 


  Don't know.  There appear to be two issues.  When the MIB interrupts 
  arrive the kernel may service them w/ the cpu at a reduced clock 
  frequency.  Since powered is currently the only mechanism for increasing 
  the frequency and it runs in user space it can take a while to bump the 
  clock rate leading to livelock (because the logic to reduce the _cause_ 
  of the MIB interrupt takes a long time to run).  John Baldwin suggested 
  raising the clock frequency when handling interrupts in the kernel but 
  nothing has been done to make that happen.
 
  Separately there is a question as to why the MIB interrupts are 
  happening at all.  This is possibly due to misprogramming of the 
  baseband h/w in the ath card.  Unfortunately I've been trying to get 
  

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-13 Thread Martin

Hi Sam,

do you know if there is anything done about cbb(4)? I have many
wireless adapters with ath(4), but only the one based on PCMCIA is
making problems on FreeBSD.

I cannot boot my notebook with the device inserted into the port, or it
will render the system unusable (100% load on cbb(4)).

And all I can see is the following:
Jul 12 14:58:39 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:40 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:41 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:42 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:43 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:44 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:46 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:47 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:52 link kernel: ath0: device timeout
Jul 12 14:58:52 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:58:58 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:59 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:00 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 3289233408
Jul 12 14:59:02 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:03 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:04 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:05 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:06 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:10 link kernel: ath0: device timeout
Jul 12 14:59:10 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:59:11 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:12 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:13 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:14 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:15 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:16 link kernel: ath0: unable to reset hardware; hal status 
3868687084
Jul 12 14:59:16 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:17 link kernel: ath0: ath_chan_set: unable to reset channel 11 
(2462 Mhz, flags 0x480 hal flags 0xc0), hal status 3828804624
Jul 12 14:59:22 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:22 link last message repeated 4 times
Jul 12 15:00:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:52 link kernel: ath0: unable to reset hardware; hal status 
3868539700


wpa_supplicant is trying to access the card and switching channels, I
suppose.

A workaround that I'm using all the time is:
1) boot the notebook without the PCMCIA card
2) 

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-13 Thread Alexandre Sunny Kovalenko
On Sun, 2008-07-13 at 11:21 +0200, Martin wrote:
 Hi Sam,
 
 do you know if there is anything done about cbb(4)? I have many
 wireless adapters with ath(4), but only the one based on PCMCIA is
 making problems on FreeBSD.
 
 I cannot boot my notebook with the device inserted into the port, or it
 will render the system unusable (100% load on cbb(4)).
 
 And all I can see is the following:
 Jul 12 14:58:39 link kernel: ath0: ath_chan_set: unable to reset channel 1 
 (2412 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:40 link kernel: ath0: ath_chan_set: unable to reset channel 6 
 (2437 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:58:41 link kernel: ath0: ath_chan_set: unable to reset channel 7 
 (2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:58:42 link kernel: ath0: ath_chan_set: unable to reset channel 2 
 (2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:43 link kernel: ath0: ath_chan_set: unable to reset channel 3 
 (2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:44 link kernel: ath0: ath_chan_set: unable to reset channel 4 
 (2427 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 5 
 (2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 8 
 (2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:46 link kernel: ath0: ath_chan_set: unable to reset channel 9 
 (2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:47 link kernel: ath0: ath_chan_set: unable to reset channel 10 
 (2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:58:52 link kernel: ath0: device timeout
 Jul 12 14:58:52 link kernel: ath0: ath_reset: unable to reset hardware; hal 
 status 3231908395
 Jul 12 14:58:58 link kernel: ath0: ath_chan_set: unable to reset channel 1 
 (2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:58:59 link kernel: ath0: ath_chan_set: unable to reset channel 6 
 (2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:00 link kernel: ath0: ath_chan_set: unable to reset channel 7 
 (2442 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 2 
 (2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 3 
 (2422 Mhz, flags 0x480 hal flags 0xc0), hal status 3289233408
 Jul 12 14:59:02 link kernel: ath0: ath_chan_set: unable to reset channel 4 
 (2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:03 link kernel: ath0: ath_chan_set: unable to reset channel 5 
 (2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:04 link kernel: ath0: ath_chan_set: unable to reset channel 8 
 (2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:05 link kernel: ath0: ath_chan_set: unable to reset channel 9 
 (2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:06 link kernel: ath0: ath_chan_set: unable to reset channel 10 
 (2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:10 link kernel: ath0: device timeout
 Jul 12 14:59:10 link kernel: ath0: ath_reset: unable to reset hardware; hal 
 status 3231908395
 Jul 12 14:59:11 link kernel: ath0: ath_chan_set: unable to reset channel 1 
 (2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:59:12 link kernel: ath0: ath_chan_set: unable to reset channel 6 
 (2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:13 link kernel: ath0: ath_chan_set: unable to reset channel 7 
 (2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
 Jul 12 14:59:14 link kernel: ath0: ath_chan_set: unable to reset channel 2 
 (2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:15 link kernel: ath0: ath_chan_set: unable to reset channel 3 
 (2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:16 link kernel: ath0: unable to reset hardware; hal status 
 3868687084
 Jul 12 14:59:16 link kernel: ath0: ath_chan_set: unable to reset channel 4 
 (2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
 Jul 12 14:59:17 link kernel: ath0: ath_chan_set: unable to reset channel 11 
 (2462 Mhz, flags 0x480 hal flags 0xc0), hal status 3828804624
 Jul 12 14:59:22 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 Jul 12 14:59:32 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 Jul 12 14:59:42 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 Jul 12 15:00:22 link last message repeated 4 times
 Jul 12 15:00:32 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 Jul 12 15:00:42 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 Jul 12 15:00:52 link kernel: ath0: unable to reset hardware; hal status 
 3868539700
 
 
 wpa_supplicant is trying to access the 

if_ral lockups [was: Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7]

2008-07-13 Thread Heiko Wundram
Am Sonntag, 13. Juli 2008 14:37:14 schrieb Alexandre Sunny Kovalenko:
 snip

Just to chime into the discussion, I'm seeing something similar for if_ral 
since my latest update of 7-STABLE from about two weeks ago (which I did not 
see with a 7-STABLE from some time in march, so something must have changed 
in between, but I can't narrow it down exactly). I've since updated the 
kernel several times (the latest update being just last friday), but the 
problem hasn't disappeared so far.

I get an oops, followed by a reboot when wpa_supplicant has associated an 
if_ral interface (which is a removable PCMCIA Ralink based WLAN card) when 
powerd is running. As soon as I disable powerd for my laptop (and it runs 
continuously at 1800 Mhz), the oopses disappear.

As I'm getting a kernel oops, if anybody is interested, I'd more than happily 
reenable powerd to produce a kernel dump and make that available to anybody 
who is interested in debugging it, along with the kernel images, of course. 
I'm running a self-compiled GENERIC (with nothing changed except for the 
self-compiled part). I currently don't have the time to track this down any 
further myself, though, and the workaround works for me.

Device info:
...
cbb0: O2Micro OZ711M1 PCI-CardBus Bridge at device 9.0 on pci0
cardbus0: CardBus bus on cbb0
pccard0: 16-bit PCCard bus on cbb0
cbb0: [ITHREAD]
cbb1: O2Micro OZ711M1 PCI-CardBus Bridge at device 9.1 on pci0
cardbus1: CardBus bus on cbb1
pccard1: 16-bit PCCard bus on cbb1
cbb1: [ITHREAD]
...
ral0: Ralink Technology RT2561 mem 0x8800-0x88007fff irq 17 at device 
0.0 on cardbus0
ral0: MAC/BBP RT2561C, RF RT2527
ral0: Ethernet address: 00:80:5a:51:23:53
ral0: [ITHREAD]
...

uname:
FreeBSD phoenix.modelnine.org 7.0-STABLE FreeBSD 7.0-STABLE #0: Fri Jul 11 
12:28:25 CEST 2008 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386

-- 
Heiko Wundram
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-13 Thread Martin


Hi Alexandre,

 You can panic the box by unloading if_ath.ko while wpa_supplicant is
 running on any form factor, so this one is not specific to the removable
 devices.

I am not talking about unloading kernel modules (I wouldn't unload
if_ath by force and complain about it). The panic is happening while
ejecting the PCMCIA card and wpa_supplicant is running.

The cbb(4) problem with the high load is much more annoying for me,
because the only solution is to reboot the system to get it stable
again.

One thing I also noticed is that if I have set a high resolution with
vidcontrol (1600x1200) and the text terminal display becomes really
slow, ath(4) has difficulties to load the device when it is inserted
into the PCMCIA-slot.

-- 
Martin Sugioarto
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-13 Thread Sam Leffler

Martin wrote:

Hi Sam,

do you know if there is anything done about cbb(4)? I have many
wireless adapters with ath(4), but only the one based on PCMCIA is
making problems on FreeBSD.

I cannot boot my notebook with the device inserted into the port, or it
will render the system unusable (100% load on cbb(4)).

And all I can see is the following:
Jul 12 14:58:39 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:40 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:41 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:42 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:43 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:44 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:46 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:47 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:52 link kernel: ath0: device timeout
Jul 12 14:58:52 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:58:58 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:59 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:00 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 3289233408
Jul 12 14:59:02 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:03 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:04 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:05 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:06 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:10 link kernel: ath0: device timeout
Jul 12 14:59:10 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:59:11 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:12 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:13 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:14 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:15 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:16 link kernel: ath0: unable to reset hardware; hal status 
3868687084
Jul 12 14:59:16 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:17 link kernel: ath0: ath_chan_set: unable to reset channel 11 
(2462 Mhz, flags 0x480 hal flags 0xc0), hal status 3828804624
Jul 12 14:59:22 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:22 link last message repeated 4 times
Jul 12 15:00:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:52 link kernel: ath0: unable to reset hardware; hal status 
3868539700


wpa_supplicant is trying to access the card and switching channels, I
suppose.

A workaround that I'm using all the time is:
1) boot the notebook without the 

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-13 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:

On Sun, 2008-07-13 at 11:21 +0200, Martin wrote:

Hi Sam,

do you know if there is anything done about cbb(4)? I have many
wireless adapters with ath(4), but only the one based on PCMCIA is
making problems on FreeBSD.

I cannot boot my notebook with the device inserted into the port, or it
will render the system unusable (100% load on cbb(4)).

And all I can see is the following:
Jul 12 14:58:39 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:40 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:41 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:42 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:43 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:44 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:45 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:46 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:47 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:58:52 link kernel: ath0: device timeout
Jul 12 14:58:52 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:58:58 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:58:59 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:00 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:01 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 3289233408
Jul 12 14:59:02 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:03 link kernel: ath0: ath_chan_set: unable to reset channel 5 
(2432 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:04 link kernel: ath0: ath_chan_set: unable to reset channel 8 
(2447 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:05 link kernel: ath0: ath_chan_set: unable to reset channel 9 
(2452 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:06 link kernel: ath0: ath_chan_set: unable to reset channel 10 
(2457 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:10 link kernel: ath0: device timeout
Jul 12 14:59:10 link kernel: ath0: ath_reset: unable to reset hardware; hal 
status 3231908395
Jul 12 14:59:11 link kernel: ath0: ath_chan_set: unable to reset channel 1 
(2412 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:12 link kernel: ath0: ath_chan_set: unable to reset channel 6 
(2437 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:13 link kernel: ath0: ath_chan_set: unable to reset channel 7 
(2442 Mhz, flags 0x480 hal flags 0xc0), hal status 3233070656
Jul 12 14:59:14 link kernel: ath0: ath_chan_set: unable to reset channel 2 
(2417 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:15 link kernel: ath0: ath_chan_set: unable to reset channel 3 
(2422 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:16 link kernel: ath0: unable to reset hardware; hal status 
3868687084
Jul 12 14:59:16 link kernel: ath0: ath_chan_set: unable to reset channel 4 
(2427 Mhz, flags 0x480 hal flags 0xc0), hal status 0
Jul 12 14:59:17 link kernel: ath0: ath_chan_set: unable to reset channel 11 
(2462 Mhz, flags 0x480 hal flags 0xc0), hal status 3828804624
Jul 12 14:59:22 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 14:59:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:22 link last message repeated 4 times
Jul 12 15:00:32 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:42 link kernel: ath0: unable to reset hardware; hal status 
3868539700
Jul 12 15:00:52 link kernel: ath0: unable to reset hardware; hal status 
3868539700


wpa_supplicant is trying to access the card and switching channels, I
suppose.

A 

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-12 Thread Alexandre Sunny Kovalenko
On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:
 Alexandre Sunny Kovalenko wrote:
  On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:
  Alexandre Sunny Kovalenko wrote:
  On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 
  Alexandre Sunny Kovalenko wrote:
  I seem to be able to lock my machine by going into wpa_cli and asking it
  to 'reassoc'. The reason for question mark after hard is that debug
  information (caused by wlandebug and athdebug) is being printed on the
  console. The only way to get machine's attention is to hold power button
  for 8 seconds.
  So this is just livelock due to console debug msgs.
  I am not sure, I have parsed this well enough, so I will try to clarify:
  machine becomes unresponsive *without* any debugging turned on, to an
  extent that pressing the power button twice *does not* generate ACPI
  console message (something to the tune of going into S5 already --
  gimme a break). If I turn ath debugging on, I do see those messages,
  and only them, scrolling on the console.
  Guess I misunderstood you.
  
  I have finally got enough time and equipment to investigate this
  further. Here are some conclusions:
  
  -- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
  a livelock.
  
  -- all system does, is executing ath_intr (if_ath.c) in the tight loop
  with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
  possibility that I have caused livelock with the debug messages, I have
  put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
  reaches 10,000. Without any kind of the debug messages, it will be
  triggered within 40-60 seconds after starting of wpa_supplicant.
  
  -- I suspect that comment below, might not hold true on my equipment
  if (status  HAL_INT_MIB) {
 sc-sc_stats.ast_mib++;
 /*
  * Disable interrupts until we service the MIB
  * interrupt; otherwise it will continue to fire.
  */
 ath_hal_intrset(ah, 0);
 /*
  * Let the hal handle the event.  We assume it will 
  * clear whatever condition caused the interrupt.   
  */
  ath_hal_mibevent(ah, sc-sc_halstats);
  ath_hal_intrset(ah, sc-sc_imask);
  }
  
  My hardware is:
  ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
  RF5413)
  ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
  pci3
  ath0: [ITHREAD]
  ath0: using obsoleted if_watchdog interface
  ath0: Ethernet address: 00:16:cf:26:2f:3f
  ath0: mac 10.3 phy 6.1 radio 10.2
  
  My wpa_supplicant.conf is:
  ctrl_interface=/var/run/wpa_supplicant
  ctrl_interface_group=wheel
  eapol_version=2
  
  network={
ssid=XXX
scan_ssid=1
priority=1
proto=WPA
pairwise=TKIP
group=TKIP
key_mgmt=WPA-PSK
psk=xx
  }
  
  Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
  different values. Only 11G SSID is configured in wpa_supplicant.conf. In
  the test setup, AP is with 10' (3m) from the laptop.
  
  AP is successfully used by handful of Windows clients (including this
  same laptop) and iBook G4.
  
  Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
  anything but normal scan. 
  
  athdebug 0x loops with ath_intr: status 0x1000 until I power
  down my laptop.
  
  I would appreciate any suggestion on what I can investigate further --
  at this point I have comfortable console setup and should be able to
  field requests for further information much better.
  
 
 Are you running powerd?
I do. And I just tried disabling it, and I could not reproduce the
problem any more. Is there any way to reconcile if_ath with powerd?

Thank you,

-- 
Alexandre Sunny Kovalenko (Олександр Коваленко)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-12 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:

On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:
  

Alexandre Sunny Kovalenko wrote:


On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:
  

Alexandre Sunny Kovalenko wrote:

On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 
  

Alexandre Sunny Kovalenko wrote:


I seem to be able to lock my machine by going into wpa_cli and asking it
to 'reassoc'. The reason for question mark after hard is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.
  

So this is just livelock due to console debug msgs.


I am not sure, I have parsed this well enough, so I will try to clarify:
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of going into S5 already --
gimme a break). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.
  

Guess I misunderstood you.


I have finally got enough time and equipment to investigate this
further. Here are some conclusions:

-- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
a livelock.

-- all system does, is executing ath_intr (if_ath.c) in the tight loop
with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
possibility that I have caused livelock with the debug messages, I have
put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
reaches 10,000. Without any kind of the debug messages, it will be
triggered within 40-60 seconds after starting of wpa_supplicant.

-- I suspect that comment below, might not hold true on my equipment
if (status  HAL_INT_MIB) {
   sc-sc_stats.ast_mib++;
   /*
* Disable interrupts until we service the MIB
* interrupt; otherwise it will continue to fire.
*/
   ath_hal_intrset(ah, 0);
   /*
* Let the hal handle the event.  We assume it will 
* clear whatever condition caused the interrupt.   
*/
ath_hal_mibevent(ah, sc-sc_halstats);
ath_hal_intrset(ah, sc-sc_imask);
}

My hardware is:
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413)
ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: 00:16:cf:26:2f:3f
ath0: mac 10.3 phy 6.1 radio 10.2

My wpa_supplicant.conf is:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=wheel
eapol_version=2

network={
  ssid=XXX
  scan_ssid=1
  priority=1
  proto=WPA
  pairwise=TKIP
  group=TKIP
  key_mgmt=WPA-PSK
  psk=xx
}

Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
different values. Only 11G SSID is configured in wpa_supplicant.conf. In
the test setup, AP is with 10' (3m) from the laptop.

AP is successfully used by handful of Windows clients (including this
same laptop) and iBook G4.

Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
anything but normal scan. 


athdebug 0x loops with ath_intr: status 0x1000 until I power
down my laptop.

I would appreciate any suggestion on what I can investigate further --
at this point I have comfortable console setup and should be able to
field requests for further information much better.

  

Are you running powerd?


I do. And I just tried disabling it, and I could not reproduce the
problem any more. Is there any way to reconcile if_ath with powerd?

  


Don't know.  There appear to be two issues.  When the MIB interrupts 
arrive the kernel may service them w/ the cpu at a reduced clock 
frequency.  Since powered is currently the only mechanism for increasing 
the frequency and it runs in user space it can take a while to bump the 
clock rate leading to livelock (because the logic to reduce the _cause_ 
of the MIB interrupt takes a long time to run).  John Baldwin suggested 
raising the clock frequency when handling interrupts in the kernel but 
nothing has been done to make that happen.


Separately there is a question as to why the MIB interrupts are 
happening at all.  This is possibly due to misprogramming of the 
baseband h/w in the ath card.  Unfortunately I've been trying to get 
Atheros to help understand/resolve this question for a very long time 
(as their code also exhibits this behaviour) but they've been 
unresponsive.  I have some experimental code to address this in new hal 
versions (such as 0.10.5.6 available in http://www.freebsd.org/~sam) but 
apparently it does not entirely fix the problem.


   Sam

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-12 Thread Alexandre Sunny Kovalenko
On Sat, 2008-07-12 at 09:57 -0700, Sam Leffler wrote:
 Alexandre Sunny Kovalenko wrote:
  On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:

  Alexandre Sunny Kovalenko wrote:
  
  On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:

  Alexandre Sunny Kovalenko wrote:
  
  On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 

  Alexandre Sunny Kovalenko wrote:
  
  I seem to be able to lock my machine by going into wpa_cli and asking 
  it
  to 'reassoc'. The reason for question mark after hard is that debug
  information (caused by wlandebug and athdebug) is being printed on the
  console. The only way to get machine's attention is to hold power 
  button
  for 8 seconds.

  So this is just livelock due to console debug msgs.
  
  I am not sure, I have parsed this well enough, so I will try to clarify:
  machine becomes unresponsive *without* any debugging turned on, to an
  extent that pressing the power button twice *does not* generate ACPI
  console message (something to the tune of going into S5 already --
  gimme a break). If I turn ath debugging on, I do see those messages,
  and only them, scrolling on the console.

  Guess I misunderstood you.
  
  I have finally got enough time and equipment to investigate this
  further. Here are some conclusions:
 
  -- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
  a livelock.
 
  -- all system does, is executing ath_intr (if_ath.c) in the tight loop
  with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
  possibility that I have caused livelock with the debug messages, I have
  put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
  reaches 10,000. Without any kind of the debug messages, it will be
  triggered within 40-60 seconds after starting of wpa_supplicant.
 
  -- I suspect that comment below, might not hold true on my equipment
  if (status  HAL_INT_MIB) {
 sc-sc_stats.ast_mib++;
 /*
  * Disable interrupts until we service the MIB
  * interrupt; otherwise it will continue to fire.
  */
 ath_hal_intrset(ah, 0);
 /*
  * Let the hal handle the event.  We assume it will 
  * clear whatever condition caused the interrupt.   
  */
  ath_hal_mibevent(ah, sc-sc_halstats);
  ath_hal_intrset(ah, sc-sc_imask);
  }
 
  My hardware is:
  ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
  RF5413)
  ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
  pci3
  ath0: [ITHREAD]
  ath0: using obsoleted if_watchdog interface
  ath0: Ethernet address: 00:16:cf:26:2f:3f
  ath0: mac 10.3 phy 6.1 radio 10.2
 
  My wpa_supplicant.conf is:
  ctrl_interface=/var/run/wpa_supplicant
  ctrl_interface_group=wheel
  eapol_version=2
 
  network={
ssid=XXX
scan_ssid=1
priority=1
proto=WPA
pairwise=TKIP
group=TKIP
key_mgmt=WPA-PSK
psk=xx
  }
 
  Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
  different values. Only 11G SSID is configured in wpa_supplicant.conf. In
  the test setup, AP is with 10' (3m) from the laptop.
 
  AP is successfully used by handful of Windows clients (including this
  same laptop) and iBook G4.
 
  Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
  anything but normal scan. 
 
  athdebug 0x loops with ath_intr: status 0x1000 until I power
  down my laptop.
 
  I would appreciate any suggestion on what I can investigate further --
  at this point I have comfortable console setup and should be able to
  field requests for further information much better.
 

  Are you running powerd?
  
  I do. And I just tried disabling it, and I could not reproduce the
  problem any more. Is there any way to reconcile if_ath with powerd?
 

 
 Don't know.  There appear to be two issues.  When the MIB interrupts 
 arrive the kernel may service them w/ the cpu at a reduced clock 
 frequency.  Since powered is currently the only mechanism for increasing 
 the frequency and it runs in user space it can take a while to bump the 
 clock rate leading to livelock (because the logic to reduce the _cause_ 
 of the MIB interrupt takes a long time to run).  John Baldwin suggested 
 raising the clock frequency when handling interrupts in the kernel but 
 nothing has been done to make that happen.
 
 Separately there is a question as to why the MIB interrupts are 
 happening at all.  This is possibly due to misprogramming of the 
 baseband h/w in the ath card.  Unfortunately I've been trying to get 
 Atheros to help understand/resolve this question for a very long time 
 (as their code also exhibits this behaviour) but they've been 
 unresponsive.  I have some experimental code to address this in new hal 
 versions (such as 0.10.5.6 available in http://www.freebsd.org/~sam) but 
 apparently it does not 

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-12 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:

On Sat, 2008-07-12 at 09:57 -0700, Sam Leffler wrote:
  

Alexandre Sunny Kovalenko wrote:


On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote:
  
  

Alexandre Sunny Kovalenko wrote:



On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:
  
  

Alexandre Sunny Kovalenko wrote:


On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 
  
  

Alexandre Sunny Kovalenko wrote:



I seem to be able to lock my machine by going into wpa_cli and asking it
to 'reassoc'. The reason for question mark after hard is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.
  
  

So this is just livelock due to console debug msgs.



I am not sure, I have parsed this well enough, so I will try to clarify:
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of going into S5 already --
gimme a break). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.
  
  

Guess I misunderstood you.



I have finally got enough time and equipment to investigate this
further. Here are some conclusions:

-- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
a livelock.

-- all system does, is executing ath_intr (if_ath.c) in the tight loop
with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
possibility that I have caused livelock with the debug messages, I have
put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
reaches 10,000. Without any kind of the debug messages, it will be
triggered within 40-60 seconds after starting of wpa_supplicant.

-- I suspect that comment below, might not hold true on my equipment
if (status  HAL_INT_MIB) {
   sc-sc_stats.ast_mib++;
   /*
* Disable interrupts until we service the MIB
* interrupt; otherwise it will continue to fire.
*/
   ath_hal_intrset(ah, 0);
   /*
* Let the hal handle the event.  We assume it will 
* clear whatever condition caused the interrupt.   
*/
ath_hal_mibevent(ah, sc-sc_halstats);
ath_hal_intrset(ah, sc-sc_imask);
}

My hardware is:
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413)
ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: 00:16:cf:26:2f:3f
ath0: mac 10.3 phy 6.1 radio 10.2

My wpa_supplicant.conf is:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=wheel
eapol_version=2

network={
  ssid=XXX
  scan_ssid=1
  priority=1
  proto=WPA
  pairwise=TKIP
  group=TKIP
  key_mgmt=WPA-PSK
  psk=xx
}

Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
different values. Only 11G SSID is configured in wpa_supplicant.conf. In
the test setup, AP is with 10' (3m) from the laptop.

AP is successfully used by handful of Windows clients (including this
same laptop) and iBook G4.

Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
anything but normal scan. 


athdebug 0x loops with ath_intr: status 0x1000 until I power
down my laptop.

I would appreciate any suggestion on what I can investigate further --
at this point I have comfortable console setup and should be able to
field requests for further information much better.

  
  

Are you running powerd?



I do. And I just tried disabling it, and I could not reproduce the
problem any more. Is there any way to reconcile if_ath with powerd?

  
  
Don't know.  There appear to be two issues.  When the MIB interrupts 
arrive the kernel may service them w/ the cpu at a reduced clock 
frequency.  Since powered is currently the only mechanism for increasing 
the frequency and it runs in user space it can take a while to bump the 
clock rate leading to livelock (because the logic to reduce the _cause_ 
of the MIB interrupt takes a long time to run).  John Baldwin suggested 
raising the clock frequency when handling interrupts in the kernel but 
nothing has been done to make that happen.


Separately there is a question as to why the MIB interrupts are 
happening at all.  This is possibly due to misprogramming of the 
baseband h/w in the ath card.  Unfortunately I've been trying to get 
Atheros to help understand/resolve this question for a very long time 
(as their code also exhibits this behaviour) but they've been 
unresponsive.  I have some experimental code to address this in new hal 
versions (such as 0.10.5.6 available in http://www.freebsd.org/~sam) but 
apparently it does not entirely fix 

Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-11 Thread Alexandre Sunny Kovalenko
On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:
 Alexandre Sunny Kovalenko wrote:
  On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 
  Alexandre Sunny Kovalenko wrote:
  I seem to be able to lock my machine by going into wpa_cli and asking it
  to 'reassoc'. The reason for question mark after hard is that debug
  information (caused by wlandebug and athdebug) is being printed on the
  console. The only way to get machine's attention is to hold power button
  for 8 seconds.
  So this is just livelock due to console debug msgs.
  I am not sure, I have parsed this well enough, so I will try to clarify:
  machine becomes unresponsive *without* any debugging turned on, to an
  extent that pressing the power button twice *does not* generate ACPI
  console message (something to the tune of going into S5 already --
  gimme a break). If I turn ath debugging on, I do see those messages,
  and only them, scrolling on the console.
 
 Guess I misunderstood you.

I have finally got enough time and equipment to investigate this
further. Here are some conclusions:

-- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
a livelock.

-- all system does, is executing ath_intr (if_ath.c) in the tight loop
with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
possibility that I have caused livelock with the debug messages, I have
put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
reaches 10,000. Without any kind of the debug messages, it will be
triggered within 40-60 seconds after starting of wpa_supplicant.

-- I suspect that comment below, might not hold true on my equipment
if (status  HAL_INT_MIB) {
   sc-sc_stats.ast_mib++;
   /*
* Disable interrupts until we service the MIB
* interrupt; otherwise it will continue to fire.
*/
   ath_hal_intrset(ah, 0);
   /*
* Let the hal handle the event.  We assume it will 
* clear whatever condition caused the interrupt.   
*/
ath_hal_mibevent(ah, sc-sc_halstats);
ath_hal_intrset(ah, sc-sc_imask);
}

My hardware is:
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413)
ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: 00:16:cf:26:2f:3f
ath0: mac 10.3 phy 6.1 radio 10.2

My wpa_supplicant.conf is:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=wheel
eapol_version=2

network={
  ssid=XXX
  scan_ssid=1
  priority=1
  proto=WPA
  pairwise=TKIP
  group=TKIP
  key_mgmt=WPA-PSK
  psk=xx
}

Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
different values. Only 11G SSID is configured in wpa_supplicant.conf. In
the test setup, AP is with 10' (3m) from the laptop.

AP is successfully used by handful of Windows clients (including this
same laptop) and iBook G4.

Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
anything but normal scan. 

athdebug 0x loops with ath_intr: status 0x1000 until I power
down my laptop.

I would appreciate any suggestion on what I can investigate further --
at this point I have comfortable console setup and should be able to
field requests for further information much better.

-- 
Alexandre Sunny Kovalenko (Олександр Коваленко)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-07-11 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:

On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote:

Alexandre Sunny Kovalenko wrote:
On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 

Alexandre Sunny Kovalenko wrote:

I seem to be able to lock my machine by going into wpa_cli and asking it
to 'reassoc'. The reason for question mark after hard is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.

So this is just livelock due to console debug msgs.

I am not sure, I have parsed this well enough, so I will try to clarify:
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of going into S5 already --
gimme a break). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.

Guess I misunderstood you.


I have finally got enough time and equipment to investigate this
further. Here are some conclusions:

-- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed
a livelock.

-- all system does, is executing ath_intr (if_ath.c) in the tight loop
with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate
possibility that I have caused livelock with the debug messages, I have
put conditional panic() into ath_intr, as soon as sc-sc_stats.ast_mib
reaches 10,000. Without any kind of the debug messages, it will be
triggered within 40-60 seconds after starting of wpa_supplicant.

-- I suspect that comment below, might not hold true on my equipment
if (status  HAL_INT_MIB) {
   sc-sc_stats.ast_mib++;
   /*
* Disable interrupts until we service the MIB
* interrupt; otherwise it will continue to fire.
*/
   ath_hal_intrset(ah, 0);
   /*
* Let the hal handle the event.  We assume it will 
* clear whatever condition caused the interrupt.   
*/
ath_hal_mibevent(ah, sc-sc_halstats);
ath_hal_intrset(ah, sc-sc_imask);
}

My hardware is:
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413)
ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: 00:16:cf:26:2f:3f
ath0: mac 10.3 phy 6.1 radio 10.2

My wpa_supplicant.conf is:
ctrl_interface=/var/run/wpa_supplicant
ctrl_interface_group=wheel
eapol_version=2

network={
  ssid=XXX
  scan_ssid=1
  priority=1
  proto=WPA
  pairwise=TKIP
  group=TKIP
  key_mgmt=WPA-PSK
  psk=xx
}

Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to
different values. Only 11G SSID is configured in wpa_supplicant.conf. In
the test setup, AP is with 10' (3m) from the laptop.

AP is successfully used by handful of Windows clients (including this
same laptop) and iBook G4.

Neither wpa_supplicant with '-d -d' nor wlandebug 0x show
anything but normal scan. 


athdebug 0x loops with ath_intr: status 0x1000 until I power
down my laptop.

I would appreciate any suggestion on what I can investigate further --
at this point I have comfortable console setup and should be able to
field requests for further information much better.



Are you running powerd?

Sam

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-05-16 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:
On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 

Alexandre Sunny Kovalenko wrote:

I seem to be able to lock my machine by going into wpa_cli and asking it
to 'reassoc'. The reason for question mark after hard is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.

So this is just livelock due to console debug msgs.

I am not sure, I have parsed this well enough, so I will try to clarify:
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of going into S5 already --
gimme a break). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.


Guess I misunderstood you.


Note: manual reassociation is just the handy way to reproduce the
problem -- I have had machine locking up on me the whole day long
completely on its own.

Below are, what I think, relevant pieces of information. If anything is
missing, please, chastise me appropriately and will do my best to
provide. I have rigged firewire console, but am unable to break into the
debugger locally or remotely.

I see no log msgs.

I am sorry -- mailman must have eaten it up -- I have posted them here
now:

http://members.verizon.net/~akovalenko/Misc/reassoc.log.gz


All I see in the log is normal scanning.




While I am on the subject, I would appreciate couple of the
troubleshooting suggestions:
* is there any way to get sysctl dev.ath.0.debug to appear, other then
defining ATH_DEBUG in something like /usr/src/sys/dev/ath/ah_osdep.h?

options ATH_DEBUG

That does not seem to work for if_ath built as the module, sorry for not
being clear in that respect.


If you are build a module then you must edit the Makefile to add the 
option.  Feel free to provide a patch to improve this situation.





* is there minimal, but still usable mask for athdebug and wlandebug? I
have started with 0x and kept trimming likely high-volume
settings until output slowed down to the reasonable pace.
Why do you want debug msgs from ath?  The debug msgs from wlandebug 
depend on what you're trying to debug.

Because neither wpa_supplicant (quoted below), nor wlandebug (in the URL
above) gave me the answer -- it looks like we are going into the scan
with the specific SSID in mind and never come back, so I went for the
next level. However, could you, please, clarify that I understood you
correctly -- you *do not* want to see mix of wlandebug and athdebug
messages in the report, and I should turn wlandebug off before turning
athdebug on, right?


wlandebug msgs are typically low duty and can be left enabled when you 
add driver-level debug msgs.  But I can't see from the log you sent what 
is going on.  Try reducing the noise by eliminating all the ath debug 
msgs; maybe provide one log w/ only wlan msgs and one w/ wlan+ath.




I suggest that when debugging you start from the highest layer and move 
downward.  If you can't find what you need in a wpa_supplicant log then 
turn on msgs in net80211 with wlandebug.  If that doesn't tell you what 
you need then move to the driver.  Blindly turning everything on can 
easily livelock your system.  

That's what I did -- what I have posted is the end result of the walking
down that chain, and I assumed, possibly incorrectly, that you would
want result of all three to put things in context. I do apologize for
the misunderstanding.

For high volume msgs I often do something 
like:


athdebug +intr; sleep 1; athdebug -intr

or

athdebug +intr; read x; athdebug -intr

so a carriage return will disable msgs.

Thank you for the suggestion.




* what facility does wpa_supplicant use, when forced to syslog by -s
switch?

trouble% cd /data/freebsd/head/contrib/wpa_supplicant/
trouble% grep openlog *.c
common.c:   openlog(wpa_supplicant, LOG_PID | LOG_NDELAY, LOG_DAEMON);

Thank you, should have done this myself, sorry.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-05-13 Thread Alexandre Sunny Kovalenko
On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: 
 Alexandre Sunny Kovalenko wrote:
  I seem to be able to lock my machine by going into wpa_cli and asking it
  to 'reassoc'. The reason for question mark after hard is that debug
  information (caused by wlandebug and athdebug) is being printed on the
  console. The only way to get machine's attention is to hold power button
  for 8 seconds.
 
 So this is just livelock due to console debug msgs.
I am not sure, I have parsed this well enough, so I will try to clarify:
machine becomes unresponsive *without* any debugging turned on, to an
extent that pressing the power button twice *does not* generate ACPI
console message (something to the tune of going into S5 already --
gimme a break). If I turn ath debugging on, I do see those messages,
and only them, scrolling on the console.
 
  
  Note: manual reassociation is just the handy way to reproduce the
  problem -- I have had machine locking up on me the whole day long
  completely on its own.
  
  Below are, what I think, relevant pieces of information. If anything is
  missing, please, chastise me appropriately and will do my best to
  provide. I have rigged firewire console, but am unable to break into the
  debugger locally or remotely.
 
 I see no log msgs.
I am sorry -- mailman must have eaten it up -- I have posted them here
now:

http://members.verizon.net/~akovalenko/Misc/reassoc.log.gz

 
  
  While I am on the subject, I would appreciate couple of the
  troubleshooting suggestions:
  * is there any way to get sysctl dev.ath.0.debug to appear, other then
  defining ATH_DEBUG in something like /usr/src/sys/dev/ath/ah_osdep.h?
 
 options ATH_DEBUG
That does not seem to work for if_ath built as the module, sorry for not
being clear in that respect.

 
  * is there minimal, but still usable mask for athdebug and wlandebug? I
  have started with 0x and kept trimming likely high-volume
  settings until output slowed down to the reasonable pace.
 
 Why do you want debug msgs from ath?  The debug msgs from wlandebug 
 depend on what you're trying to debug.
Because neither wpa_supplicant (quoted below), nor wlandebug (in the URL
above) gave me the answer -- it looks like we are going into the scan
with the specific SSID in mind and never come back, so I went for the
next level. However, could you, please, clarify that I understood you
correctly -- you *do not* want to see mix of wlandebug and athdebug
messages in the report, and I should turn wlandebug off before turning
athdebug on, right?

 
 I suggest that when debugging you start from the highest layer and move 
 downward.  If you can't find what you need in a wpa_supplicant log then 
 turn on msgs in net80211 with wlandebug.  If that doesn't tell you what 
 you need then move to the driver.  Blindly turning everything on can 
 easily livelock your system.  
That's what I did -- what I have posted is the end result of the walking
down that chain, and I assumed, possibly incorrectly, that you would
want result of all three to put things in context. I do apologize for
the misunderstanding.

 For high volume msgs I often do something 
 like:
 
 athdebug +intr; sleep 1; athdebug -intr
 
 or
 
 athdebug +intr; read x; athdebug -intr
 
 so a carriage return will disable msgs.
Thank you for the suggestion.

 
 
  * what facility does wpa_supplicant use, when forced to syslog by -s
  switch?
 
 trouble% cd /data/freebsd/head/contrib/wpa_supplicant/
 trouble% grep openlog *.c
 common.c: openlog(wpa_supplicant, LOG_PID | LOG_NDELAY, LOG_DAEMON);
Thank you, should have done this myself, sorry.

-- 
Alexandre Sunny Kovalenko (Олександр Коваленко)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7

2008-05-12 Thread Sam Leffler

Alexandre Sunny Kovalenko wrote:

I seem to be able to lock my machine by going into wpa_cli and asking it
to 'reassoc'. The reason for question mark after hard is that debug
information (caused by wlandebug and athdebug) is being printed on the
console. The only way to get machine's attention is to hold power button
for 8 seconds.


So this is just livelock due to console debug msgs.



Note: manual reassociation is just the handy way to reproduce the
problem -- I have had machine locking up on me the whole day long
completely on its own.

Below are, what I think, relevant pieces of information. If anything is
missing, please, chastise me appropriately and will do my best to
provide. I have rigged firewire console, but am unable to break into the
debugger locally or remotely.


I see no log msgs.



While I am on the subject, I would appreciate couple of the
troubleshooting suggestions:
* is there any way to get sysctl dev.ath.0.debug to appear, other then
defining ATH_DEBUG in something like /usr/src/sys/dev/ath/ah_osdep.h?


options ATH_DEBUG


* is there minimal, but still usable mask for athdebug and wlandebug? I
have started with 0x and kept trimming likely high-volume
settings until output slowed down to the reasonable pace.


Why do you want debug msgs from ath?  The debug msgs from wlandebug 
depend on what you're trying to debug.


I suggest that when debugging you start from the highest layer and move 
downward.  If you can't find what you need in a wpa_supplicant log then 
turn on msgs in net80211 with wlandebug.  If that doesn't tell you what 
you need then move to the driver.  Blindly turning everything on can 
easily livelock your system.  For high volume msgs I often do something 
like:


athdebug +intr; sleep 1; athdebug -intr

or

athdebug +intr; read x; athdebug -intr

so a carriage return will disable msgs.



* what facility does wpa_supplicant use, when forced to syslog by -s
switch?


trouble% cd /data/freebsd/head/contrib/wpa_supplicant/
trouble% grep openlog *.c
common.c:   openlog(wpa_supplicant, LOG_PID | LOG_NDELAY, LOG_DAEMON);




===

* OS: 
RELENG_7 cvsup'ed on May 11th @ 15:30EST, if_ath is loaded as the module


* HW: 
[EMAIL PROTECTED]:3:0:0:	class=0x02 card=0x058a1014 chip=0x1014168c rev=0x01

hdr=0x00
vendor = 'Atheros Communications Inc.'
device = 'AR5212 Atheros AR5212 802.11abg wireless'
class  = network
subclass   = ethernet
ath0: Atheros 5212 mem 0xedf0-0xedf0 irq 17 at device 0.0 on
pci3
ath0: [ITHREAD]
ath0: using obsoleted if_watchdog interface
ath0: Ethernet address: xx:xx:xx:xx:xx:xx
ath0: mac 10.3 phy 6.1 radio 10.2

* Debug settings:
net.wlan.0.debug:
0xdfff11n,debug,crypto,input,xrate,elemid,node,assoc,auth,scan,output,state,power,dot1x,dot1xsm,radius,raddump,radkeys,wpa,acl,wme,superg,doth,inact,roam,rate
dev.ath.0.debug:
0xffef8ff0rate,reset,mode,beacon,watchdog,beacon_proc,calibrate,keycache,state,node,ff,dfs,fatal

* Console output (somewhat trimmed) is attached -- last group of
messages keeps repeating until I power down the machine.

* Output of 'wpa_supplicant -Dbsd -iath0 -d -d -K
-c/etc/wpa_supplicant.conf' (hand transcribed) is:
complaining that it did not find any AP broadcasting needed SSID
Starting AP scan (specific SSID)
Scan SSID = hexdump_ascii (len = NN)
xx xx xx xx xx xx

* /etc/wpa_supplicant.conf is attached

* Setup: two stations with the identical SSID: Linksys WRT54Gv8 (running
dd-wrt micro) at about 50' and Zyxel P-330W at about 20'. 






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]