Re: [ath9k-devel] [RFC] ath9k: Detect and work-around tx-queue hang.

2013-03-12 Thread Ben Greear
On 02/22/2013 04:55 AM, Ben Greear wrote:
 On 02/22/2013 04:38 AM, Felix Fietkau wrote:
 On 2013-02-22 1:25 PM, Sujith Manoharan wrote:
 Felix Fietkau wrote:
 Please also check if the station(s) that the frames are queued for are
 in powersave state for some reason. That would prevent the tx path from
 throwing them in the hw queue, yet they'd still take up pending-frame
 slots. I was planning on fixing this eventually by expiring frames that
 stay in the queue for too long, but haven't decided on the exact
 approach yet.

 PS is disabled for multi-VIF.
 What about off-channel PS due to scans, etc.

 Scan is always locked to the same channel in this setup (once
 a single station is associated).

 The stations stay associated while this problem happens (the
 high-priority queue seems to work just fine, which may be
 the reason they stay associated just fine.)

 In some cases, I see packets delivered around 30 seconds late...
 aside from PS and off-channel..any idea what could make a packet
 stick around that long in the tx queues?

One of our customers on a 3.5.7+ kernel hit the problem without
using any RF attenuator...just over-the-air communication to
their AP.  It happened on both 2.4 and 5Ghz bands.  Seems rx
signal is around 40 in their environment. It took them around
24 hours to hit the problem on average.

Last we checked, we could fairly easily reproduce this in
our lab using an attenuator and a certain setup, so if there
is any debugging we could add to help narrow down what
might be causing this, we can give that a try.

For instance, is there any good way to know for certain
if packets in the queue are in power-save or not?  I know
we at least attempt to disable power-save, but possibly
it gets re-enabled somehow?

Thanks,
Ben


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Questions in 802.11 n timestamps traces for aggregated frames ?

2013-03-12 Thread abhinav narain
Hi Felix, Adrian,

 /* This packet was aggregated but doesn't carry status info */
 if ((info-flags  IEEE80211_TX_CTL_AMPDU) 
 !(info-flags  IEEE80211_TX_STAT_AMPDU))
 return;

 Any packet with IEEE80211_TX_CTL_AMPDU and not IEEE80211_TX_STAT_AMPDU
 does not carry valid rate/retry information. Did you filter your debug
 stuff accordingly?



I thought* rate_control_tx_status()[hook to minstrel_rate_ht]* in *
ieee80211_tx_status()* will return back to driver (using the condition you
told above) without reaching the radiotap code, but I don' t know why it
does not.
I continued and printed the flag, and have found the following :

I added another flag(first entry) in radiotap header to indicate 1 if
 IEEE80211_TX_CTL_AMPDU is set and IEEE80211_TX_STAT_AMPDU is not (the
above condition) eg. [1,0,0,0,1] first is above cond, last flag is for
aggregation.
The driver code indicates, this will be true if the frame passes
through ath_tx_rc_status() [which is only twice].
I had noticed that if I do this, I won't probably get the timestamp of rest
of the frames given back to mac80211.

device,tsf , tx_flags, retx_count,succ rate, attempted
bitrates, seq no,(queue+airtime)delay,dumb, frame size
64:a7:69:53:0b:15 159787172 [0, 0, 0, 0, 1] 0 52.0 [] 586 1233 2 1462
64:a7:69:53:0b:15 159787172 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 587 2697 2 1462
64:a7:69:53:0b:15 159848530 [0, 0, 0, 0, 0] 0 52.0 [] 588 1436 2 1462
64:a7:69:53:0b:15 159848892 [0, 0, 0, 0, 0] 0 52.0 [] 589 1722 2 1462
64:a7:69:53:0b:15 159849272 [0, 0, 0, 0, 0] 0 52.0 [] 590 1010 2 1462
64:a7:69:53:0b:15 159849931 [0, 0, 0, 0, 0] 0 58.5 [] 591 1602 2 1462
64:a7:69:53:0b:15 159850511 [0, 0, 0, 0, 0] 0 52.0 [] 592 1366 2 1462
64:a7:69:53:0b:15 159850864 [0, 0, 0, 0, 0] 0 52.0 [] 593 1631 2 1462
64:a7:69:53:0b:15 159851253 *[0, 0, 0, 0, 1] 0 52.0 []* 594 2796 2 1462
64:a7:69:53:0b:15 159851253 *[1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]]* *595* 1016 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]] *597* 2578 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]] *598* 1623 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]] *599* 2312 2 1462 - all invalid !?
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]]* 600* 182 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]]* 601* 2102 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]]* 602* 6424 2 1462
64:a7:69:53:0b:15 159851253 [*1*, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0,
5], [52.0, 5]]* 603* 513 2 1462
64:a7:69:53:0b:15 159899519 [0, 0, 0, 0, 0] 0 52.0 [] 604 24569 2 1462
64:a7:69:53:0b:15 159903297 [0, 0, 0, 0, 0] 0 52.0 [] 605 28311 2 1462
*64:a7:69:53:0b:15 159937368 [0, 0, 0, 0, 1] 0 52.0 [] 606 38496 2 1462*
*64:a7:69:53:0b:15 159937368 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 607 3645 2 1462*
64:a7:69:53:0b:15 159937368 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 608 393 2 1462
64:a7:69:53:0b:15 159937368 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 609 809 2 1462
64:a7:69:53:0b:15 159954617 [0, 0, 0, 0, 0] 0 65.0 [] 610 51231 2 1462
*64:a7:69:53:0b:15 159956668 [0, 0, 0, 0, 1] 0 52.0 [] 611 55867 2 1462*
*64:a7:69:53:0b:15 159956668 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 612 4501 2 1462*
64:a7:69:53:0b:15 159956668 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 613 79511 2 1462
64:a7:69:53:0b:15 159956668 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 614 6829 2 1462
64:a7:69:53:0b:15 159956668 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 615 830 2 1462
64:a7:69:53:0b:15 159956668 [1, 0, 0, 0, 1] 14 52.0 [[52.0, 4], [39.0, 5],
[52.0, 5]] 616 1129 2 1462

 This suggests that first frame (594,606,611) are valid and aggregated but
the following frames are invalid.

I will be very grateful to you if you can please tell how to interpret the
frames which have
the *aggregation *bit(last flag in tx_status flags ) *set* but *do not
carry status info *and are delivered to monitor interface.
The* sequence number *are* increasing* and never repeating ever again in
trace
If I remove these frames, it looks like there was no aggregation (I guess
aggr of 1462 bytes ?),
but why are those seq. numbers never re-used ?

-
Abhinav
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Questions in 802.11 n timestamps traces for aggregated frames ?

2013-03-12 Thread Felix Fietkau
On 2013-03-13 12:48 AM, abhinav narain wrote:
 Hi Felix, Adrian,
 
 /* This packet was aggregated but doesn't carry status info */
 if ((info-flags  IEEE80211_TX_CTL_AMPDU) 
 !(info-flags  IEEE80211_TX_STAT_AMPDU))
 return;
 
 Any packet with IEEE80211_TX_CTL_AMPDU and not IEEE80211_TX_STAT_AMPDU
 does not carry valid rate/retry information. Did you filter your debug
 stuff accordingly?
 
 
 
 I thought* rate_control_tx_status()[hook to
 minstrel_rate_ht]* in *ieee80211_tx_status()* will return back to driver
 (using the condition you told above) without reaching the radiotap code,
 but I don' t know why it does not.
 I continued and printed the flag, and have found the following : 
 
 I added another flag(first entry) in radiotap header to indicate 1 if
  IEEE80211_TX_CTL_AMPDU is set and IEEE80211_TX_STAT_AMPDU is not (the
 above condition) eg. [1,0,0,0,1] first is above cond, last flag is for
 aggregation. 
 The driver code indicates, this will be true if the frame passes
 through ath_tx_rc_status() [which is only twice].
 I had noticed that if I do this, I won't probably get the timestamp of
 rest of the frames given back to mac80211.
 
 [...]
  This suggests that first frame (594,606,611) are valid and aggregated
 but the following frames are invalid.
 
 I will be very grateful to you if you can please tell how to interpret
 the frames which have 
 the *aggregation *bit(last flag in tx_status flags ) *set* but *do not
 carry status info *and are delivered to monitor interface.
 The*sequence number *are* increasing* and never repeating ever again in
 trace
 If I remove these frames, it looks like there was no aggregation (I
 guess aggr of 1462 bytes ?), 
 but why are those seq. numbers never re-used ? 
Your traces don't give you the full picture, because looking at this as
a per-frame thing is wrong.
If both IEEE80211_TX_CTL_AMPDU and IEEE80211_TX_STAT_AMPDU are set, it
means that the contained rate retry information and time stamp refer to
the whole A-MPDU, not just that one frame. Information about software
retry of frames is not passed on to mac80211 at the moment.
If you want accurate timestamp, rate and retry information, do it per
A-MPDU, not per MPDU.

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Questions in 802.11 n timestamps traces for aggregated frames ?

2013-03-12 Thread Adrian Chadd
You need to buffer those frames in your trace right up until you get
the final frame bit which tells you that hey, this has some status
associated with it.

It's arguable whether that's the correct thing to do or not. The
better (but slower) way would be to stamp each sub-frame with the
completion status information of the status descriptor, including the
rate control information.




Adrian
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Questions in 802.11 n timestamps traces for aggregated frames ?

2013-03-12 Thread abhinav narain
On Tue, Mar 12, 2013 at 7:59 PM, Adrian Chadd adr...@freebsd.org wrote:

 You need to buffer those frames in your trace right up until you get
 the final frame bit which tells you that hey, this has some status
 associated with it.

 This all the driver+mac is giving me, I haven't discarded any information
in the trace, actually.

 It's arguable whether that's the correct thing to do or not.

Yes, I am not sure myself.

 The
 better (but slower) way would be to stamp each sub-frame with the
 completion status information of the status descriptor, including the
 rate control information.

The trace exactly does that.
As the subframes of the aggregate are unfolded in* ath_tx_complete_aggr() *in
xmit.c,
and given to *ath_tx_complete_buf() *, I timestamp them with the *
ath_tx_status* descriptor
which is present, and get the above trace; turns out the descriptor is
invalid ?

-
Abhinav






 Adrian

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel