Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?

2013-06-05 Thread Adrian Chadd
What's the FCC rules for 5.8ghz atm?



adrian

On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com wrote:
 Hi Adrian,
 I've done a modification for 5.8 ghz and some outdoor tests usin
 ubnt-hardware.
 Joe

 Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd :

 Hi all,

 I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD.
 Well, there's support for them (and they work!) but I'm doing up some
 regulatory entries for them.

 So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain
 entries to db.txt in Linux, for regulatory domains that allow them to
 be used unlicenced?

 Thanks,



 Adrian
 ___
 ath9k-devel mailing list
 ath9k-devel@lists.ath9k.org
 https://lists.ath9k.org/mailman/listinfo/ath9k-devel
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?

2013-06-05 Thread Josef Semler
Do nothing know about FCC. As I know ist open in us without tech.
Regulations like dfs or tpc. (No disturbance of radar devices in this area)
In Europe (like Austria) the channel 149-165 are used for comm. services
acc the srd-specification but are NOT ism and has to be coordinated by the
local frequency office.
Joe

Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd :

 What's the FCC rules for 5.8ghz atm?



 adrian

 On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com javascript:;
 wrote:
  Hi Adrian,
  I've done a modification for 5.8 ghz and some outdoor tests usin
  ubnt-hardware.
  Joe
 
  Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd :
 
  Hi all,
 
  I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD.
  Well, there's support for them (and they work!) but I'm doing up some
  regulatory entries for them.
 
  So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain
  entries to db.txt in Linux, for regulatory domains that allow them to
  be used unlicenced?
 
  Thanks,
 
 
 
  Adrian
  ___
  ath9k-devel mailing list
  ath9k-devel@lists.ath9k.org javascript:;
  https://lists.ath9k.org/mailman/listinfo/ath9k-devel

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?

2013-06-05 Thread Josef Semler
Do you know this document of FCC?
It describes all about the actual situation in us.

http://hraunfoss.fcc.gov/edocs_public/attachmatch/FCC-13-22A1.pdf

Joe

Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd :

 What's the FCC rules for 5.8ghz atm?



 adrian

 On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com javascript:;
 wrote:
  Hi Adrian,
  I've done a modification for 5.8 ghz and some outdoor tests usin
  ubnt-hardware.
  Joe
 
  Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd :
 
  Hi all,
 
  I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD.
  Well, there's support for them (and they work!) but I'm doing up some
  regulatory entries for them.
 
  So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain
  entries to db.txt in Linux, for regulatory domains that allow them to
  be used unlicenced?
 
  Thanks,
 
 
 
  Adrian
  ___
  ath9k-devel mailing list
  ath9k-devel@lists.ath9k.org javascript:;
  https://lists.ath9k.org/mailman/listinfo/ath9k-devel

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error

2013-06-05 Thread Helmut Schaa
Hi,

On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote:
 This error seems to be really rare, and we do not know real couse of it.
 But, in any case, we should check size of head before reducing it.

We had a similar issue in rt2x00 quite some time ago.

In general mac80211 should always reserve enough headroom as requested by
the driver in hw-extra_tx_headroom. However, there is a chance that a frame is
send to the driver again (see ieee80211_handle_filtered_frame). But if the frame
payload (or head) was moved due to padding and was not restored before calling
ieee80211_tx_status by the driver the second trip through the driver has reduced
headroom and could lead to such an error.

Quickly checking ath9k_htc it seems as if ath9k_htc_tx adds some padding
but ath9k_htc_tx_process does not remove the padding when passing the frame
back to mac80211.

Helmut
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] install hostpad compile error

2013-06-05 Thread 吳淑敏
Hi:
   I installed hostpad-1.0 and do make
I got an compile error as following:

../src/crypto/tls_openssl.c:23:25: fatal error: openssl/ssl.h: No such file
or directory
compilation terminated.


Could you find me a solution ?Thanks

Sincerely
Angela
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error

2013-06-05 Thread Helmut Schaa
On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote:
 This error seems to be really rare, and we do not know real couse of it.
 But, in any case, we should check size of head before reducing it.

Mind to try the (completely untested) patch against wireless-testing instead?
Helmut

---
Subject: [PATCH] ath9k_htc: Restore skb headroom when returning skb to mac80211

ath9k_htc adds padding between the 802.11 header and the payload during
TX by moving the header. When handing the frame back to mac80211 for TX
status handling the header is not moved back into its original position.
This can result in a too small skb headroom when entering ath9k_htc
again (due to a soft retransmission for example) causing an
skb_under_panic oops.

Fix this by moving the 802.11 header back into its original position
before returning the frame to mac80211 as other drivers like rt2x00
or ath5k do.

Signed-off-by: Helmut Schaa helmut.sc...@googlemail.com
---
 drivers/net/wireless/ath/ath9k/htc_drv_txrx.c |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
index e602c95..666cfb6 100644
--- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
+++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c
@@ -448,6 +448,8 @@ static void ath9k_htc_tx_process(struct
ath9k_htc_priv *priv,
 struct ieee80211_conf *cur_conf = priv-hw-conf;
 bool txok;
 int slot;
+struct ieee80211_hdr *hdr;
+int padpos, padsize;

 slot = strip_drv_header(priv, skb);
 if (slot  0) {
@@ -504,6 +506,15 @@ send_mac80211:

 ath9k_htc_tx_clear_slot(priv, slot);

+/* Remove padding before handing frame back to mac80211 */
+hdr = (struct ieee80211_hdr *) skb-data;
+padpos = ieee80211_hdrlen(hdr-frame_control);
+padsize = padpos  3;
+if (padsize  skb-len  padpos + padsize) {
+memmove(skb-data + padsize, skb-data, padpos);
+skb_pull(skb, padsize);
+}
+
 /* Send status to mac80211 */
 ieee80211_tx_status(priv-hw, skb);
 }
--
1.7.10.4
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error

2013-06-05 Thread Marc Kleine-Budde
On 06/05/2013 04:24 PM, Helmut Schaa wrote:
 On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote:
 This error seems to be really rare, and we do not know real couse of it.
 But, in any case, we should check size of head before reducing it.
 
 Mind to try the (completely untested) patch against wireless-testing instead?
 Helmut

I will do, however I'm not in range of that USB wireless adapter for
about 1,5 weeks.

Marc




signature.asc
Description: OpenPGP digital signature
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error

2013-06-05 Thread Oleksij Rempel
Am 05.06.2013 16:26, schrieb Marc Kleine-Budde:
 On 06/05/2013 04:24 PM, Helmut Schaa wrote:
 On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de 
 wrote:
 This error seems to be really rare, and we do not know real couse of it.
 But, in any case, we should check size of head before reducing it.

 Mind to try the (completely untested) patch against wireless-testing instead?
 Helmut

 I will do, however I'm not in range of that USB wireless adapter for
 about 1,5 weeks.

Helmut, thank you for patch!

i'll do regression test, but not week long test. So i probably won't 
reproduce this issue.


-- 
Regards,
Oleksij
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error

2013-06-05 Thread Oleksij Rempel
Am 05.06.2013 16:46, schrieb Oleksij Rempel:
 Am 05.06.2013 16:26, schrieb Marc Kleine-Budde:
 On 06/05/2013 04:24 PM, Helmut Schaa wrote:
 On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel
 li...@rempel-privat.de wrote:
 This error seems to be really rare, and we do not know real couse of
 it.
 But, in any case, we should check size of head before reducing it.

 Mind to try the (completely untested) patch against wireless-testing
 instead?
 Helmut

 I will do, however I'm not in range of that USB wireless adapter for
 about 1,5 weeks.

 Helmut, thank you for patch!

 i'll do regression test, but not week long test. So i probably won't
 reproduce this issue.

I was running two stream netperf test for 2 hours without visible 
regressions.



-- 
Regards,
Oleksij
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Tplink TL-WN822N drops out connections randomly on Arch Linux with Kernel 3.9.4 (Mark Lee)

2013-06-05 Thread Mark E. Lee
I reset my router to the default netgear firmware and found the issue
still continues. 

After seeing this message board post :
https://bbs.archlinux.org/viewtopic.php?id=137643
I set an option for ath9k_htc to nohwcrypt=1. I will see how it goes
in terms of stability. In addition, I was wondering if one could tell me
what that option means.


On Tue, 2013-06-04 at 12:00 +0200, ath9k-devel-requ...@lists.ath9k.org
wrote:
 On 3 June 2013 11:44, Mark E. Lee m...@markelee.com wrote:
  I couldn't find hostapd.conf on my access point.
 
  I did however switch the algorithm from WPA2 to WEP and found that I
 no
  longer lost connection during skype calls.
 
 That kinda points at the rekey or the crypto handling in general.
 
 Please find and enable hostapd logging on your AP. I've seen and fixed
 bugs in freebsd recently where traffic would fill up buffers and cause
 the EAPOL rekey packets to get discarded by the driver. Thus a group
 rekey would fail, and the unit would be disconnected.
 
 
 
 adrian

-- 
Mark E. Lee m...@markelee.com


signature.asc
Description: This is a digitally signed message part
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Ben Greear
On 06/05/2013 02:11 PM, Tejun Heo wrote:
 (cc'ing wireless crowd, tglx and Ingo.  The original thread is at
   http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 )

 Hello, Ben.

 On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote:
 Hmm, wonder if I found it.  I previously saw times where it appears
 jiffies does not increment.  __do_softirq has a break-out based on
 jiffies timeout.  Maybe that is failing to get us out of __do_softirq
 in my lockup case because for whatever reason the system cannot update
 jiffies in this case?

 I added this (probably whitespace damaged) hack and now I have not been
 able to reproduce the problem.

 Ah, nice catch. :)

 diff --git a/kernel/softirq.c b/kernel/softirq.c
 index 14d7758..621ea3b 100644
 --- a/kernel/softirq.c
 +++ b/kernel/softirq.c
 @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void)
  unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
  int cpu;
  unsigned long old_flags = current-flags;
 +   unsigned long loops = 0;

  /*
   * Mask out PF_MEMALLOC s current task context is borrowed for the
 @@ -241,6 +242,7 @@ restart:
  unsigned int vec_nr = h - softirq_vec;
  int prev_count = preempt_count();

 +   loops++;
  kstat_incr_softirqs_this_cpu(vec_nr);

  trace_softirq_entry(vec_nr);
 @@ -265,7 +267,7 @@ restart:

  pending = local_softirq_pending();
  if (pending) {
 -   if (time_before(jiffies, end)  !need_resched())
 +   if (time_before(jiffies, end)  !need_resched()  (loops  
 500))
  goto restart;

 So, softirq most likely kicked off from ath9k is rescheduling itself
 to the extent where it ends up locking out the CPU completely.  The
 problem is usually okay because the processing would break out in 2ms
 but as jiffies is stopped in this case with all other CPUs trapped in
 stop_machine, the loop never breaks and the machine hangs.  While
 adding the counter limit probably isn't a bad idea, softirq requeueing
 itself indefinitely sounds pretty buggy.

Just to be clear on the ath9k part for the wifi folks:

This is basically un-patched 3.9.4, but I have 200 virtual stations
configured on each of two ath9k radios.  I cannot reproduce the problem
without ath9k, but I do not know for certain ath9k is the real
culprit.

In the case where I can most easily reproduce the lockup, ath9k virtual
stations would be trying to associate, so I'd expect a fair amount
of packet processing to be happening...

 ath9k people, do you guys have any idea what's going on?  Why would
 softirq repeat itself indefinitely?

 Ingo, Thomas, we're seeing a stop_machine hanging because

 * All other CPUs entered IRQ disabled stage.  Jiffies is not being
updated.

 * The last CPU get caught up executing softirq indefinitely.  As
jiffies doesn't get updated, it never breaks out of softirq
handling.  This is a deadlock.  This CPU won't break out of softirq
handling unless jiffies is updated and other CPUs can't do anything
until this CPU enters the same stop_machine stage.

 Ben found out that breaking out of softirq handling after certain
 number of repetitions makes the issue go away, which isn't a proper
 fix but we might want anyway.  What do you guys think?

Thanks,
Ben



-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Tejun Heo
(cc'ing wireless crowd, tglx and Ingo.  The original thread is at
 http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 )

Hello, Ben.

On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote:
 Hmm, wonder if I found it.  I previously saw times where it appears
 jiffies does not increment.  __do_softirq has a break-out based on
 jiffies timeout.  Maybe that is failing to get us out of __do_softirq
 in my lockup case because for whatever reason the system cannot update
 jiffies in this case?
 
 I added this (probably whitespace damaged) hack and now I have not been
 able to reproduce the problem.

Ah, nice catch. :)

 diff --git a/kernel/softirq.c b/kernel/softirq.c
 index 14d7758..621ea3b 100644
 --- a/kernel/softirq.c
 +++ b/kernel/softirq.c
 @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void)
 unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
 int cpu;
 unsigned long old_flags = current-flags;
 +   unsigned long loops = 0;
 
 /*
  * Mask out PF_MEMALLOC s current task context is borrowed for the
 @@ -241,6 +242,7 @@ restart:
 unsigned int vec_nr = h - softirq_vec;
 int prev_count = preempt_count();
 
 +   loops++;
 kstat_incr_softirqs_this_cpu(vec_nr);
 
 trace_softirq_entry(vec_nr);
 @@ -265,7 +267,7 @@ restart:
 
 pending = local_softirq_pending();
 if (pending) {
 -   if (time_before(jiffies, end)  !need_resched())
 +   if (time_before(jiffies, end)  !need_resched()  (loops  
 500))
 goto restart;

So, softirq most likely kicked off from ath9k is rescheduling itself
to the extent where it ends up locking out the CPU completely.  The
problem is usually okay because the processing would break out in 2ms
but as jiffies is stopped in this case with all other CPUs trapped in
stop_machine, the loop never breaks and the machine hangs.  While
adding the counter limit probably isn't a bad idea, softirq requeueing
itself indefinitely sounds pretty buggy.

ath9k people, do you guys have any idea what's going on?  Why would
softirq repeat itself indefinitely?

Ingo, Thomas, we're seeing a stop_machine hanging because

* All other CPUs entered IRQ disabled stage.  Jiffies is not being
  updated.

* The last CPU get caught up executing softirq indefinitely.  As
  jiffies doesn't get updated, it never breaks out of softirq
  handling.  This is a deadlock.  This CPU won't break out of softirq
  handling unless jiffies is updated and other CPUs can't do anything
  until this CPU enters the same stop_machine stage.

Ben found out that breaking out of softirq handling after certain
number of repetitions makes the issue go away, which isn't a proper
fix but we might want anyway.  What do you guys think?

Thanks.

-- 
tejun
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


[ath9k-devel] Setting Backoff values

2013-06-05 Thread Ruwaifa Anwar
We tried changing backoff value range through this function in mac.c

REG_WRITE(ah, AR_DLCL_IFS(q),
SM(cwMin, AR_D_LCL_IFS_CWMIN) |
SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) |
SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS));

Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving 
almost the same throughput as with 1, 1023. Doesn't this mean, backoff window 
range is not changing properly? What is the correct way to do this.


___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Setting Backoff values

2013-06-05 Thread Adrian Chadd
How are you testing?



adrian


On 5 June 2013 18:01, Ruwaifa Anwar ruwaifa.an...@gmail.com wrote:
 We tried changing backoff value range through this function in mac.c

 REG_WRITE(ah, AR_DLCL_IFS(q),
 SM(cwMin, AR_D_LCL_IFS_CWMIN) |
 SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) |
 SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS));

 Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving
 almost the same throughput as with 1, 1023. Doesn't this mean, backoff window
 range is not changing properly? What is the correct way to do this.


 ___
 ath9k-devel mailing list
 ath9k-devel@lists.ath9k.org
 https://lists.ath9k.org/mailman/listinfo/ath9k-devel
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Setting Backoff values

2013-06-05 Thread Ruwaifa Anwar
Ruwaifa Anwar ruwaifa.anwar at gmail.com writes:

 
 We tried changing backoff value range through this function in mac.c
 
 REG_WRITE(ah, AR_DLCL_IFS(q),
 SM(cwMin, AR_D_LCL_IFS_CWMIN) |
 SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) |
 SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS));
 
 Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving 
 almost the same throughput as with 1, 1023. Doesn't this mean, backoff 
window 
 range is not changing properly? What is the correct way to do this.
 

By calculating number of MPDUs successfully completed per unit time.

Secondly i want to claritfy one more thing
As you said in previous posts that this alters backoff window for all frames 
in a queue. What happens if there's a long retry, will hardware still use the 
given values or use exponentially increased values of cwmin and cwmax




___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Setting Backoff values

2013-06-05 Thread Adrian Chadd
Well, backoff is only going to increment if it fails, so if you have a
mostly clear air, you're not going to see many failures.

Do the math and see if you're already filling the air with transmissions.

Try setting it max,max instead of min,min and see what happens.

The backoff counter is reset after each transmission attempt. So if
you have 5 frames in the queue, it'll do backoff for one, once that
frme goes out or expires, it'll start the next one with the reset min
counter value.


Adrian
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] So long, and thanks for all the fish (kinda)

2013-06-05 Thread Joshua Isom
First, NOOO!!

Second, will you still be given datasheets on new wireless chips to 
enable support?  Or will you at least be able to encourage Atheros to 
release the datasheets?  Tell them I bought Atheros because of you!

On 6/3/2013 2:17 PM, Adrian Chadd wrote:
 Hi all,

 This Friday will be my last day at Qualcomm Atheros. I've enjoyed
 working with the extremely bright and driven engineers and designers
 that make the wireless chips and SoCs that people everywhere
 take for granted. I've achieved a bunch of goals both with their
 internal product development and open source. But now it's time to
 move onto different things.

 I'd especially like to thank Luis Rodriguez for introducing me to the
 QCA folk and helping me get access to the Atheros open source project,
 as well as the follow-up discussions that led to me being hired.

 The open source wireless community has been driving innovation in a
 lot of areas for a number of years. I'd like to hope that I've had a
 small, positive effect on that. I wish you all the best of luck in
 pushing forward and continuing to innovate.

 Now, I'm still NDA-enabled and I quite like hacking on this wireless
 stuff so I won't be quitting hacking on things. I will just have other
 things on my mind.

 Good luck to you all!


 Adrian
 ___
 freebsd-wirel...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-wireless
 To unsubscribe, send any mail to freebsd-wireless-unsubscr...@freebsd.org


___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 14:11 -0700, Tejun Heo wrote:
 (cc'ing wireless crowd, tglx and Ingo.  The original thread is at
  http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 )
 
 Hello, Ben.
 
 On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote:
  Hmm, wonder if I found it.  I previously saw times where it appears
  jiffies does not increment.  __do_softirq has a break-out based on
  jiffies timeout.  Maybe that is failing to get us out of __do_softirq
  in my lockup case because for whatever reason the system cannot update
  jiffies in this case?
  
  I added this (probably whitespace damaged) hack and now I have not been
  able to reproduce the problem.
 
 Ah, nice catch. :)
 
  diff --git a/kernel/softirq.c b/kernel/softirq.c
  index 14d7758..621ea3b 100644
  --- a/kernel/softirq.c
  +++ b/kernel/softirq.c
  @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void)
  unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
  int cpu;
  unsigned long old_flags = current-flags;
  +   unsigned long loops = 0;
  
  /*
   * Mask out PF_MEMALLOC s current task context is borrowed for the
  @@ -241,6 +242,7 @@ restart:
  unsigned int vec_nr = h - softirq_vec;
  int prev_count = preempt_count();
  
  +   loops++;
  kstat_incr_softirqs_this_cpu(vec_nr);
  
  trace_softirq_entry(vec_nr);
  @@ -265,7 +267,7 @@ restart:
  
  pending = local_softirq_pending();
  if (pending) {
  -   if (time_before(jiffies, end)  !need_resched())
  +   if (time_before(jiffies, end)  !need_resched()  (loops 
   500))
  goto restart;
 
 So, softirq most likely kicked off from ath9k is rescheduling itself
 to the extent where it ends up locking out the CPU completely.  The
 problem is usually okay because the processing would break out in 2ms
 but as jiffies is stopped in this case with all other CPUs trapped in
 stop_machine, the loop never breaks and the machine hangs.  While
 adding the counter limit probably isn't a bad idea, softirq requeueing
 itself indefinitely sounds pretty buggy.
 
 ath9k people, do you guys have any idea what's going on?  Why would
 softirq repeat itself indefinitely?
 
 Ingo, Thomas, we're seeing a stop_machine hanging because
 
 * All other CPUs entered IRQ disabled stage.  Jiffies is not being
   updated.
 
 * The last CPU get caught up executing softirq indefinitely.  As
   jiffies doesn't get updated, it never breaks out of softirq
   handling.  This is a deadlock.  This CPU won't break out of softirq
   handling unless jiffies is updated and other CPUs can't do anything
   until this CPU enters the same stop_machine stage.
 
 Ben found out that breaking out of softirq handling after certain
 number of repetitions makes the issue go away, which isn't a proper
 fix but we might want anyway.  What do you guys think?
 

Interesting

Before 3.9 and commit c10d73671ad30f5469
(softirq: reduce latencies) we used to limit the __do_softirq() loop
to 10.



___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote:

 
 Ah, so, that's why it's showing up now.  We probably have had the same
 issue all along but it used to be masked by the softirq limiting.  Do
 you care to revive the 10 iterations limit so that it's limited by
 both the count and timing?  We do wanna find out why softirq is
 spinning indefinitely tho.

Yes, no problem, I can do that.

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Ben Greear
On 06/05/2013 08:26 PM, Eric Dumazet wrote:
 On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote:


 Ah, so, that's why it's showing up now.  We probably have had the same
 issue all along but it used to be masked by the softirq limiting.  Do
 you care to revive the 10 iterations limit so that it's limited by
 both the count and timing?  We do wanna find out why softirq is
 spinning indefinitely tho.

 Yes, no problem, I can do that.

Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, that 
would
be fine by me.

I can send a version of my patch easily enough if we can agree on the max 
number of
loops (and if indeed my version of the patch is acceptable).

Thanks,
Ben


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Tejun Heo
Hello, Eric.

On Wed, Jun 05, 2013 at 06:34:52PM -0700, Eric Dumazet wrote:
  Ingo, Thomas, we're seeing a stop_machine hanging because
  
  * All other CPUs entered IRQ disabled stage.  Jiffies is not being
updated.
  
  * The last CPU get caught up executing softirq indefinitely.  As
jiffies doesn't get updated, it never breaks out of softirq
handling.  This is a deadlock.  This CPU won't break out of softirq
handling unless jiffies is updated and other CPUs can't do anything
until this CPU enters the same stop_machine stage.
  
  Ben found out that breaking out of softirq handling after certain
  number of repetitions makes the issue go away, which isn't a proper
  fix but we might want anyway.  What do you guys think?
  
 
 Interesting
 
 Before 3.9 and commit c10d73671ad30f5469
 (softirq: reduce latencies) we used to limit the __do_softirq() loop
 to 10.

Ah, so, that's why it's showing up now.  We probably have had the same
issue all along but it used to be masked by the softirq limiting.  Do
you care to revive the 10 iterations limit so that it's limited by
both the count and timing?  We do wanna find out why softirq is
spinning indefinitely tho.

Thanks.

-- 
tejun
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Ben Greear
On 06/05/2013 08:46 PM, Eric Dumazet wrote:
 On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote:
 On 06/05/2013 08:26 PM, Eric Dumazet wrote:
 On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote:


 Ah, so, that's why it's showing up now.  We probably have had the same
 issue all along but it used to be masked by the softirq limiting.  Do
 you care to revive the 10 iterations limit so that it's limited by
 both the count and timing?  We do wanna find out why softirq is
 spinning indefinitely tho.

 Yes, no problem, I can do that.

 Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, 
 that would
 be fine by me.

 I can send a version of my patch easily enough if we can agree on the max 
 number of
 loops (and if indeed my version of the patch is acceptable).

 Well, 10 was the prior limit and seems really fine.

 The non update on jiffies seems quite exceptional condition (I hope...)

 We use in Google a patch triggering warning is a thread holds the cpu
 without taking care to need_resched() for more than xx ms

Well, I'm sure that patch works nicely until the clock stops moving
forward :)

I'll post a patch with limit of 10 shortly.

Thanks,
Ben



-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:50 -0700, Ben Greear wrote:
 On 06/05/2013 08:46 PM, Eric Dumazet wrote:
 
  We use in Google a patch triggering warning is a thread holds the cpu
  without taking care to need_resched() for more than xx ms
 
 Well, I'm sure that patch works nicely until the clock stops moving
 forward :)
 

This is not using jiffies, but the clock used in kernel/sched/core.c,
with ns resolution ;)

 I'll post a patch with limit of 10 shortly.

ok


___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.

2013-06-05 Thread Eric Dumazet
On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote:
 On 06/05/2013 08:26 PM, Eric Dumazet wrote:
  On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote:
 
 
  Ah, so, that's why it's showing up now.  We probably have had the same
  issue all along but it used to be masked by the softirq limiting.  Do
  you care to revive the 10 iterations limit so that it's limited by
  both the count and timing?  We do wanna find out why softirq is
  spinning indefinitely tho.
 
  Yes, no problem, I can do that.
 
 Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, 
 that would
 be fine by me.
 
 I can send a version of my patch easily enough if we can agree on the max 
 number of
 loops (and if indeed my version of the patch is acceptable).

Well, 10 was the prior limit and seems really fine.

The non update on jiffies seems quite exceptional condition (I hope...)

We use in Google a patch triggering warning is a thread holds the cpu
without taking care to need_resched() for more than xx ms



___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel