Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?
What's the FCC rules for 5.8ghz atm? adrian On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com wrote: Hi Adrian, I've done a modification for 5.8 ghz and some outdoor tests usin ubnt-hardware. Joe Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd : Hi all, I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD. Well, there's support for them (and they work!) but I'm doing up some regulatory entries for them. So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain entries to db.txt in Linux, for regulatory domains that allow them to be used unlicenced? Thanks, Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?
Do nothing know about FCC. As I know ist open in us without tech. Regulations like dfs or tpc. (No disturbance of radar devices in this area) In Europe (like Austria) the channel 149-165 are used for comm. services acc the srd-specification but are NOT ism and has to be coordinated by the local frequency office. Joe Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd : What's the FCC rules for 5.8ghz atm? adrian On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com javascript:; wrote: Hi Adrian, I've done a modification for 5.8 ghz and some outdoor tests usin ubnt-hardware. Joe Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd : Hi all, I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD. Well, there's support for them (and they work!) but I'm doing up some regulatory entries for them. So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain entries to db.txt in Linux, for regulatory domains that allow them to be used unlicenced? Thanks, Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org javascript:; https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] 4.9GHz and 5.8GHz regulatory entries?
Do you know this document of FCC? It describes all about the actual situation in us. http://hraunfoss.fcc.gov/edocs_public/attachmatch/FCC-13-22A1.pdf Joe Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd : What's the FCC rules for 5.8ghz atm? adrian On 4 June 2013 21:41, Josef Semler josef.sem...@gmail.com javascript:; wrote: Hi Adrian, I've done a modification for 5.8 ghz and some outdoor tests usin ubnt-hardware. Joe Am Mittwoch, 5. Juni 2013 schrieb Adrian Chadd : Hi all, I'm hacking at the support needed for 4.9 and 5.8GHz NICs in FreeBSD. Well, there's support for them (and they work!) but I'm doing up some regulatory entries for them. So, has anyone looked at adding 4.9 and 5.8GHz regulatory domain entries to db.txt in Linux, for regulatory domains that allow them to be used unlicenced? Thanks, Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org javascript:; https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error
Hi, On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote: This error seems to be really rare, and we do not know real couse of it. But, in any case, we should check size of head before reducing it. We had a similar issue in rt2x00 quite some time ago. In general mac80211 should always reserve enough headroom as requested by the driver in hw-extra_tx_headroom. However, there is a chance that a frame is send to the driver again (see ieee80211_handle_filtered_frame). But if the frame payload (or head) was moved due to padding and was not restored before calling ieee80211_tx_status by the driver the second trip through the driver has reduced headroom and could lead to such an error. Quickly checking ath9k_htc it seems as if ath9k_htc_tx adds some padding but ath9k_htc_tx_process does not remove the padding when passing the frame back to mac80211. Helmut ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] install hostpad compile error
Hi: I installed hostpad-1.0 and do make I got an compile error as following: ../src/crypto/tls_openssl.c:23:25: fatal error: openssl/ssl.h: No such file or directory compilation terminated. Could you find me a solution ?Thanks Sincerely Angela ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error
On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote: This error seems to be really rare, and we do not know real couse of it. But, in any case, we should check size of head before reducing it. Mind to try the (completely untested) patch against wireless-testing instead? Helmut --- Subject: [PATCH] ath9k_htc: Restore skb headroom when returning skb to mac80211 ath9k_htc adds padding between the 802.11 header and the payload during TX by moving the header. When handing the frame back to mac80211 for TX status handling the header is not moved back into its original position. This can result in a too small skb headroom when entering ath9k_htc again (due to a soft retransmission for example) causing an skb_under_panic oops. Fix this by moving the 802.11 header back into its original position before returning the frame to mac80211 as other drivers like rt2x00 or ath5k do. Signed-off-by: Helmut Schaa helmut.sc...@googlemail.com --- drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c index e602c95..666cfb6 100644 --- a/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c +++ b/drivers/net/wireless/ath/ath9k/htc_drv_txrx.c @@ -448,6 +448,8 @@ static void ath9k_htc_tx_process(struct ath9k_htc_priv *priv, struct ieee80211_conf *cur_conf = priv-hw-conf; bool txok; int slot; +struct ieee80211_hdr *hdr; +int padpos, padsize; slot = strip_drv_header(priv, skb); if (slot 0) { @@ -504,6 +506,15 @@ send_mac80211: ath9k_htc_tx_clear_slot(priv, slot); +/* Remove padding before handing frame back to mac80211 */ +hdr = (struct ieee80211_hdr *) skb-data; +padpos = ieee80211_hdrlen(hdr-frame_control); +padsize = padpos 3; +if (padsize skb-len padpos + padsize) { +memmove(skb-data + padsize, skb-data, padpos); +skb_pull(skb, padsize); +} + /* Send status to mac80211 */ ieee80211_tx_status(priv-hw, skb); } -- 1.7.10.4 ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error
On 06/05/2013 04:24 PM, Helmut Schaa wrote: On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote: This error seems to be really rare, and we do not know real couse of it. But, in any case, we should check size of head before reducing it. Mind to try the (completely untested) patch against wireless-testing instead? Helmut I will do, however I'm not in range of that USB wireless adapter for about 1,5 weeks. Marc signature.asc Description: OpenPGP digital signature ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error
Am 05.06.2013 16:26, schrieb Marc Kleine-Budde: On 06/05/2013 04:24 PM, Helmut Schaa wrote: On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote: This error seems to be really rare, and we do not know real couse of it. But, in any case, we should check size of head before reducing it. Mind to try the (completely untested) patch against wireless-testing instead? Helmut I will do, however I'm not in range of that USB wireless adapter for about 1,5 weeks. Helmut, thank you for patch! i'll do regression test, but not week long test. So i probably won't reproduce this issue. -- Regards, Oleksij ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k_htc: fix skb_under_panic error
Am 05.06.2013 16:46, schrieb Oleksij Rempel: Am 05.06.2013 16:26, schrieb Marc Kleine-Budde: On 06/05/2013 04:24 PM, Helmut Schaa wrote: On Tue, Jun 4, 2013 at 8:37 PM, Oleksij Rempel li...@rempel-privat.de wrote: This error seems to be really rare, and we do not know real couse of it. But, in any case, we should check size of head before reducing it. Mind to try the (completely untested) patch against wireless-testing instead? Helmut I will do, however I'm not in range of that USB wireless adapter for about 1,5 weeks. Helmut, thank you for patch! i'll do regression test, but not week long test. So i probably won't reproduce this issue. I was running two stream netperf test for 2 hours without visible regressions. -- Regards, Oleksij ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Tplink TL-WN822N drops out connections randomly on Arch Linux with Kernel 3.9.4 (Mark Lee)
I reset my router to the default netgear firmware and found the issue still continues. After seeing this message board post : https://bbs.archlinux.org/viewtopic.php?id=137643 I set an option for ath9k_htc to nohwcrypt=1. I will see how it goes in terms of stability. In addition, I was wondering if one could tell me what that option means. On Tue, 2013-06-04 at 12:00 +0200, ath9k-devel-requ...@lists.ath9k.org wrote: On 3 June 2013 11:44, Mark E. Lee m...@markelee.com wrote: I couldn't find hostapd.conf on my access point. I did however switch the algorithm from WPA2 to WEP and found that I no longer lost connection during skype calls. That kinda points at the rekey or the crypto handling in general. Please find and enable hostapd logging on your AP. I've seen and fixed bugs in freebsd recently where traffic would fill up buffers and cause the EAPOL rekey packets to get discarded by the driver. Thus a group rekey would fail, and the unit would be disconnected. adrian -- Mark E. Lee m...@markelee.com signature.asc Description: This is a digitally signed message part ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On 06/05/2013 02:11 PM, Tejun Heo wrote: (cc'ing wireless crowd, tglx and Ingo. The original thread is at http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 ) Hello, Ben. On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote: Hmm, wonder if I found it. I previously saw times where it appears jiffies does not increment. __do_softirq has a break-out based on jiffies timeout. Maybe that is failing to get us out of __do_softirq in my lockup case because for whatever reason the system cannot update jiffies in this case? I added this (probably whitespace damaged) hack and now I have not been able to reproduce the problem. Ah, nice catch. :) diff --git a/kernel/softirq.c b/kernel/softirq.c index 14d7758..621ea3b 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void) unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; unsigned long old_flags = current-flags; + unsigned long loops = 0; /* * Mask out PF_MEMALLOC s current task context is borrowed for the @@ -241,6 +242,7 @@ restart: unsigned int vec_nr = h - softirq_vec; int prev_count = preempt_count(); + loops++; kstat_incr_softirqs_this_cpu(vec_nr); trace_softirq_entry(vec_nr); @@ -265,7 +267,7 @@ restart: pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) !need_resched()) + if (time_before(jiffies, end) !need_resched() (loops 500)) goto restart; So, softirq most likely kicked off from ath9k is rescheduling itself to the extent where it ends up locking out the CPU completely. The problem is usually okay because the processing would break out in 2ms but as jiffies is stopped in this case with all other CPUs trapped in stop_machine, the loop never breaks and the machine hangs. While adding the counter limit probably isn't a bad idea, softirq requeueing itself indefinitely sounds pretty buggy. Just to be clear on the ath9k part for the wifi folks: This is basically un-patched 3.9.4, but I have 200 virtual stations configured on each of two ath9k radios. I cannot reproduce the problem without ath9k, but I do not know for certain ath9k is the real culprit. In the case where I can most easily reproduce the lockup, ath9k virtual stations would be trying to associate, so I'd expect a fair amount of packet processing to be happening... ath9k people, do you guys have any idea what's going on? Why would softirq repeat itself indefinitely? Ingo, Thomas, we're seeing a stop_machine hanging because * All other CPUs entered IRQ disabled stage. Jiffies is not being updated. * The last CPU get caught up executing softirq indefinitely. As jiffies doesn't get updated, it never breaks out of softirq handling. This is a deadlock. This CPU won't break out of softirq handling unless jiffies is updated and other CPUs can't do anything until this CPU enters the same stop_machine stage. Ben found out that breaking out of softirq handling after certain number of repetitions makes the issue go away, which isn't a proper fix but we might want anyway. What do you guys think? Thanks, Ben -- Ben Greear gree...@candelatech.com Candela Technologies Inc http://www.candelatech.com ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
(cc'ing wireless crowd, tglx and Ingo. The original thread is at http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 ) Hello, Ben. On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote: Hmm, wonder if I found it. I previously saw times where it appears jiffies does not increment. __do_softirq has a break-out based on jiffies timeout. Maybe that is failing to get us out of __do_softirq in my lockup case because for whatever reason the system cannot update jiffies in this case? I added this (probably whitespace damaged) hack and now I have not been able to reproduce the problem. Ah, nice catch. :) diff --git a/kernel/softirq.c b/kernel/softirq.c index 14d7758..621ea3b 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void) unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; unsigned long old_flags = current-flags; + unsigned long loops = 0; /* * Mask out PF_MEMALLOC s current task context is borrowed for the @@ -241,6 +242,7 @@ restart: unsigned int vec_nr = h - softirq_vec; int prev_count = preempt_count(); + loops++; kstat_incr_softirqs_this_cpu(vec_nr); trace_softirq_entry(vec_nr); @@ -265,7 +267,7 @@ restart: pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) !need_resched()) + if (time_before(jiffies, end) !need_resched() (loops 500)) goto restart; So, softirq most likely kicked off from ath9k is rescheduling itself to the extent where it ends up locking out the CPU completely. The problem is usually okay because the processing would break out in 2ms but as jiffies is stopped in this case with all other CPUs trapped in stop_machine, the loop never breaks and the machine hangs. While adding the counter limit probably isn't a bad idea, softirq requeueing itself indefinitely sounds pretty buggy. ath9k people, do you guys have any idea what's going on? Why would softirq repeat itself indefinitely? Ingo, Thomas, we're seeing a stop_machine hanging because * All other CPUs entered IRQ disabled stage. Jiffies is not being updated. * The last CPU get caught up executing softirq indefinitely. As jiffies doesn't get updated, it never breaks out of softirq handling. This is a deadlock. This CPU won't break out of softirq handling unless jiffies is updated and other CPUs can't do anything until this CPU enters the same stop_machine stage. Ben found out that breaking out of softirq handling after certain number of repetitions makes the issue go away, which isn't a proper fix but we might want anyway. What do you guys think? Thanks. -- tejun ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
[ath9k-devel] Setting Backoff values
We tried changing backoff value range through this function in mac.c REG_WRITE(ah, AR_DLCL_IFS(q), SM(cwMin, AR_D_LCL_IFS_CWMIN) | SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) | SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS)); Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving almost the same throughput as with 1, 1023. Doesn't this mean, backoff window range is not changing properly? What is the correct way to do this. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Setting Backoff values
How are you testing? adrian On 5 June 2013 18:01, Ruwaifa Anwar ruwaifa.an...@gmail.com wrote: We tried changing backoff value range through this function in mac.c REG_WRITE(ah, AR_DLCL_IFS(q), SM(cwMin, AR_D_LCL_IFS_CWMIN) | SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) | SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS)); Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving almost the same throughput as with 1, 1023. Doesn't this mean, backoff window range is not changing properly? What is the correct way to do this. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Setting Backoff values
Ruwaifa Anwar ruwaifa.anwar at gmail.com writes: We tried changing backoff value range through this function in mac.c REG_WRITE(ah, AR_DLCL_IFS(q), SM(cwMin, AR_D_LCL_IFS_CWMIN) | SM(qi-tqi_cwmax, AR_D_LCL_IFS_CWMAX) | SM(qi-tqi_aifs, AR_D_LCL_IFS_AIFS)); Setting using 1,1 in place of cwMin and qi-tqi_cwmax respectively is giving almost the same throughput as with 1, 1023. Doesn't this mean, backoff window range is not changing properly? What is the correct way to do this. By calculating number of MPDUs successfully completed per unit time. Secondly i want to claritfy one more thing As you said in previous posts that this alters backoff window for all frames in a queue. What happens if there's a long retry, will hardware still use the given values or use exponentially increased values of cwmin and cwmax ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Setting Backoff values
Well, backoff is only going to increment if it fails, so if you have a mostly clear air, you're not going to see many failures. Do the math and see if you're already filling the air with transmissions. Try setting it max,max instead of min,min and see what happens. The backoff counter is reset after each transmission attempt. So if you have 5 frames in the queue, it'll do backoff for one, once that frme goes out or expires, it'll start the next one with the reset min counter value. Adrian ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] So long, and thanks for all the fish (kinda)
First, NOOO!! Second, will you still be given datasheets on new wireless chips to enable support? Or will you at least be able to encourage Atheros to release the datasheets? Tell them I bought Atheros because of you! On 6/3/2013 2:17 PM, Adrian Chadd wrote: Hi all, This Friday will be my last day at Qualcomm Atheros. I've enjoyed working with the extremely bright and driven engineers and designers that make the wireless chips and SoCs that people everywhere take for granted. I've achieved a bunch of goals both with their internal product development and open source. But now it's time to move onto different things. I'd especially like to thank Luis Rodriguez for introducing me to the QCA folk and helping me get access to the Atheros open source project, as well as the follow-up discussions that led to me being hired. The open source wireless community has been driving innovation in a lot of areas for a number of years. I'd like to hope that I've had a small, positive effect on that. I wish you all the best of luck in pushing forward and continuing to innovate. Now, I'm still NDA-enabled and I quite like hacking on this wireless stuff so I won't be quitting hacking on things. I will just have other things on my mind. Good luck to you all! Adrian ___ freebsd-wirel...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-wireless To unsubscribe, send any mail to freebsd-wireless-unsubscr...@freebsd.org ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On Wed, 2013-06-05 at 14:11 -0700, Tejun Heo wrote: (cc'ing wireless crowd, tglx and Ingo. The original thread is at http://thread.gmane.org/gmane.linux.kernel/1500158/focus=55005 ) Hello, Ben. On Wed, Jun 05, 2013 at 01:58:31PM -0700, Ben Greear wrote: Hmm, wonder if I found it. I previously saw times where it appears jiffies does not increment. __do_softirq has a break-out based on jiffies timeout. Maybe that is failing to get us out of __do_softirq in my lockup case because for whatever reason the system cannot update jiffies in this case? I added this (probably whitespace damaged) hack and now I have not been able to reproduce the problem. Ah, nice catch. :) diff --git a/kernel/softirq.c b/kernel/softirq.c index 14d7758..621ea3b 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -212,6 +212,7 @@ asmlinkage void __do_softirq(void) unsigned long end = jiffies + MAX_SOFTIRQ_TIME; int cpu; unsigned long old_flags = current-flags; + unsigned long loops = 0; /* * Mask out PF_MEMALLOC s current task context is borrowed for the @@ -241,6 +242,7 @@ restart: unsigned int vec_nr = h - softirq_vec; int prev_count = preempt_count(); + loops++; kstat_incr_softirqs_this_cpu(vec_nr); trace_softirq_entry(vec_nr); @@ -265,7 +267,7 @@ restart: pending = local_softirq_pending(); if (pending) { - if (time_before(jiffies, end) !need_resched()) + if (time_before(jiffies, end) !need_resched() (loops 500)) goto restart; So, softirq most likely kicked off from ath9k is rescheduling itself to the extent where it ends up locking out the CPU completely. The problem is usually okay because the processing would break out in 2ms but as jiffies is stopped in this case with all other CPUs trapped in stop_machine, the loop never breaks and the machine hangs. While adding the counter limit probably isn't a bad idea, softirq requeueing itself indefinitely sounds pretty buggy. ath9k people, do you guys have any idea what's going on? Why would softirq repeat itself indefinitely? Ingo, Thomas, we're seeing a stop_machine hanging because * All other CPUs entered IRQ disabled stage. Jiffies is not being updated. * The last CPU get caught up executing softirq indefinitely. As jiffies doesn't get updated, it never breaks out of softirq handling. This is a deadlock. This CPU won't break out of softirq handling unless jiffies is updated and other CPUs can't do anything until this CPU enters the same stop_machine stage. Ben found out that breaking out of softirq handling after certain number of repetitions makes the issue go away, which isn't a proper fix but we might want anyway. What do you guys think? Interesting Before 3.9 and commit c10d73671ad30f5469 (softirq: reduce latencies) we used to limit the __do_softirq() loop to 10. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We do wanna find out why softirq is spinning indefinitely tho. Yes, no problem, I can do that. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On 06/05/2013 08:26 PM, Eric Dumazet wrote: On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We do wanna find out why softirq is spinning indefinitely tho. Yes, no problem, I can do that. Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, that would be fine by me. I can send a version of my patch easily enough if we can agree on the max number of loops (and if indeed my version of the patch is acceptable). Thanks, Ben -- Ben Greear gree...@candelatech.com Candela Technologies Inc http://www.candelatech.com ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
Hello, Eric. On Wed, Jun 05, 2013 at 06:34:52PM -0700, Eric Dumazet wrote: Ingo, Thomas, we're seeing a stop_machine hanging because * All other CPUs entered IRQ disabled stage. Jiffies is not being updated. * The last CPU get caught up executing softirq indefinitely. As jiffies doesn't get updated, it never breaks out of softirq handling. This is a deadlock. This CPU won't break out of softirq handling unless jiffies is updated and other CPUs can't do anything until this CPU enters the same stop_machine stage. Ben found out that breaking out of softirq handling after certain number of repetitions makes the issue go away, which isn't a proper fix but we might want anyway. What do you guys think? Interesting Before 3.9 and commit c10d73671ad30f5469 (softirq: reduce latencies) we used to limit the __do_softirq() loop to 10. Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We do wanna find out why softirq is spinning indefinitely tho. Thanks. -- tejun ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On 06/05/2013 08:46 PM, Eric Dumazet wrote: On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote: On 06/05/2013 08:26 PM, Eric Dumazet wrote: On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We do wanna find out why softirq is spinning indefinitely tho. Yes, no problem, I can do that. Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, that would be fine by me. I can send a version of my patch easily enough if we can agree on the max number of loops (and if indeed my version of the patch is acceptable). Well, 10 was the prior limit and seems really fine. The non update on jiffies seems quite exceptional condition (I hope...) We use in Google a patch triggering warning is a thread holds the cpu without taking care to need_resched() for more than xx ms Well, I'm sure that patch works nicely until the clock stops moving forward :) I'll post a patch with limit of 10 shortly. Thanks, Ben -- Ben Greear gree...@candelatech.com Candela Technologies Inc http://www.candelatech.com ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On Wed, 2013-06-05 at 20:50 -0700, Ben Greear wrote: On 06/05/2013 08:46 PM, Eric Dumazet wrote: We use in Google a patch triggering warning is a thread holds the cpu without taking care to need_resched() for more than xx ms Well, I'm sure that patch works nicely until the clock stops moving forward :) This is not using jiffies, but the clock used in kernel/sched/core.c, with ns resolution ;) I'll post a patch with limit of 10 shortly. ok ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] stop_machine lockup issue in 3.9.y.
On Wed, 2013-06-05 at 20:41 -0700, Ben Greear wrote: On 06/05/2013 08:26 PM, Eric Dumazet wrote: On Wed, 2013-06-05 at 20:14 -0700, Tejun Heo wrote: Ah, so, that's why it's showing up now. We probably have had the same issue all along but it used to be masked by the softirq limiting. Do you care to revive the 10 iterations limit so that it's limited by both the count and timing? We do wanna find out why softirq is spinning indefinitely tho. Yes, no problem, I can do that. Limiting it to 5000 fixes my problem, so if you wanted it larger than 10, that would be fine by me. I can send a version of my patch easily enough if we can agree on the max number of loops (and if indeed my version of the patch is acceptable). Well, 10 was the prior limit and seems really fine. The non update on jiffies seems quite exceptional condition (I hope...) We use in Google a patch triggering warning is a thread holds the cpu without taking care to need_resched() for more than xx ms ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel