Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
Rajkumar Manoharan writes: > On 2018-10-02 16:07, Rajkumar Manoharan wrote: >> On 2018-10-02 12:00, Toke Høiland-Jørgensen wrote: >>> Rajkumar Manoharan writes: I noticed a race condition b/w sta cleanup and kick_airtime tasklet. >> How do you plan to exit kick_airtime gracefully during sta_cleanup? > > Ah, right, there's a lot of stuff going on before we get to > purge_txq. > Hmm, I guess we should either make sure we remove the station from > active_txqs earlier in the sta cleanup process, or maybe it'd enough > to > just check the removed flag in the tasklet? > > Does the below patch fix the issue? > No. Attaching backtrace. Any clue? >>> >>> Ah, that's my bad. Just having a 'continue' there can make the >>> function >>> loop forever. Oops. Try something like this instead? >>> >> >> But 'continue' also used in other places. Will give a try but I think >> that >> calling drv_wake_tx_queue within iteration is dangerous as it alters >> the list. no? >> > How about below change? Just schedule first txq and remaining will be > scheduled later by driver upon tx-compl. Your mail client seems to be mangling the patch somewhat, but I think I see what your intention is. And yeah, just waking a single TXQ and letting TX-completion do the rest is a good idea; will fold that into the next version :) -Toke
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-10-02 22:53, Rajkumar Manoharan wrote: Shouldn't have to call next_txq(). It should be as below. Anyway drv_wake_tx_queue is just an indication to driver and driver will always dequeue first node from list. diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 2dbfd1d8eb5f..74ac0b19cd54 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -242,7 +242,7 @@ EXPORT_SYMBOL(ieee80211_ctstoself_duration); static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) { - struct ieee80211_txq *txq; + struct ieee80211_txq *txq = NULL; bool seen_eligible = false; struct txq_info *txqi; struct sta_info *sta; @@ -263,6 +263,7 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) if (sta->airtime[ac].deficit >= 0) { seen_eligible = true; clear_bit(IEEE80211_TXQ_AIRTIME_THROTTLE, &txqi->flags); + txq = &txqi->txq; } } @@ -283,7 +284,6 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) } out: rcu_read_unlock(); - txq = ieee80211_next_txq(&local->hw, ac); spin_unlock_bh(&local->active_txq_lock[ac]); -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-10-02 16:07, Rajkumar Manoharan wrote: On 2018-10-02 12:00, Toke Høiland-Jørgensen wrote: Rajkumar Manoharan writes: I noticed a race condition b/w sta cleanup and kick_airtime tasklet. How do you plan to exit kick_airtime gracefully during sta_cleanup? Ah, right, there's a lot of stuff going on before we get to purge_txq. Hmm, I guess we should either make sure we remove the station from active_txqs earlier in the sta cleanup process, or maybe it'd enough to just check the removed flag in the tasklet? Does the below patch fix the issue? No. Attaching backtrace. Any clue? Ah, that's my bad. Just having a 'continue' there can make the function loop forever. Oops. Try something like this instead? But 'continue' also used in other places. Will give a try but I think that calling drv_wake_tx_queue within iteration is dangerous as it alters the list. no? How about below change? Just schedule first txq and remaining will be scheduled later by driver upon tx-compl. diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 0bb590928dd0..2dbfd1d8eb5f 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -242,6 +242,7 @@ EXPORT_SYMBOL(ieee80211_ctstoself_duration); static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) { + struct ieee80211_txq *txq; bool seen_eligible = false; struct txq_info *txqi; struct sta_info *sta; @@ -261,14 +262,7 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) if (sta->airtime[ac].deficit >= 0) { seen_eligible = true; - - if (!test_and_clear_bit(IEEE80211_TXQ_AIRTIME_THROTTLE, - &txqi->flags)) - continue; - - spin_unlock_bh(&local->active_txq_lock[ac]); - drv_wake_tx_queue(local, txqi); - spin_lock_bh(&local->active_txq_lock[ac]); + clear_bit(IEEE80211_TXQ_AIRTIME_THROTTLE, &txqi->flags); } } @@ -289,8 +283,10 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) } out: rcu_read_unlock(); + txq = ieee80211_next_txq(&local->hw, ac); spin_unlock_bh(&local->active_txq_lock[ac]); - + if (txq) + drv_wake_tx_queue(local, to_txq_info(txq)); } -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-10-02 12:00, Toke Høiland-Jørgensen wrote: Rajkumar Manoharan writes: I noticed a race condition b/w sta cleanup and kick_airtime tasklet. How do you plan to exit kick_airtime gracefully during sta_cleanup? Ah, right, there's a lot of stuff going on before we get to purge_txq. Hmm, I guess we should either make sure we remove the station from active_txqs earlier in the sta cleanup process, or maybe it'd enough to just check the removed flag in the tasklet? Does the below patch fix the issue? No. Attaching backtrace. Any clue? Ah, that's my bad. Just having a 'continue' there can make the function loop forever. Oops. Try something like this instead? But 'continue' also used in other places. Will give a try but I think that calling drv_wake_tx_queue within iteration is dangerous as it alters the list. no? -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
Rajkumar Manoharan writes: > On 2018-10-02 01:22, Toke Høiland-Jørgensen wrote: >> Rajkumar Manoharan writes: >> Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) >>> Toke, >>> >>> I noticed a race condition b/w sta cleanup and kick_airtime tasklet. >>> How do you plan to exit kick_airtime gracefully during sta_cleanup? >> >> Ah, right, there's a lot of stuff going on before we get to purge_txq. >> Hmm, I guess we should either make sure we remove the station from >> active_txqs earlier in the sta cleanup process, or maybe it'd enough to >> just check the removed flag in the tasklet? >> >> Does the below patch fix the issue? >> > > No. Attaching backtrace. Any clue? Ah, that's my bad. Just having a 'continue' there can make the function loop forever. Oops. Try something like this instead? -Toke diff --git a/net/mac80211/util.c b/net/mac80211/util.c index eb77cf588d69..b30a4fac1d60 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -258,6 +258,9 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) sta = container_of(txqi->txq.sta, struct sta_info, sta); + if (sta->removed) + goto out_reschedule; + if (sta->airtime[ac].deficit >= 0) { seen_eligible = true; @@ -288,7 +291,13 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) } out: rcu_read_unlock(); spin_unlock_bh(&local->active_txq_lock[ac]); + return; + + out_reschedule: + rcu_read_unlock(); + spin_unlock_bh(&local->active_txq_lock[ac]); + tasklet_schedule(&local->airtime_tasklet); } void ieee80211_kick_airtime(unsigned long data)
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-10-02 01:22, Toke Høiland-Jørgensen wrote: Rajkumar Manoharan writes: Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) Toke, I noticed a race condition b/w sta cleanup and kick_airtime tasklet. How do you plan to exit kick_airtime gracefully during sta_cleanup? Ah, right, there's a lot of stuff going on before we get to purge_txq. Hmm, I guess we should either make sure we remove the station from active_txqs earlier in the sta cleanup process, or maybe it'd enough to just check the removed flag in the tasklet? Does the below patch fix the issue? No. Attaching backtrace. Any clue? -Rajkumar[ 612.084685] INFO: rcu_preempt self-detected stall on CPU { 0} (t=2101 jiffie s g=3643 c=3642 q=107) [ 612.092689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.77 #9 [ 612.094702] INFO: rcu_preempt detected stalls on CPUs/tasks: { 0} (detected b y 1, t=2102 jiffies, g=3643, c=3642, q=107) [ 612.094705] Task dump for CPU 0: [ 612.094716] swapper/0 R running 0 0 0 0x1002 [ 612.094745] [] (__schedule) from [] (arch_cpu_idle+0x38/0 x5c) [ 612.094834] [] (arch_cpu_idle) from [] (0x) [ 612.133149] [] (unwind_backtrace) from [] (show_stack+0x1 0/0x14) [ 612.140874] [] (show_stack) from [] (dump_stack+0x80/0xa0 ) [ 612.148079] [] (dump_stack) from [] (rcu_check_callbacks+ 0x230/0x6a8) [ 612.156239] [] (rcu_check_callbacks) from [] (update_proc ess_times+0x38/0x58) [ 612.165092] [] (update_process_times) from [] (tick_sched _timer+0x44/0x74) [ 612.173683] [] (tick_sched_timer) from [] (__run_hrtimer+ 0x50/0xc8) [ 612.181668] [] (__run_hrtimer) from [] (hrtimer_interrupt +0x130/0x27c) [ 612.189917] [] (hrtimer_interrupt) from [] (msm_timer_int errupt+0x38/0x44) [ 612.198510] [] (msm_timer_interrupt) from [] (handle_perc pu_devid_irq+0x68/0x84) [ 612.207625] [] (handle_percpu_devid_irq) from [] (generic _handle_irq+0x20/0x30) [ 612.216653] [] (generic_handle_irq) from [] (handle_IRQ+0 x68/0x90) [ 612.224548] [] (handle_IRQ) from [] (gic_handle_irq+0x3c/ 0x60) [ 612.232101] [] (gic_handle_irq) from [] (__irq_svc+0x40/0 x70) [ 612.239562] Exception stack(0xc085de68 to 0xc085deb0) [ 612.244597] de60: d65490a0 0010 d4d20 d18 d4d20d00 [ 612.252758] de80: d4d20c20 0002 0030 0018 d6549000 0006 0 000 c085deb0 [ 612.260917] dea0: bf84bc94 c020cb24 6113 [ 612.265959] [] (__irq_svc) from [] (_test_and_clear_bit+0 x3c/0x48) [ 612.273915] [] (_test_and_clear_bit) from [] (ieee80211_k ick_airtime+0x90/0x1a8 [mac80211]) [ 612.284028] [] (ieee80211_kick_airtime [mac80211]) from [ ] (tasklet_action+0x8c/0xec) [ 612.293474] [] (tasklet_action) from [] (__do_softirq+0x1 04/0x294) [ 612.301370] [] (__do_softirq) from [] (irq_exit+0x9c/0x11 c) [ 612.308663] [] (irq_exit) from [] (handle_IRQ+0x6c/0x90) [ 612.315695] [] (handle_IRQ) from [] (gic_handle_irq+0x3c/ 0x60) [ 612.323244] [] (gic_handle_irq) from [] (__irq_svc+0x40/0 x70) [ 612.330706] Exception stack(0xc085df60 to 0xc085dfa8) [ 612.335744] df60: ffed 1d3c2000 c020a040 c085c000 c085c030 f fff c08b3014 [ 612.343903] df80: c08643c0 c08539c8 ddffcc80 c08b3010 c086d288 c085dfa8 c0218 300 c0218304 [ 612.352061] dfa0: 6013 [ 612.355539] [] (__irq_svc) from [] (arch_cpu_idle+0x38/0x 5c) [ 612.362918] [] (arch_cpu_idle) from [] (cpu_startup_entry +0xa4/0x108) [ 612.371084] [] (cpu_startup_entry) from [] (start_kernel+ 0x358/0x3cc) [ 632.204687] BUG: soft lockup - CPU#1 stuck for 22s! [nmbd:1458] [ 632.209569] Modules linked in: ath10k_pci ath10k_core ath mac80211 cfg80211 c ompat ecm iptable_nat nf_nat_pptp nf_nat_ipv4 nf_nat_amanda nf_conntrack_pptp nf _conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack_amanda xt_time xt_tcpudp xt_tcpms s xt_string xt_statistic xt_state xt_recent xt_quota xt_policy xt_pkttype xt_phy sdev xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_hel per xt_esp xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_ comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY usbnet ts_kmp ts_fsm ts_bm r8152 pppoe ppp_mppe ppp_async nf_nat_tftp nf_nat_sn mp_basic nf_nat_sip nf_nat_rtsp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_f tp nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrac k_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h32 3 nf_conntrack_ftp nf_conntrack_broadcast l2tp_ppp iptable_raw iptable_mangle ip table_filter ipt_ah ipt_REJECT ipt_MASQUERADE ipt_ECN ip_tables crc_ccitt arptab le_filter arpt_mangle arp_tables sch_teql sch_tbf sch_sfq sch_red sch_prio sch_p ie sch_htb sch_gred sch_fq sch_dsmark sch_codel em_text em_nbyte em_meta em_cmp cls_basic act_police act_ipt act_connmark act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress qca_nss_ipsec qca_nss _cfi_ocf qca_nss_tunipip6 qca
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
Rajkumar Manoharan writes: >> Great! I'll fold in the rest, test it with ath9k and submit as a proper >> patch :) >> > Toke, > > I noticed a race condition b/w sta cleanup and kick_airtime tasklet. > How do you plan to exit kick_airtime gracefully during sta_cleanup? Ah, right, there's a lot of stuff going on before we get to purge_txq. Hmm, I guess we should either make sure we remove the station from active_txqs earlier in the sta cleanup process, or maybe it'd enough to just check the removed flag in the tasklet? Does the below patch fix the issue? -Toke diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 9c889da48ef0..8fa3c09d041c 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -258,6 +258,9 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) sta = container_of(txqi->txq.sta, struct sta_info, sta); + if (sta->removed) + continue; + if (sta->airtime[ac].deficit >= 0) { seen_eligible = true;
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-10-01 23:58, Rajkumar Manoharan wrote: Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) Toke, I noticed a race condition b/w sta cleanup and kick_airtime tasklet. How do you plan to exit kick_airtime gracefully during sta_cleanup? If kick_airtime tasklet is only used for adjusting deficit for all throttled txq, then below rcu lock issue is not observed. I am testing with 50 clients and the crash happens only during sta cleanup. Releasing active_txq_lock from tasklet is yielding handle to txq_purge(). I am thinking of get rid of tasklet and handle adjustment directly in API. diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 0bb590928dd0..277dbf8e0a4b 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -261,14 +261,7 @@ static void __ieee80211_kick_airtime(struct ieee80211_local *local, int ac) if (sta->airtime[ac].deficit >= 0) { seen_eligible = true; - - if (!test_and_clear_bit(IEEE80211_TXQ_AIRTIME_THROTTLE, - &txqi->flags)) - continue; - - spin_unlock_bh(&local->active_txq_lock[ac]); - drv_wake_tx_queue(local, txqi); - spin_lock_bh(&local->active_txq_lock[ac]); + clear_bit(IEEE80211_TXQ_AIRTIME_THROTTLE, &txqi->flags); -Rajkumar root@OpenWrt:/# [ 363.094702] INFO: rcu_preempt self-detected stall on CPU { 0} (t=2101 jiffies g=2679 c=2678 q=134) [ 363.102709] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.77 #9 [ 363.104717] INFO: rcu_preempt detected stalls on CPUs/tasks: { 0} (detected b y 1, t=2102 jiffies, g=2679, c=2678, q=134) [ 363.104723] Task dump for CPU 0: [ 363.104734] swapper/0 R running 0 0 0 0x1002 [ 363.104763] [] (__schedule) from [] (arch_cpu_idle+0x38/0 x5c) [ 363.104852] [] (arch_cpu_idle) from [] (0x) [ 363.143167] [] (unwind_backtrace) from [] (show_stack+0x1 0/0x14) [ 363.150892] [] (show_stack) from [] (dump_stack+0x80/0xa0 ) [ 363.158099] [] (dump_stack) from [] (rcu_check_callbacks+ 0x230/0x6a8) [ 363.166258] [] (rcu_check_callbacks) from [] (update_proc ess_times+0x38/0x58) [ 363.175110] [] (update_process_times) from [] (tick_sched _timer+0x44/0x74) [ 363.183702] [] (tick_sched_timer) from [] (__run_hrtimer+ 0x50/0xc8) [ 363.191686] [] (__run_hrtimer) from [] (hrtimer_interrupt +0x130/0x27c) [ 363.199936] [] (hrtimer_interrupt) from [] (msm_timer_int errupt+0x38/0x44) [ 363.208528] [] (msm_timer_interrupt) from [] (handle_perc pu_devid_irq+0x68/0x84) [ 363.217645] [] (handle_percpu_devid_irq) from [] (generic _handle_irq+0x20/0x30) [ 363.226671] [] (generic_handle_irq) from [] (handle_IRQ+0 x68/0x90) [ 363.234567] [] (handle_IRQ) from [] (gic_handle_irq+0x3c/ 0x60) [ 363.242120] [] (gic_handle_irq) from [] (__irq_svc+0x40/0 x70) [ 363.249581] Exception stack(0xc085de68 to 0xc085deb0) [ 363.254616] de60: 0004 db7e50a0 009d d8bb8 d18 d8bb8d00 [ 363.262777] de80: d8bb8c20 0002 0030 0018 db7e5000 0006 0 000 c085deb0 [ 363.270935] dea0: bfb68c98 c020cae8 6113 [ 363.275977] [] (__irq_svc) from [] (_test_and_clear_bit+0 x0/0x48) [ 363.283850] [] (_test_and_clear_bit) from [] (ieee80211_k ick_airtime+0x88/0x188 [mac80211]) [ 363.293963] [] (ieee80211_kick_airtime [mac80211]) from [ ] (tasklet_action+0x8c/0xec) [ 363.303406] [] (tasklet_action) from [] (__do_softirq+0x1 04/0x294)
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) Toke, I noticed a race condition b/w sta cleanup and kick_airtime tasklet. How do you plan to exit kick_airtime gracefully during sta_cleanup? -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-09-28 04:02, Toke Høiland-Jørgensen wrote: On 28 September 2018 12:47:51 CEST, Rajkumar Manoharan wrote: On 2018-09-28 03:35, Jonathan Morton wrote: On 28 Sep, 2018, at 1:19 pm, Rajkumar Manoharan wrote: This is going to break fairness; we only want to increase deficits when all stations' deficits are negative. Hence the two loops. Did you see any problems with those specifically? No. I didn't see any issue but the loop won't exit until one txq becomes positive. Till then the lock won't be released. Hence I converged into single iteration. The problem is that the fairness behaviour is changed when there are already positive txqs, but the first one happens to not be positive. That's why there were two loops. Yeah.. Understood. we can ignore the cleanup change. Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) Thanks. I am seeing higher CPU load. Did you observe similar behavior with ath9k. -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 28 September 2018 12:47:51 CEST, Rajkumar Manoharan wrote: >On 2018-09-28 03:35, Jonathan Morton wrote: >>> On 28 Sep, 2018, at 1:19 pm, Rajkumar Manoharan >>> wrote: >>> This is going to break fairness; we only want to increase deficits when all stations' deficits are negative. Hence the two loops. Did you >see any problems with those specifically? >>> >>> No. I didn't see any issue but the loop won't exit until one txq >>> becomes positive. >>> Till then the lock won't be released. Hence I converged into single >>> iteration. >> >> The problem is that the fairness behaviour is changed when there are >> already positive txqs, but the first one happens to not be positive. >> That's why there were two loops. >> >Yeah.. Understood. we can ignore the cleanup change. Great! I'll fold in the rest, test it with ath9k and submit as a proper patch :) -Toke
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
On 2018-09-28 03:35, Jonathan Morton wrote: On 28 Sep, 2018, at 1:19 pm, Rajkumar Manoharan wrote: This is going to break fairness; we only want to increase deficits when all stations' deficits are negative. Hence the two loops. Did you see any problems with those specifically? No. I didn't see any issue but the loop won't exit until one txq becomes positive. Till then the lock won't be released. Hence I converged into single iteration. The problem is that the fairness behaviour is changed when there are already positive txqs, but the first one happens to not be positive. That's why there were two loops. Yeah.. Understood. we can ignore the cleanup change. -Rajkumar
Re: [Make-wifi-fast] [PATCH RFC v4 3/4] mac80211: Add airtime accounting and scheduling to TXQs
> On 28 Sep, 2018, at 1:19 pm, Rajkumar Manoharan > wrote: > >> This is going to break fairness; we only want to increase deficits when >> all stations' deficits are negative. Hence the two loops. Did you see >> any problems with those specifically? > > No. I didn't see any issue but the loop won't exit until one txq becomes > positive. > Till then the lock won't be released. Hence I converged into single iteration. The problem is that the fairness behaviour is changed when there are already positive txqs, but the first one happens to not be positive. That's why there were two loops. - Jonathan Morton