Re: init_timer_deferrable conversion
On Dec 17, 2007 9:29 AM, Eric Dumazet <[EMAIL PROTECTED]> wrote: > Parag, could you please try this patch ? > > [NET] ARP : Convert neigh garbage collection from softirq to workqueue > I will - a little later. Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: init_timer_deferrable conversion
On Dec 17, 2007 1:13 PM, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > On Mon, 17 Dec 2007 12:47:59 -0500 > "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > > > > On Dec 17, 2007 12:00 PM, Stephen Hemminger > > <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > > > > > > Powertop's list of things that cause routine wakeups from idle. > > > > > > After > > > > > > converting to init_timer_deferrable() the wakeups went down and > > > > > > this one > > > > > > no longer shows up in powertop's list. 25% reduction. > > > > > > This surprises me because it is a 1 hz timer and uses round_jiffies() in > > > the current kernel. > > > > I am using the current git and I already have low wakeups per second > > to begin with - 5-7 and out of that 25% are attributed to sky2. Not > > sure if that matches up with the 1 hz + round_jiffies() logic. > > > > But is it conceptually ok to make this deferrable? I suppose yes as > > it's just a watchdog that checks if the link is up and delaying that > > would not make a difference? > > I think you are going to wake up once a second anyway, so all it > ends up changing is the accounting. Please check with the powertop > developers. As I understand it the advantage of deferrable is that sky2 won't have to wakeup the CPU just for itself. It can be coupled with other things that need to wake up the CPU. So hopefully this isn't just a powertop accounting fixup :) > > I'm fine with changing sky2, but it would be good if you could > go through all the network drivers and fix them as well. > Arjan - if there is value in converting netdev watchdogs to deferrable from a PM perpective I will fix up other drivers as well. Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: init_timer_deferrable conversion
On Mon, 17 Dec 2007 12:47:59 -0500 "Parag Warudkar" <[EMAIL PROTECTED]> wrote: > On Dec 17, 2007 12:00 PM, Stephen Hemminger > <[EMAIL PROTECTED]> wrote: > > > > > > > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > > > > > Powertop's list of things that cause routine wakeups from idle. After > > > > > converting to init_timer_deferrable() the wakeups went down and this > > > > > one > > > > > no longer shows up in powertop's list. 25% reduction. > > > > This surprises me because it is a 1 hz timer and uses round_jiffies() in > > the current kernel. > > I am using the current git and I already have low wakeups per second > to begin with - 5-7 and out of that 25% are attributed to sky2. Not > sure if that matches up with the 1 hz + round_jiffies() logic. > > But is it conceptually ok to make this deferrable? I suppose yes as > it's just a watchdog that checks if the link is up and delaying that > would not make a difference? I think you are going to wake up once a second anyway, so all it ends up changing is the accounting. Please check with the powertop developers. I'm fine with changing sky2, but it would be good if you could go through all the network drivers and fix them as well. -- Stephen Hemminger <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: init_timer_deferrable conversion
On Dec 17, 2007 12:00 PM, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > > > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > > > > Powertop's list of things that cause routine wakeups from idle. After > > > > converting to init_timer_deferrable() the wakeups went down and this one > > > > no longer shows up in powertop's list. 25% reduction. > > This surprises me because it is a 1 hz timer and uses round_jiffies() in > the current kernel. I am using the current git and I already have low wakeups per second to begin with - 5-7 and out of that 25% are attributed to sky2. Not sure if that matches up with the 1 hz + round_jiffies() logic. But is it conceptually ok to make this deferrable? I suppose yes as it's just a watchdog that checks if the link is up and delaying that would not make a difference? Thanks Parag -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: init_timer_deferrable conversion
On Mon, 17 Dec 2007 15:29:43 +0100 Eric Dumazet <[EMAIL PROTECTED]> wrote: > On Mon, 17 Dec 2007 09:55:04 +0100 > Eric Dumazet <[EMAIL PROTECTED]> wrote: > > > On Sun, 16 Dec 2007 22:00:23 -0500 (EST) > > Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > > > In my quest to get the wake-ups from idle per second down to bare > > > minimum, > > > I noticed 3 places in the kernel that could benefit from > > > using init_timer_deferrable() instead of init_timer() - > > > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > > > Powertop's list of things that cause routine wakeups from idle. After > > > converting to init_timer_deferrable() the wakeups went down and this one > > > no longer shows up in powertop's list. 25% reduction. This surprises me because it is a 1 hz timer and uses round_jiffies() in the current kernel. -- Stephen Hemminger <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: init_timer_deferrable conversion
On Mon, 17 Dec 2007 09:55:04 +0100 Eric Dumazet <[EMAIL PROTECTED]> wrote: > On Sun, 16 Dec 2007 22:00:23 -0500 (EST) > Parag Warudkar <[EMAIL PROTECTED]> wrote: > > > In my quest to get the wake-ups from idle per second down to bare minimum, > > I noticed 3 places in the kernel that could benefit from > > using init_timer_deferrable() instead of init_timer() - > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > > Powertop's list of things that cause routine wakeups from idle. After > > converting to init_timer_deferrable() the wakeups went down and this one > > no longer shows up in powertop's list. 25% reduction. > > > > b) kernel/time/clocksource.c - watchdog_timer - same story as sky2.c > > > > c) net/core/neighbour.c - gc_timer - Most benefit from deferrable timer. > > neigh_periodic_timer() is actually doing almost nothing per round, since it > looks only one slot of hash table. We could probably convert it to a > workqueue and scan whole table at once. > Parag, could you please try this patch ? [NET] ARP : Convert neigh garbage collection from softirq to workqueue Current neigh_periodic_timer() function is fired by timer IRQ, and scans one hash bucket out of a potentially big number. As we are supposed to scan whole hash table in 15 seconds, this means neigh_periodic_timer() can be fired very often. (depending on the number of concurrent hash entries we stored in this table) Converting this to a workqueue permits scaning whole table, minimizing icache pollution, and firing this work every 15 seconds, independantly of hash table size. Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> include/net/neighbour.h |4 - net/core/neighbour.c| 89 ++ 2 files changed, 45 insertions(+), 48 deletions(-) diff --git a/include/net/neighbour.h b/include/net/neighbour.h index a4f2618..fdb9251 100644 --- a/include/net/neighbour.h +++ b/include/net/neighbour.h @@ -24,6 +24,7 @@ #include #include +#include #include #define NUD_IN_TIMER (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE) @@ -155,7 +156,7 @@ struct neigh_table int gc_thresh2; int gc_thresh3; unsigned long last_flush; - struct timer_list gc_timer; + struct delayed_work gc_work; struct timer_list proxy_timer; struct sk_buff_head proxy_queue; atomic_tentries; @@ -166,7 +167,6 @@ struct neigh_table struct neighbour**hash_buckets; unsigned inthash_mask; __u32 hash_rnd; - unsigned inthash_chain_gc; struct pneigh_entry **phash_buckets; #ifdef CONFIG_PROC_FS struct proc_dir_entry *pde; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 4b6dd1e..495ab19 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -637,75 +637,74 @@ static void neigh_connect(struct neighbour *neigh) hh->hh_output = neigh->ops->hh_output; } -static void neigh_periodic_timer(unsigned long arg) +static void neigh_periodic_work(struct work_struct *work) { - struct neigh_table *tbl = (struct neigh_table *)arg; + struct neigh_table *tbl = container_of(work, struct neigh_table, gc_work.work); struct neighbour *n, **np; - unsigned long expire, now = jiffies; + unsigned int i; NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs); - write_lock(&tbl->lock); + write_lock_bh(&tbl->lock); /* * periodically recompute ReachableTime from random function */ - if (time_after(now, tbl->last_rand + 300 * HZ)) { + if (time_after(jiffies, tbl->last_rand + 300 * HZ)) { struct neigh_parms *p; - tbl->last_rand = now; + tbl->last_rand = jiffies; for (p = &tbl->parms; p; p = p->next) p->reachable_time = neigh_rand_reach_time(p->base_reachable_time); } - np = &tbl->hash_buckets[tbl->hash_chain_gc]; - tbl->hash_chain_gc = ((tbl->hash_chain_gc + 1) & tbl->hash_mask); + for (i = 0 ; i <= tbl->hash_mask; i++) { + np = &tbl->hash_buckets[i]; - while ((n = *np) != NULL) { - unsigned int state; + while ((n = *np) != NULL) { + unsigned int state; - write_lock(&n->lock); + write_lock(&n->lock); - state = n->nud_state; - if (state & (NUD_PERMANENT | NUD_IN_TIMER)) { - write_unlock(&n->lock); - goto next_elt; - } + state = n->nud_state; + if (state & (NUD_PERMANENT | NUD_IN_TIMER)) { + write_unlock(&n->lock); +
Re: init_timer_deferrable conversion
On Sun, 16 Dec 2007 22:00:23 -0500 (EST) Parag Warudkar <[EMAIL PROTECTED]> wrote: > In my quest to get the wake-ups from idle per second down to bare minimum, > I noticed 3 places in the kernel that could benefit from > using init_timer_deferrable() instead of init_timer() - > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on > Powertop's list of things that cause routine wakeups from idle. After > converting to init_timer_deferrable() the wakeups went down and this one > no longer shows up in powertop's list. 25% reduction. > > b) kernel/time/clocksource.c - watchdog_timer - same story as sky2.c > > c) net/core/neighbour.c - gc_timer - Most benefit from deferrable timer. neigh_periodic_timer() is actually doing almost nothing per round, since it looks only one slot of hash table. We could probably convert it to a workqueue and scan whole table at once. > > I am running a kernel with above changes and haven't noticed any immediate > problems and the wakeups-from-idle have gone down from 5-7 to mere 1-2 per > second. > > Is there any reason not to make the above timers deferrable - corner > cases, other side effects? If not then I will submit a patch. > > Thanks > > Parag > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html