Re: init_timer_deferrable conversion

2007-12-17 Thread Parag Warudkar
On Dec 17, 2007 9:29 AM, Eric Dumazet <[EMAIL PROTECTED]> wrote:
> Parag, could you please try this patch ?
>
> [NET] ARP : Convert neigh garbage collection from softirq to workqueue
>

I will - a little later.

Thanks

Parag
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: init_timer_deferrable conversion

2007-12-17 Thread Parag Warudkar
On Dec 17, 2007 1:13 PM, Stephen Hemminger
<[EMAIL PROTECTED]> wrote:
> On Mon, 17 Dec 2007 12:47:59 -0500
> "Parag Warudkar" <[EMAIL PROTECTED]> wrote:
>
>
> > On Dec 17, 2007 12:00 PM, Stephen Hemminger
> > <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on
> > > > > > Powertop's list of things that cause routine wakeups from idle. 
> > > > > > After
> > > > > > converting to init_timer_deferrable() the wakeups went down and 
> > > > > > this one
> > > > > > no longer shows up in powertop's list. 25% reduction.
> > >
> > > This surprises me because it is a 1 hz timer and uses round_jiffies() in
> > > the current kernel.
> >
> > I am using the current git and I already have low wakeups per second
> > to begin with - 5-7  and out of that 25% are attributed to sky2. Not
> > sure if that matches up with the 1 hz + round_jiffies() logic.
> >
> > But is it conceptually ok to make this deferrable? I suppose yes as
> > it's just a watchdog that checks if the link is up and delaying that
> > would not make a difference?
>
> I think you are going to wake up once a second anyway, so all it
> ends up changing is the accounting. Please check with the powertop
> developers.

As I understand it the advantage of deferrable is that sky2 won't have
to wakeup the CPU just for itself.
It can be coupled with other things that need to wake up the CPU. So
hopefully this isn't just a powertop accounting fixup :)

>
> I'm fine with changing sky2, but it would be good if you could
> go through all the network drivers and fix them as well.
>

Arjan - if there is value in converting netdev watchdogs to deferrable
from a PM perpective I will fix up other drivers as well.

Thanks

Parag
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: init_timer_deferrable conversion

2007-12-17 Thread Stephen Hemminger
On Mon, 17 Dec 2007 12:47:59 -0500
"Parag Warudkar" <[EMAIL PROTECTED]> wrote:

> On Dec 17, 2007 12:00 PM, Stephen Hemminger
> <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on
> > > > > Powertop's list of things that cause routine wakeups from idle. After
> > > > > converting to init_timer_deferrable() the wakeups went down and this 
> > > > > one
> > > > > no longer shows up in powertop's list. 25% reduction.
> >
> > This surprises me because it is a 1 hz timer and uses round_jiffies() in
> > the current kernel.
> 
> I am using the current git and I already have low wakeups per second
> to begin with - 5-7  and out of that 25% are attributed to sky2. Not
> sure if that matches up with the 1 hz + round_jiffies() logic.
> 
> But is it conceptually ok to make this deferrable? I suppose yes as
> it's just a watchdog that checks if the link is up and delaying that
> would not make a difference?

I think you are going to wake up once a second anyway, so all it
ends up changing is the accounting. Please check with the powertop
developers.

I'm fine with changing sky2, but it would be good if you could
go through all the network drivers and fix them as well.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: init_timer_deferrable conversion

2007-12-17 Thread Parag Warudkar
On Dec 17, 2007 12:00 PM, Stephen Hemminger
<[EMAIL PROTECTED]> wrote:
> > > >
> > > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on
> > > > Powertop's list of things that cause routine wakeups from idle. After
> > > > converting to init_timer_deferrable() the wakeups went down and this one
> > > > no longer shows up in powertop's list. 25% reduction.
>
> This surprises me because it is a 1 hz timer and uses round_jiffies() in
> the current kernel.

I am using the current git and I already have low wakeups per second
to begin with - 5-7  and out of that 25% are attributed to sky2. Not
sure if that matches up with the 1 hz + round_jiffies() logic.

But is it conceptually ok to make this deferrable? I suppose yes as
it's just a watchdog that checks if the link is up and delaying that
would not make a difference?

Thanks
Parag
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: init_timer_deferrable conversion

2007-12-17 Thread Stephen Hemminger
On Mon, 17 Dec 2007 15:29:43 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> On Mon, 17 Dec 2007 09:55:04 +0100
> Eric Dumazet <[EMAIL PROTECTED]> wrote:
> 
> > On Sun, 16 Dec 2007 22:00:23 -0500 (EST)
> > Parag Warudkar <[EMAIL PROTECTED]> wrote:
> > 
> > > In my quest to get the wake-ups from idle per second down to bare 
> > > minimum, 
> > > I noticed 3 places in the kernel that could benefit from 
> > > using init_timer_deferrable() instead of init_timer() -
> > > 
> > > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on 
> > > Powertop's list of things that cause routine wakeups from idle. After 
> > > converting to init_timer_deferrable() the wakeups went down and this one 
> > > no longer shows up in powertop's list. 25% reduction.

This surprises me because it is a 1 hz timer and uses round_jiffies() in
the current kernel.



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: init_timer_deferrable conversion

2007-12-17 Thread Eric Dumazet
On Mon, 17 Dec 2007 09:55:04 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> On Sun, 16 Dec 2007 22:00:23 -0500 (EST)
> Parag Warudkar <[EMAIL PROTECTED]> wrote:
> 
> > In my quest to get the wake-ups from idle per second down to bare minimum, 
> > I noticed 3 places in the kernel that could benefit from 
> > using init_timer_deferrable() instead of init_timer() -
> > 
> > a) drivers/net/sky2.c - watchdog_timer. This was showing up high on 
> > Powertop's list of things that cause routine wakeups from idle. After 
> > converting to init_timer_deferrable() the wakeups went down and this one 
> > no longer shows up in powertop's list. 25% reduction.
> > 
> > b) kernel/time/clocksource.c - watchdog_timer - same story as sky2.c
> > 
> > c) net/core/neighbour.c - gc_timer - Most benefit from deferrable timer.
> 
> neigh_periodic_timer() is actually doing almost nothing per round, since it
> looks only one slot of hash table. We could probably convert it to a
> workqueue and scan whole table at once.
> 

Parag, could you please try this patch ?

[NET] ARP : Convert neigh garbage collection from softirq to workqueue

Current neigh_periodic_timer() function is fired by timer IRQ, and
scans one hash bucket out of a potentially big number.

As we are supposed to scan whole hash table in 15 seconds, this means
neigh_periodic_timer() can be fired very often. (depending on the number
of concurrent hash entries we stored in this table)

Converting this to a workqueue permits scaning whole table, minimizing icache
pollution, and firing this work every 15 seconds, independantly of hash table
size.

Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

 include/net/neighbour.h |4 -
 net/core/neighbour.c|   89 ++
 2 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a4f2618..fdb9251 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -24,6 +24,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #define NUD_IN_TIMER   (NUD_INCOMPLETE|NUD_REACHABLE|NUD_DELAY|NUD_PROBE)
@@ -155,7 +156,7 @@ struct neigh_table
int gc_thresh2;
int gc_thresh3;
unsigned long   last_flush;
-   struct timer_list   gc_timer;
+   struct delayed_work gc_work;
struct timer_list   proxy_timer;
struct sk_buff_head proxy_queue;
atomic_tentries;
@@ -166,7 +167,6 @@ struct neigh_table
struct neighbour**hash_buckets;
unsigned inthash_mask;
__u32   hash_rnd;
-   unsigned inthash_chain_gc;
struct pneigh_entry **phash_buckets;
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry   *pde;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 4b6dd1e..495ab19 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -637,75 +637,74 @@ static void neigh_connect(struct neighbour *neigh)
hh->hh_output = neigh->ops->hh_output;
 }
 
-static void neigh_periodic_timer(unsigned long arg)
+static void neigh_periodic_work(struct work_struct *work)
 {
-   struct neigh_table *tbl = (struct neigh_table *)arg;
+   struct neigh_table *tbl = container_of(work, struct neigh_table, 
gc_work.work);
struct neighbour *n, **np;
-   unsigned long expire, now = jiffies;
+   unsigned int i;
 
NEIGH_CACHE_STAT_INC(tbl, periodic_gc_runs);
 
-   write_lock(&tbl->lock);
+   write_lock_bh(&tbl->lock);
 
/*
 *  periodically recompute ReachableTime from random function
 */
 
-   if (time_after(now, tbl->last_rand + 300 * HZ)) {
+   if (time_after(jiffies, tbl->last_rand + 300 * HZ)) {
struct neigh_parms *p;
-   tbl->last_rand = now;
+   tbl->last_rand = jiffies;
for (p = &tbl->parms; p; p = p->next)
p->reachable_time =
neigh_rand_reach_time(p->base_reachable_time);
}
 
-   np = &tbl->hash_buckets[tbl->hash_chain_gc];
-   tbl->hash_chain_gc = ((tbl->hash_chain_gc + 1) & tbl->hash_mask);
+   for (i = 0 ; i <= tbl->hash_mask; i++) {
+   np = &tbl->hash_buckets[i];
 
-   while ((n = *np) != NULL) {
-   unsigned int state;
+   while ((n = *np) != NULL) {
+   unsigned int state;
 
-   write_lock(&n->lock);
+   write_lock(&n->lock);
 
-   state = n->nud_state;
-   if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
-   write_unlock(&n->lock);
-   goto next_elt;
-   }
+   state = n->nud_state;
+   if (state & (NUD_PERMANENT | NUD_IN_TIMER)) {
+   write_unlock(&n->lock);
+ 

Re: init_timer_deferrable conversion

2007-12-17 Thread Eric Dumazet
On Sun, 16 Dec 2007 22:00:23 -0500 (EST)
Parag Warudkar <[EMAIL PROTECTED]> wrote:

> In my quest to get the wake-ups from idle per second down to bare minimum, 
> I noticed 3 places in the kernel that could benefit from 
> using init_timer_deferrable() instead of init_timer() -
> 
> a) drivers/net/sky2.c - watchdog_timer. This was showing up high on 
> Powertop's list of things that cause routine wakeups from idle. After 
> converting to init_timer_deferrable() the wakeups went down and this one 
> no longer shows up in powertop's list. 25% reduction.
> 
> b) kernel/time/clocksource.c - watchdog_timer - same story as sky2.c
> 
> c) net/core/neighbour.c - gc_timer - Most benefit from deferrable timer.

neigh_periodic_timer() is actually doing almost nothing per round, since it
looks only one slot of hash table. We could probably convert it to a
workqueue and scan whole table at once.

> 
> I am running a kernel with above changes and haven't noticed any immediate 
> problems and the wakeups-from-idle have gone down from 5-7 to mere 1-2 per 
> second.
> 
> Is there any reason not to make the above timers deferrable - corner 
> cases, other side effects? If not then I will submit a patch.
> 
> Thanks
> 
> Parag
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html