Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-17 Thread Benjamin Herrenschmidt

On Wed, 2007-10-17 at 08:31 -0700, Stephen Hemminger wrote:
> Please don't use double underscore, for this function name. There is no
> reason to not make it a normal API call.
> 
> The sky2 fix I am working on will use napi_synchronize as well.

Allright. A compiler barrier in the !SMP case makes sense, but I would
still want an smp_mb() before the test_bit. I think it's a bug in
synchronize_irq not to have it.

Cheers,
Ben.
 
> --- a/include/linux/netdevice.h   2007-10-16 16:48:20.0 -0700
> +++ b/include/linux/netdevice.h   2007-10-17 08:29:55.0 -0700
> @@ -407,6 +407,24 @@ static inline void napi_enable(struct na
>   clear_bit(NAPI_STATE_SCHED, &n->state);
>  }
>  
> +#ifdef CONFIG_SMP
> +/**
> + *   napi_synchronize - wait until NAPI is not running
> + *   @n: napi context
> + *
> + * Wait until NAPI is done being scheduled on this context.
> + * Any outstanding processing completes but
> + * does not disable future activations.
> + */
> +static inline void napi_synchronize(const struct napi_struct *n)
> +{
> + while (test_bit(NAPI_STATE_SCHED, &n->state))
> + msleep(1);
> +}
> +#else
> +# define napi_synchronize(n) barrier()
> +#endif
> +
>  /*
>   *   The DEVICE structure.
>   *   Actually, this whole structure is a big mistake.  It mixes I/O

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-17 Thread Stephen Hemminger
Please don't use double underscore, for this function name. There is no
reason to not make it a normal API call.

The sky2 fix I am working on will use napi_synchronize as well.

--- a/include/linux/netdevice.h 2007-10-16 16:48:20.0 -0700
+++ b/include/linux/netdevice.h 2007-10-17 08:29:55.0 -0700
@@ -407,6 +407,24 @@ static inline void napi_enable(struct na
clear_bit(NAPI_STATE_SCHED, &n->state);
 }
 
+#ifdef CONFIG_SMP
+/**
+ * napi_synchronize - wait until NAPI is not running
+ * @n: napi context
+ *
+ * Wait until NAPI is done being scheduled on this context.
+ * Any outstanding processing completes but
+ * does not disable future activations.
+ */
+static inline void napi_synchronize(const struct napi_struct *n)
+{
+   while (test_bit(NAPI_STATE_SCHED, &n->state))
+   msleep(1);
+}
+#else
+# define napi_synchronize(n)   barrier()
+#endif
+
 /*
  * The DEVICE structure.
  * Actually, this whole structure is a big mistake.  It mixes I/O
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt
\
> > Note: unfortunately, Jeff already picked up the EMAC patch without
> > waiting for this to be sorted out (oops...). So if you agree with
> > this patch, it would be nice to have it go in quickly or maybe via
> > Jeff's tree to avoid breakage ? Not terribly important tho.
> 
> 
> Sorry, I thought that was the way everybody was headed.  With the driver 
> broken /anyway/, I just sorta threw it into the pile of fixes.
> 
> It's upstream now, so let me know if you want to revert or move forward 
> from here...

Let's just move forward. I can always re-fix the driver :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Jeff Garzik

Benjamin Herrenschmidt wrote:

net: Add __napi_synchronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

Switched to do a cpu_relax() spin instead and only on SMP. Not that
we have an smp_mb() in there which synchronize_irq() lacks. I believe
this is a bug in synchronize_irq() which I'll handle separately.

Note: unfortunately, Jeff already picked up the EMAC patch without
waiting for this to be sorted out (oops...). So if you agree with
this patch, it would be nice to have it go in quickly or maybe via
Jeff's tree to avoid breakage ? Not terribly important tho.



Sorry, I thought that was the way everybody was headed.  With the driver 
broken /anyway/, I just sorta threw it into the pile of fixes.


It's upstream now, so let me know if you want to revert or move forward 
from here...


Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt
net: Add __napi_synchronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

Switched to do a cpu_relax() spin instead and only on SMP. Not that
we have an smp_mb() in there which synchronize_irq() lacks. I believe
this is a bug in synchronize_irq() which I'll handle separately.

Note: unfortunately, Jeff already picked up the EMAC patch without
waiting for this to be sorted out (oops...). So if you agree with
this patch, it would be nice to have it go in quickly or maybe via
Jeff's tree to avoid breakage ? Not terribly important tho.

Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-17 12:39:40.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-17 12:40:59.0 
+1000
@@ -394,6 +394,23 @@ static inline void napi_disable(struct n
 }
 
 /**
+ * __napi_synchronize - synchronize with a concurrent poll
+ * @n: napi context
+ *
+ * Synchronizes with a concurrent poll. Not to be used in normal
+ * drivers, mostly useful if you end up with multiple interfaces
+ * on one NAPI instance.
+ */
+static inline void __napi_synchronize(struct napi_struct *n)
+{
+#ifdef CONFIG_SMP
+   smp_mb();
+   while (test_bit(NAPI_STATE_SCHED, &n->state))
+   cpu_relax();
+#endif /* CONFIG_SMP */
+}
+
+/**
  * napi_enable - enable NAPI scheduling
  * @n: napi context
  *


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt
napi: use non-interruptible sleep in napi_disable

The current napi_disable() uses msleep_interruptible() but doesn't
(and can't) exit in case there's a signal, thus ending up doing a
hot spin without a cpu_relax. Use uninterruptible sleep instead.

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-17 12:39:16.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-17 12:45:00.0 
+1000
@@ -390,7 +390,7 @@ static inline void napi_complete(struct 
 static inline void napi_disable(struct napi_struct *n)
 {
while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
-   msleep_interruptible(1);
+   msleep(1);
 }
 
 /**


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt

> So this is really just like synchronize_irq()?  Using msleep is bogus
> because you want to spin, you are only waiting for a softirq on the other
> cpu to finish. If you wait for a whole millisecond and sleep that
> is far longer than the napi routine should take.
> 
> You could even optimize it like synchronize_irq() for the non-SMP case.

It's just like synchronize_irq() indeed. I used the mlseep() just like
napi_disable() mostly because I use it in a very similar context, for
disabling my sub-channels on things like link change etc... where I need
to reconfigure parts of the chip.

I prefer sleeping in my case but I agree that if somebody else was going
to use for something else more performance critical, it might be an
issue. On the other hand, spinning will not be nice for my usage
scenario :-)

I agree about the SMP optimisation though again, in my usage pattern,
it's very unimportant (similar code path as napi_disable)

I'll respin later today though.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Stephen Hemminger
On Tue, 16 Oct 2007 15:49:52 +1000
Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:

> net: Add __napi_sycnhronize() to sync with napi poll
> 
> The EMAC driver which needs to handle multiple devices with one
> NAPI instance implements its own per-channel disable bit. However,
> when setting such a bit, it needs to synchronize with the poller
> (that is make sure that any pending poller instance has completed,
> or is started late enough to see that disable bit).
> 
> This implements a low level __napi_synchronize() function to acheive
> that. The underscores are to emphasis the low level aspect of it and
> to discourage driver writers who don't know what they are doing to
> use it (to please DaveM :-)
> 
> Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
> ---
> 
> (Use correct address for Stephen this time)
> 
> If the approach is accepted, I would like to have this merged now
> so the EMAC patch to make it work again can follow :-)
> 
> Note: I use msleep_interruptible(1); just like napi_disable(). However
> I'm not too happy that the "hot" loop that results of a pending signal
> here will spin without even a cpu_relax ... what do you guys think would
> be the best way to handle this ?


So this is really just like synchronize_irq()?  Using msleep is bogus
because you want to spin, you are only waiting for a softirq on the other
cpu to finish. If you wait for a whole millisecond and sleep that
is far longer than the napi routine should take.

You could even optimize it like synchronize_irq() for the non-SMP case.



-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread David Miller
From: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
Date: Tue, 16 Oct 2007 17:37:03 +1000

> 
> On Tue, 2007-10-16 at 14:06 +0800, Herbert Xu wrote:
> > Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> > >
> > > Note: I use msleep_interruptible(1); just like napi_disable(). However
> > > I'm not too happy that the "hot" loop that results of a pending signal
> > > here will spin without even a cpu_relax ... what do you guys think would
> > > be the best way to handle this ?
> > 
> > Well since the loop does not check signals at all, it should
> > just use msleep.
> > 
> > Granted the process will end up in the D state and contribute
> > to the load average.  But if this loop executes long enough
> > for that to be noticed then we've got bigger problems to worry
> > about.
> 
> If Dave & Stephen agree, I'll send a patch changing napi_disable() too
> then.

I agree with the msleep() change.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 14:06 +0800, Herbert Xu wrote:
> Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
> >
> > Note: I use msleep_interruptible(1); just like napi_disable(). However
> > I'm not too happy that the "hot" loop that results of a pending signal
> > here will spin without even a cpu_relax ... what do you guys think would
> > be the best way to handle this ?
> 
> Well since the loop does not check signals at all, it should
> just use msleep.
> 
> Granted the process will end up in the D state and contribute
> to the load average.  But if this loop executes long enough
> for that to be noticed then we've got bigger problems to worry
> about.

If Dave & Stephen agree, I'll send a patch changing napi_disable() too
then.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-15 Thread Herbert Xu
Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:
>
> Note: I use msleep_interruptible(1); just like napi_disable(). However
> I'm not too happy that the "hot" loop that results of a pending signal
> here will spin without even a cpu_relax ... what do you guys think would
> be the best way to handle this ?

Well since the loop does not check signals at all, it should
just use msleep.

Granted the process will end up in the D state and contribute
to the load average.  But if this loop executes long enough
for that to be noticed then we've got bigger problems to worry
about.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-15 Thread Benjamin Herrenschmidt
net: Add __napi_sycnhronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

(Use correct address for Stephen this time)

If the approach is accepted, I would like to have this merged now
so the EMAC patch to make it work again can follow :-)

Note: I use msleep_interruptible(1); just like napi_disable(). However
I'm not too happy that the "hot" loop that results of a pending signal
here will spin without even a cpu_relax ... what do you guys think would
be the best way to handle this ?



Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-16 15:27:27.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-16 15:27:38.0 
+1000
@@ -394,6 +394,21 @@ static inline void napi_disable(struct n
 }
 
 /**
+ * __napi_synchronize - synchronize with a concurrent poll
+ * @n: napi context
+ *
+ * Synchronizes with a concurrent poll. Not to be used in normal
+ * drivers, mostly useful if you end up with multiple interfaces
+ * on one NAPI instance.
+ */
+static inline void __napi_synchronize(struct napi_struct *n)
+{
+   smp_mb();
+   while (test_bit(NAPI_STATE_SCHED, &n->state))
+   msleep_interruptible(1);
+}
+
+/**
  * napi_enable - enable NAPI scheduling
  * @n: napi context
  *


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-15 Thread Benjamin Herrenschmidt
net: Add __napi_sycnhronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt <[EMAIL PROTECTED]>
---

If the approach is accepted, I would like to have this merged now
so the EMAC patch to make it work again can follow :-)

Note: I use msleep_interruptible(1); just like napi_disable(). However
I'm not too happy that the "hot" loop that results of a pending signal
here will spin without even a cpu_relax ... what do you guys think would
be the best way to handle this ?



Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-16 15:27:27.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-16 15:27:38.0 
+1000
@@ -394,6 +394,21 @@ static inline void napi_disable(struct n
 }
 
 /**
+ * __napi_synchronize - synchronize with a concurrent poll
+ * @n: napi context
+ *
+ * Synchronizes with a concurrent poll. Not to be used in normal
+ * drivers, mostly useful if you end up with multiple interfaces
+ * on one NAPI instance.
+ */
+static inline void __napi_synchronize(struct napi_struct *n)
+{
+   smp_mb();
+   while (test_bit(NAPI_STATE_SCHED, &n->state))
+   msleep_interruptible(1);
+}
+
+/**
  * napi_enable - enable NAPI scheduling
  * @n: napi context
  *


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html