Re: new NAPI interface broken

2007-10-16 Thread Benjamin Herrenschmidt
Jumping on an old train ...

On Wed, 2007-09-12 at 05:50 -0700, David Miller wrote:
 
  This would mean we have a problem on all SMP machines right now.
 
 This is not a correct statement.
 
 Only on your platform do network device interrupts get moved
 around, no other platform does this.
 
 Sparc64 doesn't, all interrupts stay in one location after
 the cpu is initially choosen.
 
 x86 and x86_64 specifically do not move around network
 device interrupts, even though other device types do
 get dynamic IRQ cpu distribution.
 
 That's why you are the only person seeing this problem.
 
 I agree that it should be fixed, but we should also fix the IRQ
 distribution scheme used on powerpc platforms which is totally
 broken in these cases.

So the powerpc platform just honors the affinity mask, and depending on
the PIC does things that range from nothing to spreading interrupts to
CPUs in the affinity mask.

All interrupts by defaults are spread to all CPUs (full balancing).

At this stage, it's afaik userland business to enforce different
policies by changing the affinities via /proc/irq/*.

Do you have any pointer to how that is done on x86 or sparc64 ? On my
x86 laptop using ubuntu gutsy, I definitely see the IRQ on which the
network card is connected (e1000) happily spread between the 2 cores
just like powerpc would do.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 14:06 +0800, Herbert Xu wrote:
 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
 
  Note: I use msleep_interruptible(1); just like napi_disable(). However
  I'm not too happy that the hot loop that results of a pending signal
  here will spin without even a cpu_relax ... what do you guys think would
  be the best way to handle this ?
 
 Well since the loop does not check signals at all, it should
 just use msleep.
 
 Granted the process will end up in the D state and contribute
 to the load average.  But if this loop executes long enough
 for that to be noticed then we've got bigger problems to worry
 about.

If Dave  Stephen agree, I'll send a patch changing napi_disable() too
then.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken

2007-10-16 Thread Benjamin Herrenschmidt
 So the powerpc platform just honors the affinity mask, and depending on
 the PIC does things that range from nothing to spreading interrupts to
 CPUs in the affinity mask.
 
 All interrupts by defaults are spread to all CPUs (full balancing).
 
 At this stage, it's afaik userland business to enforce different
 policies by changing the affinities via /proc/irq/*.
 
 Do you have any pointer to how that is done on x86 or sparc64 ? On my
 x86 laptop using ubuntu gutsy, I definitely see the IRQ on which the
 network card is connected (e1000) happily spread between the 2 cores
 just like powerpc would do.

More specifically, IRQF_NOBALANCING doesn't seem to be set anywhere
except a few arch specific timer interrupts etc... nowhere I can see in
network drivers or the network stack (the stack wouldn't know what IRQ
anyway since not all drivers set netdev-irq).

We currently don't have a balance kthread like x86 has, though I wonder
if we should move this one out of x86 and make it generic (hell, it's
even hidden in the IO_APIC code :-) But at this stage, HW balancing by
the PIC is the norm and seems to be happening.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 00:44 -0700, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Tue, 16 Oct 2007 17:29:47 +1000
 
  Do you have any pointer to how that is done on x86 or sparc64 ?
 
 Sparc64 does it statically in the kernel.
 
 For x86, see http://irqbalance.org/

Allright, so that's an out of tree userland thingy... (which may well
work on ppc too I suppose). Definitely not installed by default by my
distro so IRQs from the network cards on all x86's using ubuntu gutsy at
least are spread to all CPUs :-)

There's also a balance kernel thread in x86 with has a notion of irq
that isn't moveable but that flag isn't set by any driver.

Cheers,
Ben.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: new NAPI interface broken

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 01:31 -0700, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Tue, 16 Oct 2007 18:28:56 +1000
 
  Allright, so that's an out of tree userland thingy... (which may well
  work on ppc too I suppose). Definitely not installed by default by my
  distro so IRQs from the network cards on all x86's using ubuntu gutsy at
  least are spread to all CPUs :-)
 
 But the thing does treat network interfaces differently from
 other devices.
 
 Arjan ran over to my table at kernel summit when I presented this
 topic to the audience, to remind me of all of this and he should be
 included in on any discussions about this topic. :-)
 
 I've CC:'d him.

As far as I know, the x86 in-kernel thingy doesn't but yeah, the
userland one seems much more evolved and does things based on the
class of the device.

Christoph, have any of you tried it on powerpc ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt

 So this is really just like synchronize_irq()?  Using msleep is bogus
 because you want to spin, you are only waiting for a softirq on the other
 cpu to finish. If you wait for a whole millisecond and sleep that
 is far longer than the napi routine should take.
 
 You could even optimize it like synchronize_irq() for the non-SMP case.

It's just like synchronize_irq() indeed. I used the mlseep() just like
napi_disable() mostly because I use it in a very similar context, for
disabling my sub-channels on things like link change etc... where I need
to reconfigure parts of the chip.

I prefer sleeping in my case but I agree that if somebody else was going
to use for something else more performance critical, it might be an
issue. On the other hand, spinning will not be nice for my usage
scenario :-)

I agree about the SMP optimisation though again, in my usage pattern,
it's very unimportant (similar code path as napi_disable)

I'll respin later today though.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_napi_add vs. multiple netdev's

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 15:16 -0700, David Miller wrote:
 And it's unfortunately buggy.
 
 For example, the sky2_suspend() code iterates over the ports
 and calls sky2_down() on each one that is up.
 
 That will call napi_disable() on the same NAPI instance if
 multiple ports are up, and thus hang the machine.
 
 This is why, Stephen, for the second time, I'm telling you
 that perhaps you should put the NAPI instance into the
 sky2_port struct.

Can sky2 send/mask interrupts separately per port ? The reason why I
can't do that with EMAC is why I'm doing multiple ports per NAPI
instance...

If sky2 sticks to doing like EMAC, then it probably also needs to do
like EMAC which is to have it's own port_disable/enable that use local
per-port disable bits. (It may do a real napi_disable when all ports are
disabled, I suppose, though I don't currently do it in EMAC).

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix new EMAC driver for NAPI changes

2007-10-16 Thread Benjamin Herrenschmidt

On Tue, 2007-10-16 at 21:16 -0400, Jeff Garzik wrote:
 Benjamin Herrenschmidt wrote:
  net: Fix new EMAC driver for NAPI changes
  
  This fixes the new EMAC driver for the NAPI updates. The previous patch
  by Roland Dreier (already applied) to do that doesn't actually work. This
  applies on top of it makes it work on my test Ebony machine.
  
  This patch depends on net: Add __napi_sycnhronize() to sync with napi poll
  posted previously.
  
  Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
  ---
  
  The old EMAC driver does things a bit differently (doesn't do useful
  locking :-) and seems to work with Roland patch. So I'm not going to
  touch it unless somebody reports me that it has problems
 
 applied

Beware that the depend patch for __napi_synchronize() is still being
discussed :-)

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt
napi: use non-interruptible sleep in napi_disable

The current napi_disable() uses msleep_interruptible() but doesn't
(and can't) exit in case there's a signal, thus ending up doing a
hot spin without a cpu_relax. Use uninterruptible sleep instead.

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-17 12:39:16.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-17 12:45:00.0 
+1000
@@ -390,7 +390,7 @@ static inline void napi_complete(struct 
 static inline void napi_disable(struct napi_struct *n)
 {
while (test_and_set_bit(NAPI_STATE_SCHED, n-state))
-   msleep_interruptible(1);
+   msleep(1);
 }
 
 /**


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Benjamin Herrenschmidt
\
  Note: unfortunately, Jeff already picked up the EMAC patch without
  waiting for this to be sorted out (oops...). So if you agree with
  this patch, it would be nice to have it go in quickly or maybe via
  Jeff's tree to avoid breakage ? Not terribly important tho.
 
 
 Sorry, I thought that was the way everybody was headed.  With the driver 
 broken /anyway/, I just sorta threw it into the pile of fixes.
 
 It's upstream now, so let me know if you want to revert or move forward 
 from here...

Let's just move forward. I can always re-fix the driver :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NEW EMAC Fix RGMII build error: use of_device_is_compatible

2007-10-15 Thread Benjamin Herrenschmidt

On Fri, 2007-10-12 at 17:04 +0400, Valentine Barshak wrote:
 Fix build RGMII error: use of_device_is_compatible()
 insteadof now deprecated device_is_compatible() function.
 
 Signed-off-by: Valentine Barshak [EMAIL PROTECTED]

Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]

 ---
  drivers/net/ibm_newemac/rgmii.c |2 +-
  1 files changed, 1 insertion(+), 1 deletion(-)
 
 diff -pruN linux-2.6.orig/drivers/net/ibm_newemac/rgmii.c 
 linux-2.6/drivers/net/ibm_newemac/rgmii.c
 --- linux-2.6.orig/drivers/net/ibm_newemac/rgmii.c2007-10-12 
 16:02:41.0 +0400
 +++ linux-2.6/drivers/net/ibm_newemac/rgmii.c 2007-10-12 16:49:07.0 
 +0400
 @@ -251,7 +251,7 @@ static int __devinit rgmii_probe(struct 
   }
  
   /* Check for RGMII type */
 - if (device_is_compatible(ofdev-node, ibm,rgmii-axon))
 + if (of_device_is_compatible(ofdev-node, ibm,rgmii-axon))
   dev-type = RGMII_AXON;
   else
   dev-type = RGMII_STANDARD;
 ___
 Linuxppc-dev mailing list
 [EMAIL PROTECTED]
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PowerPC: Add BCM5248 and Marvell 88E1111 PHY support to NEW EMAC.

2007-10-15 Thread Benjamin Herrenschmidt

On Mon, 2007-10-15 at 14:53 -0400, Jeff Garzik wrote:
 Roland Dreier (2):
ibm_new_emac: Nuke SET_MODULE_OWNER() use
ibm_emac: Convert to use napi_struct independent of struct
 net_device
 

Wow, I'd have loved to be CCed at least on the last one since I was
about to do just that ... Heh.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PowerPC: Add BCM5248 and Marvell 88E1111 PHY support to NEW EMAC.

2007-10-15 Thread Benjamin Herrenschmidt

On Mon, 2007-10-15 at 15:04 -0400, Jeff Garzik wrote:
 I would ideally like a single active patch generator (even if they
 are 
 merely reviewed others work sometimes).
 
 Outside of that, I'm hoping you and the other people listed making 
 changes will self-organize without my help :)

Josh, do you want to be the central point / maintainer for it or do you
want me to do it ? There's a lot of code from me in there and I did this
fork in the first place so I have a pretty good idea of what's going on
in this driver and what still needs to be done :-)

Cheers,
Ben.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-15 Thread Benjamin Herrenschmidt
net: Add __napi_sycnhronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

If the approach is accepted, I would like to have this merged now
so the EMAC patch to make it work again can follow :-)

Note: I use msleep_interruptible(1); just like napi_disable(). However
I'm not too happy that the hot loop that results of a pending signal
here will spin without even a cpu_relax ... what do you guys think would
be the best way to handle this ?



Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-16 15:27:27.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-16 15:27:38.0 
+1000
@@ -394,6 +394,21 @@ static inline void napi_disable(struct n
 }
 
 /**
+ * __napi_synchronize - synchronize with a concurrent poll
+ * @n: napi context
+ *
+ * Synchronizes with a concurrent poll. Not to be used in normal
+ * drivers, mostly useful if you end up with multiple interfaces
+ * on one NAPI instance.
+ */
+static inline void __napi_synchronize(struct napi_struct *n)
+{
+   smp_mb();
+   while (test_bit(NAPI_STATE_SCHED, n-state))
+   msleep_interruptible(1);
+}
+
+/**
  * napi_enable - enable NAPI scheduling
  * @n: napi context
  *


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: Fix new EMAC driver for NAPI changes

2007-10-15 Thread Benjamin Herrenschmidt
net: Fix new EMAC driver for NAPI changes

This fixes the new EMAC driver for the NAPI updates. The previous patch
by Roland Dreier (already applied) to do that doesn't actually work. This
applies on top of it makes it work on my test Ebony machine.

This patch depends on net: Add __napi_sycnhronize() to sync with napi poll
posted previously.

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

The old EMAC driver does things a bit differently (doesn't do useful
locking :-) and seems to work with Roland patch. So I'm not going to
touch it unless somebody reports me that it has problems


Index: linux-work/drivers/net/ibm_newemac/mal.c
===
--- linux-work.orig/drivers/net/ibm_newemac/mal.c   2007-10-16 
14:51:11.0 +1000
+++ linux-work/drivers/net/ibm_newemac/mal.c2007-10-16 14:59:23.0 
+1000
@@ -45,6 +45,8 @@ int __devinit mal_register_commac(struct
return -EBUSY;
}
 
+   if (list_empty(mal-list))
+   napi_enable(mal-napi);
mal-tx_chan_mask |= commac-tx_chan_mask;
mal-rx_chan_mask |= commac-rx_chan_mask;
list_add(commac-list, mal-list);
@@ -67,6 +69,8 @@ void __devexit mal_unregister_commac(str
mal-tx_chan_mask = ~commac-tx_chan_mask;
mal-rx_chan_mask = ~commac-rx_chan_mask;
list_del_init(commac-list);
+   if (list_empty(mal-list))
+   napi_disable(mal-napi);
 
spin_unlock_irqrestore(mal-lock, flags);
 }
@@ -182,7 +186,7 @@ static inline void mal_enable_eob_irq(st
set_mal_dcrn(mal, MAL_CFG, get_mal_dcrn(mal, MAL_CFG) | MAL_CFG_EOPIE);
 }
 
-/* synchronized by __LINK_STATE_RX_SCHED bit in ndev-state */
+/* synchronized by NAPI state */
 static inline void mal_disable_eob_irq(struct mal_instance *mal)
 {
// XXX might want to cache MAL_CFG as the DCR read can be slow
@@ -317,8 +321,8 @@ void mal_poll_disable(struct mal_instanc
while (test_and_set_bit(MAL_COMMAC_POLL_DISABLED, commac-flags))
msleep(1);
 
-   /* Synchronize with the MAL NAPI poller. */
-   napi_disable(mal-napi);
+   /* Synchronize with the MAL NAPI poller */
+   __napi_synchronize(mal-napi);
 }
 
 void mal_poll_enable(struct mal_instance *mal, struct mal_commac *commac)
@@ -326,7 +330,12 @@ void mal_poll_enable(struct mal_instance
smp_wmb();
clear_bit(MAL_COMMAC_POLL_DISABLED, commac-flags);
 
-   // XXX might want to kick a poll now...
+   /* Feels better to trigger a poll here to catch up with events that
+* may have happened on this channel while disabled. It will most
+* probably be delayed until the next interrupt but that's mostly a
+* non-issue in the context where this is called.
+*/
+   napi_schedule(mal-napi);
 }
 
 static int mal_poll(struct napi_struct *napi, int budget)
@@ -336,8 +345,7 @@ static int mal_poll(struct napi_struct *
int received = 0;
unsigned long flags;
 
-   MAL_DBG2(mal, poll(%d) %d - NL, *budget,
-rx_work_limit);
+   MAL_DBG2(mal, poll(%d) NL, budget);
  again:
/* Process TX skbs */
list_for_each(l, mal-poll_list) {
@@ -528,11 +536,12 @@ static int __devinit mal_probe(struct of
}
 
INIT_LIST_HEAD(mal-poll_list);
-   mal-napi.weight = CONFIG_IBM_NEW_EMAC_POLL_WEIGHT;
-   mal-napi.poll = mal_poll;
INIT_LIST_HEAD(mal-list);
spin_lock_init(mal-lock);
 
+   netif_napi_add(NULL, mal-napi, mal_poll,
+  CONFIG_IBM_NEW_EMAC_POLL_WEIGHT);
+
/* Load power-on reset defaults */
mal_reset(mal);
 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netif_napi_add vs. multiple netdev's

2007-10-15 Thread Benjamin Herrenschmidt
Hi Stehphen !

The new netif_napi_add() function takes a netdev argument. In the EMAC
case, there is one NAPI instance working on behalf of multiple netdev's,
so that isn't very useful. For my EMAC patch (just posted to you  the
list), I'm not passing NULL, but I'm wondering what would be a good way
to handle netpoll here...

The way it's currently implemented, there's a list of NAPI's attached to
the netdev, so obviously, that won't work for my usage scenario.

I'm not sure what's the best data structure that would be suitable for
both N ndev's for 1 NAPI and 1 ndev for N NAPI's though... I could
allocate stub list heads and queue those up, but that's a bit gross...

Any better idea ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-15 Thread Benjamin Herrenschmidt
net: Add __napi_sycnhronize() to sync with napi poll

The EMAC driver which needs to handle multiple devices with one
NAPI instance implements its own per-channel disable bit. However,
when setting such a bit, it needs to synchronize with the poller
(that is make sure that any pending poller instance has completed,
or is started late enough to see that disable bit).

This implements a low level __napi_synchronize() function to acheive
that. The underscores are to emphasis the low level aspect of it and
to discourage driver writers who don't know what they are doing to
use it (to please DaveM :-)

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

(Use correct address for Stephen this time)

If the approach is accepted, I would like to have this merged now
so the EMAC patch to make it work again can follow :-)

Note: I use msleep_interruptible(1); just like napi_disable(). However
I'm not too happy that the hot loop that results of a pending signal
here will spin without even a cpu_relax ... what do you guys think would
be the best way to handle this ?



Index: linux-work/include/linux/netdevice.h
===
--- linux-work.orig/include/linux/netdevice.h   2007-10-16 15:27:27.0 
+1000
+++ linux-work/include/linux/netdevice.h2007-10-16 15:27:38.0 
+1000
@@ -394,6 +394,21 @@ static inline void napi_disable(struct n
 }
 
 /**
+ * __napi_synchronize - synchronize with a concurrent poll
+ * @n: napi context
+ *
+ * Synchronizes with a concurrent poll. Not to be used in normal
+ * drivers, mostly useful if you end up with multiple interfaces
+ * on one NAPI instance.
+ */
+static inline void __napi_synchronize(struct napi_struct *n)
+{
+   smp_mb();
+   while (test_bit(NAPI_STATE_SCHED, n-state))
+   msleep_interruptible(1);
+}
+
+/**
  * napi_enable - enable NAPI scheduling
  * @n: napi context
  *


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
 Yinghai Lu wrote:
  On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:

  Someone around with a MSI capable board? The forcedeth driver does
  dev-irq = pci_dev-irq
  in nv_probe(), especially before pci_enable_msi().
  Does pci_enable_msi() change pci_dev-irq? Then we would disable the
  wrong interrupt
  
 
  the request_irq==setup_irq will make dev-irq = pci_dev-irq.
 

 Where is that?
 Otherwise I would propose the attached patch. My board is not 
 MSI-capable, thus I can't test it myself.

Why not just copy pcidev-irq to dev-irq once ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MSI interrupts and disable_irq

2007-10-14 Thread Benjamin Herrenschmidt

On Sun, 2007-10-14 at 16:15 -0700, Yinghai Lu wrote:
 On 10/14/07, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
 
  On Sun, 2007-10-14 at 09:15 +0200, Manfred Spraul wrote:
   Yinghai Lu wrote:
On 10/13/07, Manfred Spraul [EMAIL PROTECTED] wrote:
   
Someone around with a MSI capable board? The forcedeth driver does
dev-irq = pci_dev-irq
in nv_probe(), especially before pci_enable_msi().
Does pci_enable_msi() change pci_dev-irq? Then we would disable the
wrong interrupt
   
   
the request_irq==setup_irq will make dev-irq = pci_dev-irq.
   
   
   Where is that?
   Otherwise I would propose the attached patch. My board is not
   MSI-capable, thus I can't test it myself.
 
  Why not just copy pcidev-irq to dev-irq once ?
 
 it seems e1000 is using np-pci_dev-irq directly too.

Heh, allright, doesn't matter, I was just proposing to avoid one more
indirection :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Benjamin Herrenschmidt

On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:
 A key problem I was hoping would be solved with your work here was
 the 
 elimination of that post dma_map_sg() split.
 
 If I understood James and Ben correctly, one of the key problems was 
 always in communicating libata's segment boundary needs to the IOMMU
 layers?

Yup. If we can put some constraint in struct device that the dma mapping
code can then look at ... we also need to ensure that what's passed in
for DMA'ing already matches those constraints as well since no-iommu
platforms will basically just keep the dma table as-is.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2][BNX2]: Add iSCSI support to BNX2 devices.

2007-09-27 Thread Benjamin Herrenschmidt

On Thu, 2007-09-27 at 03:49 -0400, Jeff Garzik wrote:
 Benjamin Herrenschmidt wrote:
  On Thu, 2007-09-27 at 03:31 -0400, Jeff Garzik wrote:
  A key problem I was hoping would be solved with your work here was
  the 
  elimination of that post dma_map_sg() split.
 
  If I understood James and Ben correctly, one of the key problems was 
  always in communicating libata's segment boundary needs to the IOMMU
  layers?
  
  Yup. If we can put some constraint in struct device that the dma mapping
  code can then look at ... we also need to ensure that what's passed in
  for DMA'ing already matches those constraints as well since no-iommu
  platforms will basically just keep the dma table as-is.
 
 That's a good point...  no-iommu platforms would need to be updated to 
 do the split for me.  I suppose we can steal that code from swiotlb or 
 somewhere.

Doing the split means being able to grow the sglist... which the dma_*
calls can't do at least not in their current form.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: modpost warning question

2007-07-25 Thread Benjamin Herrenschmidt
On Wed, 2007-07-25 at 02:14 -0500, Kumar Gala wrote:
 I'm seeing the following warning:
 
 WARNING: vmlinux.o(.init.text+0x1acdc): Section mismatch: reference to
 .exit.text:gfar_mdio_exit (between 'gfar_init' and 'gfar_mdio_init')
 
 I don't understand why its not ok to access .exit.text from .init.text
 
 The following addresses the issue, however I don't particularly like it:

Because exit.text is removed when compiling built-in

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH 4/10] spidernet: zero out a pointer.

2007-05-22 Thread Benjamin Herrenschmidt

 Here's a delusional reply: I didn't see any point to it. 
 1) a wmb would add overhead
 2) the hardware is supposed to be looking at the status flag,
anyway, and not misbehaving.
 3) there is a wmb when the descr is actually refilled in such
a way as to actually mean something to the hardware.
 
 All that I really acomplished here is a minor trick to 
 aid in debug printing when looking for something bad.

And the whole thing is moot because 0 is actually a perfectly valid DMA
address :-) I suspect spider will end up trying to hit some internal
register or it's PCIe or whatever it has mapped at 0 internally and will
blow up ... At least on spider, it's not RAM there

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resending: RT patches expose netdev race [was Re: [RFC] [patch 2/2] powerpc 2.6.21-rt1: fix kernel hang and/or panic

2007-05-16 Thread Benjamin Herrenschmidt

 I do not know why sk_buff-head would be null, or
 would be set in a racy kind of way, or why the rt patches
 would cause this. But the evidence implicates that.

Would it be possible that a locking bug in spidernet would cause it
under some circumstances to get a stale skb pointer ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] spidernet: Fix problem sending IP fragments

2007-03-03 Thread Benjamin Herrenschmidt
Geoff, I suspect gelic_net might have the same problem...

Cheers,
Ben.

On Fri, 2007-03-02 at 18:39 +0100, Norbert Eicker wrote:
 On Fri 2.3.2007 00:34, Linas Vepstas wrote:
  On Thu, Mar 01, 2007 at 04:52:54PM -0600, Chris Engel wrote:
   I tried to apply this patch to 2.6.21-rc2 and CHECKSUM_HW appears
   to be changed to CHECKSUM_COMPLETE
 
 Oops. I did not test this on the actual 2.6.21-rc2 before sending it.
 It worked fine for me on 2.6.18.
 
 In the meantime it tested the patch below on 2.6.21.
 
  The use of CHECKSUM_HW was replaced by CHECKSUM_PARTIAL and
  CHECKSUM_COMPLETE on a cae-by-case basis, in the patch series leading
  up to 2.6.19.  In this case, I'm not sure which should have been
  used.
 
 In fact CHECKSUM_COMPLETE seems to be used on the receiving side while
 CHECKSUM_PARTIAL is the one to be used while sending frames. Thus the
 latter is the one to chose.
 
  Norbert, can you resubmit a patch that applies to a more recent
  kernel? p.s. your emailer replaced tabs by spaces ...
 
 so here's the new one:
 
 Fix problem sending IP fragments on spidernet.
 
 Signed-off-by: Norbert Eicker [EMAIL PROTECTED]
 ---
 diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
 index 3b91af8..e3019d5 100644
 --- a/drivers/net/spider_net.c
 +++ b/drivers/net/spider_net.c
 @@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide
   SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS;
   spin_unlock_irqrestore(chain-lock, flags);
 
 - if (skb-protocol == htons(ETH_P_IP))
 + if (skb-protocol == htons(ETH_P_IP)  skb-ip_summed == 
 CHECKSUM_PARTIAL)
   switch (skb-nh.iph-protocol) {
   case IPPROTO_TCP:
   hwdescr-dmac_cmd_status |= SPIDER_NET_DMAC_TCP;
 
 ___
 Linuxppc-dev mailing list
 [EMAIL PROTECTED]
 https://ozlabs.org/mailman/listinfo/linuxppc-dev

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehea: Optional TX/RX path optimized for SMP

2007-03-03 Thread Benjamin Herrenschmidt
On Sat, 2007-03-03 at 04:06 +0100, Andi Kleen wrote:
 Jan-Bernd Themann [EMAIL PROTECTED] writes:
  
  Are there any concerns about this approach?
 
 Yes. You should fix the NAPI code instead of trying to work
 around it.

NAPI is being fixed but the fix will take time to get in. In the
meantime, the solution to get something working is the workaround.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v3] spidernet : add improved phy support

2007-02-12 Thread Benjamin Herrenschmidt

 +static int bcm5461_enable_fiber(struct mii_phy* phy, int autoneg)
 +{
 + /* select fiber mode, enable 1000 base-X registers */
 + phy_write(phy, MII_NCONFIG, 0xfc0b);
 +
 + if (autoneg) {
 + /* enable fiber with no autonegotiation */
 + phy_write(phy, MII_ADVERTISE, 0x01e0);
 + phy_write(phy, MII_BMCR, 0x1140);
 + } else {
 + /* enable fiber with autonegotiation */
 + phy_write(phy, MII_BMCR, 0x0140);
 + }
 +
 + phy-autoneg = autoneg;
 +
 + return 0;
 +}

Something is backward... either the code or the comments...

Also, I dislike the autoneg bits without using any constants. Why not
use the normal setup_aneg() ? I think what is needed is a set_link_mode
or something like that that takes (fiber/copper) as an argument (or
better, use the proper names as documented by the chip).

Doesn't matter to much right now, if the code works. Have you had a
chance to check you don't break G5s with 5421 sungem ? If yes, then the
patch is good to go for now but we should revisit and/or try to merge
that with the generic PHY layer at one point.

Ben.
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] spidernet: support for Celleb

2007-02-12 Thread Benjamin Herrenschmidt
On Mon, 2007-02-12 at 17:49 -0600, Linas Vepstas wrote:
 On Wed, Feb 07, 2007 at 05:49:50PM +0900, Ishizaki Kou wrote:
  This is a revised spidernet patch set based on
  netdev-2.6.git#upstream.
 
 Tested this series of patches together with Jen's version 3 patch,
 it worked for me. Code looks reasonable.  Thus
 
 Acked-by: Linas Vepstas [EMAIL PROTECTED]
 
 I'm somewhat unclear as to whethr I should be doing 
 
 Signed-off-by: Linas Vepstas [EMAIL PROTECTED]

If the patch wasn't actually handled and passed-on by you or modified by
you, then no. Acked-by is the way to go.

Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED] too while at
it.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] spidernet: support for Celleb

2007-02-07 Thread Benjamin Herrenschmidt
On Wed, 2007-02-07 at 17:15 -0500, Jeff Garzik wrote:
 Ishizaki Kou wrote:
  This is a revised spidernet patch set based on
  netdev-2.6.git#upstream.
  
  This patch set is merged Jens-san's spidernet patch and works on
  Toshiba Cell reference set (aka Celleb). 
  It requires Jens-san's phy patch
  (http://ozlabs.org/pipermail/linuxppc-dev/2007-February/030987.html).
 
 I'm unsure of the status of the phy patch, so will put this patchset on 
 hold until that is resolved

PHY patch is mostly Ok except I don't like that medium variable Jens
added which isn't used anywhere in sungem_phy. Jens, can you resend a
version without that ? If you need that variable in spidernet itself,
then put it there :-)

Also, the GMII_* constants, are they standard or 54xx specific ? If the
later, then change the name to reflect that.

Once I have the new patch, I'll give it a test on G5 and if it's ok, it
will be good to go for 2.6.21

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] spidernet: dynamic phy setup code

2007-02-04 Thread Benjamin Herrenschmidt
On Thu, 2007-02-01 at 12:04 +0100, Jens Osterkamp wrote:
 Ishizaki-san,
 
  This patch partially works on celleb but remains 
  following several problems.
  1. It doesn't recover once an ethernet cable which is
 connected to a spider_net card is unpluged. 
 
 My understanding is that you are using the LINK interrupt to detect this.
 For the blade this is not connected but reenabling it wont hurt, I hope.

I would suggest just polling from a delayed work or a timer like sungem
does.

 I still dont see why you need different settings for different speed switches.
 This is getting to a point where access to some hardware would be handy.
 What exact phy are using anyway ?

Yeah, same question...

  Furthermore, we have a problem that poll_link() may succeed even when
  the auto-neg initial setting is for different network switch type,
  and the network card does not work on this case. We retry auto-neg
  with the another initial setting on this case.
 
 See above, could you give some more details why this is the case. Or maybe Ben
 knows more about this ?

No, I'm surprised too.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spidernet: dynamic phy setup code

2007-02-04 Thread Benjamin Herrenschmidt

 We use bcm5461. There is a possibility that we don't know the appropriate
 setting which is applicable for both type of switches.

Have you tested the existing 54xx code in sungem_phy.c ? We use that
with 5462 at least in K2 and all sorts of 54xx chips and it works
fine... Just setup the right advertisement bits and call setup_aneg in
the PHY ops. You don't even need to implement the forced fallback code
that is in sungem. It's not necessary with most broadcom PHYs as they do
that themselves, just setup aneg and poll the link from a timer, that's
it. Once you get a link, then setup your GMACMODE based on the link
speed.

I don't have the datasheet of the 5461 at hand but I doubt it's any
different... Like other Broadcom 54xx PHYs, it might need some special
initialization code to work around firmware bugs though...

 We didn't investigate for the detail, but we met the following phenomena.
 1. When auto-neg starts with Gbps setting and ethernet card is connected to
a 100Mbps switch, LINK is not detected.
 2. When auto-neg starts with 100/10Mbps setting and ethernet card is 
connected to Gbps switch, LINK is detected (poll_link() succeeds), but
the network is not available.

That is very strange... I would need to review your code in more details
or eventually have HW access to run my own experiments, but none of this
should happen if things are setup properly. Also avoid relying on the
link interrupt, it's a known cause of trouble. Just poll.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: No Subject

2007-02-03 Thread Benjamin Herrenschmidt

 We need to use different auto-neg initial settings between
 for 10/100Mbps ethernet switches and for Gbps ethernet switches.

That is strange ! What PHY chip are you using ? Are you sure it's not
just you not properly configuring the PHY ? Is the datasheet for the PHY
available somewhere ?

 Driver don't know which type of network switch is connected to
 network card, so we try both settings alternately in auto negtiation
 sequences by using a variable is1000.

sungem has an autoneg sequence that falls back to forced speeds but it's
not useful on any modern setup. Your PHY should be perfectly capable to
autoneg on both 1000bT and 10/100bT...

 Furthermore, we have a problem that poll_link() may succeed even when
 the auto-neg initial setting is for different network switch type,
 and the network card does not work on this case. We retry auto-neg
 with the another initial setting on this case.

Ugh ? What is that initial setting bit exactly ? If the link is up, it
should work.

 - spider_net_write_reg(card, SPIDER_NET_GMACST,
 -spider_net_read_reg(card, SPIDER_NET_GMACST));
 - spider_net_write_reg(card, SPIDER_NET_GMACINTEN, 0x4);
 
 These codes are enabling LINK status interrupt which is disabled
 at the beginning of auto-neg.
 Without this operation, auto negotiation works only when a connection
 detected for the first time, and auto negotiation will not work 
 when an ethernet cable is unpluged or pluged.

Most drivers poll the link rather than use an interrupt as those are
often unreliable.

 (3)
 - mii_phy_probe(phy, phy-mii_id);
 It seems that PHY reset is necessary before auto negotiation,
 after a link once went down.

It shouldn't be... again, what PHY are you using ?

 We can't call directly reset routine from driver, so we call
 mii_phy_probe().

If you really need to reset it, then change sungem_phy.c to export the
reset function. But I'm surprised you need that. Another option is to
reset the PHY in your PHY's setup_aneg() function.

 We are still developping the patch as we noted, and we are considering
 to call mii_phy_probe() from spider_net_setup_aneg(), or to call
 reset_one_mii_phy() from bcm54xx_setup_aneg().
 
 We think these (1)-(3) are necessary, but we are afraid that you removed
 them
 by a reason that they causes some trouble in Cell Blade. If so please
 tell us.

You might want to borrow the link state machine from sungem.c instead...
it tends to just work :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spidernet: dynamic phy setup code

2007-01-26 Thread Benjamin Herrenschmidt
On Fri, 2007-01-26 at 14:09 +0100, Jens Osterkamp wrote:
 This patch modifies the patch submitted by Kou Ishizaki to make it work on the
 blade (http://marc.theaimsgroup.com/?l=linux-netdevm=116593424505539w=2).
 Unfortunately I dont have access to a Celleb so I cannot test it there.
 
 The basic logic behind this is simple : when the interface first comes up
 it tries to detect the phy. When the phy is detected, it is initially set up
 to use copper. A timer is set to check the link status in spidernet_link_phy
 using poll_link. If link check fails more than SPIDER_NET_ANEG_TIMEOUT
 times, it switches to fiber link with autonegotiation. If that fails more
 than SPIDER_NET_ANEG_TIMEOUT times, it switches to fiber link with no
 autonegotiation. If that also fails, it goes back to copper, and so forth.
 If the link comes up, it prints the link abilites and we are done.
 
 The name of the phy found attached to the spider chip is reported to the
 user. Hardcoded values for the timeout are moved to #defines. I put in a
 few comments to make the code more readable.

Can't we have a device-tree property indicating wether to use fiber or
copper ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: spidernet: add improved phy support in sungem_phy.c

2007-01-26 Thread Benjamin Herrenschmidt
On Fri, 2007-01-26 at 14:07 +0100, Jens Osterkamp wrote:
 This patch adds improved version of enable_fiber for both the 5421 and
 the 5461 phy. It is now possible to specify with these wether you want
 autonegotiation or not. This is needed for bladecenter switches where
 some expect autonegotiation and some dont seem to like this at all.
 Depending on this flag it sets phy-autoneg accordingly for the fiber mode.
 
 More importantly it implements proper read_link and poll_link functions
 for both phys which can handle both copper and fiber mode by determining
 the medium first and then branching to the required functions. For fiber
 they all work fine, for copper they are not tested but return the result
 of the genmii_* function anyway which is supposed to work.
 
 A medium variable in the phy struct is introduced to save the medium
 with which the phy is currently used.
 
 The patch moves the genmii_* functions around to avoid foreward declarations.
 
 Signed-off-by: Jens Osterkamp [EMAIL PROTECTED]
 Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Patch is whitespace damaged...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] spidernet: add support for Celleb

2007-01-17 Thread Benjamin Herrenschmidt
On Wed, 2007-01-17 at 19:00 +0900, Ishizaki Kou wrote:
 Dear everyone,
 
 This is a revised version of the patch set for spider_net driver
 that works on Toshiba Cell Refererence Set (aka Celleb).
 
 This patch set is based on netdev-2.6.git#upstream.

Jens, can you give that a go on our blades see if it doesn't break
anything ? We are all at LCA here so it's a bit hard to test !

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/59] sysctl: ipmi remove unnecessary insert_at_head flag

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote:
 From: Eric W. Biederman [EMAIL PROTECTED] - unquoted
 
 With unique sysctl binary numbers setting insert_at_head is pointless.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]

 ---
  drivers/char/ipmi/ipmi_poweroff.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/char/ipmi/ipmi_poweroff.c 
 b/drivers/char/ipmi/ipmi_poweroff.c
 index 9d23136..b3ae65e 100644
 --- a/drivers/char/ipmi/ipmi_poweroff.c
 +++ b/drivers/char/ipmi/ipmi_poweroff.c
 @@ -686,7 +686,7 @@ static int ipmi_poweroff_init (void)
   printk(KERN_INFO PFX Power cycle is enabled.\n);
  
  #ifdef CONFIG_PROC_FS
 - ipmi_table_header = register_sysctl_table(ipmi_root_table, 1);
 + ipmi_table_header = register_sysctl_table(ipmi_root_table, 0);
   if (!ipmi_table_header) {
   printk(KERN_ERR PFX Unable to register powercycle sysctl\n);
   rv = -ENOMEM;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 36/59] sysctl: C99 convert ctl_tables entries in arch/ppc/kernel/ppc_htab.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote:
 From: Eric W. Biederman [EMAIL PROTECTED] - unquoted
 
 And make the mode of the kernel directory 0555 no one is allowed
 to write to sysctl directories.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]

 ---
  arch/ppc/kernel/ppc_htab.c |   11 ---
  1 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/arch/ppc/kernel/ppc_htab.c b/arch/ppc/kernel/ppc_htab.c
 index bd129d3..77b20ff 100644
 --- a/arch/ppc/kernel/ppc_htab.c
 +++ b/arch/ppc/kernel/ppc_htab.c
 @@ -442,11 +442,16 @@ static ctl_table htab_ctl_table[]={
   .mode   = 0644,
   .proc_handler   = proc_dol2crvec,
   },
 - { 0, },
 + {}
  };
  static ctl_table htab_sysctl_root[] = {
 - { 1, kernel, NULL, 0, 0755, htab_ctl_table, },
 - { 0,},
 + {
 + .ctl_name   = CTL_KERN,
 + .procname   = kernel,
 + .mode   = 0555,
 + .child  = htab_ctl_table,
 + },
 + {}
  };
  
  static int __init

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 35/59] sysctl: C99 convert ctl_tables in arch/powerpc/kernel/idle.c

2007-01-16 Thread Benjamin Herrenschmidt
On Tue, 2007-01-16 at 09:39 -0700, Eric W. Biederman wrote:
 From: Eric W. Biederman [EMAIL PROTECTED] - unquoted
 
 This was partially done already and there was no ABI breakage what
 a relief.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]

 ---
  arch/powerpc/kernel/idle.c |   11 ---
  1 files changed, 8 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
 index 8994af3..8b27bb1 100644
 --- a/arch/powerpc/kernel/idle.c
 +++ b/arch/powerpc/kernel/idle.c
 @@ -110,11 +110,16 @@ static ctl_table powersave_nap_ctl_table[]={
   .mode   = 0644,
   .proc_handler   = proc_dointvec,
   },
 - { 0, },
 + {}
  };
  static ctl_table powersave_nap_sysctl_root[] = {
 - { 1, kernel, NULL, 0, 0755, powersave_nap_ctl_table, },
 - { 0,},
 + {
 + .ctl_name   = CTL_KERN,
 + .procname   = kernel,
 + .mode   = 0755,
 + .child  = powersave_nap_ctl_table,
 + },
 + {}
  };
  
  static int __init

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sungem: PHY updates pause fixes

2007-01-04 Thread Benjamin Herrenschmidt
On Thu, 2007-01-04 at 21:57 +0100, Eric Lemoine wrote:
 On 1/4/07, David Miller [EMAIL PROTECTED] wrote:
  From: Eric Lemoine [EMAIL PROTECTED]
  Date: Thu, 4 Jan 2007 21:06:48 +0100
 
   On 1/4/07, David Miller [EMAIL PROTECTED] wrote:
I've applied that patch, thanks.
  
   David, I suppose you've applied the locking patch as well...
 
  No, not yet.
 
  Your locking patches are definitely 2.6.21 material, but
  Ben's changes fix PAUSE handling bugs so I want to
  put his patch into 2.6.20
 
  I'll put your locking bits into 2.6.21, don't worry :-)
 
 I'm relieved now :-)
 
 I was asking because Ben said his patch applied on top of mine.

Well, I did it with yours applied already but it seems like they don't
really conflict so they can be applied pretty much in any order.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sungem: PHY updates pause fixes

2007-01-02 Thread Benjamin Herrenschmidt
This patch adds support for a few more PHYs used by Apple and fixes
advertising and detecting of Pause (we were missing setting the bit in
MII_ADVERTISE and weren't testing in LPA for all PHYs). I only do it for
gigabit capable PHYs for now.

Note that I currently only advertise pause, not asymetric pause. I don't
know for sure the details there, I suppose I should read a bit more
802.3 references, and I don't now what sungem is capable of, but I
noticed the PCS code (originated from you) does the same.

Unfortunately, whatever switches we have here also seem to only support
asym. pause, so while I did a quick test to verify that pause is
properly enabled on a cross-over cable to another machine, I still get
occasional RX fifo overflows due to pause support lacking on our
internal network.

So let me know if asym. pause is something we can support with sungem.
In which case, it shouldn't be very hard to add in a subsequent patch.

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

(Applies on top of Eric's locking patch)

Index: linux-work/drivers/net/sungem_phy.c
===
--- linux-work.orig/drivers/net/sungem_phy.c2007-01-03 12:04:12.0 
+1100
+++ linux-work/drivers/net/sungem_phy.c 2007-01-03 15:51:39.0 +1100
@@ -3,10 +3,9 @@
  *
  * This file could be shared with other drivers.
  *
- * (c) 2002, Benjamin Herrenscmidt ([EMAIL PROTECTED])
+ * (c) 2002-2007, Benjamin Herrenscmidt ([EMAIL PROTECTED])
  *
  * TODO:
- *  - Implement WOL
  *  - Add support for PHYs that provide an IRQ line
  *  - Eventually moved the entire polling state machine in
  *there (out of the eth driver), so that it can easily be
@@ -15,7 +14,7 @@
  *to read the link status. Figure out why and if it makes
  *sense to do the same (magic aneg ?)
  *  - Apple has some additional power management code for some
- *Broadcom PHYs that they hide from the OpenSource version
+  *Broadcom PHYs that they hide from the OpenSource version
  *of darwin, still need to reverse engineer that
  */
 
@@ -152,6 +151,44 @@ static int bcm5221_suspend(struct mii_ph
return 0;
 }
 
+static int bcm5241_init(struct mii_phy* phy)
+{
+   u16 data;
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data | MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_STAT2);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_STAT2,
+   data | MII_BCM5221_SHDOW_AUX_STAT2_APD);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_MODE4);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_MODE4,
+   data  ~MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR);
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data  ~MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   return 0;
+}
+
+static int bcm5241_suspend(struct mii_phy* phy)
+{
+   u16 data;
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data | MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_MODE4);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_MODE4,
+ data | MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR);
+
+   return 0;
+}
+
 static int bcm5400_init(struct mii_phy* phy)
 {
u16 data;
@@ -373,6 +410,10 @@ static int bcm54xx_setup_aneg(struct mii
adv |= ADVERTISE_100HALF;
if (advertise  ADVERTISED_100baseT_Full)
adv |= ADVERTISE_100FULL;
+   if (advertise  ADVERTISED_Pause)
+   adv |= ADVERTISE_PAUSE_CAP;
+   if (advertise  ADVERTISED_Asym_Pause)
+   adv |= ADVERTISE_PAUSE_ASYM;
phy_write(phy, MII_ADVERTISE, adv);
 
/* Setup 1000BT advertise */
@@ -441,7 +482,7 @@ static int bcm54xx_read_link(struct mii_
SPEED_1000 :
(phy_BCM5400_link_table[link_mode][1] ? 
SPEED_100 : SPEED_10);
val = phy_read(phy, MII_LPA);
-   phy-pause = ((val  LPA_PAUSE) != 0);
+   phy-pause = (phy-duplex == DUPLEX_FULL)  ((val  LPA_PAUSE) 
!= 0);
}
/* On non-aneg, we assume what we put in BMCR is the speed,
 * though magic-aneg shouldn't prevent this case from occurring
@@ -450,6 +491,28 @@ static int bcm54xx_read_link(struct mii_
return 0;
 }
 
+static int marvell88e_init(struct mii_phy* phy)
+{
+   u16 rev;
+
+   /* magic init sequence for rev 0 */
+   rev = phy_read(phy, MII_PHYSID2)  0x000f;
+   if (rev == 0) {
+   phy_write(phy, 0x1d, 0x000a);
+   phy_write(phy, 0x1e, 0x0821);
+
+   phy_write(phy, 0x1d, 0x0006);
+   phy_write(phy, 0x1e, 0x8600);
+
+   phy_write(phy, 0x1d, 0x000b);
+   phy_write(phy, 0x1e, 0x0100);
+
+   phy_write(phy, 0x1d

Re: [PATCH] sungem: PHY updates pause fixes

2007-01-02 Thread Benjamin Herrenschmidt

 Thanks for finding these bugs, although that's really strange pause
 behavior you are seeing on your switches.
 
 By default, we advertise PAUSE but not ASYM PAUSE in the tg3 driver,
 and I get flow control on every switch I have here.

Yeah, that's strange. I still have the debug values at hand:

 - I advertise 0x5e1
 - I read in LPA 0xc9e1 from the switch
(and my bcm PHY tells me Rx and Tx pause disabled in a separate
register)

Now, I cross-over with a TG3 and I get:

 - I advertise 0x5e1 (hopefully same value)
 - I read in LPA 0xc5e1 from the TG3
(and that other register tells me Rx and Tx pause can be enabled).

 You should try to use flow control, even slower than 1000Mbit links.

Sorry, not sure I parse ;-) You mean allowing pause on 10 and 100 as
well ? I sure can, it's easy to fix.

 That's the only problem I can see, would you mind fixing that and
 I'll put your change into my net-2.6 tree and perhaps play around
 with PAUSE on my switches here?

Sure will do.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sungem: PHY updates pause fixes (#2)

2007-01-02 Thread Benjamin Herrenschmidt
This patch adds support for a few more PHYs used by Apple and fixes
advertising and detecting of Pause (we were missing setting the bit in
MII_ADVERTISE and weren't testing in LPA for all PHYs).

Note that I currently only advertise pause, not asymetric pause. I don't
know for sure the details there, I suppose I should read a bit more
802.3 references, and I don't now what sungem is capable of, but I
noticed the PCS code (originated from you) does the same.

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

(Applies on top of Eric's locking patch)

This version advertise pause support for all speeds. Note that I think
pause support was never broken for PCS, only for MII.

Index: linux-work/drivers/net/sungem_phy.c
===
--- linux-work.orig/drivers/net/sungem_phy.c2007-01-03 12:04:12.0 
+1100
+++ linux-work/drivers/net/sungem_phy.c 2007-01-03 16:28:55.0 +1100
@@ -3,10 +3,9 @@
  *
  * This file could be shared with other drivers.
  *
- * (c) 2002, Benjamin Herrenscmidt ([EMAIL PROTECTED])
+ * (c) 2002-2007, Benjamin Herrenscmidt ([EMAIL PROTECTED])
  *
  * TODO:
- *  - Implement WOL
  *  - Add support for PHYs that provide an IRQ line
  *  - Eventually moved the entire polling state machine in
  *there (out of the eth driver), so that it can easily be
@@ -152,6 +151,44 @@ static int bcm5221_suspend(struct mii_ph
return 0;
 }
 
+static int bcm5241_init(struct mii_phy* phy)
+{
+   u16 data;
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data | MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_STAT2);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_STAT2,
+   data | MII_BCM5221_SHDOW_AUX_STAT2_APD);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_MODE4);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_MODE4,
+   data  ~MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR);
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data  ~MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   return 0;
+}
+
+static int bcm5241_suspend(struct mii_phy* phy)
+{
+   u16 data;
+
+   data = phy_read(phy, MII_BCM5221_TEST);
+   phy_write(phy, MII_BCM5221_TEST,
+   data | MII_BCM5221_TEST_ENABLE_SHADOWS);
+
+   data = phy_read(phy, MII_BCM5221_SHDOW_AUX_MODE4);
+   phy_write(phy, MII_BCM5221_SHDOW_AUX_MODE4,
+ data | MII_BCM5241_SHDOW_AUX_MODE4_STANDBYPWR);
+
+   return 0;
+}
+
 static int bcm5400_init(struct mii_phy* phy)
 {
u16 data;
@@ -373,6 +410,10 @@ static int bcm54xx_setup_aneg(struct mii
adv |= ADVERTISE_100HALF;
if (advertise  ADVERTISED_100baseT_Full)
adv |= ADVERTISE_100FULL;
+   if (advertise  ADVERTISED_Pause)
+   adv |= ADVERTISE_PAUSE_CAP;
+   if (advertise  ADVERTISED_Asym_Pause)
+   adv |= ADVERTISE_PAUSE_ASYM;
phy_write(phy, MII_ADVERTISE, adv);
 
/* Setup 1000BT advertise */
@@ -436,12 +477,15 @@ static int bcm54xx_read_link(struct mii_
val = phy_read(phy, MII_BCM5400_AUXSTATUS);
link_mode = ((val  MII_BCM5400_AUXSTATUS_LINKMODE_MASK) 
 MII_BCM5400_AUXSTATUS_LINKMODE_SHIFT);
-   phy-duplex = phy_BCM5400_link_table[link_mode][0] ? 
DUPLEX_FULL : DUPLEX_HALF;
+   phy-duplex = phy_BCM5400_link_table[link_mode][0] ?
+   DUPLEX_FULL : DUPLEX_HALF;
phy-speed = phy_BCM5400_link_table[link_mode][2] ?
SPEED_1000 :
-   (phy_BCM5400_link_table[link_mode][1] ? 
SPEED_100 : SPEED_10);
+   (phy_BCM5400_link_table[link_mode][1] ?
+SPEED_100 : SPEED_10);
val = phy_read(phy, MII_LPA);
-   phy-pause = ((val  LPA_PAUSE) != 0);
+   phy-pause = (phy-duplex == DUPLEX_FULL) 
+   ((val  LPA_PAUSE) != 0);
}
/* On non-aneg, we assume what we put in BMCR is the speed,
 * though magic-aneg shouldn't prevent this case from occurring
@@ -450,6 +494,28 @@ static int bcm54xx_read_link(struct mii_
return 0;
 }
 
+static int marvell88e_init(struct mii_phy* phy)
+{
+   u16 rev;
+
+   /* magic init sequence for rev 0 */
+   rev = phy_read(phy, MII_PHYSID2)  0x000f;
+   if (rev == 0) {
+   phy_write(phy, 0x1d, 0x000a);
+   phy_write(phy, 0x1e, 0x0821);
+
+   phy_write(phy, 0x1d, 0x0006);
+   phy_write(phy, 0x1e, 0x8600);
+
+   phy_write(phy, 0x1d, 0x000b);
+   phy_write(phy, 0x1e, 0x0100);
+
+   phy_write(phy, 0x1d, 0x0004);
+   phy_write(phy, 0x1e, 0x4850);
+   }
+   return 0;
+}
+
 static int

Re: [PATCH] sungem: PHY updates pause fixes

2007-01-02 Thread Benjamin Herrenschmidt
 The one with only asym. support is a big Cisco Catalyst 3350 (well.. big
 but not that many ports :-)

Ok, I got in the config of the switch with somebody who knows how to
speak ciscong, and it seems that it defaults to flow control desired
for send and off for receive on all ports, which means it will indeed
only advertise support for asymetrical flow control.

We've changed the setting for the port on which my sungem g5 is
connected to desired on both send and receive, and it now advertises
flow control on both, and sungem does pick it up properly.

So it's indeed the default setting of those cisco switches that seem to
not quite match what drivers like sungem or tg3 like.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-29 Thread Benjamin Herrenschmidt
On Thu, 2006-12-28 at 21:05 -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Wed, 13 Dec 2006 15:07:24 +1100
 
  tg3 says
  
  tg3: eth0: Link is up at 1000 Mbps, full duplex.
  tg3: eth0: Flow control is on for TX and on for RX.
  
  but sungem says
  
  eth0: Link is up at 1000 Mbps, full-duplex.
  eth0: Pause is disabled
  
  Hrm... I suppose I need to dig more. No time to do that today though.
 
 I was about to try and debug this, and noticed immediately that I
 didn't recognize any of the code.
 
 Could you look into this, you rewrote all of this stuff and this
 looks like a regression added, because I know this pause stuff
 used to work perfectly when I wrote the original GEM driver. :-)

Heh, it's very possible it's a regression I added indeed. I'll try to
have a look next week. Do you know of anybody who can verify on non-mii
hardware or is pause irrelevant there ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Generic PHY lib vs. locking

2006-12-28 Thread Benjamin Herrenschmidt

 Great!  At last glance, only gianfar, fs_enet, and au1000_eth.  There  
 are one or two others that haven't gone in, yet.  My hope is that  
 your changes will not require any changes to the drivers, but I'll  
 leave that to your discretion.

Unfortunately, it will probably have an impact on them. I'll have a
look.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netif_poll_enable() barrier

2006-12-28 Thread Benjamin Herrenschmidt
On Thu, 2006-12-28 at 21:09 -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Wed, 20 Dec 2006 14:44:12 +1100
 
  I stumbled accross what might be a bug on out of order architecture:
  
  netif_poll_enable() only does a clear_bit(). However,
  netif_poll_disable/enable pairs are often used as simili-spinlocks.
  
  (netif_poll_enable() has pretty much spin_lock semantics except that it
  schedules instead of looping).
  
  Thus, shouldn't netif_poll_disable() do an smp_wmb(); before clearing
  the bit to make sure that any stores done within the poll-disabled
  section are properly visible to the rest of the system before clearing
  the bit ?
 
 Although I couldn't find a problematic case with any current
 in-tree drivers, it's better to be safe than sorry :-)
 
 So I'll add a smp_mb__before_clear_bit() to netif_poll_enable() :)

Heh, thanks ! :-)

I haven't seen any problematic case neither, though if there was one, it
would result in weird problems very hard to track down, so as you said,
better safe than sorry (unless you see a flaw in my reasoning).

Cheers,

Oh, and happy new year too ! :-)

Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Generic PHY lib vs. locking

2006-12-22 Thread Benjamin Herrenschmidt
On Fri, 2006-12-22 at 10:24 -0500, David Hollis wrote:
 On Fri, 2006-12-22 at 15:07 +1100, Benjamin Herrenschmidt wrote:
  Hi Andy !
  
  I've been looking at porting various drivers (EMAC, sungem,
  spider_net, ...) to the generic PHY stuff. However, I have one
  significant problem here.
  
 
  One solution would be to change it to use a mutex instead of a lock as
  well, though that would change the requirements of where phy_start/stop
  can be called, and use a delayed work queue instead of a timer.
 
 Wouldn't this change also allow it to be used with USB Ethernet devices?
 I was looking at porting the asix.c drive to use the PHY layer but the
 locking killed that effort since USB devices can't do spinlocks without
 hosing things up.

Yes, possibly. At the end of the day, I get the locking in the driver
down to something like:

 - All rx/tx operatations are done between the NAPI poll and the
hard_start_xmit() routine. Locking here is done with the netif_tx_lock
if needed (depends on the driver). Those can't sleep.

 - Everything else is task level and locks against the fast path above
with stopping NAPI poll and stopping tx. I have my
driver_netif_stop/start routines doing that and a bit more (see below).

 - There is at least one issue with set_multicast which is called with
the lock held. What I do here is that my driver_netif_stop above also
take the lock, set a flag tellign set-mcast to say away, release the
lock. start() on the other hand take the lock, clears that flag, checks
if another flag was set by set-mcast telling it was here, and does the
mcast setting if that was the case.

I much prefer that than having most of the driver operations locked with
the tx lock or some other or run at softirq. Some of the reset/recovery
operations can be slow, PHY operations too, and thus it's all better run
at task level. In addition, MDIO access might be able to sleep waiting
on an interrupt signaling the completion of the access.

Ben.
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Generic PHY lib vs. locking

2006-12-21 Thread Benjamin Herrenschmidt
Hi Andy !

I've been looking at porting various drivers (EMAC, sungem,
spider_net, ...) to the generic PHY stuff. However, I have one
significant problem here.

One of the things I've been trying to do lately with EMAC and that I
plan to do with others, is to have the PHY polling entirely operate at
task level (along with other slow parts of the network driver like
timeout handling etc...).

This makes a lot of locking easier, allowing to use mutexes instead of
locks (less latencies), allowing to sleep waiting for MDIO operations to
complete, etc... it's generall all benefit.

It's especially useful in a case like EMAC where several EMACs can share
MDIO lines, so we need exclusive access, and that might involve even a
second layer of exclusion for access to the RGMII or ZMII layer. mutexes
are really the best approach for that sort of non-speed critical
activities.

However, the generic PHY layer defeats that by it's heavy usage of
spin_lock_bh and timer.

One solution would be to change it to use a mutex instead of a lock as
well, though that would change the requirements of where phy_start/stop
can be called, and use a delayed work queue instead of a timer.

I could do all of these changes provided everybody agrees, though I
suppose all existing network drivers using that PHY layer might need to
be adapted. How many do we have nowadays ?

Also, I see your comments about not calling flush_scheduled_work() in
phy_stop() because of rtnl_lock()... What is the problem here ?
dev_close() ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netif_poll_enable() barrier

2006-12-19 Thread Benjamin Herrenschmidt
Hi !

I stumbled accross what might be a bug on out of order architecture:

netif_poll_enable() only does a clear_bit(). However,
netif_poll_disable/enable pairs are often used as simili-spinlocks.

(netif_poll_enable() has pretty much spin_lock semantics except that it
schedules instead of looping).

Thus, shouldn't netif_poll_disable() do an smp_wmb(); before clearing
the bit to make sure that any stores done within the poll-disabled
section are properly visible to the rest of the system before clearing
the bit ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-17 Thread Benjamin Herrenschmidt

 Thanks for testing this stuff.
 
 I'll take a look at the pause-enabling issue in the sungem
 drive then work on integrating Eric's patch.

Ok, thanks.

 Probably we'll need to put this in post-2.6.20 as the merge
 window is closed.

Yup, no hurry there.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-14 Thread Benjamin Herrenschmidt
On Tue, 2006-12-12 at 06:49 +0100, Eric Lemoine wrote:
 On 12/12/06, Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:
  On Tue, 2006-12-12 at 06:33 +0100, Eric Lemoine wrote:
   On 12/12/06, David Miller [EMAIL PROTECTED] wrote:
[...]
Anyways, Eric your changes look fine as far as I can tell, can you
give them a really good testing on some SMP boxes?
  
   Unfortunately I can't, I don't have the hardware (only an old ibook here).
 
  I do however, I'll give it a beating on a dual G5 as soon as I get a
  chance. I'm pretty swamped at the moment and the box is used by somebody
  else today.
 
 Ok, thanks a lot Benjamin.

Patched driver's been running fine for a couple of days  nights with
constant beating... just those RX MAC fifo overflows every now and then
(though they cause no data corruption and no big hit on the driver perfs
neither). I suppose still worth investigating when I have a bit of time,
I must have done something stupid with the pause settings.

In the meantime, Eric's patch is all good.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] split NAPI from network device.

2006-12-14 Thread Benjamin Herrenschmidt
On Wed, 2006-12-13 at 15:46 -0800, Stephen Hemminger wrote:
 Split off NAPI part from network device, this patch is build tested
 only! It breaks kernel API for network devices, and only three examples
 are fixed (skge, sky2, and tg3).
 
 1. Decomposition allows different NAPI - network device
Some hardware has N devices for one IRQ, others like MSI-X
want multiple receive's for one device.
 
 2. Cleanup locking with netpoll
 
 3. Change poll callback arguements and semantics
 
 4. Make softnet_data static (only in dev.c)

Thanks !

I'll give a go at adapting emac and maybe a few more when I get 5mn to
spare...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add init_dummy_netdev() for napi only dummy intefaces

2006-12-13 Thread Benjamin Herrenschmidt
On Wed, 2006-12-13 at 11:35 -0800, Stephen Hemminger wrote:

 Could we split off the NAPI context part of network_device instead?
 I'll work up something for 2.6.21.

That would do the trick too... though I prefer not putting my hands in
network_device too much myself :-)

Ben

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-12 Thread Benjamin Herrenschmidt
On Wed, 2006-12-13 at 14:12 +1100, Benjamin Herrenschmidt wrote:
 Been hitting a raw throughput in both directions plus a few other things
 on a dual G5 and the driver didn't crash :-)
 
 I'm seeing a problem though but I'm not sure it's related to your patch,
 I'll have to test without it.
 
 Basically, if I use a slightly modified versio of tridge's socklib (raw
 xput test, basically, a tcp connection with one side pushing as fast as
 it can a known pattern and the other one just receiving and verifying
 the data integrity), it works fine when running only one side (either rx
 or tx, doesn't matter).
 
 But if I start it both ways (that is both a receiver and a sender on the
 GMAC box) and the other end is a tg3 (quad g5), then I'm getting a lot
 of eth0: RX MAC fifo overflow smac[02045822 but there are other numbers
 here every now and then] errors on the sungem side.
 
 David, could that be the pause stuff not working properly ?
 
 The link is gigabit and the tg3 side doesn't complain about anything.

I've verified that the problem isn't related to Eric's patch and is
there without it... I suppose I just never noticed it before :-)

So far, the patch looks solid. I'll let it run overnight and will holler
if things went wrong tomorrow.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] drivers/net: spidernet driver on Celleb

2006-12-12 Thread Benjamin Herrenschmidt
On Tue, 2006-12-12 at 19:14 -0600, Linas Vepstas wrote:
 On Tue, Dec 12, 2006 at 02:25:50PM +0900, Ishizaki Kou wrote:
  
  Following are the changes.
  -This patch enables auto-negotiation.
  -Loading firmware is done when spidernet_open() is called.
  -And this patch adds other several small changes for Celleb. 
  -This patch is not tested on CellBlade.
 
 I just tested this, and it does not work. Jim Lewis is gone
 until the new year. However, as he was leaving, he grumbled something
 about how autonegotiation simply won't work on the spider. 
 (I didn't think to ask about the details). Perhaps he'll 
 look at his email soon?

Duh ? Autoneg is completely local to the PHY. It will not work on the
Cell blade because it's using a fiber link though, thus we probably need
to disable this code when running on a fiber link.
 
 I've been trying to figure out how to modify the patch to make it 
 work anyway, but so far, no success. 
 
 Basically, in genmii_poll_link(), 
   status = phy_read(phy, MII_BMSR);
 status  BMSR_LSTATUS will always be zero.
 
 So I tried ignoring this value, and calling setup_forced()
 However, this still doesn't get the thing working.
 I am somewhat at a loss to see why right now, since
 I don't see what may be causing this.

I can have a look, most of the code was borrowed from sungem and I wrote
the MII code for it :-)

Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-12 Thread Benjamin Herrenschmidt
Been hitting a raw throughput in both directions plus a few other things
on a dual G5 and the driver didn't crash :-)

I'm seeing a problem though but I'm not sure it's related to your patch,
I'll have to test without it.

Basically, if I use a slightly modified versio of tridge's socklib (raw
xput test, basically, a tcp connection with one side pushing as fast as
it can a known pattern and the other one just receiving and verifying
the data integrity), it works fine when running only one side (either rx
or tx, doesn't matter).

But if I start it both ways (that is both a receiver and a sender on the
GMAC box) and the other end is a tg3 (quad g5), then I'm getting a lot
of eth0: RX MAC fifo overflow smac[02045822 but there are other numbers
here every now and then] errors on the sungem side.

David, could that be the pause stuff not working properly ?

The link is gigabit and the tg3 side doesn't complain about anything.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-12 Thread Benjamin Herrenschmidt
On Tue, 2006-12-12 at 20:03 -0800, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Wed, 13 Dec 2006 14:12:13 +1100
 
  David, could that be the pause stuff not working properly ?
 
 It could be, but if the tg3 says flow control is up in the kernel logs
 things ought to be OK.

tg3 says

tg3: eth0: Link is up at 1000 Mbps, full duplex.
tg3: eth0: Flow control is on for TX and on for RX.

but sungem says

eth0: Link is up at 1000 Mbps, full-duplex.
eth0: Pause is disabled

Hrm... I suppose I need to dig more. No time to do that today though.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] add init_dummy_netdev() for napi only dummy intefaces

2006-12-11 Thread Benjamin Herrenschmidt
This adds an init_dummy_netdev() function that gets a network device
structure (allocation and lifetime entirely under caller's control) and
initialize the minimum amount of fields so it can be used to schedule
NAPI polls without registering a full blown interface. This is to be
used by drivers that need to tie several hardware interfaces to a single
NAPI poll scheduler due to HW limitations.

It also updates the ibm_emac driver to use that instead of doing it's
own initializations by hand.

Symbol is exported GPL only a I don't think we want binary drivers doing
that sort of acrobatics (if we want them at all).

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

I'm shit at inventing function names so feel free to come up with
something nicer !

Index: linux-cell/drivers/net/ibm_emac/ibm_emac_mal.c
===
--- linux-cell.orig/drivers/net/ibm_emac/ibm_emac_mal.c 2006-12-12 
16:13:27.0 +1100
+++ linux-cell/drivers/net/ibm_emac/ibm_emac_mal.c  2006-12-12 
16:23:21.0 +1100
@@ -426,11 +426,10 @@ static int __init mal_probe(struct ocp_d
mal-def = ocpdev-def;
 
INIT_LIST_HEAD(mal-poll_list);
-   set_bit(__LINK_STATE_START, mal-poll_dev.state);
+   init_dummy_netdev(mal-poll_dev);
mal-poll_dev.weight = CONFIG_IBM_EMAC_POLL_WEIGHT;
mal-poll_dev.poll = mal_poll;
mal-poll_dev.priv = mal;
-   atomic_set(mal-poll_dev.refcnt, 1);
 
INIT_LIST_HEAD(mal-list);
 
Index: linux-cell/include/linux/netdevice.h
===
--- linux-cell.orig/include/linux/netdevice.h   2006-12-12 16:06:24.0 
+1100
+++ linux-cell/include/linux/netdevice.h2006-12-12 16:12:17.0 
+1100
@@ -454,6 +454,7 @@ struct net_device
   NETREG_UNREGISTERING,/* called unregister_netdevice */
   NETREG_UNREGISTERED, /* completed unregister todo */
   NETREG_RELEASED, /* called free_netdev */
+  NETREG_DUMMY,/* dummy device for NAPI poll */
} reg_state;
 
/* Called after device is detached from network. */
@@ -581,6 +582,7 @@ extern int  dev_close(struct net_device 
 extern int dev_queue_xmit(struct sk_buff *skb);
 extern int register_netdevice(struct net_device *dev);
 extern int unregister_netdevice(struct net_device *dev);
+extern int init_dummy_netdev(struct net_device *dev);
 extern voidfree_netdev(struct net_device *dev);
 extern voidsynchronize_net(void);
 extern int register_netdevice_notifier(struct notifier_block *nb);
Index: linux-cell/net/core/dev.c
===
--- linux-cell.orig/net/core/dev.c  2006-12-12 16:01:59.0 +1100
+++ linux-cell/net/core/dev.c   2006-12-12 16:23:23.0 +1100
@@ -3007,6 +3007,42 @@ out_err:
 }
 
 /**
+ * init_dummy_netdev   - init a dummy network device for NAPI
+ * @dev: device to init
+ *
+ * This takes a network device structure and initialize the minimum
+ * amount of fields so it can be used to schedule NAPI polls without
+ * registering a full blown interface. This is to be used by drivers
+ * that need to tie several hardware interfaces to a single NAPI
+ * poll scheduler due to HW limitations.
+ */
+int init_dummy_netdev(struct net_device *dev)
+{
+   /* Clear everything. Note we don't initialize spinlocks
+* are they aren't supposed to be taken by any of the
+* NAPI code and this dummy netdev is supposed to be
+* only ever used for NAPI polls
+*/
+   memset(dev, 0, sizeof(struct net_device));
+
+   /* make sure we BUG if trying to hit standard
+* register/unregister code path
+*/
+   dev-reg_state = NETREG_DUMMY;
+
+   /* initialize the ref count */
+   atomic_set(dev-refcnt, 1);
+
+   /* a dummy interface is started by default */
+   set_bit(__LINK_STATE_PRESENT, dev-state);
+   set_bit(__LINK_STATE_START, dev-state);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(init_dummy_netdev);
+
+
+/**
  * register_netdev - register a network device
  * @dev: device to register
  *


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-12-11 Thread Benjamin Herrenschmidt
On Tue, 2006-12-12 at 06:33 +0100, Eric Lemoine wrote:
 On 12/12/06, David Miller [EMAIL PROTECTED] wrote:
  [...]
  Anyways, Eric your changes look fine as far as I can tell, can you
  give them a really good testing on some SMP boxes?
 
 Unfortunately I can't, I don't have the hardware (only an old ibook here).

I do however, I'll give it a beating on a dual G5 as soon as I get a
chance. I'm pretty swamped at the moment and the box is used by somebody
else today.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/16] Spidernet RX Locking

2006-12-08 Thread Benjamin Herrenschmidt
A spinlock is expensive in the fast path, which is why Jeff says it's
invasive.

 spider_net_decode_one_descr() is called from
 spider_net_poll() (which is the netdev-poll callback)
 and also from spider_net_handle_rxram_full(). 
 
 The rxramfull routine is called from a tasklet that
 is fired off after a RX ram full interrupt is receved.
 This interrupt is generated when the hardware runs out
 of space to store incoming packets. We are seeing this
 interrupt fire when the CPU is heavily loaded, and a
 lot of traffic is being fired at the device.

How often does that interrupt happen in that case ?

A better approach is to keep the fast path (ie. poll()) lockless, and in
handle_rxram_full(), the slow path, protect against poll using
netif_disable_poll(). Though that means using a work queue, not a
tasklet, since it needs to schedule.

  and what other 
  non-sledgehammer approaches were discarded before arriving at this one?
 
 Well, I'm not that good at kernel programming, so I guess
 I did not perceive this as a sledgehammer.  And alternative
 approach is to simply ignore the rxramfull interrupt entirely,
 and depend on poll() do all the work.   I'll try this shortly.

or you can schedule rx work from the rxramfull interrupt after setting a
something bad happened flag. Then, poll can check this flag and do the
right thing.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-07 Thread Benjamin Herrenschmidt
On Thu, 2006-12-07 at 02:22 -0800, Eugene Surovegin wrote:
 On Thu, Dec 07, 2006 at 02:20:10AM -0800, David Miller wrote:
  It also just occured to me that even if you use the dummy device
  approach, it's the dummy device's quota that will be used by the
  generic -poll() downcall into the driver.
 
 Yes, that's true. That's why I made this parameter 
 Konfig-configurable. Yes, this is not as flexible as run-time 
 configuration available for a real netdev, but better than nothing and 
 nobody has complained yet :)

Can we maybe change the dummy device weight as emacs get added to a
MAL ?

Dave, the main point of my initial email was: should we provide a
routine from the net core to initialize such dummy devices properly ?
I'm a little worried that some day, the NAPI code will change and the
initialisations done by the driver will not be enough anymore...

Cheers,
Ben.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt
Hi Dave !

I'd like your advice on something we need to deal with in the EMAC
ethernet driver (an IBM part). The driver is maintainedby Eugene (CC'd),
I'm mostly adding support for some new hardware at this point, which
involves making it SMP safe among other things ;-)

So the problem this driver has is with NAPI polling and its shared DMA
engine. Basically, the DMA engine can be shared between several EMAC
ethernet cells. It has separate channels (and thus separate rings) so
in that area, all is fine... except the interrupt control. The is only
one global interrupt enable/disable bit for packet interrupts, shared by
all the EMACs using that DMA engine cell.

That means complications for NAPI polling as we can't just
disable/enable Rx interrupts as NAPI would normally expect on a
per-device basis.

What Eugene does currently, which seems to me like it's actually the
only proper solution, is to create a separate net_device structure for
the DMA engine and thus have a single NAPI poll  weighting for all the
EMACs sharing a given MAL (MAL is the name of that DMA engine). This
means that Rx from any of the channels schedules the poll, and
interrupts can be properly masked/unmasked globally based on the
presence/absence of work on all the channels.

I'd still like to run it through you, make sure you are ok or see if you
have a better idea as you are way more familiar with the network stack
than I am :-)

My main concern with the approach is purely due to how the additional
net_device is initialized. It's currently allocated as part of some
private data structure in the driver which then manually initializes
some fields that are used by NAPI polling. While it certainly works, I
find the approach a bit fragile (what if the NAPI code internals change
and some initialisations have to be done differently ?)

Thus I was wondering, if you think the approach is sane, wether it would
make sense to provide a low level netif_init_device() thingy that makes
sure a net_device data structure is fully initialized and suitable for
use ? (make sure spinlocks are initialized etc...). Something similar to
the code used to initialize the backlog fake device ?

The driver currently does:

set_bit(__LINK_STATE_START, mal-poll_dev.state);
mal-poll_dev.weight = CONFIG_IBM_EMAC_POLL_WEIGHT;
mal-poll_dev.poll = mal_poll;
mal-poll_dev.priv = mal;
atomic_set(mal-poll_dev.refcnt, 1);

(None of the spinlocks are initialized, but then, the driver was only
ever used on UP only embedded CPUs so far and it's possible that none of
the NAPI code path for which this net_device structure is used are
hitting any spinlock). 

Or do you think that net_device should be fully registered with
register_netdev ? (that would involve a lot more than what is currently
needed/used though).

Cheers,
Ben.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NAPI and shared interrupt control

2006-12-06 Thread Benjamin Herrenschmidt

 What Eugene does currently, which seems to me like it's actually the
 only proper solution, is to create a separate net_device structure for
 the DMA engine and thus have a single NAPI poll  weighting for all the
 EMACs sharing a given MAL (MAL is the name of that DMA engine). This
 means that Rx from any of the channels schedules the poll, and
 interrupts can be properly masked/unmasked globally based on the
 presence/absence of work on all the channels.

Actually, another solution would be to have one of the instances do the
NAPI poll for all of them instead of creating a separate net_device for
the DMA engine...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Broadcom PHY support

2006-12-04 Thread Benjamin Herrenschmidt
On Fri, 2006-09-15 at 16:15 -0400, Amy Fong wrote:
 [PATCH] Add Broadcom PHY support
 
 This patch adds a driver to support the bcm5421s and bcm5461s PHY
 
 Kernel version:  linux-2.6.18-rc6
 
 Signed-off-by: Amy Fong 

Some 5421's need special initialisation (see drivers/net/sungem_phy.c),
might be worth having them there too. I was also wondering... for
spidernet, we need to enable the fiber mode on the PHY. Does phylib has
an API for that ?

I'd like to look into moving sungem and spidernet over to phylib.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add Broadcom PHY support

2006-12-04 Thread Benjamin Herrenschmidt

 I believe that this fiber enabling can be done by defining config_init in the 
 phy_driver struct.
 
 struct phy_driver {
 snip
 /* Called to initialize the PHY,
* including after a reset */
   int (*config_init)(struct phy_device *phydev);
 snip
 };

Well... I don't know for sure... thing is, enabling the fiber mode is a
rather platform specific thing. So it's the MAC driver that knows wether
it wants it on a PHY and should call into the driver.

It's difficult to abstract all possible PHY config options tho... some
MACs might want to enable low power, some don't because they have issues
with it, that sort of thing, though.

Not sure what the best solution is at this point... Maybe an ascii
string to pass the PHY driver is the most flexible, though a bit yucky,
or we try to have a list of all the possible configuration options in
phy.h and people just add new ones that they need as they add support
for them...

Sounds grossly like an in-kernel ioctl tho...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-11-28 Thread Benjamin Herrenschmidt
On Tue, 2006-11-28 at 15:43 -0800, David Miller wrote:

 At least in theory the atomic + any necessary memory barriers
 would be cheaper than the extra PIO read we need otherwise.

Yes, IO reads are generally the worst case scenarios even on machines
with fairly slow locks.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: bcm43xx-d80211: fix hwcrypto issues (mcast)

2006-11-17 Thread Benjamin Herrenschmidt
On Fri, 2006-11-17 at 08:53 +0100, Johannes Berg wrote:
 On Fri, 2006-11-17 at 18:46 +1100, Benjamin Herrenschmidt wrote:
 
  So what is the solution for Apple machines owner who only get a v3
  firmware from Apple ? I remember you telling me the answer on irc but I
  wanted to make it public :-) Some web site we can d/l the windows
  updates and extract the FW ?
 
 Yes, fwcutter comes with a huge list of URLs for both firmware versions
 (I suppose they'll remove the v3 ones now...)

Well, the latest released version (fwcutter-005) contains a huge list
of ... v3 URLs :-) Only 2 v4 in there. I'll check SVN.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: bcm43xx-d80211: fix hwcrypto issues (mcast)

2006-11-16 Thread Benjamin Herrenschmidt
On Thu, 2006-11-16 at 15:07 +0100, Michael Buesch wrote:
 This fixes various bcm43xx-d80211 hwcrypto issues,
 which mainly prevented mcast frames from being decrypted properly.
 
 This is mostly a rewrite of the key managing code.
 Note that after this patch v3 firmware is no longer supported.

So what is the solution for Apple machines owner who only get a v3
firmware from Apple ? I remember you telling me the answer on irc but I
wanted to make it public :-) Some web site we can d/l the windows
updates and extract the FW ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch sungem] improved locking

2006-11-10 Thread Benjamin Herrenschmidt

 Yes, the bit 4 isn't used, but I assumed clearing it shouldn't be prolem.

Always leave reserved bits alone ... I agree it's very unlikely in this
case that it could cause a problem but I've seen stranger things

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt

 This seems a bit ugly.  Could you add
 
   #define readq readq
 
 to your platform instead?

That's ugly too imho but I suppose I can do it :-)

 I generally think it's a bug in the kernel-wide API, if use of said API 
 requires arch-specific ifdefs.

Yes. I agree. In that specific case, I suppose what you propose is the
least ugly of the solutions. HAVE_ARCH_* is pretty much out of fascion
(and I tend to agree with Linus that it's not pretty anyway).

Actually, I tend to think in that specific case that the driver defining
something called readq and writeq based on a pair of readl's and
writel's is fairly bogus though.

 Or maybe the problem could be solved another way, by guaranteeing that a 
 good enough for drivers readq() and writeq() exist on all platforms, 
 even 32-bit platforms where the operation isn't inherently atomic.

I'd rather not provide readq/writeq if they aren't atomic.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt
 This is why I said good enough for drivers.  This is _key_.
 
 I have run into several [PCI] devices with 64-bit registers, and 
 __none__ of them had requirements such that the Linux platform code 
 -must- provide an atomic readq/writeq.  Probably because everybody wants 
 to support 32-bit platforms with their devices.
 
 What you call fairly bogus is precisely what drivers need.  These 
 devices with 64-bit registers just don't need the atomicity that arch 
 developers harp about :)

Is there any consistency in that case in which half need to be
read/written first ? Or none of these ever had side effects ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt
On Mon, 2006-11-06 at 01:37 -0800, Linus Torvalds wrote:
 
 On Mon, 6 Nov 2006, Jeff Garzik wrote:
 
  This seems a bit ugly.  Could you add
  
  #define readq readq
  
  to your platform instead?
 
 Heartily agreed. MUCH better than adding unrelated #if defined() stuff, 
 whether arch-related or otherwise.

I agree it's less ugly, though I still don't like it much :-)

Anyway, what do you think of Jeff proposal to just implement them as two
32 bits operations ? My arch guy side screams at the idea, but if,
indeed, drivers generally cope fine with it, I suppose that's ok.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt
On Mon, 2006-11-06 at 01:50 -0800, Linus Torvalds wrote:
 
 On Mon, 6 Nov 2006, Benjamin Herrenschmidt wrote:
 
  Anyway, what do you think of Jeff proposal to just implement them as two
  32 bits operations ? My arch guy side screams at the idea, but if,
  indeed, drivers generally cope fine with it, I suppose that's ok.
 
 Last I saw, that's how normal PCI will split the IO anyway, so I guess it 
 makes sense.

Hrm.. true indeed. I'll implement them that way for ppc32 then.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt
On Mon, 2006-11-06 at 04:55 -0500, Jeff Garzik wrote:
 Benjamin Herrenschmidt wrote:
  On Mon, 2006-11-06 at 01:50 -0800, Linus Torvalds wrote:
  On Mon, 6 Nov 2006, Benjamin Herrenschmidt wrote:
  Anyway, what do you think of Jeff proposal to just implement them as two
  32 bits operations ? My arch guy side screams at the idea, but if,
  indeed, drivers generally cope fine with it, I suppose that's ok.
  Last I saw, that's how normal PCI will split the IO anyway, so I guess it 
  makes sense.
  
  Hrm.. true indeed. I'll implement them that way for ppc32 then.
 
 Bonus points if you want to find-and-kill where individual drivers did
 
   #ifndef readq
   implement readq and writeq by hand...
   #endif

Yes, well, we would have to make sure all archs have them defined
first though, but I suppose I can have a look later this week, maybe
tomorrow. Shouldn't be too hard :)

Ben.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] s2io ppc64 fix for readq/writeq

2006-11-06 Thread Benjamin Herrenschmidt

 Generally the kernel code should write the two 32-bit chunks to the 
 memory-mapped region in order (low dword first), and let things take 
 care of themselves from there.
 
 That's pretty much the implementation that -every- driver copies, when 
 they need readq/writeq to work on a 32-bit platform.

What do you mean by low dword first ? For example, the implementation
in the s2io driver does:

static inline u64 readq(void __iomem *addr)
{
u64 ret = 0;
ret = readl(addr + 4);
ret = 32;
ret |= readl(addr);

return ret;
}

static inline void writeq(u64 val, void __iomem *addr)
{
writel((u32) (val), addr);
writel((u32) (val  32), (addr + 4));
}

As you can see, it reads the -second- dword first (high order dword in
little endian), but writes the first dword first (low order dword in
little endian).

If there is any logic here, it's card specific.

Or is this really what PCI does when doing 64 bits accesses on a 32 bits
PCI bus ? I would have expected the later (what write does) but this
driver does it reverse on reads.

I'm tempted to go to the simple

#define readq readq for now until we clear that up.
 
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [sungem] proposal for a new locking strategy

2006-11-05 Thread Benjamin Herrenschmidt
On Sun, 2006-11-05 at 14:00 +0100, Eric Lemoine wrote:
 Hi!
 
 Some (long) time ago benh wrote a blaming comment in sungem.c about
 that driver's locking strategy. That comment basically says that we
 probably don't need two spinlocks.

Yeah :) Note that I mostly blamed myself there ... Just never found the
time to sit down a figure out a proper locking.

 I agree!
 
 Proposal:
 
 Today's sungem effectively uses two spinlock's: lock and tx_lock.
 
 tx_lock is held by the xmit function when sending out a packet. Lots
 of functions grab tx_lock not to mess up with xmit (gem_stop_phy(),
 gem_change_mtu(), etc.).
 
 All of these funcs also take lock!
 
 What we could do is remove lx_lock, have the above functions take
 only lock, and rely on dev-_xmit_lock to protect the xmit func from
 reentrance. In that case, obviously, the driver wouldn't feature LLTX
 anymore.

We could probably do even better but yeah, a single lock is a good
start. Overall, I'm unhappy with the infrastructure provided by the
network stack though. (Might be better nowadays, but last I looked, for
example, I couldn't properly do things like stopping  MAPI poll from
set_multicast etc... due to locks held by the upper level).

 When (re-)configuring we'd now quiesce the device, with the new
 functions gem_netif_stop() and gem_full_lock(), in the same way as tg3
 does.
 
 gem_interrupt(), gem_poll(), and gem_start_xmit() could become lockless. Fast!
 
 Basically this proposal makes the data path faster, the control path
 slower, and simplifies the code by using one single spinlock within
 the driver.
 
 If the idea seems reasonable to you guys I can go ahead and cook up 
 something...

I certainly does. Bring on the patch ! :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] s2io ppc64 fix for readq/writeq

2006-11-05 Thread Benjamin Herrenschmidt
The s2io driver is redefining it's own readq/writeq based on
readl/writel when the platform doesn't provide native ones. However, it
currently does so by testing #ifndef readq. While that works for now, we
are about to change ppc64 to use inline functions rather that macros for
all those IO accessors which will break that test. This fixes it. I
don't have anything less ugly at hand unfortunately.

Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
---

The patch changing ppc64 own definition is scheduled to go in 2.6.20 when
the merge window opens, so it would be nice if this patch could go in a
similar timeframe, provided that you agree with it of course. It can go
earlier as it won't break current ppc64.

Index: linux-cell/drivers/net/s2io.h
===
--- linux-cell.orig/drivers/net/s2io.h  2006-10-13 17:23:49.0 +1000
+++ linux-cell/drivers/net/s2io.h   2006-11-06 13:19:32.0 +1100
@@ -862,8 +862,10 @@ struct s2io_nic {
 #define RESET_ERROR 1;
 #define CMD_ERROR   2;
 
-/*  OS related system calls */
-#ifndef readq
+/* OS related system calls. Note that ppc64 has readq defined as
+ * an inline, not a macro
+ */
+#if !defined(CONFIG_PPC64)  !defined(readq)
 static inline u64 readq(void __iomem *addr)
 {
u64 ret = 0;
@@ -875,7 +877,7 @@ static inline u64 readq(void __iomem *ad
 }
 #endif
 
-#ifndef writeq
+#if !defined(CONFIG_PPC64)  !defined(writeq)
 static inline void writeq(u64 val, void __iomem *addr)
 {
writel((u32) (val), addr);



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bcm43xx: Drain TX status before starting IRQs

2006-10-19 Thread Benjamin Herrenschmidt
On Thu, 2006-10-19 at 17:29 +0200, Michael Buesch wrote:
 Drain the Microcode TX-status-FIFO before we enable IRQs.
 This is required, because the FIFO may still have entries left
 from a previous run. Those would immediately fire after enabling
 IRQs and would lead to an oops in the DMA TXstatus handling code.
 
 Signed-off-by: Michael Buesch [EMAIL PROTECTED]

Great, thanks. Note that I haven't yet hit the bug since I updated the
firmware, so it could be a mix of firmware and that problem, though it's
not been long enough to be a clear results.

In any case, that patch should go in.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] remove TxStartThresh and RxEarlyThresh

2006-10-15 Thread Benjamin Herrenschmidt
On Sat, 2006-10-14 at 15:55 -0400, Jesse Huang wrote:
 From: Jesse Huang [EMAIL PROTECTED]
 
 Change Logs:
 For patent issue need to remove TxStartThresh and RxEarlyThresh. This patent 
 is cut-through patent. If use this function, Tx will start to transmit after 
 few data be move in to Tx FIFO. We are not allow to use those function in 
 DFE530/DFE550/DFE580/DL10050/IP100/IP100A. It will decrease a little 
 performance.

Somebody patented FIFO thresholds ? Gack ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Benjamin Herrenschmidt
On Tue, 2006-10-10 at 18:20 -0500, jschopp wrote:
 Linas Vepstas wrote:
  The current driver code performs 512 DMA mappns of a bunch of 
  32-byte structures. This is silly, as they are all in contiguous 
  memory. Ths patch changes the code to DMA map the entie area
  with just one call.
  
  Signed-off-by: Linas Vepstas [EMAIL PROTECTED]
  Cc: James K Lewis [EMAIL PROTECTED]
  Cc: Arnd Bergmann [EMAIL PROTECTED]
 
 The others look good, but this one complicates the code and doesn't have any 
 benefit.  20 
 for 21 isn't bad.

Hi Joel !

I'm not sure what you mean here (especially your 20 to 21 comment).

The patch looks perfectly fine to me, and in fact removes more lines of
code than it adds :)

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 21/21]: powerpc/cell spidernet DMA coalescing

2006-10-11 Thread Benjamin Herrenschmidt

 I started writingthe patch thinking it will have some huge effect on
 performance, based on a false assumption on how i/o was done on this
 machine
 
 *If* this were another pSeries system, then each call to 
 pci_map_single() chews up an actual hardware translation 
 control entry (TCE) that maps pci bus addresses into 
 system RAM addresses. These are somewhat limited resources,
 and so one shouldn't squander them.  Furthermore, I thouhght
 TCE's have TLB's associated with them (similar to how virtual
 memory page tables are backed by hardware page TLB's), of which 
 there are even less of. I was thinking that TLB thrashing would 
 have a big hit on performance. 
 
 Turns out that there was no difference to performance at all, 
 and a quick look at cell_map_single() in arch/powerpc/platforms/cell
 made it clear why: there's no fancy i/o address mapping.
 
 Thus, the patch has only mrginal benefit; I submit it only in the 
 name of its the right thing to do anyway.

Well, there is no fancy iommu mapping ... yet.

It's been implemented and is coming after we put together some
workarounds for various other spider hardware issues that trigger when
using it (bogus prefetches and bogus pci ordering).

I think the hypervisor based platforms will be happy with that patch
too.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] softmac: Fix WX and association related races

2006-09-27 Thread Benjamin Herrenschmidt
On Wed, 2006-09-27 at 17:26 +0200, Michael Buesch wrote:
 This fixes some race conditions in the WirelessExtension
 handling and association handling code.

Unlike the previous patch, this one doesn't apply on top of 2.6.18
(which I'm using as a basis for testing, along with Larry big bcm43xx
patch).

Ben.

 Signed-off-by: Michael Buesch [EMAIL PROTECTED]
 
 ---
 
 Hi John,
 
 Please apply this fo wireless-2.6 and push it with your
 next push of bugfixes.
 There are other theoretical raceconditions possible in softmac
 related to scanning and assoc code. But I don't really want to fix
 these, as I think they:
 1) are not triggerable in real world conditions.
 2) are not exploitable to do some bad attacks, etc...
 3) require _huge_ changes to softmac, and I don't really
want to spend more effort into softmac any longer.
 This patch fixes some real-world exploitable bugs, so better
 fix them to have working wireless until d80211 is ready for
 prime-time.
 
 btw: d80211 might have similiar race conditions like this.
 
 Larry, once this patch settled a little bit and enough people
 tested it, you might want to try pushing it into -stable (although
 it's big...)
 
 
 Index: wireless-2.6/include/net/ieee80211softmac.h
 ===
 --- wireless-2.6.orig/include/net/ieee80211softmac.h  2006-09-25 
 17:02:08.0 +0200
 +++ wireless-2.6/include/net/ieee80211softmac.h   2006-09-27 
 16:39:17.0 +0200
 @@ -63,13 +63,11 @@ struct ieee80211softmac_wpa {
  
  /*
   * Information about association
 - *
 - * Do we need a lock for this?
 - * We only ever use this structure inlined
 - * into our global struct. I've used its lock,
 - * but maybe we need a local one here?
   */
  struct ieee80211softmac_assoc_info {
 +
 + struct mutex mutex;
 +
   /*
* This is the requested ESSID. It is written
* only by the WX handlers.
 @@ -99,12 +97,13 @@ struct ieee80211softmac_assoc_info {
*
* bssfixed is used for SIOCSIWAP.
*/
 - u8 static_essid:1,
 -short_preamble_available:1,
 -associating:1,
 -assoc_wait:1,
 -bssvalid:1,
 -bssfixed:1;
 + u8 static_essid;
 + u8 short_preamble_available;
 + u8 associating;
 + u8 associated;
 + u8 assoc_wait;
 + u8 bssvalid;
 + u8 bssfixed;
  
   /* Scan retries remaining */
   int scan_retry;
 @@ -229,12 +228,10 @@ struct ieee80211softmac_device {
   /* private stuff follows */
   /* this lock protects this structure */
   spinlock_t lock;
 - 
 - /* couple of flags */
 - u8 scanning:1, /* protects scanning from being done multiple times at 
 once */
 -associated:1,
 -running:1;
 - 
 +
 + u8 running; /* SoftMAC started? */
 + u8 scanning;
 +
   struct ieee80211softmac_scaninfo *scaninfo;
   struct ieee80211softmac_assoc_info associnfo;
   struct ieee80211softmac_bss_info bssinfo;
 @@ -250,7 +247,7 @@ struct ieee80211softmac_device {
  
   /* we need to keep a list of network structs we copied */
   struct list_head network_list;
 - 
 +
   /* This must be the last item so that it points to the data
* allocated beyond this structure by alloc_ieee80211 */
   u8 priv[0];
 @@ -295,7 +292,7 @@ static inline u8 ieee80211softmac_sugges
  {
   struct ieee80211softmac_txrates *txrates = mac-txrates;
  
 - if (!mac-associated)
 + if (!mac-associnfo.associated)
   return txrates-mgt_mcast_rate;
  
   /* We are associated, sending unicast frame */
 Index: wireless-2.6/net/ieee80211/softmac/ieee80211softmac_assoc.c
 ===
 --- wireless-2.6.orig/net/ieee80211/softmac/ieee80211softmac_assoc.c  
 2006-09-17 15:21:53.0 +0200
 +++ wireless-2.6/net/ieee80211/softmac/ieee80211softmac_assoc.c   
 2006-09-27 17:10:08.0 +0200
 @@ -48,7 +48,7 @@ ieee80211softmac_assoc(struct ieee80211s
   dprintk(KERN_INFO PFX sent association request!\n);
  
   spin_lock_irqsave(mac-lock, flags);
 - mac-associated = 0; /* just to make sure */
 + mac-associnfo.associated = 0; /* just to make sure */
  
   /* Set a timer for timeout */
   /* FIXME: make timeout configurable */
 @@ -62,24 +62,22 @@ ieee80211softmac_assoc_timeout(void *d)
  {
   struct ieee80211softmac_device *mac = (struct ieee80211softmac_device 
 *)d;
   struct ieee80211softmac_network *n;
 - unsigned long flags;
  
 - spin_lock_irqsave(mac-lock, flags);
 + mutex_lock(mac-associnfo.mutex);
   /* we might race against ieee80211softmac_handle_assoc_response,
* so make sure only one of us does something */
 - if (!mac-associnfo.associating) {
 - spin_unlock_irqrestore(mac-lock, flags);
 - return;
 - }
 + if (!mac-associnfo.associating)
 + goto out;
   

Re: [PATCH] softmac: Fix WX and association related races

2006-09-27 Thread Benjamin Herrenschmidt
On Wed, 2006-09-27 at 19:50 +0200, Michael Buesch wrote:
 On Wednesday 27 September 2006 18:18, Larry Finger wrote:
  Michael Buesch wrote:
   This fixes some race conditions in the WirelessExtension
   handling and association handling code.
   
   Signed-off-by: Michael Buesch [EMAIL PROTECTED]
   
   ---
  
  This patch doesn't apply.
 
 Oh, linville merged stuff on the 25th. That's the day I updated
 my tree to do this patch. But seems like I did it just before
 the merge.
 Who could suspect that linville merges something. :D
 *me runs away*
 
 Anyway. Here's an updated patch.

Which still doesn't apply to 2.6.18.

I won't try some random other git tree to test things, it's simply not
feasible for me and we need something we can give to distros to backport
so they have something remotely stable (I'm thinking for example about
the upcoming ubuntu edgy which is coming out soon an a 2.6.17 base).

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] softmac: Fix WX and association related races

2006-09-27 Thread Benjamin Herrenschmidt
On Wed, 2006-09-27 at 19:43 -0500, Larry Finger wrote:
 Benjamin Herrenschmidt wrote:
  
  I won't try some random other git tree to test things, it's simply not
  feasible for me and we need something we can give to distros to backport
  so they have something remotely stable (I'm thinking for example about
  the upcoming ubuntu edgy which is coming out soon an a 2.6.17 base).
 
 I don't know what your problem is! Despite your negative attitude, I have 
 tried to accommodate you, 
 but obviously you do not appreciate my efforts. My big patch for 2.6.18 
 concentrated on bcm43xx. Yes 
 there were changes in softmac, but they had no major effect until Michael's 
 fix of the race condition.
 
 The bottom line is that I have had it with you! You had best go back to some 
 mainline kernel as you 
 will get no further help from me. There are lots of folks that need help 
 _AND_ are appreciative of 
 that help. My time will be spent on them.

Maybe because I'm a hot blooded asshole :)

Note that I wasn't complaining about your patch (which is a bit huge tho
for a stable release, but it did make a difference here along with
Michael debug one). I just noticed that none of Michael last final
patches would apply on 2.6.18 + your patch, and went into my usual rant
in case they were supposed to apply to the wireless tree.

There is a number of reasons why people can't just use a wireless-git or
whatever else tree, and having something against mainline is the only
way I see to get something to distros now, which is the way to fix the
most people problems rather than only the ones finding this mailing
list.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-25 Thread Benjamin Herrenschmidt

 yes. Many locking issues in d80211 to still sort out. Basically, there
 are next to no useful locks in it and most data structures are not
 protected at all.

Doh ! Scary... locking is hard ... if the stuff was written without
locking in mind in the first place, it might end up being a nightmare...

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.19-rc1] ehea firmware interface based on Anton Blanchard's new hvcall interface

2006-09-25 Thread Benjamin Herrenschmidt
On Mon, 2006-09-25 at 20:07 -0400, Jeff Garzik wrote:
 patch does not apply.
 
 also, it would seem like varargs would be appropriate here.

Not really... these are hypervisor calls, their calling convention is
not varargs, thus we would need some conversion layer if using them,
with possible performance loss (it's also hard to do right as we are
passing args by registers, not stack).

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-24 Thread Benjamin Herrenschmidt

 prism54 fullmac, right?

Yes.

 Try using -Dwext; the prism54 wpa_supplicant driver is a dead-end and I
 added WE-19 commands to it a bit ago anyway.  Oddly enough, I couldn't
 seem to get the driver to work reliably for me using straight WEP
 either, let alone WPA.  It's pretty unmaintained at the moment.

Ok, well, I tried with wext but iirc, I got rejected ioctls' from
wpa_supplicant. I'll try again later. Thanks.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-24 Thread Benjamin Herrenschmidt
On Sat, 2006-09-23 at 23:45 -0400, Daniel Drake wrote:
 Benjamin Herrenschmidt wrote:
  Oh and I don't care about it works in dscape stack sort of crap I 
  regulary get. I want something that
  works with upstream kernels. That isn't that much to ask... or is it ?
 
 wpa_supplicant triggers races in softmac relatively easily, which are 
 hard to fix properly. At least for me, motivation to work on this stuff 
 is low given the potentially impending merge of devicescape, and every 
 time I do spend some time investigating I just get even more frustrated 
 at how difficult WE is to implement *properly* for non-hardmac 
 drivers. We really have a need for a configuration system designed 
 around 802.11.
 
 I agree, the stuff in mainline should be fixed, but at least personally 
 I am finding it harder and harder to justify working on softmac.

So what are the chances of getting this dscape stack merged, let's
say... for 2.6.19 ? Or we'll get yet another full release with barely
working wireless ?

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-24 Thread Benjamin Herrenschmidt
  So what are the chances of getting this dscape stack merged, let's
  say... for 2.6.19 ? Or we'll get yet another full release with barely
  working wireless ?
 
 I don't think it's possible to happen for 2.6.19.
 Ealiest 2.6.20 but likely one or two releases later (me thinks).

Which means we'll still have nothing near decent wireless for what ... 6
monthes min ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-24 Thread Benjamin Herrenschmidt
On Sun, 2006-09-24 at 10:43 +0200, Michael Buesch wrote:
 On Sunday 24 September 2006 10:12, Benjamin Herrenschmidt wrote:
So what are the chances of getting this dscape stack merged, let's
say... for 2.6.19 ? Or we'll get yet another full release with barely
working wireless ?
   
   I don't think it's possible to happen for 2.6.19.
   Ealiest 2.6.20 but likely one or two releases later (me thinks).
  
  Which means we'll still have nothing near decent wireless for what ... 6
  monthes min ?
 
 Well. Works For Me (tm).
 If there is some bug for you in current mainline, it needs to
 be fixed. But I can't fix something I am not able to reproduce and
 don't know what happens.

Well, there is definitely an issue with softmac going nuts, possibly
related to the chip stopping to transmit, I'm not certain at this point.
It -seems- to be better with Larry's latest patch (the big one) so we'll
see. Then, there is a problem when that happens an d your rmmod,
something kicks in after the driver is gone and crashes the kernel... If
I ever get into the situation where I suspect the rmmod will trigger
that again, I'll try to get into a text VT first so I can capture an
oops...

Ben.
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bcm43xx driver unstable behaviour (and linux wireless is junk btw)

2006-09-23 Thread Benjamin Herrenschmidt
Hi folks !

So this is 2.6.18 + Larry fix (though I've seen this problem before,
it seems using WPA just make it happen more often).

This is also a 4318, so the link is pretty weak due to the Tx Power
problem and I suspects it makes the driver problems more visible...

So basically, I lose the link every few minutes for a minute or so, I
suspect it's related to wpa_supplicant vs. the ack losses due to the
4318 Tx Power problems. That alone would be ok though, if the driver
wasn't totally stuck after a while. (Similar problem to after
sleep/wakeup, looks like nothign goes through).

When it goes bunk, it looks like that in the logs:

Sep 24 12:24:18 localhost kernel: [  285.686826] SoftMAC: Sent Authentication 
Request to 00:0f:66:52:4b:60.
Sep 24 12:24:18 localhost kernel: [  285.686976] SoftMAC: generic IE set to 

Sep 24 12:24:18 localhost kernel: [  285.686999] SoftMAC: Already associating 
or associated to 00:0f:66:52:4b:60
Sep 24 12:24:28 localhost kernel: [  295.687229] SoftMAC: Start scanning with 
channel: 1
Sep 24 12:24:28 localhost kernel: [  295.687240] SoftMAC: Scanning 14 channels
Sep 24 12:24:29 localhost kernel: [  296.027053] SoftMAC: Scanning finished
Sep 24 12:24:29 localhost kernel: [  296.035267] SoftMAC: generic IE set to 

Sep 24 12:24:29 localhost kernel: [  296.035310] SoftMAC: Already associating 
or associated to 00:0f:66:52:4b:60
Sep 24 12:24:31 localhost kernel: [  297.690969] SoftMAC: Sent Authentication 
Request to 00:0f:66:52:4b:60.
Sep 24 12:24:39 localhost kernel: [  306.039210] SoftMAC: Start scanning with 
channel: 1
Sep 24 12:24:39 localhost kernel: [  306.039222] SoftMAC: Scanning 14 channels
Sep 24 12:24:39 localhost kernel: [  306.375046] SoftMAC: Scanning finished
Sep 24 12:24:39 localhost kernel: [  306.383018] SoftMAC: generic IE set to 

Sep 24 12:24:39 localhost kernel: [  306.383075] SoftMAC: Already associating 
or associated to 00:0f:66:52:4b:60
Sep 24 12:24:42 localhost kernel: [  309.695021] SoftMAC: Sent Authentication 
Request to 00:0f:66:52:4b:60.
Sep 24 12:24:49 localhost kernel: [  316.387211] SoftMAC: Start scanning with 
channel: 1

 etc...

Then, if you rmmod, you get back a prompt, and about a second later, the kernel 
blows up. At this point,
I've always been in X and it's too dead to dump anything into the disk logs so 
I don't know what
the precise crash is, but it looks to me like the driver is not properly 
removing some timer
or something there.

Note that it also goes bunk on sleep/wakeup, and sometimes ifdown/ifup... in 
general, it's fragile and
just 'loses it' in which case the only way to get it back is to rmmod/insmod.

Doesn't help me to have my prism54 not working with WPA (apparently, the driver 
looks like it handles hostap
ioctls but it doesn't agree on the ioctl numbers, among others, with whatever 
wpa_supplicant sends when
configured to wpa mode... somebody knows if that driver is maintained ?)

So at this point I have a choice between two wireless devices that don't work 
(and none of them is less than
a couple years old). Looks like the linux wireless situation isn't getting any 
better since last KS.

Oh and I don't care about it works in dscape stack sort of crap I regulary 
get. I want something that
works with upstream kernels. That isn't that much to ask... or is it ?

Ben, back to ethernet cables.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Benjamin Herrenschmidt

 However, I presume someone added the  __powerpc__ define here
 because they picked up a 3c509 at a garage sale, stuck it in 
 a powerpc, found out it didn't work due to a byte-swapping bug,
 and then patched it as above. I'm disturbed that somehow 
 outsl_ns() became identical to outsl() at some point, presumably
 breaking this patch.

The problem is that somebody had the bright idea to implement ppc
outsl as byteswapping a lng time ago. This was totally bogus and got
fixed, but it looks like this driver holds a remain of that period.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Benjamin Herrenschmidt
On Tue, 2006-09-19 at 20:52 +0200, Matt Sealey wrote:
 Some northbridges and PCI bridges have clever byteswapping in 
 hardware, maybe this is just an effect of that. In theory depending on 
 the host bridge, you should pass in big endian data and have it swap or 
 not swap, not pick that way in the driver, UNLESS your driver expects 
 bigendian data, in which case on a bigendian platform you can tell it to 
 write without swapping. Voila, two functions.

It's generally considered pretty bad for a northbridge to try to muck
around with byte order. There are fairly well defined rules to plug a
little endian bus (PCI, ISA, ...) on a big endian machine.

The trick that some people didn't get a while ago is that while
accessors like inw/inl shall return a byteswapped data, string
operations like in insw/insl who are copying from a fifo basically to
memory (and opposite write versions) shall -not- byteswap since the data
isn't interpreted. it's a byte stream. It doesn't have any endian
semantic associated to it until it's actually read back from memory in
which case the appropriate endian swap (if any) has to be used depending
on the endianness and size of a given field read/written.

Since some people didn't get it, in the early days, some BE
architectures like ppc had versions of insw/insl that did byteswap,
which was wrong. The bits in this driver are remains from that era.

Note that to aggravate the problem, it still happens that HW engineers
try to be smart when hooking a 16 bits or 32 bits FIFO to a BE machine
and byteswap it in hardware. This is of course totally bogus but did
happen with IDE controllers typically (I think atari or amiga has one of
these, the Tivo is like that too, and a bunch of embedded  things). The
net result is that you have to pump the data fifo using a byteswapping
accessor and you cannot use DMA unless you DMA controller can re-swap
the other way around But lots of HW people still don't get it :) 

 However the existance of these PCI bridges these days? I haven't seen 
 one in years, and when I have nobody has ever enabled the magic swappy 
 thing as it's unreliable and can't always tell how you present the data.
 
 One wishes that there was a ntoh and hton style macro in standard use 
 for PCI access.. hang on though that jsut wouldn't work would it.

Nah. We have the basic rule that readl/writel are little endian. PowerPC
additionally provides arch specific low level in_{be,le}32 type
accessors with explicit endianness. Or you can also use
cpu_to_le32/le32_to_cpu kind of macros to convert between native and
explicit endianness.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Remove powerpc specific parts of 3c509 driver

2006-09-19 Thread Benjamin Herrenschmidt
On Wed, 2006-09-20 at 02:21 +0200, Segher Boessenkool wrote:
  Nah. We have the basic rule that readl/writel are little endian.  
  PowerPC
  additionally provides arch specific low level in_{be,le}32 type
  accessors with explicit endianness. Or you can also use
  cpu_to_le32/le32_to_cpu kind of macros to convert between native and
  explicit endianness.
 
 Sure, PCI busses are little-endian.  But is readX()/writeX() for PCI
 only?  I sure hope not.

It's defined for PCI and possibly ISA memory. You can use it for other
things if you whish to, but other things are arch specific in any
case.

 It would make a lot more sense if readX()/writeX() used the endianness
 of the bus they are performed on.

No way ! Again, it's evil if such a simple thing start doing different
things depending on random external factors.

Different bus - different accessor.

We defined on PowerPC that readl was fine for anything that comes out of
ioremap and is little endian, but that's also why you have the explicit
{in,out}_{le,be}{16,32}. That's what you should use in fact with non-PCI
busses unless you know you are LE.

 PowerPC byteswaps are cheap -- for 16- and 32-bit accesses.  They're quite 
 bad for 64-bit though; it would
 be a pity to end up doing two of those for a 64-bit big-endian I/O  
 access
 (one on the access itself, one to convert the data back to CPU order).
 
 This would happily solve the problem of the various variations of
 byte-swapping bus bridges, too (natural swap, 32-bit swap, 64-bit  
 swap,
 perhaps others that I thankfully have never seen or cannot remember).
 
 Now you can say, use readl_be() or something similar, but that's a)  
 ugly,
 b) error-prone, c) exponential interface explosion, d) ugly.

I'd rather has an interface explosion than having black endian magic
happening inside of the accessors.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-11 Thread Benjamin Herrenschmidt
Ok, it seems like we might have more than just the missing barrier in
TG3. Possibly some IOMMU problems on some machines as well.
Unfortunately, I don't have a tg3 on a PCI-X or PCI-E card to test on a
pSeries or some other machine.

[Olof: I've disabled the new U4 DART invalidate code (reverted to the
old one) and added an unconditional barrier to dart_flush and I yet have
to reproduce the problem. I suspect a problem with the DART invalidate
one thingy, maybe a HW problem with the U4 chip. Now regarding the
barrier in flush, we'll talk about it later, I think we might have a
problem with the way we do the DART accesses (they might leak out of the
lock) though I yet have to see that cause a problem in practice due to
the round-robin nature of our allocation algorithm.]

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-10 Thread Benjamin Herrenschmidt
On Sun, 2006-09-10 at 22:33 -0700, Michael Chan wrote:
 Benjamin Herrenschmidt wrote:
 
  I've done:
  
  #define tw32_rx_mbox(reg, val)  do { wmb();
 tp-write32_rx_mbox(tp, reg, val); } while(0)
  #define tw32_tx_mbox(reg, val)  do { wmb();
 tp-write32_tx_mbox(tp, reg, val); } while(0)
  
 
 That should do it.
 
 I think we need those tcpdump after all.  Can you send it to me?

Looks like adding a sync to writel does fix it though... I'm trying to
figure out which specific writel in the driver makes a difference. I'll
then look into slicing those tcpdumps.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] FRV: do_gettimeofday() should no longer use tickadj

2006-09-10 Thread Benjamin Herrenschmidt

 But funky cascading using chained flow handlers doesn't work if the cascade
 must share an IRQ with some other device, right?

Indeed. Best way there is then to have a normal action handler like you
do and have it call generic_handle_irq() on the cascaded interrupts.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-10 Thread Benjamin Herrenschmidt

 Oh, we know about this.  The powerpc writel() used to have memory
 barriers in 2.4 kernels but not any more in 2.6 kernels.  Red Hat's
 version of tg3 has extra wmb()'s to fix this problem.  David doesn't
 think that the upstream version of tg3 should have these wmb()'s, and
 the problem should instead be fixed in powerpc's writel().

I've added a wmb() in tw32_rx_mbox() and tw32_tx_mbox() and can still
reproduce the problem. I've also done a 2 days run without TSO enabled
without a failure (my test program normally fails after a couple of
minutes).

Thus, do you see any other code path in the driver where a
synchronisation might be missing ? Is there any case where the chip
might use data in memory before it has been told to do so  with a
mailbox write ? (There are no OWN bits that I can see in the
descriptors, thus I doubt it will use a transmit descriptor that is
half-built before the store to the mailbox allows using it) but who
knows

That leads to the question that there might be an unrelated bug in the
driver. Segher thinks we might be overriding live descriptors, though
I haven't seen how yet. It seems to be TSO specific tho... maybe some
missing smp synchronisation in the driver itself or a problem when the
TX ring is full ?

I don't have the chip docs and I'm not familiar with the driver, so I'll
keep looking, but advice is welcome. I'll also see if I can reproduce
with some other TSO capable card, in case the problem is in the kernel
TSO code and not in the driver.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-10 Thread Benjamin Herrenschmidt
On Sun, 2006-09-10 at 22:18 -0700, Michael Chan wrote:
 Benjamin Herrenschmidt wrote:
 
  I've added a wmb() in tw32_rx_mbox() and tw32_tx_mbox() and can still
  reproduce the problem. I've also done a 2 days run without TSO enabled
  without a failure (my test program normally fails after a couple of
  minutes).
  
 
 Hi Ben,
 
 The code is a bit tricky.  It uses function pointers for the various
 register read/write methods.  For the 5780, I believe it will be
 assigned a simple writel() and not tg3_write32_tx_mbox().  Can you
 double check to make sure you have actually added the wmb()?
 
 It's probably easiest to just add the wmb() in tg3_xmit_dma_bug()
 before the tw32_tx_mbox().

I've done:

#define tw32_rx_mbox(reg, val)  do { wmb(); tp-write32_rx_mbox(tp, reg, val); 
} while(0)
#define tw32_tx_mbox(reg, val)  do { wmb(); tp-write32_tx_mbox(tp, reg, val); 
} while(0)

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TG3 data corruption (TSO ?)

2006-09-09 Thread Benjamin Herrenschmidt
On Sat, 2006-09-09 at 02:22 -0700, David Miller wrote:
 From: Benjamin Herrenschmidt [EMAIL PROTECTED]
 Date: Sat, 09 Sep 2006 07:46:02 +1000
 
  I don't think that in general, you have ordering guarantees between
  cacheable and non-cacheable stores unless you use explicit barriers.
 
 In fact, on most systems you absolutely do have ordering between
 MMIO and memory accesses.

Well, at least powerpc and ia64 don't in hw, I don't know about
others... out of order in general is getting fascionable in processor
design ... 

 So you are making an extremely poor engineering decision
 by trying to fixup all the drivers to match PowerPC's
 semantics.  I think a smart engineer would decrease his
 debugging burdon, by matching his platform's MMIO accessors
 such that it matches what other platforms do and therefore
 inheriting the testing coverage provided by all platforms.
 
 Otherwise you will be hunting down these kinds of memory
 barrier issues forever.

Well, some of you (Alan, you, etc...) seem to imply that it's always
been the rule to have a memory store followed by an MMIO write be
strongly ordered.

However, if you look at drivers like e1000, USB OHCI, or even sungem
(:-) they, all have at least wmb()'s between updating descriptor in
memory and the MMIO that triggers reading those by the chip. So it seems
that I'm not the only one to have thought otherwise ;-) More
specificaly, at least ia64 I think, like PowerPC, assumes no ordering
requirement here. So they would need fixing too.

My main problem is the cost... it's actually very expensive to do that
sort of synchronisation. I don't know for ia64 or other potentially out
of order architectures, but we do introduced a measureable performance
hit by adding the one we already have to guard against spin_unlock.

So if we decide to go the way of making writel synchronous vs. previous
MMIOs, I'd really like to have a clearly defined relaxed version as
well.

However, I'm not sure any of our current relaxed accessor have clear
semantics. At least what is implemented currently on PowerPC is the
__raw_* versions which not only have no barriers at all (they don't even
order between MMIOs, for example, readl might cross writel), and do no
endian swap. Quite a mess of semantics if you ask me... Then there has
been talks about those *_relaxed operations but those are more a match
to the relaxed PCI-X and PCI-E cycels, that is they relax ordering vs.
requests in a different direction on the bus, they have nothing to do
with storage domains on the CPU.

Ben.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3   4   5   >