Re: TCP_DEFER_ACCEPT issues

2007-11-01 Thread Eric Dumazet

Felix von Leitner a écrit :

I am trying to use TCP_DEFER_ACCEPT in my web server.

There are some operational problems.  First of all: timeout handling.  I
would like to be able to set a timeout in seconds (or better:
milliseconds) for how long the socket is allowed to sit there without
data coming in.  For high load situations, I have been enforcing
timeouts in the range of 15 seconds, otherwise someone can DoS the
server by opening a lot of connections and tying up data structures.

It is still possible, of course, to tie up kernel memory this way, by
not reacting to the FIN or RST packets and running into a timeout there,
too, but that is partially tunable via sysctl.

According to tcp(7) the int argument to TCP_DEFER_ACCEPT is in seconds.
In the kernel code, it's converted to TCP timeout units.  When I ran my
server, and connected without sending any data, nothing happened.  No
timeout.  Minutes later, the connection was still there.  Even worse:
when I killed (!) the server process (thus closing the server socket),
the client did not get a reset.  Only when I type something in the
telnet, I get a reset.  This appears to be very broken.

My suggestion:

  1. make the argument to the setsockopt be in seconds, or milliseconds.
  2. if the server socket is closed, reset all pending connections.

Comments?



I agree TCP_DEFER_ACCEPT is not worth it at the current time, if you take into 
account the bad guys, or very slow networks.


1) Setting a timeout in a millisecond range (< 1000) is not very good because 
some clients may need much more time to send your server the data (very long 
distance). So a second granularity is OK.


2) After timeout is elapsed, the server tcp stack has no socket associated to 
your client attempt. So closing the server listening socket wont be able to 
send RST. I agree a RST *should* be sent by the server once the timeout is 
triggered.


A typical tcpdump of what is happening for a tcp_defer_accept timeout of 20 
seconds is :


[1]08:52:47.480291 IP client.60930 > server.http: S 2498995442:2498995442(0) 
win 5840 
[2]08:52:47.480302 IP server.http > client.60930: S 1173302644:1173302644(0) 
ack 2498995443 win 5840 

[3]08:52:47.481669 IP client.60930 > server.http: . ack 1 win 5840

[4]08:52:50.757543 IP server.http > client.60930: S 1173302644:1173302644(0) 
ack 2498995443 win 5840 

[5]08:52:50.758953 IP client.60930 > server.http: . ack 1 win 5840

[6]08:52:56.760611 IP server.http > client.60930: S 1173302644:1173302644(0) 
ack 2498995443 win 5840 

[7]08:52:56.761886 IP client.60930 > server.http: . ack 1 win 5840

[8]08:53:08.771254 IP server.http > client.60930: S 1173302644:1173302644(0) 
ack 2498995443 win 5840 

[9]08:53:08.772514 IP client.60930 > server.http: . ack 1 win 5840

[10]08:53:32.782488 IP server.http > client.60930: S 1173302644:1173302644(0) 
ack 2498995443 win 5840 

[11]08:53:32.783754 IP client.60930 > server.http: . ack 1 win 5840



[12]08:59:30.509097 IP client.60930 > server.http: P 1:3(2) ack 1 win 5840
[13]08:59:30.509125 IP server.http > client.60930: R 1173302645:1173302645(0) 
win 0



So TCP_DEFER_ACCEPT might send way more packets than needed. Packets 4,6,8,10 
(and their corresponding acks 5,7,9,11) seem un-necessary, since (1,2,3) has 
engaged a normal TCP session (three way handshake).


We only should wait for the data coming from the client to be able to pass the 
new socket to the listening application.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: Add 405EX support to new EMAC driver

2007-11-01 Thread Stefan Roese
This patch adds support for the 405EX to the new EMAC driver. Some as on
AXON, the 405EX handles the MDIO via the RGMII bridge.

Tested on AMCC Kilauea.

Signed-off-by: Stefan Roese <[EMAIL PROTECTED]>
---
 drivers/net/ibm_newemac/core.c  |3 ++-
 drivers/net/ibm_newemac/rgmii.c |   16 +++-
 drivers/net/ibm_newemac/rgmii.h |2 +-
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 0de3aa2..fd0a585 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -2466,7 +2466,8 @@ static int __devinit emac_init_config(struct 
emac_instance *dev)
if (of_device_is_compatible(np, "ibm,emac4"))
dev->features |= EMAC_FTR_EMAC4;
if (of_device_is_compatible(np, "ibm,emac-axon")
-   || of_device_is_compatible(np, "ibm,emac-440epx"))
+   || of_device_is_compatible(np, "ibm,emac-440epx")
+   || of_device_is_compatible(np, "ibm,emac-405ex"))
dev->features |= EMAC_FTR_HAS_AXON_STACR
| EMAC_FTR_STACR_OC_INVERT;
if (of_device_is_compatible(np, "ibm,emac-440spe"))
diff --git a/drivers/net/ibm_newemac/rgmii.c b/drivers/net/ibm_newemac/rgmii.c
index de41695..b9a4ce7 100644
--- a/drivers/net/ibm_newemac/rgmii.c
+++ b/drivers/net/ibm_newemac/rgmii.c
@@ -140,7 +140,12 @@ void rgmii_get_mdio(struct of_device *ofdev, int input)
 
RGMII_DBG2(dev, "get_mdio(%d)" NL, input);
 
-   if (dev->type != RGMII_AXON)
+   /*
+* Some platforms (e.g. 440GX) have RGMII support but don't use it for
+* MDIO access. Only continue if platforms is using MDIO over the RGMII
+* interface (e.g. AXON, 405EX).
+*/
+   if (dev->type != RGMII_HAS_MDIO)
return;
 
mutex_lock(&dev->lock);
@@ -161,7 +166,7 @@ void rgmii_put_mdio(struct of_device *ofdev, int input)
 
RGMII_DBG2(dev, "put_mdio(%d)" NL, input);
 
-   if (dev->type != RGMII_AXON)
+   if (dev->type != RGMII_HAS_MDIO)
return;
 
fer = in_be32(&p->fer);
@@ -251,8 +256,9 @@ static int __devinit rgmii_probe(struct of_device *ofdev,
}
 
/* Check for RGMII type */
-   if (of_device_is_compatible(ofdev->node, "ibm,rgmii-axon"))
-   dev->type = RGMII_AXON;
+   if (of_device_is_compatible(ofdev->node, "ibm,rgmii-axon") ||
+   of_device_is_compatible(ofdev->node, "ibm,rgmii-405ex"))
+   dev->type = RGMII_HAS_MDIO;
else
dev->type = RGMII_STANDARD;
 
@@ -264,7 +270,7 @@ static int __devinit rgmii_probe(struct of_device *ofdev,
 
printk(KERN_INFO
   "RGMII %s %s initialized\n",
-  dev->type == RGMII_STANDARD ? "standard" : "axon",
+  dev->type == RGMII_STANDARD ? "standard" : "has-mdio",
   ofdev->node->full_name);
 
wmb();
diff --git a/drivers/net/ibm_newemac/rgmii.h b/drivers/net/ibm_newemac/rgmii.h
index 5780683..f1b0ef5 100644
--- a/drivers/net/ibm_newemac/rgmii.h
+++ b/drivers/net/ibm_newemac/rgmii.h
@@ -23,7 +23,7 @@
 
 /* RGMII bridge type */
 #define RGMII_STANDARD 0
-#define RGMII_AXON 1
+#define RGMII_HAS_MDIO 1
 
 /* RGMII bridge */
 struct rgmii_regs {
-- 
1.5.3.4.498.g9c514

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehea: add kexec support

2007-11-01 Thread Michael Ellerman
On Wed, 2007-10-31 at 20:48 +0100, Christoph Raisch wrote:
> Michael Ellerman <[EMAIL PROTECTED]> wrote on 30.10.2007 23:50:36:
> >
> > On Tue, 2007-10-30 at 09:39 +0100, Christoph Raisch wrote:
> > >
> > > Michael Ellerman <[EMAIL PROTECTED]> wrote on 28.10.2007 23:32:17:
> > > Hope I didn't miss anything here...
> >
> > Perhaps. When we kdump the kernel does not call the reboot notifiers, so
> > the code Jan-Bernd just added won't get called. So the eHEA resources
> > won't be freed. When the kdump kernel tries to load the eHEA driver what
> > will happen?
> >
> Good point.
> 
> If the device driver tries to allocate resources again (in the kdump
> kernel),
> which have been allocated before (in the crashed kernel) the hcalls will
> fail because from the hypervisor view the resources are still in use.
> Currently there's no method to find out the resource handles for these
> HEA resources allocated by the crashed kernel within the hypervisor...

So the hypervisor can't allocate more resources, because they're already
allocated, but it can't free the ones that are allocated because it
doesn't know what they are? I don't think I understand.

If that's really the way it works then eHEA is more or less broken for
kdump I'm afraid.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [patch 2/2] ipvs: Syncrhonise Closing of Connections

2007-11-01 Thread Simon Horman
On Fri, Nov 02, 2007 at 01:36:07AM +0200, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Thu, 1 Nov 2007, Simon Horman wrote:
> 
> > --- net-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c 2007-11-01 18:17:55.0 
> > +0900
> > +++ net-2.6/net/ipv4/ipvs/ip_vs_sync.c  2007-11-01 18:20:30.0 
> > +0900
> > @@ -332,7 +332,7 @@ static void ip_vs_process_message(const 
> > s->daddr, s->dport,
> > flags, dest);
> > if (dest)
> 
>   Is that correct? Sorry, I was flooded with different versions
> of this patch and I'm not sure if it is the final one.
> 
> > -   atomic_dec(&dest->refcnt);
> > +   ip_vs_dest_get(dest);
> > if (!cp) {
> > IP_VS_ERR("ip_vs_conn_new failed\n");
> > return;

The ip_vs_dest_get() call shouldn't be there.
I'll double check the rest of the patch.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] let USB_USBNET always select MII

2007-11-01 Thread David Brownell
On Thursday 01 November 2007, Adrian Bunk wrote:
> All this USB_USBNET_MII trickery is simply not worth it considering how 
> few code it saves.

Depends on what systems you're talking about.  Forcing unused
code into the kernel is not free, especially if that's made into
a design policy and applied repeatedly to many subsystems.


> As a side effect, this also fixes the following compile error reported 
> by Toralf Förster:

Why not just fix the thing which changed and broke the build?

Or if reverse dependencies can't be made to work sanely, then
have those Ethernet-adapter minidrivers depend on NET_ETHERNET
and then select MII.  (To make the relationships be simple
enough that current Kconfig can handle them.)

I have a fair number of usbnet devices.  Not one of them needs
MII or NET_ETHERNET.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: build #337 failed for 2.6.24-rc1-gb1d08ac In function `usbnet_set_settings':

2007-11-01 Thread Adrian Bunk
On Thu, Nov 01, 2007 at 04:32:18PM -0700, David Brownell wrote:
> On Thursday 01 November 2007, Randy Dunlap wrote:
> > The MII functions aren't available unless NET_ETHERNET=y.

The setting of CONFIG_NET_ETHERNET doesn't matter for this bug.

> > Howver, the MII functions aren't always needed...
> > 
> > David, any ideas on this one?
> 
> It's been several years since I looked at this.  It
> used to behave just fine.
> 
> Something must have changed in the not-too-distant
> past to have broken this mechanism...
>...

It seems to be an old bug.

The following combination of options is simply an unusual one:

CONFIG_MII=m
CONFIG_USB_USBNET=y
CONFIG_USB_USBNET_MII=n

> - Dave

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/2] ipvs: Syncrhonise Closing of Connections

2007-11-01 Thread Julian Anastasov

Hello,

On Thu, 1 Nov 2007, Simon Horman wrote:

> --- net-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c   2007-11-01 18:17:55.0 
> +0900
> +++ net-2.6/net/ipv4/ipvs/ip_vs_sync.c2007-11-01 18:20:30.0 
> +0900
> @@ -332,7 +332,7 @@ static void ip_vs_process_message(const 
>   s->daddr, s->dport,
>   flags, dest);
>   if (dest)

Is that correct? Sorry, I was flooded with different versions
of this patch and I'm not sure if it is the final one.

> - atomic_dec(&dest->refcnt);
> + ip_vs_dest_get(dest);
>   if (!cp) {
>   IP_VS_ERR("ip_vs_conn_new failed\n");
>   return;

Regards

--
Julian Anastasov <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: build #337 failed for 2.6.24-rc1-gb1d08ac In function `usbnet_set_settings':

2007-11-01 Thread David Brownell
On Thursday 01 November 2007, Randy Dunlap wrote:
> The MII functions aren't available unless NET_ETHERNET=y.
> Howver, the MII functions aren't always needed...
> 
> David, any ideas on this one?

It's been several years since I looked at this.  It
used to behave just fine.

Something must have changed in the not-too-distant
past to have broken this mechanism...


>  config USB_USBNET
>         tristate "Multi-purpose USB Networking Framework"
> +       depends on NET_ETHERNET if USB_USBNET_MII != n
>         select MII if USB_USBNET_MII != n
> 
> would be handy.  But invalid.
> 
> Hm, wait.  Haven't we seen this before and decided that MII should
> be made more generally available?  I.e., not depend on NET_ETHERNET?

Some of us keep wanting to see "select" work properly,
not omitting dependencies...

Re interdependencies MII and NET_ETHERNET, I'll leave
that up to the netedev folk.

- Dave

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread Rick Jones

David Miller wrote:

From: Rick Jones <[EMAIL PROTECTED]>

I'll try to go pester folks in tcpdump-workers then.



The thing to check is "TP_STATUS_CSUMNOTREADY".

When using mmap(), it will be provided in the descriptor.  When using
recvmsg() it will be provided via a PACKET_AUXDATA control message
when enabled via the PACKET_AUXDATA socket option.


Figures... the "dailies" and "weeklies" for tar files of tcpdump and libpcap 
source are fubar... again.  I've email in to tcpdump-workers on that one.  If 
that isn't resolved quickly I'll learn how to access their CVS (pick an SCM, any 
SCM...)


I did an apt-get of debian lenny's tcpdump and sources:

hpcpc103:~# tcpdump -V
tcpdump version 3.9.8
libpcap version 0.9.8

and that seems to show the false checksum failure and not use the 
TP_STATUS_CSUMNOTREADY - at least that didn't appear in a grepping of the 
sources.  At first I thought it might be, but then I realized that my snaplen 
was too short to get the whole TSO'ed frame so tcpdump wasn't even trying to 
verify.  After disabling TSO on the NIC, leaving CKO on, and making my snaplen > 
1500 I could see it was doing undesirable stuff.


I'll see what top of trunk has at some point and what the folks there think of 
adding-in a change.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9270] New: sunhme requires lower MTU to handle 802.1q frames

2007-11-01 Thread Chris Poon
Forgot to add that only changing BMAC_TXMAX & BMAC_RXMAX wouldn't work
for me, until I changed 2 skb_put as well (which is in the patch that
I submitted in bugzilla). Dug up some really old threads on the net
and found out that this was reported before

Quoting Andrew Morton <[EMAIL PROTECTED]>:

> On Wed, 31 Oct 2007 16:35:57 -0700 (PDT)
> David Miller <[EMAIL PROTECTED]> wrote:
> 
> > From: Andrew Morton <[EMAIL PROTECTED]>
> > Date: Wed, 31 Oct 2007 15:43:01 -0700
> > 
> > > > sunhme requires lower MTU to handle 802.1q frames - even though the
> PCI
> > > > driver supported VLAN tagging, you cannot do full MTU @ 1500 because
> the
> > > > driver doesn't set the card to transfer more the extra bytes for a
> 802.1q
> > > > frame at 1500 MTU.
> > 
> > It supports VLAN tagging by accident, the NETIF_F_VLAN_CHALLENGED
> > flag should be set both in the PCI and non-PCI cases.
> > 
> > Jeff, please apply, thanks:
> > 
> > [SUNHME]: Fix missing NETIF_F_VLAN_CHALLENGED on PCI happy meals.
> > 
> > No HME parts can do VLANs correctly.
> > 
> > Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> > 
> > diff --git a/drivers/net/sunhme.c b/drivers/net/sunhme.c
> > index 120c8af..c20a3bd 100644
> > --- a/drivers/net/sunhme.c
> > +++ b/drivers/net/sunhme.c
> > @@ -3143,8 +3143,8 @@ static int __devinit happy_meal_pci_probe(struct
> pci_dev *pdev,
> > dev->irq = pdev->irq;
> > dev->dma = 0;
> >  
> > -   /* Happy Meal can do it all... */
> > -   dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
> > +   /* Happy Meal can do it all... except VLAN. */
> > +   dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_VLAN_CHALLENGED;
> >  
> >  #if defined(CONFIG_SBUS) && defined(CONFIG_PCI)
> > /* Hook up PCI register/dma accessors. */
> 
> I forgot to add my standard "please reply via emailed reply-to-all, not via
> the bugzilla web interface", so Chris has gone and attempted to communicate
> with us via the bugzilla UI (sigh).
> 
> He asked
> 
> "Even though it appears to work after I bumped the BMAC_TXMAX / BMAC_RXMAX?"
> 
> 
> 




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[2.6 patch] let USB_USBNET always select MII

2007-11-01 Thread Adrian Bunk
All this USB_USBNET_MII trickery is simply not worth it considering how 
few code it saves.

As a side effect, this also fixes the following compile error reported 
by Toralf Förster:

<--  snip  -->

...
  LD  .tmp_vmlinux1
drivers/built-in.o: In function `usbnet_set_settings':
(.text+0xf1876): undefined reference to `mii_ethtool_sset'
drivers/built-in.o: In function `usbnet_get_settings':
(.text+0xf1836): undefined reference to `mii_ethtool_gset'
drivers/built-in.o: In function `usbnet_get_link':
(.text+0xf18d6): undefined reference to `mii_link_ok'
drivers/built-in.o: In function `usbnet_nway_reset':
(.text+0xf18f6): undefined reference to `mii_nway_restart'
make: *** [.tmp_vmlinux1] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---

 drivers/net/usb/Kconfig  |9 +
 drivers/net/usb/usbnet.c |7 ---
 2 files changed, 1 insertion(+), 15 deletions(-)

3b7f6290c639b9042fead1698fdbe1c84132c953 
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index 5a96d74..a12c9c4 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -93,13 +93,9 @@ config USB_RTL8150
  To compile this driver as a module, choose M here: the
  module will be called rtl8150.
 
-config USB_USBNET_MII
-   tristate
-   default n
-
 config USB_USBNET
tristate "Multi-purpose USB Networking Framework"
-   select MII if USB_USBNET_MII != n
+   select MII
---help---
  This driver supports several kinds of network links over USB,
  with "minidrivers" built around a common network driver core
@@ -135,7 +131,6 @@ config USB_NET_AX8817X
tristate "ASIX AX88xxx Based USB 2.0 Ethernet Adapters"
depends on USB_USBNET && NET_ETHERNET
select CRC32
-   select USB_USBNET_MII
default y
help
  This option adds support for ASIX AX88xxx based USB 2.0
@@ -190,7 +185,6 @@ config USB_NET_DM9601
tristate "Davicom DM9601 based USB 1.1 10/100 ethernet devices"
depends on USB_USBNET
select CRC32
-   select USB_USBNET_MII
help
  This option adds support for Davicom DM9601 based USB 1.1
  10/100 Ethernet adapters.
@@ -225,7 +219,6 @@ config USB_NET_PLUSB
 config USB_NET_MCS7830
tristate "MosChip MCS7830 based Ethernet adapters"
depends on USB_USBNET
-   select USB_USBNET_MII
help
  Choose this option if you're using a 10/100 Ethernet USB2
  adapter based on the MosChip 7830 controller. This includes
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index acd5f1c..8ed1fc5 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -683,9 +683,6 @@ done_nopm:
  * they'll probably want to use this base set.
  */
 
-#if defined(CONFIG_MII) || defined(CONFIG_MII_MODULE)
-#define HAVE_MII
-
 int usbnet_get_settings (struct net_device *net, struct ethtool_cmd *cmd)
 {
struct usbnet *dev = netdev_priv(net);
@@ -744,8 +741,6 @@ int usbnet_nway_reset(struct net_device *net)
 }
 EXPORT_SYMBOL_GPL(usbnet_nway_reset);
 
-#endif /* HAVE_MII */
-
 void usbnet_get_drvinfo (struct net_device *net, struct ethtool_drvinfo *info)
 {
struct usbnet *dev = netdev_priv(net);
@@ -776,12 +771,10 @@ EXPORT_SYMBOL_GPL(usbnet_set_msglevel);
 
 /* drivers may override default ethtool_ops in their bind() routine */
 static struct ethtool_ops usbnet_ethtool_ops = {
-#ifdef HAVE_MII
.get_settings   = usbnet_get_settings,
.set_settings   = usbnet_set_settings,
.get_link   = usbnet_get_link,
.nway_reset = usbnet_nway_reset,
-#endif
.get_drvinfo= usbnet_get_drvinfo,
.get_msglevel   = usbnet_get_msglevel,
.set_msglevel   = usbnet_set_msglevel,
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread David Miller
From: Rick Jones <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 15:04:12 -0700

> David Miller wrote:
> > From: Rick Jones <[EMAIL PROTECTED]>
> > Date: Thu, 01 Nov 2007 14:48:45 -0700
> > 
> > 
> >>One could I suppose try to ammend the information passed to allow
> >>tcpdump to say "oh, this was a tx packet on the same machine on
> >>which I am tracing so don't worry about checksum mismatch"
> > 
> > 
> > We do this already!
> 
> I'll try to go pester folks in tcpdump-workers then.

The thing to check is "TP_STATUS_CSUMNOTREADY".

When using mmap(), it will be provided in the descriptor.  When using
recvmsg() it will be provided via a PACKET_AUXDATA control message
when enabled via the PACKET_AUXDATA socket option.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread Rick Jones

David Miller wrote:

From: Rick Jones <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 14:48:45 -0700



One could I suppose try to ammend the information passed to allow
tcpdump to say "oh, this was a tx packet on the same machine on
which I am tracing so don't worry about checksum mismatch"



We do this already!


I'll try to go pester folks in tcpdump-workers then.

rick
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread David Miller
From: Rick Jones <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 14:48:45 -0700

> One could I suppose try to ammend the information passed to allow
> tcpdump to say "oh, this was a tx packet on the same machine on
> which I am tracing so don't worry about checksum mismatch"

We do this already!

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread David Miller
From: Dave Johnson <[EMAIL PROTECTED]>
Date: Thu, 1 Nov 2007 17:36:22 -0400

> bad csum on tx packets as reported by tcpdump is also an issue.

We provide a tag to userspace that tcpdump should use to see that the
HW is going to checksum the packet, and therefore it should elide
trying to verify the checksums.

It's not a kernel issue.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 20:17:35 +0100

> But if it was an issue, the spinlock array used in IP route cache
> would have the same problem and *someone* should already have
> complained...

I'd say that having a few MB less of rwlock cache lines to
touch offsets the issue even if it does exist.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread Rick Jones

The code in AF_PACKET should fix the skb before passing to user
space so that there is no difference between accel and non-accel
hardware.  Internal choices shouldn't leak to user space.  Ditto,
the receive checksum offload should be fixed up as well.



yep.  bad csum on tx packets as reported by tcpdump is also an issue.


With TX CKO enabled, there isn't any checksum to fixup when a tx packet is 
sniffed, so I'm not sure what can be done in the kernel apart from an 
unpalatable "disable CKO and all which depend upon it when entering promiscuous 
mode."  Having the tap calculate a checksum would be equally bad for 
performance, and would frankly be incorrect anyway because it would give the 
user the false impression that was the checksum which went-out onto the wire.


One could I suppose try to ammend the information passed to allow tcpdump to say 
"oh, this was a tx packet on the same machine on which I am tracing so don't 
worry about checksum mismatch" but I have to wonder if it is _really_ worth it. 
Already someone has to deal with seeing TCP segments >> the MSS thanks to TSO. 
(Actually tcpdump got rather confused about that too since the IP length of 
those was 0, but IIRC we got that patched to use the length of zero as a "ah, 
this was TSO so wing it" heuristic.)


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 18:54:24 +0100

> Stephen Hemminger a écrit :
> > Longterm is there any chance of using rcu for this? Seems like
> > it could be a big win.
> 
> This was discussed in the past, and I even believe some patch was proposed, 
> but some guys (including David) complained that RCU is well suited for 
> 'mostly 
>   read' structures.
> 
> On some web server workloads, TCP hash table is constantly accessed in write 
> mode (socket creation, socket move to timewait state, socket  deleted...), 
> and 
> RCU added overhead and poor cache re-use (because sockets must be placed on 
> RCU queue before reuse)
> 
> On these typical workload, hash table without RCU is still the best.

Right, and none of the submitted RCU attempts were correct, it's very
hard to get the synchronization right.

> Longterm changes would rather be based on Robert Olsson suggestion
> last year (trie based lookups and unified IP/TCP cache)
>
> Short term changes would be to be able to resize the TCP hash table (being 
> small at boot, and be able to grow it if necessary). Its current size on 
> modern machines is just insane.

Resizing the hash is also, unfortunately, very hard to implement
as well.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread Dave Johnson
Ben Greear writes:
> We should also define what a NIC should do with VLANs it doesn't
> explicitly know about.   I think it should pass them up the stack
> with VLAN tag intact, but again, perhaps there are reasons not to do
> that?

Unless the device also supports NETIF_F_HW_VLAN_FILTER, it has no idea
which vlans the kernel cares about, it's up to __vlan_hwaccel_rx().


Stephen Hemminger writes:
> The code in AF_PACKET should fix the skb before passing to user
> space so that there is no difference between accel and non-accel
> hardware.  Internal choices shouldn't leak to user space.  Ditto,
> the receive checksum offload should be fixed up as well.

yep.  bad csum on tx packets as reported by tcpdump is also an issue.


Ben Greear writes:
> Currently, VLAN devices offer the ability to 'reorder' the header
> and explicitly remove the VLAN header.  I assume we keep this
> feature and have the AF_PACKET logic check the device flags to see
> if it should insert the VLAN header for hw-accel vlans? 
> 
> Either way, if we sniff the underlying device, we should always get
> the VLAN header.

Yes, but it's more than just a packet socket issue.

A quick look through the hwaccel capable drivers (in 2.6.23) and most
are doing something like:

if (foo->vlgrp && packet_is_tagged)
  vlan_hwaccel_receive_skb(skb, foo->vlgrp, vlan_tag);
else
  netif_receive_skb(skb);

The important thing here is if the vlan group is NULL, the MAC must
be configured to NOT strip the tag.

users of NETIF_F_HW_VLAN_RX:
---
./drivers/net/8139cp.c: looks ok
./drivers/net/acenic.c: *1
./drivers/net/amd8111e.c:   unsure, probably *1
./drivers/net/atl1/atl1_main.c: looks ok
./drivers/net/bnx2.c:   *2
./drivers/net/bonding/bond_main.c:  unsure, probably ok
./drivers/net/chelsio/cxgb2.c:  looks ok
./drivers/net/cxgb3/cxgb3_main.c:   looks ok
./drivers/net/e1000/e1000_main.c:   looks ok
./drivers/net/ehea/ehea_main.c: unsure, probably ok
./drivers/net/forcedeth.c:  looks ok
./drivers/net/gianfar.c:looks ok
./drivers/net/ixgb/ixgb_main.c: looks ok
./drivers/net/ns83820.c:unsure, probably ok
./drivers/net/r8169.c:  looks ok
./drivers/net/s2io.c:   *1
./drivers/net/sky2.c:   looks ok
./drivers/net/starfire.c:   unsure, probably ok
./drivers/net/tg3.c:*2
./drivers/net/typhoon.c:unsure, probably ok
./drivers/s390/net/qeth_main.c: unsure, probably ok

*1:  Driver configures the MAC to strip TAGs even if vlan group is
 NULL. MAC strips the tag, but driver calls netif_rx() or
 netif_receive_skb() with the packet as untagged.  Kernel
 processes tagged packet as if it was received untagged.  Possible
 security issue.

*2:  If chip supports 'ASF', tag is always stripped (see *1 above).
 Looks ok if ASF is not supported. 


Ben Greear writes:
> Do the NICs not save the QoS bits in the VLAN header anywhere that
> we could use to reconstitute the header?

Most likely, __vlan_hwaccel_rx() gets the whole 16bit tag and sets
skb->priority from on it.


Besides the accidental removal with the drivers listed above when
there is no vlan group registerd, we're still back to the original
issue.

Having __vlan_hwaccel_rx() send to the base device would likely
require a copy of the skb (at least the head). That completely defeats
the point of hwaccel.

At a minimum, __vlan_hwaccel_rx() should probably add the vlan header
back on if doesn't find a vlan device.  This way no copy is needed,
just shove the header back on (we already have the full 16bits).  Once
re-added, send to the base device instead of dropping.

That would fix the unknown vlan issue, but known vlans would only go
to the vlan device not the base device.  Not sure of an easy fix for
this as af_packet can specifically bind to a specified base device.  I
don't this this would be much of an issue and probably doesn't need
fixing.

-- 
Dave Johnson
Starent Networks

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] - e1000_ethtool.c - convert macros to functions

2007-11-01 Thread Kok, Auke
Joe Perches wrote:
> Minimal macro to function conversion in e1000_ethtool.c
> 
> Adds functions reg_pattern_test and reg_set_and_check
> Changes REG_PATTERN_TEST and REG_SET_AND_CHECK macros
> to call these functions.
> 
> Saves ~2.5KB
> 
> Compiled x86, untested (no hardware)
> 
> old:
> 
> $ size drivers/net/e1000/e1000_ethtool.o
>textdata bss dec hex filename
>   16778   0   0   16778418a drivers/net/e1000/e1000_ethtool.o
> 
> new:
> 
> $ size drivers/net/e1000/e1000_ethtool.o
>textdata bss dec hex filename
>   14128   0   0   141283730 drivers/net/e1000/e1000_ethtool.o
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>



ok, looks good and it'll get tested with the e1000e version.

Thanks

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread David Miller
From: Ben Greear <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 08:04:31 -0700

> David Miller wrote:
> > The hardware has stripped the VLAN header completely and has not
> > provided it to us at all.
> >   
> Do the NICs not save the QoS bits in the VLAN header anywhere that we could
> use to reconstitute the header?

You get the 16-bit VLAN tag.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Add 405EX support to new EMAC driver

2007-11-01 Thread Stefan Roese
On Thursday 01 November 2007, Josh Boyer wrote:
> > > - if (dev->type != RGMII_AXON)
> > > - return;
> > > -
> > >   mutex_lock(&dev->lock);
> >
> > That will break 440GX boards that need to use the RGMII for data and the
> > ZMII for MDIO.
> >
> > You may want to change the name RGMII_AXON something like RGMII_HAS_MDIO
> > instead and set that for 405EX as well instead.
>
> And perhaps adding a comment about that since the meaning of that code
> isn't very obvious. That way people that aren't the original author
> of the driver don't get confused again.

Will do. I'll send a fixed version tomorrow.

Best regards,
Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] - e1000_ethtool.c - convert macros to functions

2007-11-01 Thread Joe Perches
Minimal macro to function conversion in e1000_ethtool.c

Adds functions reg_pattern_test and reg_set_and_check
Changes REG_PATTERN_TEST and REG_SET_AND_CHECK macros
to call these functions.

Saves ~2.5KB

Compiled x86, untested (no hardware)

old:

$ size drivers/net/e1000/e1000_ethtool.o
   textdata bss dec hex filename
  16778   0   0   16778418a drivers/net/e1000/e1000_ethtool.o

new:

$ size drivers/net/e1000/e1000_ethtool.o
   textdata bss dec hex filename
  14128   0   0   141283730 drivers/net/e1000/e1000_ethtool.o

Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

---

 drivers/net/e1000/e1000_ethtool.c |   80 +++-
 1 files changed, 51 insertions(+), 29 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethtool.c 
b/drivers/net/e1000/e1000_ethtool.c
index 667f18b..830f0fb 100644
--- a/drivers/net/e1000/e1000_ethtool.c
+++ b/drivers/net/e1000/e1000_ethtool.c
@@ -728,39 +728,61 @@ err_setup:
return err;
 }
 
-#define REG_PATTERN_TEST(R, M, W)  
\
-{  
\
-   uint32_t pat, val; \
-   const uint32_t test[] =\
-   {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};  \
-   for (pat = 0; pat < ARRAY_SIZE(test); pat++) { \
-   E1000_WRITE_REG(&adapter->hw, R, (test[pat] & W)); \
-   val = E1000_READ_REG(&adapter->hw, R); \
-   if (val != (test[pat] & W & M)) {  \
-   DPRINTK(DRV, ERR, "pattern test reg %04X failed: got " \
-   "0x%08X expected 0x%08X\n",\
-   E1000_##R, val, (test[pat] & W & M));  \
-   *data = (adapter->hw.mac_type < e1000_82543) ? \
-   E1000_82542_##R : E1000_##R;   \
-   return 1;  \
-   }  \
-   }  \
+static bool reg_pattern_test(struct e1000_adapter *adapter, uint64_t *data,
+int reg, uint32_t mask, uint32_t write)
+{
+   static const uint32_t test[] =
+   {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};
+   uint8_t __iomem *address = adapter->hw.hw_addr + reg;
+   uint32_t read;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(test); i++) {
+   writel(write & test[i], address);
+   read = readl(address);
+   if (read != (write & test[i] & mask)) {
+   DPRINTK(DRV, ERR, "pattern test reg %04X failed: "
+   "got 0x%08X expected 0x%08X\n",
+   reg, read, (write & test[i] & mask));
+   *data = reg;
+   return true;
+   }
+   }
+   return false;
 }
 
-#define REG_SET_AND_CHECK(R, M, W) 
\
-{  
\
-   uint32_t val;  \
-   E1000_WRITE_REG(&adapter->hw, R, W & M);   \
-   val = E1000_READ_REG(&adapter->hw, R); \
-   if ((W & M) != (val & M)) {\
-   DPRINTK(DRV, ERR, "set/check reg %04X test failed: got 0x%08X "\
-   "expected 0x%08X\n", E1000_##R, (val & M), (W & M));   \
-   *data = (adapter->hw.mac_type < e1000_82543) ? \
-   E1000_82542_##R : E1000_##R;   \
-   return 1;  \
-   }  \
+static bool reg_set_and_check(struct e1000_adapter *adapter, uint64_t *data,
+ int reg, uint32_t mask, uint32_t write)
+{
+   uint8_t __iomem *address = adapter->hw.hw_addr + reg;
+   uint32_t read;
+
+   writel(write & mask, address);
+   read = readl(address);
+   if ((read & mask) != (write & mask)) {
+   DPRINTK(DRV, ERR, "set/check reg %04X test failed: "
+   "got 0x%08X expected 0x%08X\n",
+   reg, (read & mask), (write & mask));
+   *data = reg;
+   return true;
+   }
+   return false;
 }
 
+#define REG_PATTERN_TEST(reg, mask, write)  \
+   if (reg_pattern_t

Re: [BUG] in inet6_create

2007-11-01 Thread Roel Kluin
Roel Kluin wrote:
> I got this bug recently, I am not sure whether this is related to any 
> previously 
> reported ones. It was a recently pulled git kernel. Also I have been hacking 
> my
> kernel a bit lately, but I think that I haven't got any changes in the 
> currently
> running kernel.
> 
> FYI: my network card was not running (module not loaded, and I just started 
> thunderbird)
> 
> Roel
> 
> More information needed?
> --

probably mailing to linux-net was more appropriate

> 
> NET: Registered protocol family 10
>  BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 
>  printing eip: f881034f *pde =  
>  Oops:  [#1] 
>  Modules linked in: ipv6
>  
>  Pid: 17080, comm: modprobe Not tainted (2.6.24-rc1 #1)
>  EIP: 0060:[] EFLAGS: 00010293 CPU: 0
>  EIP is at inet6_create+0x5f/0x340 [ipv6]
>  EAX:  EBX:  ECX: f7621fd5 EDX: f8842e78
>  ESI:  EDI: 003a EBP: ff9f ESP: d780de74
>   DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
>  Process modprobe (pid: 17080, ti=d780c000 task=c3a86000 task.ti=d780c000)
>  Stack:  0246 0246 0003 c60e22a0 0246  
>  
> f88410fc ffea 0003 c063f680 c028d597 0002 0001 
> c028d52c 
> c60e22a0 0003 f8842d00 0032  c028d6a7 003a 
> f88438c0 
>  Call Trace:
>   [] __sock_create+0xf7/0x1e0
>   [] __sock_create+0x8c/0x1e0
>   [] sock_create_kern+0x27/0x30
>   [] icmpv6_init+0x1f/0xa0 [ipv6]
>   [] inet6_init+0x13f/0x2f0 [ipv6]
>   [] sys_init_module+0x173/0x16c0
>   [] autoremove_wake_function+0x0/0x50
>   [] sys_read+0x41/0x70
>   [] syscall_call+0x7/0xb
>   ===
>  Code: c0 85 c9 0f 84 12 02 00 00 c7 44 24 18 00 00 00 00 0f bf c6 c1 e0 03 
> 8b 98 80 2e 84 f8 8d 90 80 2e 84 f8 89 5c 24 1c 8b 44 24 1c <8b> 00 0f 18 00 
> 90 39 d3 bd a2 ff ff ff 75 36 e9 f3 01 00 00 85 
>  EIP: [] inet6_create+0x5f/0x340 [ipv6] SS:ESP 0068:d780de74
>  BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 
>  printing eip: f881034f *pde =  
>  Oops:  [#2] 
>  Modules linked in: ipv6
>  
>  Pid: 17078, comm: thunderbird-bin Tainted: G  D (2.6.24-rc1 #1)
>  EIP: 0060:[] EFLAGS: 00210293 CPU: 0
>  EIP is at inet6_create+0x5f/0x340 [ipv6]
>  EAX:  EBX:  ECX: f7621fd5 EDX: f8842e78
>  ESI:  EDI:  EBP: ff9f ESP: c2801f00
>   DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
>  Process thunderbird-bin (pid: 17078, ti=c280 task=c20bf000 
> task.ti=c280)
>  Stack: c0185024 00200246 00200246 0001 c60e2000 00200246  
>  
> f88410fc ffea 0001 c063f680 c028d597 0002 0001 
> c028d52c 
> c60e2000 0001 000a 08b095bc c280 c028d6e9  
> c2801f74 
>  Call Trace:
>   [] new_inode+0x24/0x90
>   [] __sock_create+0xf7/0x1e0
>   [] __sock_create+0x8c/0x1e0
>   [] sock_create+0x39/0x50
>   [] sys_socket+0x1c/0x50
>   [] sys_socketcall+0x68/0x280
>   [] trace_hardirqs_on+0xbb/0x160
>   [] do_sched_setscheduler+0xad/0xc0
>   [] restore_nocheck+0x12/0x15
>   [] syscall_call+0x7/0xb
>   ===
>  Code: c0 85 c9 0f 84 12 02 00 00 c7 44 24 18 00 00 00 00 0f bf c6 c1 e0 03 
> 8b 98 80 2e 84 f8 8d 90 80 2e 84 f8 89 5c 24 1c 8b 44 24 1c <8b> 00 0f 18 00 
> 90 39 d3 bd a2 ff ff ff 75 36 e9 f3 01 00 00 85 
>  EIP: [] inet6_create+0x5f/0x340 [ipv6] SS:ESP 0068:c2801f00
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: build #337 failed for 2.6.24-rc1-gb1d08ac In function `usbnet_set_settings':

2007-11-01 Thread Randy Dunlap
On Thu, 1 Nov 2007 20:24:54 +0100 Toralf Förster wrote:

[adding netdev]

> Hello,
> 
> the build with the attached .config failed, make ends with:
> ...
>   CC  arch/x86/lib/usercopy_32.o
>   AR  arch/x86/lib/lib.a
>   GEN .version
>   CHK include/linux/compile.h
>   UPD include/linux/compile.h
>   CC  init/version.o
>   LD  init/built-in.o
>   LD  .tmp_vmlinux1
> drivers/built-in.o: In function `usbnet_set_settings':
> (.text+0xf1876): undefined reference to `mii_ethtool_sset'
> drivers/built-in.o: In function `usbnet_get_settings':
> (.text+0xf1836): undefined reference to `mii_ethtool_gset'
> drivers/built-in.o: In function `usbnet_get_link':
> (.text+0xf18d6): undefined reference to `mii_link_ok'
> drivers/built-in.o: In function `usbnet_nway_reset':
> (.text+0xf18f6): undefined reference to `mii_nway_restart'
> make: *** [.tmp_vmlinux1] Error 1

The MII functions aren't available unless NET_ETHERNET=y.
Howver, the MII functions aren't always needed...

David, any ideas on this one?

 config USB_USBNET
tristate "Multi-purpose USB Networking Framework"
+   depends on NET_ETHERNET if USB_USBNET_MII != n
select MII if USB_USBNET_MII != n

would be handy.  But invalid.

Hm, wait.  Haven't we seen this before and decided that MII should
be made more generally available?  I.e., not depend on NET_ETHERNET?


> The build was made with :
> $> make mrproper && make rndconfig &&  && make oldconfig 
> && make
> 
> Here's the config:
> 
> #
> # Automatically generated make config: don't edit
> # Linux kernel version: 2.6.24-rc1
> # Thu Nov  1 19:33:29 2007
> #
> CONFIG_X86_32=y
> CONFIG_GENERIC_TIME=y
> CONFIG_GENERIC_CMOS_UPDATE=y
> CONFIG_CLOCKSOURCE_WATCHDOG=y
> CONFIG_GENERIC_CLOCKEVENTS=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_SEMAPHORE_SLEEPERS=y
> CONFIG_X86=y
> CONFIG_MMU=y
> CONFIG_ZONE_DMA=y
> CONFIG_QUICKLIST=y
> CONFIG_GENERIC_ISA_DMA=y
> CONFIG_GENERIC_IOMAP=y
> CONFIG_GENERIC_BUG=y
> CONFIG_GENERIC_HWEIGHT=y
> CONFIG_ARCH_MAY_HAVE_PC_FDC=y
> CONFIG_DMI=y
> CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
> 
> #
> # General setup
> #
> # CONFIG_EXPERIMENTAL is not set
> CONFIG_BROKEN_ON_SMP=y
> CONFIG_LOCK_KERNEL=y
> CONFIG_INIT_ENV_ARG_LIMIT=32
> CONFIG_LOCALVERSION=""
> # CONFIG_LOCALVERSION_AUTO is not set
> CONFIG_SWAP=y
> CONFIG_SYSVIPC=y
> CONFIG_SYSVIPC_SYSCTL=y
> CONFIG_BSD_PROCESS_ACCT=y
> # CONFIG_BSD_PROCESS_ACCT_V3 is not set
> # CONFIG_TASKSTATS is not set
> CONFIG_AUDIT=y
> CONFIG_AUDITSYSCALL=y
> # CONFIG_IKCONFIG is not set
> CONFIG_LOG_BUF_SHIFT=14
> CONFIG_CGROUPS=y
> # CONFIG_CGROUP_DEBUG is not set
> CONFIG_CGROUP_NS=y
> # CONFIG_CGROUP_CPUACCT is not set
> # CONFIG_FAIR_GROUP_SCHED is not set
> CONFIG_SYSFS_DEPRECATED=y
> CONFIG_RELAY=y
> # CONFIG_BLK_DEV_INITRD is not set
> CONFIG_SYSCTL=y
> # CONFIG_EMBEDDED is not set
> CONFIG_UID16=y
> CONFIG_SYSCTL_SYSCALL=y
> CONFIG_KALLSYMS=y
> # CONFIG_KALLSYMS_EXTRA_PASS is not set
> CONFIG_HOTPLUG=y
> CONFIG_PRINTK=y
> CONFIG_BUG=y
> CONFIG_ELF_CORE=y
> CONFIG_BASE_FULL=y
> CONFIG_FUTEX=y
> CONFIG_ANON_INODES=y
> CONFIG_EPOLL=y
> CONFIG_SIGNALFD=y
> CONFIG_EVENTFD=y
> CONFIG_SHMEM=y
> CONFIG_VM_EVENT_COUNTERS=y
> CONFIG_SLAB=y
> # CONFIG_SLUB is not set
> # CONFIG_SLOB is not set
> CONFIG_RT_MUTEXES=y
> # CONFIG_TINY_SHMEM is not set
> CONFIG_BASE_SMALL=0
> CONFIG_MODULES=y
> # CONFIG_MODULE_UNLOAD is not set
> CONFIG_MODVERSIONS=y
> # CONFIG_MODULE_SRCVERSION_ALL is not set
> # CONFIG_KMOD is not set
> CONFIG_BLOCK=y
> # CONFIG_LBD is not set
> CONFIG_BLK_DEV_IO_TRACE=y
> # CONFIG_LSF is not set
> 
> #
> # IO Schedulers
> #
> CONFIG_IOSCHED_NOOP=y
> # CONFIG_IOSCHED_AS is not set
> # CONFIG_IOSCHED_DEADLINE is not set
> CONFIG_IOSCHED_CFQ=y
> # CONFIG_DEFAULT_AS is not set
> # CONFIG_DEFAULT_DEADLINE is not set
> # CONFIG_DEFAULT_CFQ is not set
> CONFIG_DEFAULT_NOOP=y
> CONFIG_DEFAULT_IOSCHED="noop"
> 
> #
> # Processor type and features
> #
> CONFIG_TICK_ONESHOT=y
> CONFIG_NO_HZ=y
> # CONFIG_HIGH_RES_TIMERS is not set
> CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
> # CONFIG_SMP is not set
> CONFIG_X86_PC=y
> # CONFIG_X86_ELAN is not set
> # CONFIG_X86_VOYAGER is not set
> # CONFIG_X86_NUMAQ is not set
> # CONFIG_X86_SUMMIT is not set
> # CONFIG_X86_BIGSMP is not set
> # CONFIG_X86_VISWS is not set
> # CONFIG_X86_GENERICARCH is not set
> # CONFIG_X86_ES7000 is not set
> # CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER is not set
> CONFIG_PARAVIRT_GUEST=y
> # CONFIG_XEN is not set
> # CONFIG_VMI is not set
> # CONFIG_M386 is not set
> # CONFIG_M486 is not set
> # CONFIG_M586 is not set
> # CONFIG_M586TSC is not set
> # CONFIG_M586MMX is not set
> # CONFIG_M686 is not set
> # CONFIG_MPENTIUMII is not set
> # CONFIG_MPENTIUMIII is not set
> CONFIG_MPENTIUMM=y
> # CONFIG_MCORE2 is not set
> # CONFIG_MPENTIUM4 is not set
> # CONFIG_MK6 is not set
> # CONFIG_MK7 is not set
> # CONFIG_MK8 is not set
> # CONFIG_MCRUSOE is not set
> # CONFIG_MEFFICEON is not set
> # CONFIG_M

Re: [PATCH] net: Add 405EX support to new EMAC driver

2007-11-01 Thread Josh Boyer
On Fri, 02 Nov 2007 07:37:01 +1100
Benjamin Herrenschmidt <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2007-11-01 at 15:54 +0100, Stefan Roese wrote:
> > This patch adds support for the 405EX to the new EMAC driver.
> > 
> > Tested on AMCC Kilauea.
> 
>  .../...
> 
> > diff --git a/drivers/net/ibm_newemac/rgmii.c 
> > b/drivers/net/ibm_newemac/rgmii.c
> > index de41695..e393f68 100644
> > --- a/drivers/net/ibm_newemac/rgmii.c
> > +++ b/drivers/net/ibm_newemac/rgmii.c
> > @@ -140,9 +140,6 @@ void rgmii_get_mdio(struct of_device *ofdev, int input)
> >  
> > RGMII_DBG2(dev, "get_mdio(%d)" NL, input);
> >  
> > -   if (dev->type != RGMII_AXON)
> > -   return;
> > -
> > mutex_lock(&dev->lock);
> 
> That will break 440GX boards that need to use the RGMII for data and the
> ZMII for MDIO.
> 
> You may want to change the name RGMII_AXON something like RGMII_HAS_MDIO
> instead and set that for 405EX as well instead.

And perhaps adding a comment about that since the meaning of that code
isn't very obvious. That way people that aren't the original author
of the driver don't get confused again.

josh
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Add 405EX support to new EMAC driver

2007-11-01 Thread Benjamin Herrenschmidt

On Thu, 2007-11-01 at 15:54 +0100, Stefan Roese wrote:
> This patch adds support for the 405EX to the new EMAC driver.
> 
> Tested on AMCC Kilauea.

 .../...

> diff --git a/drivers/net/ibm_newemac/rgmii.c b/drivers/net/ibm_newemac/rgmii.c
> index de41695..e393f68 100644
> --- a/drivers/net/ibm_newemac/rgmii.c
> +++ b/drivers/net/ibm_newemac/rgmii.c
> @@ -140,9 +140,6 @@ void rgmii_get_mdio(struct of_device *ofdev, int input)
>  
>   RGMII_DBG2(dev, "get_mdio(%d)" NL, input);
>  
> - if (dev->type != RGMII_AXON)
> - return;
> -
>   mutex_lock(&dev->lock);

That will break 440GX boards that need to use the RGMII for data and the
ZMII for MDIO.

You may want to change the name RGMII_AXON something like RGMII_HAS_MDIO
instead and set that for 405EX as well instead.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] in inet6_create

2007-11-01 Thread Roel Kluin
I got this bug recently, I am not sure whether this is related to any 
previously 
reported ones. It was a recently pulled git kernel. Also I have been hacking my
kernel a bit lately, but I think that I haven't got any changes in the currently
running kernel.

FYI: my network card was not running (module not loaded, and I just started 
thunderbird)

Roel

More information needed?
--

NET: Registered protocol family 10
 BUG: unable to handle kernel NULL pointer dereference at virtual address 

 printing eip: f881034f *pde =  
 Oops:  [#1] 
 Modules linked in: ipv6
 
 Pid: 17080, comm: modprobe Not tainted (2.6.24-rc1 #1)
 EIP: 0060:[] EFLAGS: 00010293 CPU: 0
 EIP is at inet6_create+0x5f/0x340 [ipv6]
 EAX:  EBX:  ECX: f7621fd5 EDX: f8842e78
 ESI:  EDI: 003a EBP: ff9f ESP: d780de74
  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
 Process modprobe (pid: 17080, ti=d780c000 task=c3a86000 task.ti=d780c000)
 Stack:  0246 0246 0003 c60e22a0 0246   
f88410fc ffea 0003 c063f680 c028d597 0002 0001 c028d52c 
c60e22a0 0003 f8842d00 0032  c028d6a7 003a f88438c0 
 Call Trace:
  [] __sock_create+0xf7/0x1e0
  [] __sock_create+0x8c/0x1e0
  [] sock_create_kern+0x27/0x30
  [] icmpv6_init+0x1f/0xa0 [ipv6]
  [] inet6_init+0x13f/0x2f0 [ipv6]
  [] sys_init_module+0x173/0x16c0
  [] autoremove_wake_function+0x0/0x50
  [] sys_read+0x41/0x70
  [] syscall_call+0x7/0xb
  ===
 Code: c0 85 c9 0f 84 12 02 00 00 c7 44 24 18 00 00 00 00 0f bf c6 c1 e0 03 8b 
98 80 2e 84 f8 8d 90 80 2e 84 f8 89 5c 24 1c 8b 44 24 1c <8b> 00 0f 18 00 90 39 
d3 bd a2 ff ff ff 75 36 e9 f3 01 00 00 85 
 EIP: [] inet6_create+0x5f/0x340 [ipv6] SS:ESP 0068:d780de74
 BUG: unable to handle kernel NULL pointer dereference at virtual address 

 printing eip: f881034f *pde =  
 Oops:  [#2] 
 Modules linked in: ipv6
 
 Pid: 17078, comm: thunderbird-bin Tainted: G  D (2.6.24-rc1 #1)
 EIP: 0060:[] EFLAGS: 00210293 CPU: 0
 EIP is at inet6_create+0x5f/0x340 [ipv6]
 EAX:  EBX:  ECX: f7621fd5 EDX: f8842e78
 ESI:  EDI:  EBP: ff9f ESP: c2801f00
  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
 Process thunderbird-bin (pid: 17078, ti=c280 task=c20bf000 
task.ti=c280)
 Stack: c0185024 00200246 00200246 0001 c60e2000 00200246   
f88410fc ffea 0001 c063f680 c028d597 0002 0001 c028d52c 
c60e2000 0001 000a 08b095bc c280 c028d6e9  c2801f74 
 Call Trace:
  [] new_inode+0x24/0x90
  [] __sock_create+0xf7/0x1e0
  [] __sock_create+0x8c/0x1e0
  [] sock_create+0x39/0x50
  [] sys_socket+0x1c/0x50
  [] sys_socketcall+0x68/0x280
  [] trace_hardirqs_on+0xbb/0x160
  [] do_sched_setscheduler+0xad/0xc0
  [] restore_nocheck+0x12/0x15
  [] syscall_call+0x7/0xb
  ===
 Code: c0 85 c9 0f 84 12 02 00 00 c7 44 24 18 00 00 00 00 0f bf c6 c1 e0 03 8b 
98 80 2e 84 f8 8d 90 80 2e 84 f8 89 5c 24 1c 8b 44 24 1c <8b> 00 0f 18 00 90 39 
d3 bd a2 ff ff ff 75 36 e9 f3 01 00 00 85 
 EIP: [] inet6_create+0x5f/0x340 [ipv6] SS:ESP 0068:c2801f00
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-11-01 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/Kconfig |4 +---
 drivers/net/fec_mpc52xx.c   |4 ++--
 drivers/net/myri10ge/myri10ge.c |6 +++---
 3 files changed, 6 insertions(+), 8 deletions(-)

Andrew Gallatin (1):
  Fix myri10ge NAPI oops & warnings

Grant Likely (2):
  mpc5200: Fix Kconfig dependancies on MPC5200 FEC device driver
  Fix region size check in mpc5200 FEC driver

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 867cb73..5f800a6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1883,9 +1883,7 @@ config FEC2
 
 config FEC_MPC52xx
tristate "MPC52xx FEC driver"
-   depends on PPC_MPC52xx
-   select PPC_BESTCOMM
-   select PPC_BESTCOMM_FEC
+   depends on PPC_MERGE && PPC_MPC52xx && PPC_BESTCOMM_FEC
select CRC32
select PHYLIB
---help---
diff --git a/drivers/net/fec_mpc52xx.c b/drivers/net/fec_mpc52xx.c
index fc1cf0b..a8a0ee2 100644
--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -879,9 +879,9 @@ mpc52xx_fec_probe(struct of_device *op, const struct 
of_device_id *match)
"Error while parsing device node resource\n" );
return rv;
}
-   if ((mem.end - mem.start + 1) != sizeof(struct mpc52xx_fec)) {
+   if ((mem.end - mem.start + 1) < sizeof(struct mpc52xx_fec)) {
printk(KERN_ERR DRIVER_NAME
-   " - invalid resource size (%lx != %x), check 
mpc52xx_devices.c\n",
+   " - invalid resource size (%lx < %x), check 
mpc52xx_devices.c\n",
(unsigned long)(mem.end - mem.start + 1), sizeof(struct 
mpc52xx_fec));
return -EINVAL;
}
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 366e62a..0f306dd 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1151,7 +1151,7 @@ static inline int myri10ge_clean_rx_done(struct 
myri10ge_priv *mgp, int budget)
u16 length;
__wsum checksum;
 
-   while (rx_done->entry[idx].length != 0 && work_done++ < budget) {
+   while (rx_done->entry[idx].length != 0 && work_done < budget) {
length = ntohs(rx_done->entry[idx].length);
rx_done->entry[idx].length = 0;
checksum = csum_unfold(rx_done->entry[idx].checksum);
@@ -1167,6 +1167,7 @@ static inline int myri10ge_clean_rx_done(struct 
myri10ge_priv *mgp, int budget)
rx_bytes += rx_ok * (unsigned long)length;
cnt++;
idx = cnt & (myri10ge_max_intr_slots - 1);
+   work_done++;
}
rx_done->idx = idx;
rx_done->cnt = cnt;
@@ -1233,13 +1234,12 @@ static int myri10ge_poll(struct napi_struct *napi, int 
budget)
struct myri10ge_priv *mgp =
container_of(napi, struct myri10ge_priv, napi);
struct net_device *netdev = mgp->dev;
-   struct myri10ge_rx_done *rx_done = &mgp->rx_done;
int work_done;
 
/* process as many rx events as NAPI will allow */
work_done = myri10ge_clean_rx_done(mgp, budget);
 
-   if (rx_done->entry[rx_done->idx].length == 0 || !netif_running(netdev)) 
{
+   if (work_done < budget || !netif_running(netdev)) {
netif_rx_complete(netdev, napi);
put_be32(htonl(3), mgp->irq_claim);
}
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000, e1000e valid-addr fixes

2007-11-01 Thread Jeff Garzik

Stephen Hemminger wrote:

How about:

static int eth_validate_addr(const struct net_device *dev)
{
return is_valid_ether_addr(dev->dev_addr) ? 0 : -EINVAL;
}


hmmm -- its a slow path, so I don't see the value of marking the 
argument 'const' -- right now this implementation merely reads the 
dev->dev_addr[], but that need not always be the case.


And I don't see the value of squashing everything onto one line, IMO the 
current version is more readable.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Eric Dumazet



Rick Jones a écrit :
Something is telling me finding a 64 core system with a suitable 
workload to try this could be a good thing.  Wish I had one at my 
disposal.


Maybe on big NUMA machines, we might prefer to spread the rwlock array on 
multiple nodes (ie using vmalloc() instead of kmalloc())


After my patch, it is true all rwlocks are on one memory node. Not exactly 
optimal :(


But if it was an issue, the spinlock array used in IP route cache would have 
the same problem and *someone* should already have complained...



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9270] New: sunhme requires lower MTU to handle 802.1q frames

2007-11-01 Thread Andrew Morton
On Wed, 31 Oct 2007 16:35:57 -0700 (PDT)
David Miller <[EMAIL PROTECTED]> wrote:

> From: Andrew Morton <[EMAIL PROTECTED]>
> Date: Wed, 31 Oct 2007 15:43:01 -0700
> 
> > > sunhme requires lower MTU to handle 802.1q frames - even though the PCI
> > > driver supported VLAN tagging, you cannot do full MTU @ 1500 because the
> > > driver doesn't set the card to transfer more the extra bytes for a 802.1q
> > > frame at 1500 MTU.
> 
> It supports VLAN tagging by accident, the NETIF_F_VLAN_CHALLENGED
> flag should be set both in the PCI and non-PCI cases.
> 
> Jeff, please apply, thanks:
> 
> [SUNHME]: Fix missing NETIF_F_VLAN_CHALLENGED on PCI happy meals.
> 
> No HME parts can do VLANs correctly.
> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>
> 
> diff --git a/drivers/net/sunhme.c b/drivers/net/sunhme.c
> index 120c8af..c20a3bd 100644
> --- a/drivers/net/sunhme.c
> +++ b/drivers/net/sunhme.c
> @@ -3143,8 +3143,8 @@ static int __devinit happy_meal_pci_probe(struct 
> pci_dev *pdev,
>   dev->irq = pdev->irq;
>   dev->dma = 0;
>  
> - /* Happy Meal can do it all... */
> - dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
> + /* Happy Meal can do it all... except VLAN. */
> + dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_VLAN_CHALLENGED;
>  
>  #if defined(CONFIG_SBUS) && defined(CONFIG_PCI)
>   /* Hook up PCI register/dma accessors. */

I forgot to add my standard "please reply via emailed reply-to-all, not via
the bugzilla web interface", so Chris has gone and attempted to communicate
with us via the bugzilla UI (sigh).

He asked

"Even though it appears to work after I bumped the BMAC_TXMAX / BMAC_RXMAX?"


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Eric Dumazet

Rick Jones a écrit :

Eric Dumazet wrote:

Stephen Hemminger a écrit :


On Thu, 01 Nov 2007 11:16:20 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

As done two years ago on IP route cache table (commit 
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one 
lock per hash bucket for the huge TCP/DCCP hash tables.


The TCP hashes are looked at with higher frequency than the route cache 
yes?


It depends on the workload, but in general I would say the reverse.



On a typical x86_64 platform, this saves about 2MB or 4MB of ram, 
for litle performance differences. (we hit a different cache line 
for the rwlock, but then the bucket cache line have a better sharing 
factor among cpus, since we dirty it less often)


Using a 'small' table of hashed rwlocks should be more than enough 
to provide correct SMP concurrency between different buckets, 
without using too much memory. Sizing of this table depends on 
NR_CPUS and various CONFIG settings.


Something is telling me finding a 64 core system with a suitable 
workload to try this could be a good thing.  Wish I had one at my disposal.


If you find one, please give it to me when you finished playing^Wworking with 
it :)



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Rick Jones

Eric Dumazet wrote:

Stephen Hemminger a écrit :


On Thu, 01 Nov 2007 11:16:20 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

As done two years ago on IP route cache table (commit 
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one 
lock per hash bucket for the huge TCP/DCCP hash tables.


The TCP hashes are looked at with higher frequency than the route cache yes?

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for 
litle performance differences. (we hit a different cache line for the 
rwlock, but then the bucket cache line have a better sharing factor 
among cpus, since we dirty it less often)


Using a 'small' table of hashed rwlocks should be more than enough to 
provide correct SMP concurrency between different buckets, without 
using too much memory. Sizing of this table depends on NR_CPUS and 
various CONFIG settings.


Something is telling me finding a 64 core system with a suitable workload to try 
this could be a good thing.  Wish I had one at my disposal.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000, e1000e valid-addr fixes

2007-11-01 Thread Jeff Garzik

Kok, Auke wrote:

David Miller wrote:

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Tue, 23 Oct 2007 22:20:30 -0400


David Miller wrote:

From: Jeff Garzik <[EMAIL PROTECTED]>
Date: Tue, 23 Oct 2007 21:03:36 -0400


I'm wondering if there is a way to avoid adding

if (!is_valid_ether_addr(dev->dev_addr))
return -EINVAL;

to every ethernet driver's ->open() hook.

The first idea I get is:

1) Create netdev->validate_dev_addr().

2) If it exists, invoke it before ->open(), abort
   and return if any errors signaled.

etherdev init hooks up a function that does the above
check, which allows us to avoid editing every ethernet
driver

What do you think?

Seems sane to me.  Something like this (attached)?

Looks great:

Acked-by: David S. Miller <[EMAIL PROTECTED]>


I like it.

Should I start sending patches to remove the checks from e1000/e1000e/ixgb/ixgbe
already (to David, I assume?)?


Send the patches to me like normal...

Jeff



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [POWERPC] Fix region size check in mpc5200 FEC driver

2007-11-01 Thread Grant Likely
On 11/1/07, Ingo Oeser <[EMAIL PROTECTED]> wrote:
> Hi Grant,
>
> Grant Likely schrieb:
> > From: Grant Likely <[EMAIL PROTECTED]>
> >
> > Driver shouldn't complain if the register range is larger than what
> > it expects.  This works around failures with some device trees.
> >
>
> But maybe the firmware guys like to know about it?
> May I suggest putting this in front of the other check?
>
> if ((mem.end - mem.start + 1) > sizeof(struct mpc52xx_fec)) {
> printk(KERN_DEBUG DRIVER_NAME
> " - gratious resource size (%lx > %x), check 
> mpc52xx_devices.c\n",
> (unsigned long)(mem.end - mem.start + 1), 
> sizeof(struct mpc52xx_fec));

Personally, I'm not concerned about it.  Even if the device tree says
the range is larger than what the driver knows about it is not
technically an error. If a new version of the chip appears that is
compatible, but defines a larger register range with extra feature
registers, then this message would be erroneously printed.  Finally,
depending on how you read the mpc5200 user guild, it can be 100% valid
to specify the reg size as 0x800 instead of 0x400.

Cheers,
g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.
[EMAIL PROTECTED]
(403) 399-0195
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [POWERPC] Fix region size check in mpc5200 FEC driver

2007-11-01 Thread Ingo Oeser
Hi Grant,

Grant Likely schrieb:
> From: Grant Likely <[EMAIL PROTECTED]>
> 
> Driver shouldn't complain if the register range is larger than what
> it expects.  This works around failures with some device trees.
> 

But maybe the firmware guys like to know about it?
May I suggest putting this in front of the other check?

if ((mem.end - mem.start + 1) > sizeof(struct mpc52xx_fec)) {
printk(KERN_DEBUG DRIVER_NAME
" - gratious resource size (%lx > %x), check 
mpc52xx_devices.c\n",
(unsigned long)(mem.end - mem.start + 1), sizeof(struct 
mpc52xx_fec));
}

> - if ((mem.end - mem.start + 1) != sizeof(struct mpc52xx_fec)) {
> + if ((mem.end - mem.start + 1) < sizeof(struct mpc52xx_fec)) {
>   printk(KERN_ERR DRIVER_NAME
> - " - invalid resource size (%lx != %x), check 
> mpc52xx_devices.c\n",
> + " - invalid resource size (%lx < %x), check 
> mpc52xx_devices.c\n",
>   (unsigned long)(mem.end - mem.start + 1), sizeof(struct 
> mpc52xx_fec));
>   return -EINVAL;
>   }


Best Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000, e1000e valid-addr fixes

2007-11-01 Thread Stephen Hemminger
How about:

static int eth_validate_addr(const struct net_device *dev)
{
return is_valid_ether_addr(dev->dev_addr) ? 0 : -EINVAL;
}

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Endianness problem with u32 classifier hash masks

2007-11-01 Thread Radu Rendec
Hi,

While trying to implement u32 hashes in my shaping machine I ran into a
possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).

The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).

I'm not 100% sure this is a problem with u32 itself, but at least I'm
sure u32 with the same configuration would behave differently on little
endian and big endian machines. Detailed description of the problem and
proposed patch follow.

Let's say that I would like to use 0x3fc0 as the hash mask. This means 8
contiguous "1" bits starting at b6. With such a mask, the expected (and
logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.

This is exactly what would happen on a big endian machine, but on little
endian machines, what would actually happen with current implementation
is 0x3fc0 being reversed (into 0xc03f) by htonl() in the userspace
tool and then applied to 192.168.x.x in the u32 classifier. When
shifting right by 16 bits (rank of first "1" bit in the reversed mask)
and applying the divisor mask (0xff for divisor 256), what would
actually remain is 0x3f applied on the "168" octet of the address.

One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc). But the actual problem is the network address (inside the
packet) not being converted to host order, but used as a host-order
value when computing the bucket.

Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes
n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8
would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26
would no longer be consecutive.

My approach to this issue was keeping the hash mask in host order and
converting the octets in the packet to host order before applying the
mask. This proved to work just fine on my little endian machine, but I'm
interested in finding out (from you) if this really is an issue with u32
itself.

My changes to the u32 classifier are attached below as a patch. It was
made against 2.6.22.9, but applies cleanly on Dave Miller's net-2.6
tree.

The idea behind my changes is to keep the user space tool intact and
work everything out in kernel space (because converting the packet
octets to host order must be done in kernel anyway).

Therefore, hash masks are converted back to host order when a selector
is configured - in u32_change() - and converted to network order
(because userspace tools expect to get them in network order from the
kernel) when a selector is dumped - in u32_dump().

I would like at least to know your opinion about this issue.

Thanks,

Radu Rendec

--- linux-2.6.22.9/net/sched/cls_u32.c.orig 2007-10-30 17:08:03.0 
+0200
+++ linux-2.6.22.9/net/sched/cls_u32.c  2007-10-30 17:04:49.0 +0200
@@ -198,7 +198,7 @@
ht = n->ht_down;
sel = 0;
if (ht->divisor)
-   sel = 
ht->divisor&u32_hash_fold(*(u32*)(ptr+n->sel.hoff), &n->sel,n->fshift);
+   sel = 
ht->divisor&u32_hash_fold(ntohl(*(u32*)(ptr+n->sel.hoff)), &n->sel,n->fshift);
 
if (!(n->sel.flags&(TC_U32_VAROFFSET|TC_U32_OFFSET|TC_U32_EAT)))
goto next_ht;
@@ -626,6 +626,10 @@
}
 #endif
 
+   /* userspace tc tool sends us the hmask in network order, but we
+* need host order, so change it here */
+   s->hmask = ntohl(s->hmask);
+
memcpy(&n->sel, s, sizeof(*s) + s->nkeys*sizeof(struct tc_u32_key));
n->ht_up = ht;
n->handle = handle;
@@ -735,9 +739,14 @@
u32 divisor = ht->divisor+1;
RTA_PUT(skb, TCA_U32_DIVISOR, 4, &divisor);
} else {
+   /* get the address where the selector will be put, then
+* change the hmask after it is put there */
+   struct tc_u32_sel *s =
+   (struct tc_u32_sel *)RTA_DATA(skb_tail_pointer(skb));
RTA_PUT(skb, TCA_U32_SEL,
sizeof(n->sel) + n->sel.nkeys*sizeof(struct tc_u32_key),
&n->sel);
+   s->hmask = htonl(s->hmask);
if (n->ht_up) {
u32 htid = n->handle & 0xF000;
RTA_PUT(skb, TCA_U32_HASH, 4, &htid);


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000, e1000e valid-addr fixes

2007-11-01 Thread Kok, Auke
David Miller wrote:
> From: Jeff Garzik <[EMAIL PROTECTED]>
> Date: Tue, 23 Oct 2007 22:20:30 -0400
> 
>> David Miller wrote:
>>> From: Jeff Garzik <[EMAIL PROTECTED]>
>>> Date: Tue, 23 Oct 2007 21:03:36 -0400
>>>
 I'm wondering if there is a way to avoid adding

if (!is_valid_ether_addr(dev->dev_addr))
return -EINVAL;

 to every ethernet driver's ->open() hook.
>>> The first idea I get is:
>>>
>>> 1) Create netdev->validate_dev_addr().
>>>
>>> 2) If it exists, invoke it before ->open(), abort
>>>and return if any errors signaled.
>>>
>>> etherdev init hooks up a function that does the above
>>> check, which allows us to avoid editing every ethernet
>>> driver
>>>
>>> What do you think?
>> Seems sane to me.  Something like this (attached)?
> 
> Looks great:
> 
> Acked-by: David S. Miller <[EMAIL PROTECTED]>

I like it.

Should I start sending patches to remove the checks from e1000/e1000e/ixgb/ixgbe
already (to David, I assume?)?

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Eric Dumazet

Jarek Poplawski a écrit :

Hi,

A few doubts below: 

 
+#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING)


Probably "|| defined(CONFIG_DEBUG_SPINLOCK)" is needed here.


Not sure, because DEBUG_SPINLOCK only applies to spinlocks.
Here we deal with rwlocks.




+/*
+ * Instead of using one rwlock for each inet_ehash_bucket, we use a table of 
locks
+ * The size of this table is a power of two and depends on the number of CPUS.
+ */
+# if defined(CONFIG_DEBUG_LOCK_ALLOC)
+#  define EHASH_LOCK_SZ 256
+# elif NR_CPUS >= 32
+#  define EHASH_LOCK_SZ4096
+# elif NR_CPUS >= 16
+#  define EHASH_LOCK_SZ2048
+# elif NR_CPUS >= 8
+#  define EHASH_LOCK_SZ1024
+# elif NR_CPUS >= 4
+#  define EHASH_LOCK_SZ512
+# else
+#  define EHASH_LOCK_SZ256
+# endif
+#else
+# define EHASH_LOCK_SZ 0
+#endif
+


Looks hackish: usually DEBUG code checks "real" environment, and here it's
a special case. But omitting locks if no SMP or DEBUG is strange. IMHO,
there should be 1 instead of 0.


It is 0 so that no alloc is done. (see your next questions)





 struct inet_hashinfo {
/* This is for sockets with full identity only.  Sockets here will
 * always be without wildcards and will have the following invariant:
@@ -100,6 +121,7 @@ struct inet_hashinfo {
 * TIME_WAIT sockets use a separate chain (twchain).
 */
struct inet_ehash_bucket*ehash;
+   rwlock_t*ehash_locks;
 
 	/* Ok, let's try this, I give up, we do need a local binding

 * TCP hash as well as the others for fast bind/connect.
@@ -134,6 +156,13 @@ static inline struct inet_ehash_bucket *inet_ehash_bucket(
return &hashinfo->ehash[hash & (hashinfo->ehash_size - 1)];
 }
 
+static inline rwlock_t *inet_ehash_lockp(

+   struct inet_hashinfo *hashinfo,
+   unsigned int hash)
+{
+   return &hashinfo->ehash_locks[hash & (EHASH_LOCK_SZ - 1)];
+}
+


Is it OK for EHASH_LOCK_SZ == 0?


At least, compiled tested and booted on UP ;)



...

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index d849739..3b5f97a 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1072,11 +1072,18 @@ static int __init dccp_init(void)
}
 
 	for (i = 0; i < dccp_hashinfo.ehash_size; i++) {

-   rwlock_init(&dccp_hashinfo.ehash[i].lock);
INIT_HLIST_HEAD(&dccp_hashinfo.ehash[i].chain);
INIT_HLIST_HEAD(&dccp_hashinfo.ehash[i].twchain);
}
-
+   if (EHASH_LOCK_SZ) {


Why not #ifdef then? But, IMHO, rwlock_init() should be done at least
once here. (Similarly later for tcp.)


well, #ifdef are not so nice :)




+   dccp_hashinfo.ehash_locks =
+   kmalloc(EHASH_LOCK_SZ * sizeof(rwlock_t),
+   GFP_KERNEL);
+   if (!dccp_hashinfo.ehash_locks)
+   goto out_free_dccp_ehash;
+   for (i = 0; i < EHASH_LOCK_SZ; i++)
+   rwlock_init(&dccp_hashinfo.ehash_locks[i]);
+   }
bhash_order = ehash_order;
 
 	do {

@@ -1091,7 +1098,7 @@ static int __init dccp_init(void)
 
 	if (!dccp_hashinfo.bhash) {

DCCP_CRIT("Failed to allocate DCCP bind hash table");
-   goto out_free_dccp_ehash;
+   goto out_free_dccp_locks;
}
 
 	for (i = 0; i < dccp_hashinfo.bhash_size; i++) {

@@ -1121,6 +1128,9 @@ out_free_dccp_mib:
 out_free_dccp_bhash:
free_pages((unsigned long)dccp_hashinfo.bhash, bhash_order);
dccp_hashinfo.bhash = NULL;
+out_free_dccp_locks:
+   kfree(dccp_hashinfo.ehash_locks);
+   dccp_hashinfo.ehash_locks = NULL;
 out_free_dccp_ehash:
free_pages((unsigned long)dccp_hashinfo.ehash, ehash_order);
dccp_hashinfo.ehash = NULL;


Isn't such kfree(dccp_hashinfo.ehash_locks) needed in dccp_fini()?



Probably ! Thank you for reviewing !

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Eric Dumazet

Stephen Hemminger a écrit :

On Thu, 01 Nov 2007 11:16:20 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

As done two years ago on IP route cache table (commit 
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
hash bucket for the huge TCP/DCCP hash tables.


On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
performance differences. (we hit a different cache line for the rwlock, but 
then the bucket cache line have a better sharing factor among cpus, since we 
dirty it less often)


Using a 'small' table of hashed rwlocks should be more than enough to provide 
correct SMP concurrency between different buckets, without using too much 
memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.


This patch provides some locking abstraction that may ease a future work using 
  a different model for TCP/DCCP table.


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

  include/net/inet_hashtables.h |   40 
  net/dccp/proto.c  |   16 ++--
  net/ipv4/inet_diag.c  |9 ---
  net/ipv4/inet_hashtables.c|7 +++--
  net/ipv4/inet_timewait_sock.c |   13 +-
  net/ipv4/tcp.c|   11 +++-
  net/ipv4/tcp_ipv4.c   |   11 
  net/ipv6/inet6_hashtables.c   |   19 ---
  8 files changed, 89 insertions(+), 37 deletions(-)



Longterm is there any chance of using rcu for this? Seems like
it could be a big win.



This was discussed in the past, and I even believe some patch was proposed, 
but some guys (including David) complained that RCU is well suited for 'mostly 
 read' structures.


On some web server workloads, TCP hash table is constantly accessed in write 
mode (socket creation, socket move to timewait state, socket  deleted...), and 
RCU added overhead and poor cache re-use (because sockets must be placed on 
RCU queue before reuse)


On these typical workload, hash table without RCU is still the best.

Longterm changes would rather be based on Robert Olsson suggestion last year 
(trie based lookups and unified IP/TCP cache)


Short term changes would be to be able to resize the TCP hash table (being 
small at boot, and be able to grow it if necessary). Its current size on 
modern machines is just insane.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] - e1000e/ethtool.c - convert macros to functions

2007-11-01 Thread Kok, Auke
Joe Perches wrote:
> Add functions for reg_pattern_test and reg_set_and check
> Changed macros to use these functions
> 
> Compiled x86, untested
> 
> Size decreased ~2K
> 
> old:
> 
> $ size drivers/net/e1000e/ethtool.o
>textdata bss dec hex filename
>   14461   0   0   14461387d drivers/net/e1000e/ethtool.o
> 
> new:
> 
> $ size drivers/net/e1000e/ethtool.o
>textdata bss dec hex filename
>   12498   0   0   1249830d2 drivers/net/e1000e/ethtool.o
> 
> 
> Signed-off-by: Joe Perches <[EMAIL PROTECTED]>

ok, this looks great.

I'm going to have this tested a bit before I pass it on to Jeff Garzik.

Thanks!

Auke


> 
> ---
> 
>  drivers/net/e1000e/ethtool.c |   78 +
>  1 files changed, 47 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/net/e1000e/ethtool.c b/drivers/net/e1000e/ethtool.c
> index 6a39784..225db17 100644
> --- a/drivers/net/e1000e/ethtool.c
> +++ b/drivers/net/e1000e/ethtool.c
> @@ -691,41 +691,57 @@ err_setup:
>   return err;
>  }
>  
> -#define REG_PATTERN_TEST(R, M, W) REG_PATTERN_TEST_ARRAY(R, 0, M, W)
> -#define REG_PATTERN_TEST_ARRAY(reg, offset, mask, writeable)   \
> -{  \
> - u32 _pat; \
> - u32 _value;   \
> - u32 _test[] = {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};   \
> - for (_pat = 0; _pat < ARRAY_SIZE(_test); _pat++) {\
> - E1000_WRITE_REG_ARRAY(hw, reg, offset,\
> -   (_test[_pat] & writeable)); \
> - _value = E1000_READ_REG_ARRAY(hw, reg, offset); \
> - if (_value != (_test[_pat] & writeable & mask)) { \
> - ndev_err(netdev, "pattern test reg %04X " \
> -  "failed: got 0x%08X expected 0x%08X\n",  \
> -  reg + offset,  \
> -  value, (_test[_pat] & writeable & mask));\
> - *data = reg;  \
> - return 1; \
> - } \
> - } \
> +bool reg_pattern_test_array(struct e1000_adapter *adapter, u64 *data, 
> + int reg, int offset, u32 mask, u32 write)
> +{
> + int i;
> + u32 read;
> + static const u32 test[] = 
> + {0x5A5A5A5A, 0xA5A5A5A5, 0x, 0x};
> + for (i = 0; i < ARRAY_SIZE(test); i++) {
> + E1000_WRITE_REG_ARRAY(&adapter->hw, reg, offset,
> +   (test[i] & write));
> + read = E1000_READ_REG_ARRAY(&adapter->hw, reg, offset);
> + if (read != (test[i] & write & mask)) {
> + ndev_err(adapter->netdev, "pattern test reg %04X "
> +  "failed: got 0x%08X expected 0x%08X\n",
> +  reg + offset,
> +  read, (test[i] & write & mask));
> + *data = reg;
> + return true;
> + }
> + }
> + return false;
>  }
>  
> -#define REG_SET_AND_CHECK(R, M, W) \
> -{  \
> - u32 _value;   \
> - __ew32(hw, R, W & M);   \
> - _value = __er32(hw, R); \
> - if ((W & M) != (_value & M)) {\
> - ndev_err(netdev, "set/check reg %04X test failed: "   \
> -  "got 0x%08X expected 0x%08X\n", R, (_value & M), \
> -  (W & M));\
> - *data = R;\
> - return 1; \
> - } \
> +#define REG_PATTERN_TEST(R, M, W) \
> + if (reg_pattern_test_array(adapter, data, R, 0, M, W)) \
> + return 1;
> +
> +#define REG_PATTERN_TEST_ARRAY(R, offset, M, W) \
> + if (reg_pattern_test_array(adapter, data, R, offset, M, W)) \
> + return 1;
> +
> +static bool reg_set_and_check(struct e1000_adapter *adapter, u64 *data,
> +   int reg, u32 mask, u32 write)
> +{
> + u32 read;
> + __ew32(&adapter->hw, reg, write & mask);
> + read = __er32(&adapter->hw, reg);
> +

Re: [PATCH] decnet: "addr" module param can't be __initdata

2007-11-01 Thread Steven Whitehouse
Hi,

Looks good, Feel free to add:

Acked-by: Steven Whitehouse <[EMAIL PROTECTED]>

Steve.

On Thu, Nov 01, 2007 at 06:36:29PM +0300, Alexey Dobriyan wrote:
> sysfs keeps references to module parameters via /sys/module/*/parameters,
> so marking them as __initdata can't work.
> 
> Steps to reproduce:
> 
>   modprobe decnet
>   cat /sys/module/decnet/parameters/addr
> 
> BUG: unable to handle kernel paging request at virtual address f88cd410
> printing eip: c043dfd1 *pdpt = 4001 *pde = 04408067 *pte 
> =  
> Oops:  [#1] PREEMPT SMP 
> Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror 
> dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply 
> parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom 
> shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
> Pid: 2099, comm: cat Not tainted 
> (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6)
> EIP: 0060:[] EFLAGS: 00210286 CPU: 1
> EIP is at param_get_int+0x6/0x20
> EAX: c5c87000 EBX:  ECX: 80d0 EDX: f88cd410
> ESI: f8a108f8 EDI: c5c87000 EBP:  ESP: c5c97f00
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
> Stack:  f8a108f8 c5c87000 c043db6b f8a108f1 0124 c043de1a 
> c043db2f 
>f88cd410  c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 
> c5dd5078 
>c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 
> 1000 
> Call Trace:
>  [] param_array_get+0x3c/0x62
>  [] param_array_set+0x0/0xdf
>  [] param_array_get+0x0/0x62
>  [] param_attr_show+0x15/0x2d
>  [] param_attr_show+0x0/0x2d
>  [] module_attr_show+0x1a/0x1e
>  [] sysfs_read_file+0x7c/0xd9
>  [] sysfs_read_file+0x0/0xd9
>  [] vfs_read+0x88/0x134
>  [] do_page_fault+0x0/0x7d5
>  [] sys_read+0x41/0x67
>  [] sysenter_past_esp+0x6b/0xc1
>  ===
> Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 
> 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 <8b> 12 c7 44 24 04 
> 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c 
> EIP: [] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00
> 
> Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
> ---
> 
>  net/decnet/dn_dev.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/net/decnet/dn_dev.c
> +++ b/net/decnet/dn_dev.c
> @@ -1439,7 +1439,7 @@ static const struct file_operations dn_dev_seq_fops = {
>  
>  #endif /* CONFIG_PROC_FS */
>  
> -static int __initdata addr[2];
> +static int addr[2];
>  module_param_array(addr, int, NULL, 0444);
>  MODULE_PARM_DESC(addr, "The DECnet address of this machine: area,node");
>  
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clean the ip_sockglue.c from some ugly ifdefs

2007-11-01 Thread Pavel Emelyanov
Arnaldo Carvalho de Melo wrote:
> Em Thu, Nov 01, 2007 at 06:52:34PM +0300, Pavel Emelyanov escreveu:
>> The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
>> which looks completely bad. Similar ifdefs inside the functions
>> looks a bit better, but they are also not recommended to be used.
>>
>> Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
>>
>> Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>
> 
> Perhaps a better name would be ip_mroute_valid_opt()?

No :) The _valid_ mrote opts are from 0 to 8, according to MRT_XXX
macros, not from 0 to 10 as checked. I suspect this was a kind of 
reserve for future use and thus do not change this. 

Correct me if I am wrong.

> - Arnaldo

Thanks,
Pavel
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clean the ip_sockglue.c from some ugly ifdefs

2007-11-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2007 at 06:52:34PM +0300, Pavel Emelyanov escreveu:
> The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
> which looks completely bad. Similar ifdefs inside the functions
> looks a bit better, but they are also not recommended to be used.
> 
> Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
> 
> Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

Perhaps a better name would be ip_mroute_valid_opt()?

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: af_packet.c flush_dcache_page

2007-11-01 Thread Evgeniy Polyakov
On Thu, Nov 01, 2007 at 05:10:32PM +0100, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
> David Miller wrote:
> >Instead of answering your questions, I'm going to show you
> >how to avoid having to do any of this cache flushing crap :-)
> >
> >You can avoid having to flush anything as long as the virtual
> >addresses on the kernel side are modulo SHMLBA the virtual addresses
> >on the userland side.
> >
> >We have some (decidedly awkward) mechanisms to try and achieve
> >this in the kernel, but they are cumbersome and not air tight.
> >
> >Instead, I would recommend simply that you access the ring
> >buffer directly in userspace.  This avoids all of the cache
> >aliasing issues.
> >
> >Yes, this means you have to do the ring buffer accesses in
> >the context of the user, but it simplifies so much that I think
> >it'd be worth it.
> 
> 
> I'm probably misunderstanding your suggestion because of my
> limited mm knowledge, are you suggesting to do something like
> this:
> 
> setsockopt(RX_RING, ...):
> 
> Allocate ring using get_user_pages, return address to user
> 
> tpacket_rcv/netlink_unicast/netlink_broadcast:
> 
> for each receiver:
>   switch_mm(...)
>   copy data to ring
> 
> switch_mm(original mm)
> 
> Would this work in softirq context?

IIRC it requires disabled interrupts.

Probably David suggests to provide a pointer to allocated
in userspace buffer and use copy_to_user() and friends.

> >Another option is to use the "copy_to_user_page()" and
> >"copy_from_user_page()" interfaces which will do all of
> >the necessary cache flushing for you.
> >
> >Actually it might be nice to convert AF_PACKET's mmap() code
> >over to using those things.
> 
> 
> That would also require to do the copy in the context of
> the user, right?

Most of the time it is possible to call copy_to_user() in atomic
context, but it can fail, in which case some additional mechanism to
make a copy should be invented (workqueue, kthread, whatever you like :)

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Stephen Hemminger
On Thu, 01 Nov 2007 11:16:20 +0100
Eric Dumazet <[EMAIL PROTECTED]> wrote:

> As done two years ago on IP route cache table (commit 
> 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
> hash bucket for the huge TCP/DCCP hash tables.
> 
> On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
> performance differences. (we hit a different cache line for the rwlock, but 
> then the bucket cache line have a better sharing factor among cpus, since we 
> dirty it less often)
> 
> Using a 'small' table of hashed rwlocks should be more than enough to provide 
> correct SMP concurrency between different buckets, without using too much 
> memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.
> 
> This patch provides some locking abstraction that may ease a future work 
> using 
>   a different model for TCP/DCCP table.
> 
> Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
> 
>   include/net/inet_hashtables.h |   40 
>   net/dccp/proto.c  |   16 ++--
>   net/ipv4/inet_diag.c  |9 ---
>   net/ipv4/inet_hashtables.c|7 +++--
>   net/ipv4/inet_timewait_sock.c |   13 +-
>   net/ipv4/tcp.c|   11 +++-
>   net/ipv4/tcp_ipv4.c   |   11 
>   net/ipv6/inet6_hashtables.c   |   19 ---
>   8 files changed, 89 insertions(+), 37 deletions(-)
> 

Longterm is there any chance of using rcu for this? Seems like
it could be a big win.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: af_packet.c flush_dcache_page

2007-11-01 Thread Patrick McHardy

David Miller wrote:

Instead of answering your questions, I'm going to show you
how to avoid having to do any of this cache flushing crap :-)

You can avoid having to flush anything as long as the virtual
addresses on the kernel side are modulo SHMLBA the virtual addresses
on the userland side.

We have some (decidedly awkward) mechanisms to try and achieve
this in the kernel, but they are cumbersome and not air tight.

Instead, I would recommend simply that you access the ring
buffer directly in userspace.  This avoids all of the cache
aliasing issues.

Yes, this means you have to do the ring buffer accesses in
the context of the user, but it simplifies so much that I think
it'd be worth it.



I'm probably misunderstanding your suggestion because of my
limited mm knowledge, are you suggesting to do something like
this:

setsockopt(RX_RING, ...):

Allocate ring using get_user_pages, return address to user

tpacket_rcv/netlink_unicast/netlink_broadcast:

for each receiver:
switch_mm(...)
copy data to ring

switch_mm(original mm)

Would this work in softirq context?


Another option is to use the "copy_to_user_page()" and
"copy_from_user_page()" interfaces which will do all of
the necessary cache flushing for you.

Actually it might be nice to convert AF_PACKET's mmap() code
over to using those things.



That would also require to do the copy in the context of
the user, right?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Jarek Poplawski
Hi,

A few doubts below: 

Eric Dumazet wrote:
> As done two years ago on IP route cache table (commit 
> 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
> hash bucket for the huge TCP/DCCP hash tables.
...
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 4427dcd..5cbfbac 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -37,7 +37,6 @@
>   * I'll experiment with dynamic table growth later.
>   */
>  struct inet_ehash_bucket {
> - rwlock_t  lock;
>   struct hlist_head chain;
>   struct hlist_head twchain;
>  };
> @@ -91,6 +90,28 @@ struct inet_bind_hashbucket {
>  /* This is for listening sockets, thus all sockets which possess wildcards. 
> */
>  #define INET_LHTABLE_SIZE32  /* Yes, really, this is all you need. */
>  
> +#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING)

Probably "|| defined(CONFIG_DEBUG_SPINLOCK)" is needed here.

> +/*
> + * Instead of using one rwlock for each inet_ehash_bucket, we use a table of 
> locks
> + * The size of this table is a power of two and depends on the number of 
> CPUS.
> + */
> +# if defined(CONFIG_DEBUG_LOCK_ALLOC)
> +#  define EHASH_LOCK_SZ 256
> +# elif NR_CPUS >= 32
> +#  define EHASH_LOCK_SZ  4096
> +# elif NR_CPUS >= 16
> +#  define EHASH_LOCK_SZ  2048
> +# elif NR_CPUS >= 8
> +#  define EHASH_LOCK_SZ  1024
> +# elif NR_CPUS >= 4
> +#  define EHASH_LOCK_SZ  512
> +# else
> +#  define EHASH_LOCK_SZ  256
> +# endif
> +#else
> +# define EHASH_LOCK_SZ 0
> +#endif
> +

Looks hackish: usually DEBUG code checks "real" environment, and here it's
a special case. But omitting locks if no SMP or DEBUG is strange. IMHO,
there should be 1 instead of 0.

>  struct inet_hashinfo {
>   /* This is for sockets with full identity only.  Sockets here will
>* always be without wildcards and will have the following invariant:
> @@ -100,6 +121,7 @@ struct inet_hashinfo {
>* TIME_WAIT sockets use a separate chain (twchain).
>*/
>   struct inet_ehash_bucket*ehash;
> + rwlock_t*ehash_locks;
>  
>   /* Ok, let's try this, I give up, we do need a local binding
>* TCP hash as well as the others for fast bind/connect.
> @@ -134,6 +156,13 @@ static inline struct inet_ehash_bucket 
> *inet_ehash_bucket(
>   return &hashinfo->ehash[hash & (hashinfo->ehash_size - 1)];
>  }
>  
> +static inline rwlock_t *inet_ehash_lockp(
> + struct inet_hashinfo *hashinfo,
> + unsigned int hash)
> +{
> + return &hashinfo->ehash_locks[hash & (EHASH_LOCK_SZ - 1)];
> +}
> +

Is it OK for EHASH_LOCK_SZ == 0?

...
> diff --git a/net/dccp/proto.c b/net/dccp/proto.c
> index d849739..3b5f97a 100644
> --- a/net/dccp/proto.c
> +++ b/net/dccp/proto.c
> @@ -1072,11 +1072,18 @@ static int __init dccp_init(void)
>   }
>  
>   for (i = 0; i < dccp_hashinfo.ehash_size; i++) {
> - rwlock_init(&dccp_hashinfo.ehash[i].lock);
>   INIT_HLIST_HEAD(&dccp_hashinfo.ehash[i].chain);
>   INIT_HLIST_HEAD(&dccp_hashinfo.ehash[i].twchain);
>   }
> -
> + if (EHASH_LOCK_SZ) {

Why not #ifdef then? But, IMHO, rwlock_init() should be done at least
once here. (Similarly later for tcp.)

> + dccp_hashinfo.ehash_locks =
> + kmalloc(EHASH_LOCK_SZ * sizeof(rwlock_t),
> + GFP_KERNEL);
> + if (!dccp_hashinfo.ehash_locks)
> + goto out_free_dccp_ehash;
> + for (i = 0; i < EHASH_LOCK_SZ; i++)
> + rwlock_init(&dccp_hashinfo.ehash_locks[i]);
> + }
>   bhash_order = ehash_order;
>  
>   do {
> @@ -1091,7 +1098,7 @@ static int __init dccp_init(void)
>  
>   if (!dccp_hashinfo.bhash) {
>   DCCP_CRIT("Failed to allocate DCCP bind hash table");
> - goto out_free_dccp_ehash;
> + goto out_free_dccp_locks;
>   }
>  
>   for (i = 0; i < dccp_hashinfo.bhash_size; i++) {
> @@ -1121,6 +1128,9 @@ out_free_dccp_mib:
>  out_free_dccp_bhash:
>   free_pages((unsigned long)dccp_hashinfo.bhash, bhash_order);
>   dccp_hashinfo.bhash = NULL;
> +out_free_dccp_locks:
> + kfree(dccp_hashinfo.ehash_locks);
> + dccp_hashinfo.ehash_locks = NULL;
>  out_free_dccp_ehash:
>   free_pages((unsigned long)dccp_hashinfo.ehash, ehash_order);
>   dccp_hashinfo.ehash = NULL;

Isn't such kfree(dccp_hashinfo.ehash_locks) needed in dccp_fini()?

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2][NETFILTER] Use the list_for_each_entry in nf_sockopt.c

2007-11-01 Thread Pavel Emelyanov
The list_head pointer, used to iterate over the list, is not used
at all, but to get the struct nf_sockopt_ops pointer (and actually
not in the 100% clean way).

So use the list_for_each_entry, removing one unneeded variable
from each place of use.

Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

---

diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index a5e5e30..87bc144 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -23,14 +23,13 @@ static inline int overlap(int min1, int max1, int min2, int 
max2)
 /* Functions to register sockopt ranges (exclusive). */
 int nf_register_sockopt(struct nf_sockopt_ops *reg)
 {
-   struct list_head *i;
+   struct nf_sockopt_ops *ops;
int ret = 0;
 
if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0)
return -EINTR;
 
-   list_for_each(i, &nf_sockopts) {
-   struct nf_sockopt_ops *ops = (struct nf_sockopt_ops *)i;
+   list_for_each_entry(ops, &nf_sockopts, list) {
if (ops->pf == reg->pf
&& (overlap(ops->set_optmin, ops->set_optmax,
reg->set_optmin, reg->set_optmax)
@@ -64,7 +63,6 @@ EXPORT_SYMBOL(nf_unregister_sockopt);
 static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, int pf,
int val, int get)
 {
-   struct list_head *i;
struct nf_sockopt_ops *ops;
 
if (sk->sk_net != &init_net)
@@ -73,8 +71,7 @@ static struct nf_sockopt_ops *nf_sockopt_find(struct sock 
*sk, int pf,
if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0)
return ERR_PTR(-EINTR);
 
-   list_for_each(i, &nf_sockopts) {
-   ops = (struct nf_sockopt_ops *)i;
+   list_for_each_entry(ops, &nf_sockopts, list) {
if (ops->pf == pf) {
if (!try_module_get(ops->owner))
goto out_nosup;
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2][NETFILTER] Consolidate nf_sockopt and compat_nf_sockopt

2007-11-01 Thread Pavel Emelyanov
Both lookup the nf_sockopt_ops object to call the get/set callbacks
from, but they perform it in a completely similar way.

Introduce the helper for finding the ops.

Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

---

diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index aa28315..a5e5e30 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -61,48 +61,59 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg)
 }
 EXPORT_SYMBOL(nf_unregister_sockopt);
 
-/* Call get/setsockopt() */
-static int nf_sockopt(struct sock *sk, int pf, int val,
- char __user *opt, int *len, int get)
+static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, int pf,
+   int val, int get)
 {
struct list_head *i;
struct nf_sockopt_ops *ops;
-   int ret;
 
if (sk->sk_net != &init_net)
-   return -ENOPROTOOPT;
+   return ERR_PTR(-ENOPROTOOPT);
 
if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0)
-   return -EINTR;
+   return ERR_PTR(-EINTR);
 
list_for_each(i, &nf_sockopts) {
ops = (struct nf_sockopt_ops *)i;
if (ops->pf == pf) {
if (!try_module_get(ops->owner))
goto out_nosup;
+
if (get) {
-   if (val >= ops->get_optmin
-   && val < ops->get_optmax) {
-   mutex_unlock(&nf_sockopt_mutex);
-   ret = ops->get(sk, val, opt, len);
+   if (val >= ops->get_optmin &&
+   val < ops->get_optmax)
goto out;
-   }
} else {
-   if (val >= ops->set_optmin
-   && val < ops->set_optmax) {
-   mutex_unlock(&nf_sockopt_mutex);
-   ret = ops->set(sk, val, opt, *len);
+   if (val >= ops->set_optmin &&
+   val < ops->set_optmax)
goto out;
-   }
}
module_put(ops->owner);
}
}
- out_nosup:
+out_nosup:
+   ops = ERR_PTR(-ENOPROTOOPT);
+out:
mutex_unlock(&nf_sockopt_mutex);
-   return -ENOPROTOOPT;
+   return ops;
+}
+
+/* Call get/setsockopt() */
+static int nf_sockopt(struct sock *sk, int pf, int val,
+ char __user *opt, int *len, int get)
+{
+   struct nf_sockopt_ops *ops;
+   int ret;
+
+   ops = nf_sockopt_find(sk, pf, val, get);
+   if (IS_ERR(ops))
+   return PTR_ERR(ops);
+
+   if (get)
+   ret = ops->get(sk, val, opt, len);
+   else
+   ret = ops->set(sk, val, opt, *len);
 
- out:
module_put(ops->owner);
return ret;
 }
@@ -124,56 +135,25 @@ EXPORT_SYMBOL(nf_getsockopt);
 static int compat_nf_sockopt(struct sock *sk, int pf, int val,
 char __user *opt, int *len, int get)
 {
-   struct list_head *i;
struct nf_sockopt_ops *ops;
int ret;
 
-   if (sk->sk_net != &init_net)
-   return -ENOPROTOOPT;
-
-
-   if (mutex_lock_interruptible(&nf_sockopt_mutex) != 0)
-   return -EINTR;
-
-   list_for_each(i, &nf_sockopts) {
-   ops = (struct nf_sockopt_ops *)i;
-   if (ops->pf == pf) {
-   if (!try_module_get(ops->owner))
-   goto out_nosup;
-
-   if (get) {
-   if (val >= ops->get_optmin
-   && val < ops->get_optmax) {
-   mutex_unlock(&nf_sockopt_mutex);
-   if (ops->compat_get)
-   ret = ops->compat_get(sk,
-   val, opt, len);
-   else
-   ret = ops->get(sk,
-   val, opt, len);
-   goto out;
-   }
-   } else {
-   if (val >= ops->set_optmin
-   && val < ops->set_optmax) {
-   mutex_unlock(&nf_sockopt_mutex);
-   if (ops->compat_set)
-   ret = ops->compat_set(sk,
-   val, opt, *len);
- 

[PATCH] Clean the ip_sockglue.c from some ugly ifdefs

2007-11-01 Thread Pavel Emelyanov
The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
which looks completely bad. Similar ifdefs inside the functions
looks a bit better, but they are also not recommended to be used.

Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.

Signed-off-by: Pavel Emelyanov <[EMAIL PROTECTED]>

---

diff --git a/include/linux/mroute.h b/include/linux/mroute.h
index 7da2cee..200fbb2 100644
--- a/include/linux/mroute.h
+++ b/include/linux/mroute.h
@@ -128,6 +128,18 @@ struct igmpmsg
 #ifdef __KERNEL__
 #include 
 
+#ifdef CONFIG_IP_MROUTE
+static inline int ip_mroute_opt(int opt)
+{
+   return (opt >= MRT_BASE) && (op <= MRT_BASE + 10);
+}
+#else
+static inline int ip_mroute_opt(int opt)
+{
+   return 0;
+}
+#endif
+
 extern int ip_mroute_setsockopt(struct sock *, int, char __user *, int);
 extern int ip_mroute_getsockopt(struct sock *, int, char __user *, int __user 
*);
 extern int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index f51f20e..82817e5 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -437,10 +437,8 @@ static int do_ip_setsockopt(struct sock *sk, int level,
 
/* If optlen==0, it is equivalent to val == 0 */
 
-#ifdef CONFIG_IP_MROUTE
-   if (optname >= MRT_BASE && optname <= (MRT_BASE + 10))
+   if (ip_mroute_opt(optname))
return ip_mroute_setsockopt(sk,optname,optval,optlen);
-#endif
 
err = 0;
lock_sock(sk);
@@ -909,11 +907,9 @@ int ip_setsockopt(struct sock *sk, int level,
 #ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
-   optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY
-#ifdef CONFIG_IP_MROUTE
-   && (optname < MRT_BASE || optname > (MRT_BASE + 10))
-#endif
-  ) {
+   optname != IP_IPSEC_POLICY &&
+   optname != IP_XFRM_POLICY &&
+   !ip_mroute_opt(optname)) {
lock_sock(sk);
err = nf_setsockopt(sk, PF_INET, optname, optval, optlen);
release_sock(sk);
@@ -935,11 +931,9 @@ int compat_ip_setsockopt(struct sock *sk, int level, int 
optname,
 #ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
if (err == -ENOPROTOOPT && optname != IP_HDRINCL &&
-   optname != IP_IPSEC_POLICY && optname != IP_XFRM_POLICY
-#ifdef CONFIG_IP_MROUTE
-   && (optname < MRT_BASE || optname > (MRT_BASE + 10))
-#endif
-  ) {
+   optname != IP_IPSEC_POLICY &&
+   optname != IP_XFRM_POLICY &&
+   !ip_mroute_opt(optname)) {
lock_sock(sk);
err = compat_nf_setsockopt(sk, PF_INET, optname,
   optval, optlen);
@@ -967,11 +961,8 @@ static int do_ip_getsockopt(struct sock *sk, int level, 
int optname,
if (level != SOL_IP)
return -EOPNOTSUPP;
 
-#ifdef CONFIG_IP_MROUTE
-   if (optname >= MRT_BASE && optname <= MRT_BASE+10) {
+   if (ip_mroute_opt(optname))
return ip_mroute_getsockopt(sk,optname,optval,optlen);
-   }
-#endif
 
if (get_user(len,optlen))
return -EFAULT;
@@ -1171,11 +1162,8 @@ int ip_getsockopt(struct sock *sk, int level,
err = do_ip_getsockopt(sk, level, optname, optval, optlen);
 #ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
-   if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS
-#ifdef CONFIG_IP_MROUTE
-   && (optname < MRT_BASE || optname > MRT_BASE+10)
-#endif
-  ) {
+   if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
+   !ip_mroute_opt(optname)) {
int len;
 
if (get_user(len,optlen))
@@ -1200,11 +1188,8 @@ int compat_ip_getsockopt(struct sock *sk, int level, int 
optname,
int err = do_ip_getsockopt(sk, level, optname, optval, optlen);
 #ifdef CONFIG_NETFILTER
/* we need to exclude all possible ENOPROTOOPTs except default case */
-   if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS
-#ifdef CONFIG_IP_MROUTE
-   && (optname < MRT_BASE || optname > MRT_BASE+10)
-#endif
-  ) {
+   if (err == -ENOPROTOOPT && optname != IP_PKTOPTIONS &&
+   !ip_mroute_opt(optname)) {
int len;
 
if (get_user(len, optlen))
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] decnet: "addr" module param can't be __initdata

2007-11-01 Thread Alexey Dobriyan
sysfs keeps references to module parameters via /sys/module/*/parameters,
so marking them as __initdata can't work.

Steps to reproduce:

modprobe decnet
cat /sys/module/decnet/parameters/addr

BUG: unable to handle kernel paging request at virtual address f88cd410
printing eip: c043dfd1 *pdpt = 4001 *pde = 04408067 *pte = 
 
Oops:  [#1] PREEMPT SMP 
Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror 
dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply 
parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom 
shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
Pid: 2099, comm: cat Not tainted 
(2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #6)
EIP: 0060:[] EFLAGS: 00210286 CPU: 1
EIP is at param_get_int+0x6/0x20
EAX: c5c87000 EBX:  ECX: 80d0 EDX: f88cd410
ESI: f8a108f8 EDI: c5c87000 EBP:  ESP: c5c97f00
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
Stack:  f8a108f8 c5c87000 c043db6b f8a108f1 0124 c043de1a c043db2f 
   f88cd410  c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078 
   c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 1000 
Call Trace:
 [] param_array_get+0x3c/0x62
 [] param_array_set+0x0/0xdf
 [] param_array_get+0x0/0x62
 [] param_attr_show+0x15/0x2d
 [] param_attr_show+0x0/0x2d
 [] module_attr_show+0x1a/0x1e
 [] sysfs_read_file+0x7c/0xd9
 [] sysfs_read_file+0x0/0xd9
 [] vfs_read+0x88/0x134
 [] do_page_fault+0x0/0x7d5
 [] sys_read+0x41/0x67
 [] sysenter_past_esp+0x6b/0xc1
 ===
Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 
89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 <8b> 12 c7 44 24 04 58 
8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c 
EIP: [] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 net/decnet/dn_dev.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -1439,7 +1439,7 @@ static const struct file_operations dn_dev_seq_fops = {
 
 #endif /* CONFIG_PROC_FS */
 
-static int __initdata addr[2];
+static int addr[2];
 module_param_array(addr, int, NULL, 0444);
 MODULE_PARM_DESC(addr, "The DECnet address of this machine: area,node");
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6] SNMP: Restore Udp6InErrors incrementation

2007-11-01 Thread Herbert Xu
On Thu, Nov 01, 2007 at 10:46:38PM +0900, Mitsuru Chinen wrote:
> As the checksum verification is postponed till user calls recv or poll,
> the inrementation of Udp6InErrors counter should be also postponed.
> Currently, it is postponed in non-blocking operation case. However it
> should be postponed in all case like the IPv4 code.
> 
> Signed-off-by: Mitsuru Chinen <[EMAIL PROTECTED]>

Looks good to me.  Thanks for catching this!

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: expected behavior of PF_PACKET on NETIF_F_HW_VLAN_RX device?

2007-11-01 Thread Ben Greear

David Miller wrote:

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 31 Oct 2007 18:23:37 -0700

  

The code in AF_PACKET should fix the skb before passing to user
space so that there is no difference between accel and non-accel
hardware.  Internal choices shouldn't leak to user space.  Ditto,
the receive checksum offload should be fixed up as well.



The hardware has stripped the VLAN header completely and has not
provided it to us at all.
  

Do the NICs not save the QoS bits in the VLAN header anywhere that we could
use to reconstitute the header?

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: Add 405EX support to new EMAC driver

2007-11-01 Thread Stefan Roese
This patch adds support for the 405EX to the new EMAC driver.

Tested on AMCC Kilauea.

Signed-off-by: Stefan Roese <[EMAIL PROTECTED]>
---
 drivers/net/ibm_newemac/core.c  |3 ++-
 drivers/net/ibm_newemac/rgmii.c |6 --
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 0de3aa2..fd0a585 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -2466,7 +2466,8 @@ static int __devinit emac_init_config(struct 
emac_instance *dev)
if (of_device_is_compatible(np, "ibm,emac4"))
dev->features |= EMAC_FTR_EMAC4;
if (of_device_is_compatible(np, "ibm,emac-axon")
-   || of_device_is_compatible(np, "ibm,emac-440epx"))
+   || of_device_is_compatible(np, "ibm,emac-440epx")
+   || of_device_is_compatible(np, "ibm,emac-405ex"))
dev->features |= EMAC_FTR_HAS_AXON_STACR
| EMAC_FTR_STACR_OC_INVERT;
if (of_device_is_compatible(np, "ibm,emac-440spe"))
diff --git a/drivers/net/ibm_newemac/rgmii.c b/drivers/net/ibm_newemac/rgmii.c
index de41695..e393f68 100644
--- a/drivers/net/ibm_newemac/rgmii.c
+++ b/drivers/net/ibm_newemac/rgmii.c
@@ -140,9 +140,6 @@ void rgmii_get_mdio(struct of_device *ofdev, int input)
 
RGMII_DBG2(dev, "get_mdio(%d)" NL, input);
 
-   if (dev->type != RGMII_AXON)
-   return;
-
mutex_lock(&dev->lock);
 
fer = in_be32(&p->fer);
@@ -161,9 +158,6 @@ void rgmii_put_mdio(struct of_device *ofdev, int input)
 
RGMII_DBG2(dev, "put_mdio(%d)" NL, input);
 
-   if (dev->type != RGMII_AXON)
-   return;
-
fer = in_be32(&p->fer);
fer &= ~(0x0008u >> input);
out_be32(&p->fer, fer);
-- 
1.5.3.4.498.g9c514

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [POWERPC] Fix region size check in mpc5200 FEC driver

2007-11-01 Thread Grant Likely
From: Grant Likely <[EMAIL PROTECTED]>

Driver shouldn't complain if the register range is larger than what
it expects.  This works around failures with some device trees.

Signed-off-by: Grant Likely <[EMAIL PROTECTED]>
---

 drivers/net/fec_mpc52xx.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/fec_mpc52xx.c b/drivers/net/fec_mpc52xx.c
index fc1cf0b..a8a0ee2 100644
--- a/drivers/net/fec_mpc52xx.c
+++ b/drivers/net/fec_mpc52xx.c
@@ -879,9 +879,9 @@ mpc52xx_fec_probe(struct of_device *op, const struct 
of_device_id *match)
"Error while parsing device node resource\n" );
return rv;
}
-   if ((mem.end - mem.start + 1) != sizeof(struct mpc52xx_fec)) {
+   if ((mem.end - mem.start + 1) < sizeof(struct mpc52xx_fec)) {
printk(KERN_ERR DRIVER_NAME
-   " - invalid resource size (%lx != %x), check 
mpc52xx_devices.c\n",
+   " - invalid resource size (%lx < %x), check 
mpc52xx_devices.c\n",
(unsigned long)(mem.end - mem.start + 1), sizeof(struct 
mpc52xx_fec));
return -EINVAL;
}

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [POWERPC] mpc5200: Fix Kconfig dependancies on MPC5200 FEC device driver

2007-11-01 Thread Grant Likely
From: Grant Likely <[EMAIL PROTECTED]>

When not building an arch/powerpc kernel, the mpc5200 FEC driver depends
on some symbols which are not defined (BESTCOMM & BESTCOMM_FEC).

This patch flips around the dependancy logic so that it cannot be
selected unless BESTCOMM_FEC is selected first.  Kconfig stops
complaining this way.

Also, the driver only works for arch/powerpc (not arch/ppc) anyway so
it should depend on PPC_MERGE also.

Signed-off-by: Grant Likely <[EMAIL PROTECTED]>
---

 drivers/net/Kconfig |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 867cb73..5f800a6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1883,9 +1883,7 @@ config FEC2
 
 config FEC_MPC52xx
tristate "MPC52xx FEC driver"
-   depends on PPC_MPC52xx
-   select PPC_BESTCOMM
-   select PPC_BESTCOMM_FEC
+   depends on PPC_MERGE && PPC_MPC52xx && PPC_BESTCOMM_FEC
select CRC32
select PHYLIB
---help---

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Fixes to MPC5200 FEC driver

2007-11-01 Thread Grant Likely
Oops, send this series yesterday but forgot to include jgarzik, domen
and netdev to the 'to:' list.

These are fixes which should go in for .24

Cheers,
g.

--
Grant Likely, B.Sc. P.Eng.
Secret Lab Technologies Ltd.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Remove /proc/net/stat/*_arp_cache upon module removal

2007-11-01 Thread Alexey Dobriyan
neigh_table_init_no_netlink() creates them, but they aren't removed anywhere.

Steps to reproduce:

modprobe clip
rmmod clip
cat /proc/net/stat/clip_arp_cache

BUG: unable to handle kernel paging request at virtual address f89d7758
printing eip: c05a99da *pdpt = 4001 *pde = 04408067 *pte = 
 
Oops:  [#1] PREEMPT SMP 
Modules linked in: atm af_packet ipv6 binfmt_misc sbs sbshc fan dock battery 
backlight ac power_supply parport loop rtc_cmos rtc_core rtc_lib serio_raw 
button k8temp hwmon amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd 
uhci_hcd usbcore
Pid: 2082, comm: cat Not tainted 
(2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56-bloat #4)
EIP: 0060:[] EFLAGS: 00210256 CPU: 0
EIP is at neigh_stat_seq_next+0x26/0x3f
EAX: 0001 EBX: f89d7600 ECX: c587bf40 EDX: 
ESI:  EDI: 0001 EBP: 0400 ESP: c587bf1c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 2082, ti=c587b000 task=c5984e10 task.ti=c587b000)
Stack: c06228cc c5313790 c049e5c0 0804f000 c45a7b00 c53137b0   
   0082 0001    fffb c58d6780 c049e437 
   c45a7b00 c04b1f93 c587bfa0 0400 0804f000 0400 0804f000 c04b1f2f 
Call Trace:
 [] seq_read+0x189/0x281
 [] seq_read+0x0/0x281
 [] proc_reg_read+0x64/0x77
 [] proc_reg_read+0x0/0x77
 [] vfs_read+0x80/0xd1
 [] sys_read+0x41/0x67
 [] sysenter_past_esp+0x6b/0xc1
 ===
Code: e9 ec 8d 05 00 56 8b 11 53 8b 40 70 8b 58 3c eb 29 0f a3 15 80 91 7b c0 
19 c0 85 c0 8d 42 01 74 17 89 c6 c1 fe 1f 89 01 89 71 04 <8b> 83 58 01 00 00 f7 
d0 8b 04 90 eb 09 89 c2 83 fa 01 7e d2 31 
EIP: [] neigh_stat_seq_next+0x26/0x3f SS:ESP 0068:c587bf1c

Signed-off-by: Alexey Dobriyan <[EMAIL PROTECTED]>
---

 net/core/neighbour.c |2 ++
 1 file changed, 2 insertions(+)

--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1435,6 +1435,8 @@ int neigh_table_clear(struct neigh_table *tbl)
kfree(tbl->phash_buckets);
tbl->phash_buckets = NULL;
 
+   remove_proc_entry(tbl->id, init_net.proc_net_stat);
+
free_percpu(tbl->stats);
tbl->stats = NULL;
 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [IPv6] SNMP: Restore Udp6InErrors incrementation

2007-11-01 Thread Mitsuru Chinen
As the checksum verification is postponed till user calls recv or poll,
the inrementation of Udp6InErrors counter should be also postponed.
Currently, it is postponed in non-blocking operation case. However it
should be postponed in all case like the IPv4 code.

Signed-off-by: Mitsuru Chinen <[EMAIL PROTECTED]>
---
 net/ipv6/udp.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index caebad6..8344d8c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -205,12 +205,11 @@ out:
return err;
 
 csum_copy_err:
+   UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite);
skb_kill_datagram(sk, skb, flags);
 
-   if (flags & MSG_DONTWAIT) {
-   UDP6_INC_STATS_USER(UDP_MIB_INERRORS, is_udplite);
+   if (flags & MSG_DONTWAIT)
return -EAGAIN;
-   }
goto try_again;
 }
 
-- 
1.5.3.4

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [UDP6]: Restore sk_filter optimisation

2007-11-01 Thread Mitsuru Chinen
On Wed, 31 Oct 2007 22:42:57 +0800
Herbert Xu <[EMAIL PROTECTED]> wrote:

> On Wed, Oct 31, 2007 at 11:05:45PM +0900, Mitsuru Chinen wrote:
> >
> > > >  1. udp6InDatagrams is incremented instead of udpInErrors
> > > >  2. In userland, recvfrom() replies an error with EAGAIN.
> > > > recvfrom() wasn't aware of such a packet before.
> > > > 
> > > > Are these changes intentional?
> >
> > As far as I tested, this doesn't happen with the old code even if
> > a filter is attached. However, this happen with the new code
> > without a filter and I don't see this rather when a filter is
> > attached. So, I'm afraid it's new.
> 
> Sorry, I read the patch the wrong way around :)
> 
> 1) is just an accounting issue.  It shouldn't be too difficult
> to fix it up.  In fact, I think udpInErrors will still be
> incremented once we detect the error.
> 
> 2) shouldn't be an issue because we've already solved the
> problem by making poll/select do the checksum verification
> before indiciating that the socket is readable.
> 
> > > And, we're not sure how much the "optimization"'s benefit is.
> > > It is even worse when we are hand
> 
> The checksum verification is costly because we have to bring
> the payload into cache.  Since filters are very rare it's
> worthwhile to postpone the checksum verification for the common
> case.
> 
> Also as a general rule, we want to avoid divergent behaviour
> between IPv4 and IPv6.  So for changes like this we should
> really modify both stacks in future rather than have each
> stack do its own thing.

I got it. OK. I will submit a patch to postpone the udpInError
counter incrementation, either.

Thanks for your detailed explanation!

Best Regards,

Mitsuru Chinen <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] FEC - fast ethernet controller for mpc52xx

2007-11-01 Thread tnt
> + while (bcom_buffer_done(priv->tx_dmatsk)) {
> + struct sk_buff *skb;
> + skb = bcom_retrieve_buffer(priv->tx_dmatsk, NULL, NULL);
> + /* Here (and in rx routines) would be a good place for
> +  * dma_unmap_single(), but bcom doesn't return bcom_bd of the
> +  * finished transfer, and _unmap is empty on this platfrom.
> +  */
> +

Of course bestcomm let's you get back the bcom_bd ... What do you think
your second NULL parameter is for ?
Give it a pointer to a bcom_bd * and it will fill your pointer for you to
point to the bd you just got back.


   Sylvain

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 01, 2007 at 04:03:40AM -0700, David Miller escreveu:
> From: Eric Dumazet <[EMAIL PROTECTED]>
> Date: Thu, 01 Nov 2007 11:16:20 +0100
> 
> > As done two years ago on IP route cache table (commit 
> > 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
> > hash bucket for the huge TCP/DCCP hash tables.
> > 
> > On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
> > performance differences. (we hit a different cache line for the rwlock, but 
> > then the bucket cache line have a better sharing factor among cpus, since 
> > we 
> > dirty it less often)
> > 
> > Using a 'small' table of hashed rwlocks should be more than enough to 
> > provide 
> > correct SMP concurrency between different buckets, without using too much 
> > memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.
> > 
> > This patch provides some locking abstraction that may ease a future work 
> > using 
> >   a different model for TCP/DCCP table.
> > 
> > Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>
> 
> Nice work Eric.
> 
> I've tossed this into my local tree and we'll let this cook
> for a few days.  If no problems crop up I will submit it
> for 2.6.24 because the memory savings is non-trivial.

Agreed, thanks!

Acked-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Ilpo Järvinen
On Thu, 1 Nov 2007, Eric Dumazet wrote:

> @@ -134,6 +156,13 @@ static inline struct inet_ehash_bucket 
> *inet_ehash_bucket(
>   return &hashinfo->ehash[hash & (hashinfo->ehash_size - 1)];
>  }
>  
> +static inline rwlock_t *inet_ehash_lockp(
> + struct inet_hashinfo *hashinfo,

...These two fit to 80 columns.

> + unsigned int hash)
> +{
> + return &hashinfo->ehash_locks[hash & (EHASH_LOCK_SZ - 1)];
> +}
> +
>  extern struct inet_bind_bucket *
>   inet_bind_bucket_create(struct kmem_cache *cachep,
>   struct inet_bind_hashbucket *head,

-- 
 i.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 11:16:20 +0100

> As done two years ago on IP route cache table (commit 
> 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
> hash bucket for the huge TCP/DCCP hash tables.
> 
> On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
> performance differences. (we hit a different cache line for the rwlock, but 
> then the bucket cache line have a better sharing factor among cpus, since we 
> dirty it less often)
> 
> Using a 'small' table of hashed rwlocks should be more than enough to provide 
> correct SMP concurrency between different buckets, without using too much 
> memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.
> 
> This patch provides some locking abstraction that may ease a future work 
> using 
>   a different model for TCP/DCCP table.
> 
> Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

Nice work Eric.

I've tossed this into my local tree and we'll let this cook
for a few days.  If no problems crop up I will submit it
for 2.6.24 because the memory savings is non-trivial.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] INET : removes per bucket rwlock in tcp/dccp ehash table

2007-11-01 Thread Eric Dumazet
As done two years ago on IP route cache table (commit 
22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per 
hash bucket for the huge TCP/DCCP hash tables.


On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle 
performance differences. (we hit a different cache line for the rwlock, but 
then the bucket cache line have a better sharing factor among cpus, since we 
dirty it less often)


Using a 'small' table of hashed rwlocks should be more than enough to provide 
correct SMP concurrency between different buckets, without using too much 
memory. Sizing of this table depends on NR_CPUS and various CONFIG settings.


This patch provides some locking abstraction that may ease a future work using 
 a different model for TCP/DCCP table.


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

 include/net/inet_hashtables.h |   40 
 net/dccp/proto.c  |   16 ++--
 net/ipv4/inet_diag.c  |9 ---
 net/ipv4/inet_hashtables.c|7 +++--
 net/ipv4/inet_timewait_sock.c |   13 +-
 net/ipv4/tcp.c|   11 +++-
 net/ipv4/tcp_ipv4.c   |   11 
 net/ipv6/inet6_hashtables.c   |   19 ---
 8 files changed, 89 insertions(+), 37 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 4427dcd..5cbfbac 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -37,7 +37,6 @@
  * I'll experiment with dynamic table growth later.
  */
 struct inet_ehash_bucket {
-   rwlock_t  lock;
struct hlist_head chain;
struct hlist_head twchain;
 };
@@ -91,6 +90,28 @@ struct inet_bind_hashbucket {
 /* This is for listening sockets, thus all sockets which possess wildcards. */
 #define INET_LHTABLE_SIZE  32  /* Yes, really, this is all you need. */
 
+#if defined(CONFIG_SMP) || defined(CONFIG_PROVE_LOCKING)
+/*
+ * Instead of using one rwlock for each inet_ehash_bucket, we use a table of 
locks
+ * The size of this table is a power of two and depends on the number of CPUS.
+ */
+# if defined(CONFIG_DEBUG_LOCK_ALLOC)
+#  define EHASH_LOCK_SZ 256
+# elif NR_CPUS >= 32
+#  define EHASH_LOCK_SZ4096
+# elif NR_CPUS >= 16
+#  define EHASH_LOCK_SZ2048
+# elif NR_CPUS >= 8
+#  define EHASH_LOCK_SZ1024
+# elif NR_CPUS >= 4
+#  define EHASH_LOCK_SZ512
+# else
+#  define EHASH_LOCK_SZ256
+# endif
+#else
+# define EHASH_LOCK_SZ 0
+#endif
+
 struct inet_hashinfo {
/* This is for sockets with full identity only.  Sockets here will
 * always be without wildcards and will have the following invariant:
@@ -100,6 +121,7 @@ struct inet_hashinfo {
 * TIME_WAIT sockets use a separate chain (twchain).
 */
struct inet_ehash_bucket*ehash;
+   rwlock_t*ehash_locks;
 
/* Ok, let's try this, I give up, we do need a local binding
 * TCP hash as well as the others for fast bind/connect.
@@ -134,6 +156,13 @@ static inline struct inet_ehash_bucket *inet_ehash_bucket(
return &hashinfo->ehash[hash & (hashinfo->ehash_size - 1)];
 }
 
+static inline rwlock_t *inet_ehash_lockp(
+   struct inet_hashinfo *hashinfo,
+   unsigned int hash)
+{
+   return &hashinfo->ehash_locks[hash & (EHASH_LOCK_SZ - 1)];
+}
+
 extern struct inet_bind_bucket *
inet_bind_bucket_create(struct kmem_cache *cachep,
struct inet_bind_hashbucket *head,
@@ -222,7 +251,7 @@ static inline void __inet_hash(struct inet_hashinfo 
*hashinfo,
sk->sk_hash = inet_sk_ehashfn(sk);
head = inet_ehash_bucket(hashinfo, sk->sk_hash);
list = &head->chain;
-   lock = &head->lock;
+   lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
write_lock(lock);
}
__sk_add_node(sk, list);
@@ -253,7 +282,7 @@ static inline void inet_unhash(struct inet_hashinfo 
*hashinfo, struct sock *sk)
inet_listen_wlock(hashinfo);
lock = &hashinfo->lhash_lock;
} else {
-   lock = &inet_ehash_bucket(hashinfo, sk->sk_hash)->lock;
+   lock = inet_ehash_lockp(hashinfo, sk->sk_hash);
write_lock_bh(lock);
}
 
@@ -354,9 +383,10 @@ static inline struct sock *
 */
unsigned int hash = inet_ehashfn(daddr, hnum, saddr, sport);
struct inet_ehash_bucket *head = inet_ehash_bucket(hashinfo, hash);
+   rwlock_t *lock = inet_ehash_lockp(hashinfo, hash);
 
prefetch(head->chain.first);
-   read_lock(&head->lock);
+   read_lock(lock);
sk_for_each(sk, node, &head->chain) {
if (INET_MATCH(sk, hash, acookie, saddr, daddr, ports, dif))
goto hit; /* You sunk my battleship! */
@@ -369,7 +399,7 @@ static inline struct sock *
}
 

Re: [PATCH] ucc_geth: add support for netpoll

2007-11-01 Thread Anton Vorontsov
On Thu, Nov 01, 2007 at 10:33:24AM +0800, Li Yang-r58472 wrote:
> > -Original Message-
> > From: Anton Vorontsov [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, November 01, 2007 5:59 AM
> > To: Li Yang-r58472
> > Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]; 
> > [EMAIL PROTECTED]
> > Subject: Re: [PATCH] ucc_geth: add support for netpoll
> > 
> > On Mon, Oct 29, 2007 at 03:17:44PM +0300, Anton Vorontsov wrote:
> > [...]
> > > > Oops.  The original patch happened to hit the Junk mail box. :(
> > > 
> > > That one as well? http://lkml.org/lkml/2007/10/11/128
> > > 
> > > > I think
> > > > the patch is good to merge after the cosmetic change.  I 
> > can do it 
> > > > in next pull request to Jeff.
> > > 
> > > Ok, great. Thanks.
> > 
> > I'm wondering if you missed that email again. Maybe your mail 
> > client/server doing weird things with emails from @ru.mvista.com?
> 
> No.  I have explicitly add you to the whitelist. :)

Hehe, thanks. ;-)

> Please be patient,
> isn't this patch a new feature which can only be integrated in the merge
> window?

Sure it is. I didn't mean to "hurry up" you, of course not.

Just wondered if you've solved issues with getting my emails. Such
wonders are quite normal if there was a precedent lately. ;-)


Sorry for troubling you,

-- 
Anton Vorontsov
email: [EMAIL PROTECTED]
backup email: [EMAIL PROTECTED]
irc://irc.freenode.net/bd2
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] added register coments - bnx2x_init.h

2007-11-01 Thread Eliezer Tamir
On Thu, 2007-11-01 at 02:56 -0700, David Miller wrote:
> From: "Eliezer Tamir" <[EMAIL PROTECTED]>
> Date: Thu, 01 Nov 2007 11:56:17 +0200
> 
> > posting individual files for comments.
> >
> > ---
> > #ifndef __BNX2X_NEW_INIT_H__
> > #define __BNX2X_NEW_INIT_H__
> 
> Too big for the mailing lists.
> 
Thanks,
I will refrain from sending it to the list in the future.

Eliezer


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] added register coments - bnx2x_init.h

2007-11-01 Thread David Miller
From: "Eliezer Tamir" <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 11:56:17 +0200

> posting individual files for comments.
> 
> ---
> #ifndef __BNX2X_NEW_INIT_H__
> #define __BNX2X_NEW_INIT_H__

Too big for the mailing lists.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] added register coments - bnx2x_init.h

2007-11-01 Thread Eliezer Tamir
posting individual files for comments.

---
/* bnx2x_init.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 *
 * Written by: Eliezer Tamir <[EMAIL PROTECTED]>
 */

#ifndef EVEREST_INIT_FUNCTIONS_H
#define EVEREST_INIT_FUNCTIONS_H

#define COMMON  0x1
#define PORT0   0x2
#define PORT1   0x4

#define INIT_EMULATION  0x1
#define INIT_FPGA   0x2
#define INIT_ASIC   0x4
#define INIT_HARDWARE   0x7

#define STORM_INTMEM_SIZE   (0x5800 / 4)
#define TSTORM_INTMEM_ADDR  0x1a
#define CSTORM_INTMEM_ADDR  0x22
#define XSTORM_INTMEM_ADDR  0x2a
#define USTORM_INTMEM_ADDR  0x32


/* Init operation types and structures */

#define OP_RD   0x1 /* read single register */
#define OP_WR   0x2 /* write single register */
#define OP_IW   0x3 /* write single register using mailbox */
#define OP_SW   0x4 /* copy a string to the device */
#define OP_SI   0x5 /* copy a string using mailbox */
#define OP_ZR   0x6 /* clear memory */
#define OP_ZP   0x7 /* unzip then copy with DMAE */
#define OP_WB   0x8 /* copy a string using DMAE */

struct raw_op {
u32 op  :8;
u32 offset  :24;
u32 raw_data;
};

struct op_read {
u32 op  :8;
u32 offset  :24;
u32 pad;
};

struct op_write {
u32 op  :8;
u32 offset  :24;
u32 val;
};

struct op_string_write {
u32 op  :8;
u32 offset  :24;
#ifdef __LITTLE_ENDIAN
u16 data_off;
u16 data_len;
#else /* __BIG_ENDIAN */
u16 data_len;
u16 data_off;
#endif
};

struct op_zero {
u32 op  :8;
u32 offset  :24;
u32 len;
};

union init_op {
struct op_read  read;
struct op_write write;
struct op_string_write  str_wr;
struct op_zero  zero;
struct raw_op   raw;
};

#include "bnx2x_init_values.h"

static void bnx2x_reg_wr_ind(struct bnx2x *bp, u32 addr, u32 val);

static void bnx2x_write_dmae(struct bnx2x *bp, dma_addr_t dma_addr,
 u32 dst_addr, u32 len32);

static int bnx2x_gunzip(struct bnx2x *bp, u8 *zbuf, int len);

static void bnx2x_init_str_wr(struct bnx2x *bp, u32 addr, const u32 *data,
  u32 len)
{
int i;

for (i = 0; i < len; i++) {
REG_WR(bp, addr + i*4, data[i]);
if (!(i % 1)) {
touch_softlockup_watchdog();
cpu_relax();
}
}
}

#define INIT_MEM_WR(reg, data, reg_off, len) \
bnx2x_init_str_wr(bp, reg + reg_off*4, data, len)

static void bnx2x_init_ind_wr(struct bnx2x *bp, u32 addr, const u32 *data,
  u16 len)
{
int i;

for (i = 0; i < len; i++) {
REG_WR_IND(bp, addr + i*4, data[i]);
if (!(i % 1)) {
touch_softlockup_watchdog();
cpu_relax();
}
}
}

static void bnx2x_init_wr_wb(struct bnx2x *bp, u32 addr, const u32 *data,
 u32 len, int gunzip)
{
int offset = 0;

if (gunzip) {
int rc;
#ifdef __BIG_ENDIAN
int i, size;
u32 *temp;

temp = kmalloc(len, GFP_KERNEL);
size = (len / 4) + ((len % 4) ? 1 : 0);
for (i = 0; i < size; i++)
temp[i] = swab32(data[i]);
data = temp;
#endif
rc = bnx2x_gunzip(bp, (u8 *)data, len);
if (rc) {
DP(NETIF_MSG_HW, "gunzip failed ! rc %d\n", rc);
return;
}
len = bp->gunzip_outlen;
#ifdef __BIG_ENDIAN
kfree(temp);
for (i = 0; i < len; i++)
 ((u32 *)bp->gunzip_buf)[i] =
swab32(((u32 *)bp->gunzip_buf)[i]);
#endif
} else {
if ((len * 4) > FW_BUF_SIZE) {
BNX2X_ERR("LARGE DMAE OPERATION ! len 0x%x\n", len*4);
return;
}
memcpy(bp->gunzip_buf, data, len * 4);
}

while (len > DMAE_LEN32_MAX) {
bnx2x_write_dmae(bp, bp->gunzip_mapping + offset,
 addr + offset, DMAE_LEN32_MAX);
offset += DMAE_L

Re: [PATCH][BNX2X] added register coments - bnx2x_fw_defs.h

2007-11-01 Thread Eliezer Tamir
posting individual files for comments.

---
/* bnx2x_fw_defs.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 */


#define CSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x1922 + (port * 0x40) + (index * 0x4))
#define CSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1900 + (port * 0x40))
#define CSTORM_HC_BTR_OFFSET(port)\
(0x1984 + (port * 0xc0))
#define CSTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\
(0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define CSTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\
(0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define CSTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\
(0x1400 + (port * 0x280) + (cpu_id * 0x28))
#define CSTORM_STATS_FLAGS_OFFSET(port) (0x5108 + (port * 0x8))
#define TSTORM_CLIENT_CONFIG_OFFSET(port, client_id)\
(0x1510 + (port * 0x240) + (client_id * 0x20))
#define TSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x138a + (port * 0x28) + (index * 0x4))
#define TSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1370 + (port * 0x28))
#define TSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\
(0x4a80 + (port * 0x8))
#define TSTORM_HC_BTR_OFFSET(port)\
(0x13c4 + (port * 0x18))
#define TSTORM_MAC_FILTER_CONFIG_OFFSET(port)\
(0x1420 + (port * 0x30))
#define TSTORM_PORT_COMMON_CONFIG_OFFSET(port)\
(0x1418 + (port * 0x30))
#define TSTORM_RCQ_PROD_OFFSET(port, client_id)\
(0x1508 + (port * 0x240) + (client_id * 0x20))
#define TSTORM_STATS_FLAGS_OFFSET(port) (0x4aa0 + (port * 0x8))
#define USTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x191a + (port * 0x28) + (index * 0x4))
#define USTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1900 + (port * 0x28))
#define USTORM_HC_BTR_OFFSET(port)\
(0x1954 + (port * 0xb8))
#define USTORM_MEM_WORKAROUND_ADDRESS_OFFSET0x5408
#define USTORM_SB_HC_DISABLE_OFFSET(port, cpu_id, index)\
(0x141a + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define USTORM_SB_HC_TIMEOUT_OFFSET(port, cpu_id, index)\
(0x1418 + (port * 0x280) + (cpu_id * 0x28) + (index * 0x4))
#define USTORM_SB_HOST_SB_ADDR_OFFSET(port, cpu_id)\
(0x1400 + (port * 0x280) + (cpu_id * 0x28))
#define XSTORM_ASSERT_LIST_INDEX_OFFSET 0x1000
#define XSTORM_ASSERT_LIST_OFFSET(idx)  (0x1020 + (idx * 0x10))
#define XSTORM_DEF_SB_HC_DISABLE_OFFSET(port, index)\
(0x141a + (port * 0x28) + (index * 0x4))
#define XSTORM_DEF_SB_HOST_SB_ADDR_OFFSET(port)\
(0x1400 + (port * 0x28))
#define XSTORM_ETH_STATS_QUERY_ADDR_OFFSET(port)\
(0x5408 + (port * 0x8))
#define XSTORM_HC_BTR_OFFSET(port)\
(0x1454 + (port * 0x18))
#define XSTORM_SPQ_PAGE_BASE_OFFSET(port)\
(0x5328 + (port * 0x18))
#define XSTORM_SPQ_PROD_OFFSET(port)\
(0x5330 + (port * 0x18))
#define XSTORM_STATS_FLAGS_OFFSET(port) (0x53f8 + (port * 0x8))
#define COMMON_ASM_INVALID_ASSERT_OPCODE 0x0

/* values of command IDs in the ramrod message */
#define RAMROD_CMD_ID_ETH_PORT_SETUP80
#define RAMROD_CMD_ID_ETH_CLIENT_SETUP  85
#define RAMROD_CMD_ID_ETH_STAT_QUERY90
#define RAMROD_CMD_ID_ETH_UPDATE100
#define RAMROD_CMD_ID_ETH_HALT  105
#define RAMROD_CMD_ID_ETH_SET_MAC   110
#define RAMROD_CMD_ID_ETH_CFC_DEL   115
#define RAMROD_CMD_ID_ETH_PORT_DEL  120
#define RAMROD_CMD_ID_ETH_FORWARD_SETUP 125


/* command values for set mac command */
#define T_ETH_MAC_COMMAND_SET   0
#define T_ETH_MAC_COMMAND_INVALIDATE1
#define T_ETH_INDIRECTION_TABLE_SIZE128

/* Maximal L2 clients supported */
#define ETH_MAX_RX_CLIENTS  18

/* Maximal aggregation queues supported */
#define ETH_MAX_AGGREGATION_QUEUES  16


/**
* This file defines HSI constatnts common to all microcode flows
*/

/* Connection types */
#define ETH_CONNECTION_TYPE 0

/* microcode fixed page page size 4K (chains and ring segments) */
#define MC_PAGE_SIZE4096


/* Host coalescing constants */

/* IGU constants */
#define IGU_PORT_BASE   0x0400

#define IGU_ADDR_MSIX   0x
#define IGU_ADDR_INT_ACK0x0200
#define IGU_ADDR_PROD_UPD   0x0201
#define IGU_ADDR_ATTN_BITS_UPD  0x0202
#define IGU_ADDR_ATTN_BITS_SET  0x0203
#define IGU_ADDR_ATTN_BITS_CLR  0x0204
#define IGU_ADDR_COALESCE_NOW   0x0205
#define IGU_ADDR_SIMD_MASK  0x0206
#define IGU_ADDR_SIMD_NOMASK   

Re: [PATCH][BNX2X] added register coments - bnx2x.h

2007-11-01 Thread Eliezer Tamir
posting individual files for comments.

---
/* bnx2x.h: Broadcom Everest network driver.
 *
 * Copyright (c) 2007 Broadcom Corporation
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 *
 * Written by: Eliezer Tamir <[EMAIL PROTECTED]>
 * Based on code from Michael Chan's bnx2 driver
 */


#ifndef BNX2X_H
#define BNX2X_H

/* error/debug prints */

#define DRV_MODULE_NAME "bnx2x"
#define PFX DRV_MODULE_NAME ": "

/* for messages that are currently off */
#define BNX2X_MSG_OFF   0
#define BNX2X_MSG_MCP   0x1 /* was: NETIF_MSG_HW */
#define BNX2X_MSG_STATS 0x2 /* was: NETIF_MSG_TIMER */
#define NETIF_MSG_NVM   0x4 /* was: NETIF_MSG_HW */
#define NETIF_MSG_DMAE  0x8 /* was: NETIF_MSG_HW */

#define DP_LEVELKERN_NOTICE /* was: KERN_DEBUG */

/* regular debug print */
#define DP(__mask, __fmt, __args...) do { \
if (bp->msglevel & (__mask)) \
printk(DP_LEVEL "[%s:%d(%s)]" __fmt, __FUNCTION__, \
__LINE__, bp->dev?(bp->dev->name):"?", ##__args); \
} while (0)

/* for errors (never masked) */
#define BNX2X_ERR(__fmt, __args...) do { \
printk(KERN_ERR "[%s:%d(%s)]" __fmt, __FUNCTION__, \
__LINE__, bp->dev?(bp->dev->name):"?", ##__args); \
} while (0)

/* before we have a dev->name use dev_info() */
#define BNX2X_DEV_INFO(__fmt, __args...) do { \
if (bp->msglevel & NETIF_MSG_PROBE) \
dev_info(&bp->pdev->dev, __fmt, ##__args); \
} while (0)


#ifdef BNX2X_STOP_ON_ERROR
#define bnx2x_panic() do { \
bp->panic = 1; \
BNX2X_ERR("driver assert\n"); \
bnx2x_disable_int(bp); \
bnx2x_panic_dump(bp); \
} while (0)
#else
#define bnx2x_panic()
#endif


#define U64_LO(x)   (((u64)x) & 0x)
#define U64_HI(x)   (((u64)x) >> 32)
#define HILO_U64(hi, lo)(((u64)hi << 32) + lo)


#define REG_ADDR(bp, offset)(bp->regview + offset)

#define REG_RD(bp, offset)  readl(REG_ADDR(bp, offset))
#define REG_RD8(bp, offset) readb(REG_ADDR(bp, offset))
#define REG_RD64(bp, offset)readq(REG_ADDR(bp, offset))

#define REG_WR(bp, offset, val) writel((u32)val, REG_ADDR(bp, offset))
#define REG_WR8(bp, offset, val)writeb((u8)val, REG_ADDR(bp, offset))
#define REG_WR16(bp, offset, val)   writew((u16)val, REG_ADDR(bp, offset))
#define REG_WR32(bp, offset, val)   REG_WR(bp, offset, val)

#define REG_RD_IND(bp, offset)  bnx2x_reg_rd_ind(bp, offset)
#define REG_WR_IND(bp, offset, val) bnx2x_reg_wr_ind(bp, offset, val)

#define REG_WR_DMAE(bp, offset, val, len32) \
do { \
memcpy(bnx2x_sp(bp, wb_data[0]), val, len32 * 4); \
bnx2x_write_dmae(bp, bnx2x_sp_mapping(bp, wb_data), \
 offset, len32); \
} while (0)

#define SHMEM_RD(bp, type) \
REG_RD(bp, bp->shmem_base + offsetof(struct shmem_region, type))
#define SHMEM_WR(bp, type, val) \
REG_WR(bp, bp->shmem_base + offsetof(struct shmem_region, type), val)

#define NIG_WR(reg, val)REG_WR(bp, reg, val)
#define EMAC_WR(reg, val)   REG_WR(bp, emac_base + reg, val)
#define BMAC_WR(reg, val)   REG_WR(bp, GRCBASE_NIG + bmac_addr + reg, val)


#define for_each_queue(bp, var) for (var = 0; var < bp->num_queues; var++)
#ifdef BCM_MULTI
#define for_each_nondefault_queue(bp, var) \
for (var = 1; var < bp->num_queues; var++)
#define is_multi(bp)(bp->num_queues > 1)
#endif


struct regp {
u32 lo;
u32 hi;
};

struct bmac_stats {
struct regp tx_gtpkt;
struct regp tx_gtxpf;
struct regp tx_gtfcs;
struct regp tx_gtmca;
struct regp tx_gtgca;
struct regp tx_gtfrg;
struct regp tx_gtovr;
struct regp tx_gt64;
struct regp tx_gt127;
struct regp tx_gt255;   /* 10 */
struct regp tx_gt511;
struct regp tx_gt1023;
struct regp tx_gt1518;
struct regp tx_gt2047;
struct regp tx_gt4095;
struct regp tx_gt9216;
struct regp tx_gt16383;
struct regp tx_gtmax;
struct regp tx_gtufl;
struct regp tx_gterr;   /* 20 */
struct regp tx_gtbyt;

struct regp rx_gr64;
struct regp rx_gr127;
struct regp rx_gr255;
struct regp rx_gr511;
struct regp rx_gr1023;
struct regp rx_gr1518;
struct regp rx_gr2047;
struct regp rx_gr4095;
struct regp rx_gr9216;  /* 30 */
struct regp rx_gr16383;
struct regp rx_grmax;
struct regp rx_grpkt;
struct regp rx_grfcs;

Re: [PATCH][BNX2X] added register coments

2007-11-01 Thread Eliezer Tamir
On Thu, 2007-11-01 at 02:28 -0700, David Miller wrote:
> From: "Eliezer Tamir" <[EMAIL PROTECTED]>
> Date: Thu, 01 Nov 2007 11:28:25 +0200
> 
> > Here is the version with added register comments.
> > Please consider applying.
> 
> The 2.6.24 merge window is closed, so even if I thought
> the driver is perfect it cannot be included right now.
> 
> It will need to be resubmitted as the 2.6.25 merge window
> approaches.

OK, thanks.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] ipvs: Syncrhonise Closing of Connections

2007-11-01 Thread Simon Horman
From: Rumen G. Bogdanovski <[EMAIL PROTECTED]>

This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Regards,
Rumen Bogdanovski

Signed-off-by: Rumen G. Bogdanovski <[EMAIL PROTECTED]>
Signed-off-by: Simon Horman <[EMAIL PROTECTED]>

--- 
Thu, 01 Nov 2007 18:25:10 +0900, Horms
* Redifed for net-2.6
* Ran through scripts/checkpatch.pl and fixed up everything
  that it complains about except the use of volatile, as
  its in keeping with other fields in the structure.
  If its wrong, lets fix them all together.

WARNING: Use of volatile is usually wrong: see
Documentation/volatile-considered-harmful.txt
#49: FILE: include/net/ip_vs.h:523:
+   volatile __u16  old_state;  /* old state, to be used for

Index: net-2.6/include/net/ip_vs.h
===
--- net-2.6.orig/include/net/ip_vs.h2007-11-01 18:17:55.0 +0900
+++ net-2.6/include/net/ip_vs.h 2007-11-01 18:21:02.0 +0900
@@ -520,6 +520,10 @@ struct ip_vs_conn {
spinlock_t  lock;   /* lock for state transition */
volatile __u16  flags;  /* status flags */
volatile __u16  state;  /* state info */
+   volatile __u16  old_state;  /* old state, to be used for
+* state transition triggerd
+* synchronization
+*/
 
/* Control members */
struct ip_vs_conn   *control;   /* Master control connection */
Index: net-2.6/net/ipv4/ipvs/ip_vs_core.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_core.c 2007-11-01 18:17:55.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_core.c  2007-11-01 18:18:23.0 +0900
@@ -979,15 +979,23 @@ ip_vs_in(unsigned int hooknum, struct sk
ret = NF_ACCEPT;
}
 
-   /* increase its packet counter and check if it is needed
-  to be synchronized */
+   /* Increase its packet counter and check if it is needed
+* to be synchronized
+*
+* Sync connection if it is about to close to
+* encorage the standby servers to update the connections timeout
+*/
atomic_inc(&cp->in_pkts);
if ((ip_vs_sync_state & IP_VS_STATE_MASTER) &&
-   (cp->protocol != IPPROTO_TCP ||
-cp->state == IP_VS_TCP_S_ESTABLISHED) &&
-   (atomic_read(&cp->in_pkts) % sysctl_ip_vs_sync_threshold[1]
-== sysctl_ip_vs_sync_threshold[0]))
+   (((cp->protocol != IPPROTO_TCP ||
+  cp->state == IP_VS_TCP_S_ESTABLISHED) &&
+ (atomic_read(&cp->in_pkts) % sysctl_ip_vs_sync_threshold[1]
+  == sysctl_ip_vs_sync_threshold[0])) ||
+((cp->protocol == IPPROTO_TCP) && (cp->old_state != cp->state) &&
+ ((cp->state == IP_VS_TCP_S_FIN_WAIT) ||
+  (cp->state == IP_VS_TCP_S_CLOSE)
ip_vs_sync_conn(cp);
+   cp->old_state = cp->state;
 
ip_vs_conn_put(cp);
return ret;
Index: net-2.6/net/ipv4/ipvs/ip_vs_sync.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_sync.c 2007-11-01 18:17:55.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_sync.c  2007-11-01 18:20:30.0 +0900
@@ -332,7 +332,7 @@ static void ip_vs_process_message(const 
s->daddr, s->dport,
flags, dest);
if (dest)
-   atomic_dec(&dest->refcnt);
+   ip_vs_dest_get(dest);
if (!cp) {
IP_VS_ERR("ip_vs_conn_new failed\n");
return;
@@ -343,10 +343,9 @@ static void ip_vs_process_mes

[patch 1/2] ipvs: Bind connections on stanby if the destination exists

2007-11-01 Thread Simon Horman
From: Rumen G. Bogdanovski <[EMAIL PROTECTED]>

This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
[EMAIL PROTECTED]:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port   Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999   Route   1000   0  0
  -> node484.local:5999   Route   1000   0  0

while "ipvs -lnc" is right
[EMAIL PROTECTED]:~# ipvsadm -lnc
IPVS connection entries
pro expire state   source virtualdestination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

[EMAIL PROTECTED]:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port   Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999   Route   1000   0  1
  -> node484.local:5999   Route   1000   0  1

[EMAIL PROTECTED]:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F  Route   1000   0  1
  -> C0A80032:176F  Route   1000   0  1


Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov <[EMAIL PROTECTED]>
Signed-off-by: Rumen G. Bogdanovski <[EMAIL PROTECTED]>
Signed-off-by: Simon Horman <[EMAIL PROTECTED]>

--- 
Thu, 01 Nov 2007 18:26:24 +0900, Horms
* Various whitespace and indentation changes
* Rediffed against net-2.6
* Ran against ./scripts/checkpatch.pl and fixed everything that
  it complained about

Index: net-2.6/include/net/ip_vs.h
===
--- net-2.6.orig/include/net/ip_vs.h2007-11-01 17:57:30.0 +0900
+++ net-2.6/include/net/ip_vs.h 2007-11-01 18:06:56.0 +0900
@@ -901,6 +901,10 @@ extern int ip_vs_use_count_inc(void);
 extern void ip_vs_use_count_dec(void);
 extern int ip_vs_control_init(void);
 extern void ip_vs_control_cleanup(void);
+extern struct ip_vs_dest *
+ip_vs_find_dest(__be32 daddr, __be16 dport,
+__be32 vaddr, __be16 vport, __u16 protocol);
+extern struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp);
 
 
 /*
Index: net-2.6/net/ipv4/ipvs/ip_vs_conn.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_conn.c 2007-11-01 17:57:30.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_conn.c  2007-11-01 18:06:47.0 +0900
@@ -426,6 +426,25 @@ ip_vs_bind_dest(struct ip_vs_conn *cp, s
 
 
 /*
+ * Check if there is a destination for the connection, if so
+ * bind the connection to the destination.
+ */
+struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
+{
+   struct ip_vs_dest *dest;
+
+   if ((cp) && (!cp->dest)) {
+   dest = ip_vs_find_dest(cp->daddr, cp->dport,
+  cp->vaddr, cp->vport, cp->protocol);
+   ip_vs_bind_dest(cp, dest);
+   return dest;
+   } else
+   return NULL;
+}
+EXPORT_SYMBOL(ip_vs_try_bind_dest);
+
+
+/*
  * Unbind a connection entry with its VS destination
  * Called by the ip_vs_conn_expire function.
  */
Index: net-2.6/net/ipv4/ipvs/ip_vs_ctl.c
===
--- net-2.6.orig/net/ipv4/ipvs/ip_vs_ctl.c  2007-11-01 17:57:30.0 
+0900
+++ net-2.6/net/ipv4/ipvs/ip_vs_ctl.c   2007-11-01 18:06:47.0 +0900
@@ -579,6 +579,34 @@ ip_vs_lookup_dest(struct ip_vs_service *
return NULL;
 }
 
+/*
+ * Find destination by {daddr,dport,vaddr,protocol}
+ * Cretaed to be used in ip_vs_process_mes

[patch 0/2] ipvs: avoid overcommit on the standby, take II

2007-11-01 Thread Simon Horman
Two related patches from Rumen G. Bogdanovski
to help prevent overcommit on the standby.

After sending the last patchset I discovered scripts/checkpatch.pl and
found that it didn't like quite a few things. I've fixed those problems
and here is the result.

-- 
Horms
  H: http://www.vergenet.net/~horms/
  W: http://www.valinux.co.jp/en/

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] added register coments

2007-11-01 Thread David Miller
From: "Eliezer Tamir" <[EMAIL PROTECTED]>
Date: Thu, 01 Nov 2007 11:28:25 +0200

> Here is the version with added register comments.
> Please consider applying.

The 2.6.24 merge window is closed, so even if I thought
the driver is perfect it cannot be included right now.

It will need to be resubmitted as the 2.6.25 merge window
approaches.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][BNX2X] added register coments

2007-11-01 Thread Eliezer Tamir
Dave,

Here is the version with added register comments.
Please consider applying.

ftp link:
ftp://[EMAIL PROTECTED]/0001-bnx2x-0.4.12-with-reg-remarks.txt

gzipped:
ftp://[EMAIL PROTECTED]/0001-bnx2x-0.4.12-with-reg-remarks.txt.gz

(I will also post each individual file for review as a reply to this
email)

Thanks,
Eliezer


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: docbook fixes for netif_ functions

2007-11-01 Thread David Miller
From: Randy Dunlap <[EMAIL PROTECTED]>
Date: Wed, 31 Oct 2007 15:36:20 -0700

> > + * return values (usually ignored).
> > + * NET_RX_SUCCESS  (no congestion)
> > + * NET_RX_DROP (packet was dropped)
> 
> For the 3 lines above, how about:
> 
>  *Return values (usually ignored):
>  *NET_RX_SUCCESS: no congestion
>  *NET_RX_DROP: packet was dropped
> 
> 
> only because they come out of kernel-doc badly, munged together like so:
> 
>return  values  (usually  ignored).NET_RX_SUCCESS (no   congestion)
>NET_RX_DROP (packet was dropped)

I've applied Stephen's patch with this minor correction
added.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html