date:20070228

[PATCH (RESEND)] [USBNET] DM9601: Add Corega FEther USB-TXC support.

2007-02-28 Thread YOSHIFUJI Hideaki / 吉藤英明

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
---
 drivers/usb/net/dm9601.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c
index 4a932e1..c0bc52b 100644
--- a/drivers/usb/net/dm9601.c
+++ b/drivers/usb/net/dm9601.c
@@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = {
 
 static const struct usb_device_id products[] = {
{
+USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */
+.driver_info = (unsigned long)&dm9601_info,
+},
+   {
 USB_DEVICE(0x0a46, 0x9601),/* Davicom USB-100 */
 .driver_info = (unsigned long)&dm9601_info,
 },

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] [USBNET] DM9501: Add Corega FEther USB-TXC support.

2007-02-28 Thread YOSHIFUJI Hideaki / 吉藤英明

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
---
 drivers/usb/net/dm9601.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c
index 4a932e1..c0bc52b 100644
--- a/drivers/usb/net/dm9601.c
+++ b/drivers/usb/net/dm9601.c
@@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = {
 
 static const struct usb_device_id products[] = {
{
+USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */
+.driver_info = (unsigned long)&dm9601_info,
+},
+   {
 USB_DEVICE(0x0a46, 0x9601),/* Davicom USB-100 */
 .driver_info = (unsigned long)&dm9601_info,
 },

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jarek Poplawski

On Wed, Feb 28, 2007 at 10:45:41AM -0800, Jean Tourrilhes wrote:
> On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote:
> > On 28-02-2007 02:27, Jean Tourrilhes wrote:
> > >   Hi all,
> > ...
> > >   Patch for 2.6.20 is attached. The patch was tested on a system
> > > running the hotplug scripts, and on another system running udev.
> > > 
> > >   Have fun...
> > > 
> > >   Jean
> > > 
> > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>
> > > 
> > > -
> > ...
> > > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
> > > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800
> > > +++ linux/net/core/net-sysfs.c2007-02-27 15:06:49.0 -0800
> > > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de
> > >   if ((size <= 0) || (i >= num_envp))
> > >   return -ENOMEM;
> > >  
> > > + /* pass ifindex to uevent.
> > > +  * ifindex is useful as it won't change (interface name may change)
> > > +  * and is what RtNetlink uses natively. */
> > > + envp[i++] = buf;
> > > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
> > > + buf += n;
> > > + size -= n;
> > > +
> > > + if ((size <= 0) || (i >= num_envp))
> > 
> > Btw.:
> > 1. if size == 10 and snprintf returns 9 (without NULL)
> >then n == 10 (with NULL), so isn't it enough (here and above):
> >  
> > if ((size < 0) || (i >= num_envp))
> 
>   I just cut'n'pasted the code a few line above. If the original
> code is incorrect, it need fixing. And it will need fixing in probably
> a lot of places.

I think you're kind of responsible for your part, at least.

> 
> > 2. shouldn't there be (here and above):
> >  
> > envp[--i] = NULL;
> > 
> 
>   No, envp is local, so who cares.

But envp[i] isn't (at least here). So, I guess, a caller
of this function could care.

> > > + if ((size <= 0) || (i >= num_envp))
> > > + return -ENOMEM;

And one more thing (not necessarily for you):
ENOBUFS is probably more adequate here.

Cheers,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bonding: replace system timer with work queue

2007-02-28 Thread Andrew Morton

On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
>   please, review and apply to mm tree for further testing. The patch 
> is also available at 
> ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch .

Please cc netdev@vger.kernel.org on net-related patches, thanks.

>   Thank you,
>   Jaroslav
> 
> ==
> bonding: replace system timer with work queue
> 
> This patch replaces system timer with work queue in monitor functions.
> The reason for this change is that bonding handlers calls various
> sleeping functions from the timer handler which is not allowed.

Which sleeping functions?  I'd have expected the kernel to spew runtime
warnings when this happens, but I don't recall any such reports.


> Because we cannot share the main workqueue threads (rtnl_lock is used
> also in linkwatch_event) - new bond workqueue thread is created.
> 
> Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]>
> 
> diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 
> linux-2.6.20/drivers/net/bonding/bond_3ad.c
> --- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c  2007-02-04 
> 19:44:54.0 +0100
> +++ linux-2.6.20/drivers/net/bonding/bond_3ad.c   2007-02-28 
> 09:19:43.831369202 +0100
> @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave 
>   * times out, and it selects an aggregator for the ports that are yet not
>   * related to any aggregator, and selects the active aggregator for a bond.
>   */
> -void bond_3ad_state_machine_handler(struct bonding *bond)
> +void bond_3ad_state_machine_handler(struct work_struct *work)
>  {
> + struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, 
> ad_work.work);
> + struct bonding *bond = (struct bonding *)((char *)ad_info - 
> offsetof(struct bonding, ad_info));

We can use containers_of here too?

> -void bond_alb_monitor(struct bonding *bond)
> +void bond_alb_monitor(struct work_struct *work)
>  {
> - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
> + struct alb_bond_info *bond_info = container_of(work, struct 
> alb_bond_info, alb_work.work);
> + struct bonding *bond = (struct bonding *)((char *)bond_info - 
> offsetof(struct bonding, alb_info));

And here.

> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_AD_INFO(bond).ad_work));
>   break;
>   case BOND_MODE_TLB:
>   case BOND_MODE_ALB:
> - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
> + cancel_rearming_delayed_workqueue(bond_wq, 
> &(BOND_ALB_INFO(bond).alb_work));
>   break;
>   default:
>   break;
> @@ -4289,6 +4272,14 @@ static int bond_init(struct net_device *
>   rwlock_init(&bond->lock);
>   rwlock_init(&bond->curr_slave_lock);
>  
> + /* initialize work */
> + INIT_DELAYED_WORK(&bond->mii_work, (void *)&bond_mii_monitor);
> + if (params->mode == BOND_MODE_ACTIVEBACKUP) {
> + INIT_DELAYED_WORK(&bond->arp_work, (void 
> *)&bond_activebackup_arp_mon);
> + } else {
> + INIT_DELAYED_WORK(&bond->arp_work, (void 
> *)&bond_loadbalance_arp_mon);
> + }

Can we lose the unneeded braces, the unneeded typecasts and fit the code
into 80 cols?



yup.

>   bond->params = *params; /* copy params struct */
>  
>   /* Initialize pointers */
> @@ -4782,6 +4773,12 @@ static int __init bonding_init(void)
>   goto err;
>   }
>  
> + bond_wq = create_singlethread_workqueue("bond");
> + if (bond_wq == NULL) {
> + res = -ENOMEM;
> + goto err;
> + }
> +
>   res = bond_create_sysfs();
>   if (res)
>   goto err;
> @@ -4807,6 +4804,7 @@ static void __exit bonding_exit(void)
>  
>   rtnl_lock();
>   bond_free_all();
> + destroy_workqueue(bond_wq);
>   bond_destroy_sysfs();
>   rtnl_unlock();

Are you sure that all pending delayed works have been cancelled when we
destroy this workqueue?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 23:26:36 -0800

> sounds like a socket option would help, the data is already there. Then 
> the normal
> UDP receive path would work.

That would be perfect for new applications.

But we have to support all the old ones, so we're stuck
providing correctly functioning AF_PACKET handling on
all devices, sorry.

And in fact that effectively makes the new socket option
pointless, since it doesn't buy us anything since we have
to support the old stuff fully anyways.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Stephen Hemminger

David Miller wrote:

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 23:04:36 -0800

If an normal application has to use something like raw packet
filtering, it seems there is a missing API.

I'm loosely following this discussion, but Ben mentions DHCP
and I remember learning the other month that DHCP uses AF_PACKET
and filtering instead of IP RAW sockets because it needs to get
the MAC address and RAW sockets don't provide that.

sounds like a socket option would help, the data is already there. Then 
the normal

UDP receive path would work.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 23:04:36 -0800

> If an normal application has to use something like raw packet
> filtering, it seems there is a missing API.

I'm loosely following this discussion, but Ben mentions DHCP
and I remember learning the other month that DHCP uses AF_PACKET
and filtering instead of IP RAW sockets because it needs to get
the MAC address and RAW sockets don't provide that.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Stephen Hemminger


Ben Greear wrote:

Stephen Hemminger wrote:

On Wed, 28 Feb 2007 17:28:09 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

 

Stephen Hemminger wrote:
   

I was measuring bridging/routing performance and noticed this.

The current code runs the "all packet" type handlers before calling
the bridge hook.  If an application (like some DHCP clients) is
using AF_PACKET, this means that each received packet gets run
through the Berkeley Packet Filter code in sk_run_filter (slow).

By moving the bridging hook to run first, the packets flowing
through the bridge get filtered out there. This results in a 14%
improvement in performance, but it does mean that some snooping
applications would miss packets if being used on a bridge.  The
correct way to see all packets on a bridge is to set the bridge
pseudo-device to promiscuous mode.
  

Seems it would be better to fix these clients to be more selective as
to where they bind.



The problem is any use of BPF is a lose, if it has to be done to all
traffic.
  
Right, but couldn't you have the dhcp client bind to eth0, eth7, and 
br0 (ie, skipping the eth1-6 that comprise the bridge group?)


The only difficulty I see is having the client know when new devices 
come and go, but there are probably
ways to know that without keeping a whole lot of state or probing the 
/proc/net/dev (like my own bloated app does :))


I envision the client args to be something like --skip-devices "eth1 
eth2 eth3 ..."


I know you can bind raw packet sockets to individual devices, though I 
don't know much about BPF, so it's

possible I'm wrong...

The kernel has to deal with busted applications all the time. And each 
damn distro and configuration
seems to invent it's own new way of doing network configuration. If an 
normal application has
to use something like raw packet filtering, it seems there is a missing 
API.




This breaks the case where you want to see packets on a particular
interface, not just the entire bridge, right?



It might be possible to use promisc counter to handle this.
  
Not really, it's perfectly valid to sniff a port in non-promiscuous 
mode...


The non-promiscuous mode packets still make it in through the normal 
receive path.
The only packets that don't make up the stack are those that are being 
bridged.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] e1000 stop raw interrupts disabled nag from RT

2007-02-28 Thread Mark Huth


Kok, Auke wrote:

Mark Huth wrote:

Current e1000_xmit_frame spews raw interrupt disabled nag messages when
used with RT kernel patches.  This patch uses spin_trylock_irqsave,
which allows RT patches to properly manage the irq semantics.


Looks OK with me on first sight, I'll keep it on my stack and push it 
upstream after Jesse looks it over.


Which -RT paches make this pop up btw? I'd like to repro it.

Thanks,

Auke

Auke,

Well, I'm not an expert on the realtime patches - but most any patch set 
from Ingo seems to set this off - we've run through a bunch all the way 
since 2.6.10. It's a standard warning to get drivers to not mess with 
the processor interrupt function, since RT threads both the hard and 
soft irqs, and except for rare instances, the drivers and critical 
region protection no longer require the processor interrupt to be off.  
The XX_irqsave functions get turned into pre-empt disable, which is 
adequate for most things.  But the local_irq_save is recognized and 
warned, and then if it is really necessary it can be converted to an RT 
varient that won't warn.






Signed-off-by: Mark Huth <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/e1000/e1000_main.c 
b/drivers/net/e1000/e1000_main.c

index 619c892..48f94ee 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct 
net_device *netdev)

 (adapter->hw.mac_type == e1000_82573))
 e1000_transfer_dhcp_info(adapter, skb);
 
-local_irq_save(flags);

-if (!spin_trylock(&tx_ring->tx_lock)) {
+if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags))
 /* Collision - tell upper layer to requeue */
-local_irq_restore(flags);
 return NETDEV_TX_LOCKED;
-}
 
 /* need: count + 2 desc gap to keep tail from touching

  * head, otherwise try next time */




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH -mm 3/5] Blackfin: on-chip ethernet MAC controller driver

2007-02-28 Thread Wu, Bryan

Hi folks,

Here is the blackfin on-chip ethernet MAC controller driver for Linux.

It's name is blackfin-driver-net-stamp537.patch

[PATCH] Blackfin: on-chip ethernet MAC controller driver

This patch implements the driver necessary use the Analog Devices
Blackfin processor's on-chip ethernet MAC controller.

Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> 
---

 drivers/net/Kconfig|   44 ++
 drivers/net/Makefile   |1
 drivers/net/bfin_mac.c |  988 
+
 drivers/net/bfin_mac.h |  146 +
 4 files changed, 1179 insertions(+)

Index: linux-2.6/drivers/net/Kconfig
===
--- linux-2.6.orig/drivers/net/Kconfig  2007-03-01 11:39:14.0 +0800
+++ linux-2.6/drivers/net/Kconfig   2007-03-01 11:39:19.0 +0800
@@ -836,6 +836,50 @@
  module, say M here and read  as well
  as .
 
+config BFIN_MAC
+   tristate "Blackfin 536/537 on-chip mac support"
+   depends on NET_ETHERNET && (BF537 || BF536) && (!BF537_PORT_H)
+   select CRC32
+   select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
+   help
+ This is the driver for blackfin on-chip mac device. Say Y if you want 
it
+ compiled into the kernel. This driver is also available as a module
+ ( = code which can be inserted in and removed from the running kernel
+ whenever you want). The module will be called bfin_mac.
+
+config BFIN_MAC_USE_L1
+   bool "Use L1 memory for rx/tx packets"
+   depends on BFIN_MAC && BF537
+   default y
+   help
+ To get maximum network performace, you should use L1 memory as rx/tx 
buffers.
+ Say N here if you want to reserve L1 memory for other uses.
+
+config BFIN_TX_DESC_NUM
+   int "Number of transmit buffer packets"
+   depends on BFIN_MAC
+   range 6 10 if BFIN_MAC_USE_L1
+   range 10 100
+   default "10"
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_RX_DESC_NUM
+   int "Number of receive buffer packets"
+   depends on BFIN_MAC
+   range 20 100 if BFIN_MAC_USE_L1
+   range 20 800
+   default "20"
+   help
+ Set the number of buffer packets used in driver.
+
+config BFIN_MAC_RMII
+   bool "RMII PHY Interface (EXPERIMENTAL)"
+   depends on BFIN_MAC && EXPERIMENTAL
+   default n
+   help
+ Use Reduced PHY MII Interface
+
 config SMC9194
tristate "SMC 9194 support"
depends on NET_VENDOR_SMC && (ISA || MAC && BROKEN)
Index: linux-2.6/drivers/net/Makefile
===
--- linux-2.6.orig/drivers/net/Makefile 2007-03-01 11:33:24.0 +0800
+++ linux-2.6/drivers/net/Makefile  2007-03-01 11:39:19.0 +0800
@@ -195,6 +195,7 @@
 obj-$(CONFIG_MYRI10GE) += myri10ge/
 obj-$(CONFIG_SMC91X) += smc91x.o
 obj-$(CONFIG_SMC911X) += smc911x.o
+obj-$(CONFIG_BFIN_MAC) += bfin_mac.o
 obj-$(CONFIG_DM9000) += dm9000.o
 obj-$(CONFIG_FEC_8XX) += fec_8xx/
 obj-$(CONFIG_PASEMI_MAC) += pasemi_mac.o
Index: linux-2.6/drivers/net/bfin_mac.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/drivers/net/bfin_mac.c2007-03-01 11:39:19.0 +0800
@@ -0,0 +1,988 @@
+/*
+ * File: drivers/net/bfin_mac.c
+ * Based on:
+ * Author:   Luke Yang <[EMAIL PROTECTED]>
+ *
+ * Created:
+ * Description:
+ *
+ * Rev:  $Id: bfin_mac.c,v 1.60 2006/12/16 11:23:56 hennerich Exp $
+ *
+ * Modified:
+ *   Copyright 2004-2006 Analog Devices Inc.
+ *
+ * Bugs: Enter bugs at http://blackfin.uclinux.org/
+ *
+ * This program is free software ;  you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation ;  either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY ;  without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program ;  see the file COPYING.
+ * If not, write to the Free Software Foundation,
+ * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "bfin_mac.h"
+
+#define CARDNAME "bfin_mac"
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Luke Yang");
+MODULE_DESCRIPTION("Blackfin MAC Driver");
+
+#if defined(CONFIG_BFIN_MAC_USE

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Ben Greear


Stephen Hemminger wrote:

On Wed, 28 Feb 2007 17:28:09 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

  

Stephen Hemminger wrote:


I was measuring bridging/routing performance and noticed this.

The current code runs the "all packet" type handlers before calling
the bridge hook.  If an application (like some DHCP clients) is
using AF_PACKET, this means that each received packet gets run
through the Berkeley Packet Filter code in sk_run_filter (slow).

By moving the bridging hook to run first, the packets flowing
through the bridge get filtered out there. This results in a 14%
improvement in performance, but it does mean that some snooping
applications would miss packets if being used on a bridge.  The
correct way to see all packets on a bridge is to set the bridge
pseudo-device to promiscuous mode.
  

Seems it would be better to fix these clients to be more selective as
to where they bind.



The problem is any use of BPF is a lose, if it has to be done to all
traffic.
  
Right, but couldn't you have the dhcp client bind to eth0, eth7, and br0 
(ie, skipping the eth1-6 that comprise the bridge group?)


The only difficulty I see is having the client know when new devices 
come and go, but there are probably
ways to know that without keeping a whole lot of state or probing the 
/proc/net/dev (like my own bloated app does :))


I envision the client args to be something like --skip-devices "eth1 
eth2 eth3 ..."


I know you can bind raw packet sockets to individual devices, though I 
don't know much about BPF, so it's

possible I'm wrong...


This breaks the case where you want to see packets on a particular
interface, not just the entire bridge, right?



It might be possible to use promisc counter to handle this.
  

Not really, it's perfectly valid to sniff a port in non-promiscuous mode...

Ben


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  



--
Ben Greear <[EMAIL PROTECTED]> 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Stephen Hemminger

On Wed, 28 Feb 2007 17:28:09 -0800
Ben Greear <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> > I was measuring bridging/routing performance and noticed this.
> > 
> > The current code runs the "all packet" type handlers before calling
> > the bridge hook.  If an application (like some DHCP clients) is
> > using AF_PACKET, this means that each received packet gets run
> > through the Berkeley Packet Filter code in sk_run_filter (slow).
> > 
> > By moving the bridging hook to run first, the packets flowing
> > through the bridge get filtered out there. This results in a 14%
> > improvement in performance, but it does mean that some snooping
> > applications would miss packets if being used on a bridge.  The
> > correct way to see all packets on a bridge is to set the bridge
> > pseudo-device to promiscuous mode.
> 
> Seems it would be better to fix these clients to be more selective as
> to where they bind.

The problem is any use of BPF is a lose, if it has to be done to all
traffic.

> This breaks the case where you want to see packets on a particular
> interface, not just the entire bridge, right?

It might be possible to use promisc counter to handle this.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] e1000 stop raw interrupts disabled nag from RT

2007-02-28 Thread Kok, Auke


Mark Huth wrote:

Current e1000_xmit_frame spews raw interrupt disabled nag messages when
used with RT kernel patches.  This patch uses spin_trylock_irqsave,
which allows RT patches to properly manage the irq semantics.


Looks OK with me on first sight, I'll keep it on my stack and push it upstream 
after Jesse looks it over.


Which -RT paches make this pop up btw? I'd like to repro it.

Thanks,

Auke





Signed-off-by: Mark Huth <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 619c892..48f94ee 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device 
*netdev)
(adapter->hw.mac_type == e1000_82573))
e1000_transfer_dhcp_info(adapter, skb);
 
-	local_irq_save(flags);

-   if (!spin_trylock(&tx_ring->tx_lock)) {
+   if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags))
/* Collision - tell upper layer to requeue */
-   local_irq_restore(flags);
return NETDEV_TX_LOCKED;
-   }
 
 	/* need: count + 2 desc gap to keep tail from touching

 * head, otherwise try next time */

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] e1000 stop raw interrupts disabled nag from RT

2007-02-28 Thread Mark Huth



Current e1000_xmit_frame spews raw interrupt disabled nag messages when
used with RT kernel patches.  This patch uses spin_trylock_irqsave,
which allows RT patches to properly manage the irq semantics.

Signed-off-by: Mark Huth <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 619c892..48f94ee 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
 	(adapter->hw.mac_type == e1000_82573))
 		e1000_transfer_dhcp_info(adapter, skb);
 
-	local_irq_save(flags);
-	if (!spin_trylock(&tx_ring->tx_lock)) {
+	if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags))
 		/* Collision - tell upper layer to requeue */
-		local_irq_restore(flags);
 		return NETDEV_TX_LOCKED;
-	}
 
 	/* need: count + 2 desc gap to keep tail from touching
 	 * head, otherwise try next time */

Re: [PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Ben Greear


Stephen Hemminger wrote:

I was measuring bridging/routing performance and noticed this.

The current code runs the "all packet" type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter (slow).

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there. This results in a 14%
improvement in performance, but it does mean that some snooping applications
would miss packets if being used on a bridge.  The correct way to see all
packets on a bridge is to set the bridge pseudo-device to promiscuous
mode.


Seems it would be better to fix these clients to be more selective as to
where they bind.

This breaks the case where you want to see packets on a particular interface,
not just the entire bridge, right?

Thanks,
Ben



Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/core/dev.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index cf71614..dc2cda6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb)
 
 	rcu_read_lock();
 
+	if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))

+   goto out;
+
+
 #ifdef CONFIG_NET_CLS_ACT
if (skb->tc_verd & TC_NCLS) {
skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
@@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb)
 ncls:
 #endif
 
-	if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))

-   goto out;
-
type = skb->protocol;
list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) {
if (ptype->type == type &&



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bridge: avoid ptype_all packet handling

2007-02-28 Thread Stephen Hemminger

I was measuring bridging/routing performance and noticed this.

The current code runs the "all packet" type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter (slow).

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there. This results in a 14%
improvement in performance, but it does mean that some snooping applications
would miss packets if being used on a bridge.  The correct way to see all
packets on a bridge is to set the bridge pseudo-device to promiscuous
mode.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/core/dev.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index cf71614..dc2cda6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb)
 
rcu_read_lock();
 
+   if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))
+   goto out;
+
+
 #ifdef CONFIG_NET_CLS_ACT
if (skb->tc_verd & TC_NCLS) {
skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
@@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb)
 ncls:
 #endif
 
-   if (handle_bridge(&skb, &pt_prev, &ret, orig_dev))
-   goto out;
-
type = skb->protocol;
list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) {
if (ptype->type == type &&
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable

2007-02-28 Thread Andy Gospodarek

On Wed, Feb 28, 2007 at 05:08:59PM -0800, Jay Vosburgh wrote:
> >
> >That sounds like a nice add-on to the existing functionality.  I can see
> >the value in something dynamic like that, but I can also see the value
> >in something static like the functionality we have.  Did you plan to
> >keep the existing functionality intact or just have it done dynamically?
> 
>   Well, I posted the patch just a bit ago, so you can see for
> yourself, but no, it removes the existing "copy IGMP everywhere"

I see them now -- I'll check them out and make any comments there.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PCIe NICs that correctly suspend/resume

2007-02-28 Thread Jay Cliburn

We're trying to make the Attansic L1 NIC correctly suspend, resume,
and wake-on-lan.  Can someone point me to a PCIe-based NIC driver in
the kernel tree that correctly does these things?  I'd like to see how
it's *supposed* to be done.

Thanks,
Jay
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable

2007-02-28 Thread Jay Vosburgh

Andy Gospodarek <[EMAIL PROTECTED]> wrote:

>On Wed, Feb 28, 2007 at 02:39:42PM -0800, Jay Vosburgh wrote:
[...]
>>  Why would you want to turn this off?
>
>When you connect active-backup bonds to 2 separate switches that are in
>'distant' parts of the network you can end up with a bunch of unwanted
>multicast data flowing everywhere and if you don't care whether or not
>your multicast traffic is highly available then it just seems like
>noise.  I thought the flexibility seemed nice.

Ok, I can buy the "multicast spew" argument.

>>  Also, I've got a replacement patch for this functionality that
>> seems to be better in all regards. It sends bonus IGMP joins when a
>> failover occurs, rather than simply duplicating them on all slaves (the
>> current system can leave switches in the dark if the slaves fail back to
>> the originals).  As chance would have it, I'm planning to post it as
>> part of a set in a a little while.
>> 
>
>That sounds like a nice add-on to the existing functionality.  I can see
>the value in something dynamic like that, but I can also see the value
>in something static like the functionality we have.  Did you plan to
>keep the existing functionality intact or just have it done dynamically?

Well, I posted the patch just a bit ago, so you can see for
yourself, but no, it removes the existing "copy IGMP everywhere"
behavior.  I couldn't really think of an advantage to flooding
everywhere all the time if the hose is re-aimed during failover (if
you'll pardon my cheesy metaphor).

>Is this separate from your workqueue/refactoring patch or does it work
>on the existing code?

This is separate, for the current mainline.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] bonding: only receive ARPs for us

2007-02-28 Thread Jay Vosburgh


The ARP validation code only needs ARPs for the bonding device.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1f263ac..7ec6121 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3427,7 +3427,7 @@ void bond_register_arp(struct bonding *b
return;
 
pt->type = htons(ETH_P_ARP);
-   pt->dev = NULL; /*bond->dev;XXX*/
+   pt->dev = bond->dev;
pt->func = bond_arp_rcv;
dev_add_pack(pt);
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] bonding: Improve IGMP join processing

2007-02-28 Thread Jay Vosburgh


In active-backup mode, the current bonding code duplicates IGMP
traffic to all slaves, so that switches are up to date in case of a
failover from an active to a backup interface.  If bonding then fails
back to the original active interface, it is likely that the "active
slave" switch's IGMP forwarding for the port will be out of date until
some event occurs to refresh the switch (e.g., a membership query).

This patch alters the behavior of bonding to no longer flood
IGMP to all ports, and to issue IGMP JOINs to the newly active port at
the time of a failover.  This insures that switches are kept up to date
for all cases.

"GOELLESCH Niels" <[EMAIL PROTECTED]> originally
reported this problem, and included a patch.  His original patch was
modified by Jay Vosburgh to additionally remove the existing IGMP flood
behavior, use RCU, streamline code paths, fix trailing white space, and
adjust for style.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7ec6121..338d452 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -60,6 +60,7 @@ #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -861,6 +862,28 @@ static void bond_mc_delete(struct bondin
}
 }
 
+
+/*
+ * Retrieve the list of registered multicast addresses for the bonding
+ * device and retransmit an IGMP JOIN request to the current active
+ * slave.
+ */
+static void bond_resend_igmp_join_requests(struct bonding *bond)
+{
+   struct in_device *in_dev;
+   struct ip_mc_list *im;
+
+   rcu_read_lock();
+   in_dev = __in_dev_get_rcu(bond->dev);
+   if (in_dev) {
+   for (im = in_dev->mc_list; im; im = im->next) {
+   ip_mc_rejoin_group(im);
+   }
+   }
+
+   rcu_read_unlock();
+}
+
 /*
  * Totally destroys the mc_list in bond
  */
@@ -874,6 +897,7 @@ static void bond_mc_list_destroy(struct 
kfree(dmi);
dmi = bond->mc_list;
}
+bond->mc_list = NULL;
 }
 
 /*
@@ -967,6 +991,7 @@ static void bond_mc_swap(struct bonding 
for (dmi = bond->dev->mc_list; dmi; dmi = dmi->next) {
dev_mc_add(new_active->dev, dmi->dmi_addr, 
dmi->dmi_addrlen, 0);
}
+   bond_resend_igmp_join_requests(bond);
}
 }
 
@@ -4017,42 +4042,6 @@ out:
return 0;
 }
 
-static void bond_activebackup_xmit_copy(struct sk_buff *skb,
-struct bonding *bond,
-struct slave *slave)
-{
-   struct sk_buff *skb2 = skb_copy(skb, GFP_ATOMIC);
-   struct ethhdr *eth_data;
-   u8 *hwaddr;
-   int res;
-
-   if (!skb2) {
-   printk(KERN_ERR DRV_NAME ": Error: "
-  "bond_activebackup_xmit_copy(): skb_copy() failed\n");
-   return;
-   }
-
-   skb2->mac.raw = (unsigned char *)skb2->data;
-   eth_data = eth_hdr(skb2);
-
-   /* Pick an appropriate source MAC address
-*  -- use slave's perm MAC addr, unless used by bond
-*  -- otherwise, borrow active slave's perm MAC addr
-* since that will not be used
-*/
-   hwaddr = slave->perm_hwaddr;
-   if (!memcmp(eth_data->h_source, hwaddr, ETH_ALEN))
-   hwaddr = bond->curr_active_slave->perm_hwaddr;
-
-   /* Set source MAC address appropriately */
-   memcpy(eth_data->h_source, hwaddr, ETH_ALEN);
-
-   res = bond_dev_queue_xmit(bond, skb2, slave->dev);
-   if (res)
-   dev_kfree_skb(skb2);
-
-   return;
-}
 
 /*
  * in active-backup mode, we know that bond->curr_active_slave is always valid 
if
@@ -4073,21 +4062,6 @@ static int bond_xmit_activebackup(struct
if (!bond->curr_active_slave)
goto out;
 
-   /* Xmit IGMP frames on all slaves to ensure rapid fail-over
-  for multicast traffic on snooping switches */
-   if (skb->protocol == __constant_htons(ETH_P_IP) &&
-   skb->nh.iph->protocol == IPPROTO_IGMP) {
-   struct slave *slave, *active_slave;
-   int i;
-
-   active_slave = bond->curr_active_slave;
-   bond_for_each_slave_from_to(bond, slave, i, active_slave->next,
-   active_slave->prev)
-   if (IS_UP(slave->dev) &&
-   (slave->link == BOND_LINK_UP))
-   bond_activebackup_xmit_copy(skb, bond, slave);
-   }
-
res = bond_dev_queue_xmit(bond, skb, bond->curr_active_slave->dev);
 
 out:
diff --git a/include/linux/igmp.h b/include/linux/igmp.h
index 9dbb525..a113fe6 100644
--- a/include/linux/igmp.h
+++ b/include/linux/igmp.h
@@ -218,5 +218,7 @@ extern void ip_mc_up(struct in_device *)
 extern void ip_mc_down(struct in_

[PATCH 1/3] bonding: fix double dev_add_pack

2007-02-28 Thread Jay Vosburgh


Bonding can erroneously register the same packet_type to receive
ARPs (for use by ARP validation): once at device open time, and once via
sysfs.  Since sysfs can change the validate setting (and thus register
or unregister) at any time, a flag is needed to synchronize with device
open in order to avoid double registrations, and the simplest place is
within the packet_type structure itself.  Double unregister is not an
issue.

Bug reported by Ulrich Oelmann <[EMAIL PROTECTED]>.

Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>


diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a7c8f98..1f263ac 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3423,6 +3423,9 @@ void bond_register_arp(struct bonding *b
 {
struct packet_type *pt = &bond->arp_mon_pt;
 
+   if (pt->type)
+   return;
+
pt->type = htons(ETH_P_ARP);
pt->dev = NULL; /*bond->dev;XXX*/
pt->func = bond_arp_rcv;
@@ -3431,7 +3434,10 @@ void bond_register_arp(struct bonding *b
 
 void bond_unregister_arp(struct bonding *bond)
 {
-   dev_remove_pack(&bond->arp_mon_pt);
+   struct packet_type *pt = &bond->arp_mon_pt;
+
+   dev_remove_pack(pt);
+   pt->type = 0;
 }
 
 /* Hashing Policies -*/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable

2007-02-28 Thread Andy Gospodarek

On Wed, Feb 28, 2007 at 02:39:42PM -0800, Jay Vosburgh wrote:
> Andy Gospodarek <[EMAIL PROTECTED]> wrote:
> 
> >A while back the following change was made to the bonding code:
> >
> >commit df49898a47061e82219c991dfbe9ac6ddf7a866b
> >Author: John W. Linville <[EMAIL PROTECTED]>
> >Date:   Tue Oct 18 21:30:58 2005 -0400
> >
> >[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack
> >
> >Expand comment explaining MAC address selection for replicated IGMP
> >frames transmitted in bonding mode 1 (active-backup).  Also, a small
> >whitespace cleanup.
> >
> >Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
> >Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
> >
> >In general this patch is good, but this tweaks that feature by allowing
> >that functionality to be enabled and disabled.  This patch adds a new
> >module option as well as a sysfs entry.  It sets the default to be the
> >current behavior so existing users shouldn't notice any difference.
> 
>   Why would you want to turn this off?
> 

When you connect active-backup bonds to 2 separate switches that are in
'distant' parts of the network you can end up with a bunch of unwanted
multicast data flowing everywhere and if you don't care whether or not
your multicast traffic is highly available then it just seems like
noise.  I thought the flexibility seemed nice.

>   Also, I've got a replacement patch for this functionality that
> seems to be better in all regards. It sends bonus IGMP joins when a
> failover occurs, rather than simply duplicating them on all slaves (the
> current system can leave switches in the dark if the slaves fail back to
> the originals).  As chance would have it, I'm planning to post it as
> part of a set in a a little while.
> 

That sounds like a nice add-on to the existing functionality.  I can see
the value in something dynamic like that, but I can also see the value
in something static like the functionality we have.  Did you plan to
keep the existing functionality intact or just have it done dynamically?

Is this separate from your workqueue/refactoring patch or does it work
on the existing code?


>   -J
> 
> ---
>   -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]> More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg

On Wed, 2007-02-28 at 16:51 -0800, Jean Tourrilhes wrote:

>   I would prefer to fix the comment when this change actually
> happens. I prefer comments to refer to the current reality, rather
> than past/future situation.

Uh, no. device_rename is perfectly fine, even other people may use it in
the future.

>  When you introduce wireless renaming, you
> will need to verify the whole chain anyway, so you might as well fix
> the comment while merging wireless renaming.

No again, device_rename is perfectly fine API, I shouldn't have to look
at it's internals to see if it's broken in my use case. Even if it's
only a broken comment.

I'm not going to respin your patches though, if this doesn't make it in
I don't care.

johannes

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes

On Thu, Mar 01, 2007 at 01:37:46AM +0100, Johannes Berg wrote:
> On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote:
> 
> > +   /* This function is only used for network interface.
> > +* Some hotplug package track interfaces by their name and
> > +* therefore want to know when the name is changed by the user. */
> 
> Right now, that's true, but wireless is going to start using
> device_rename pretty soon as well. Could you rephrase this comment?
> 
> johannes

I would prefer to fix the comment when this change actually
happens. I prefer comments to refer to the current reality, rather
than past/future situation. When you introduce wireless renaming, you
will need to verify the whole chain anyway, so you might as well fix
the comment while merging wireless renaming.
Note also that my comment is technically correct. I did not
say 'netdev' but the more generic term 'network interface', and I
believe your wireless interface is a 'network interface', even if it's
not a netdev ;-)
But if this really bugs you, please feel free to respin my
patch.

Have fun...

Jean

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] qla3xxx: bugfix for line omitted in previous patch.

2007-02-28 Thread Ron Mercer


>From 01751a39d7327acc28dabf4f68930b7e20b279d1 Mon Sep 17 00:00:00 2001
From: Ron Mercer <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 16:42:17 -0800
Subject: [PATCH] [PATCH] qla3xxx: bugfix for line omitted in previous patch.

This missing line caused transmit errors on the Qlogic 4032 chip.

Signed-off-by: Ron Mercer <[EMAIL PROTECTED]>
---
 drivers/net/qla3xxx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qla3xxx.c b/drivers/net/qla3xxx.c
index 3a14d19..d3f65da 100755
--- a/drivers/net/qla3xxx.c
+++ b/drivers/net/qla3xxx.c
@@ -2210,7 +2210,7 @@ static int ql_send_map(struct ql3_adapter *qdev,
 {
struct oal *oal;
struct oal_entry *oal_entry;
-   int len = skb->len;
+   int len = skb_headlen(skb);
dma_addr_t map;
int err;
int completed_segs, i;
-- 
1.5.0.rc4.16.g9e258

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Missing VLAN tags in bnx2

2007-02-28 Thread Michael Chan

On Wed, 2007-02-28 at 21:12 +0200, Pekka Pietikainen wrote:
> Just had to spend some time figuring out why a bnx2 card connected to
> a switch monitor port didn't see any vlan tags (when in our scenario the
> tags are pretty vital).

I'll have someone send you a utility to disable the ASF firmware.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg

On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote:

> + /* This function is only used for network interface.
> +  * Some hotplug package track interfaces by their name and
> +  * therefore want to know when the name is changed by the user. */

Right now, that's true, but wireless is going to start using
device_rename pretty soon as well. Could you rephrase this comment?

johannes


signature.asc
Description: This is a digitally signed message part

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes

On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote:
> On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote:
> > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> > --- linux/drivers/base/class.j1.c   2007-02-26 18:38:10.0 -0800
> > +++ linux/drivers/base/class.c  2007-02-27 15:52:37.0 -0800
> > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev
> 
> This function is not in the 2.6.21-rc2 kernel, so you might want to
> rework this patch a bit :)

Thanks for all you good comments. I ported my patch to
2.6.21-rc2, and tested it both on a hotplug and a udev system. Patch
is attached, I would be glad if you could push that through the usual
channels.

Also, I realised that I forgot to say in my original e-mail
that migrating udev to use ifindex instead of ifname would fix the
remove/add race condition for network devices. But that's not going to
happen overnight...

Have fun...

Jean

Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>

-

diff -u -p linux/include/linux/kobject.j1.h linux/include/linux/kobject.h
--- linux/include/linux/kobject.j1.h2007-02-28 14:26:29.0 -0800
+++ linux/include/linux/kobject.h   2007-02-28 14:27:54.0 -0800
@@ -48,6 +48,7 @@ enum kobject_action {
KOBJ_OFFLINE= (__force kobject_action_t) 0x06,  /* device 
offline */
KOBJ_ONLINE = (__force kobject_action_t) 0x07,  /* device 
online */
KOBJ_MOVE   = (__force kobject_action_t) 0x08,  /* device move 
*/
+   KOBJ_RENAME = (__force kobject_action_t) 0x09,  /* device 
renamed */
 };
 
 struct kobject {
diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
--- linux/net/core/net-sysfs.j1.c   2007-02-28 14:26:45.0 -0800
+++ linux/net/core/net-sysfs.c  2007-02-28 14:27:54.0 -0800
@@ -424,6 +424,17 @@ static int netdev_uevent(struct device *
if ((size <= 0) || (i >= num_envp))
return -ENOMEM;
 
+   /* pass ifindex to uevent.
+* ifindex is useful as it won't change (interface name may change)
+* and is what RtNetlink uses natively. */
+   envp[i++] = buf;
+   n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
+   buf += n;
+   size -= n;
+
+   if ((size <= 0) || (i >= num_envp))
+   return -ENOMEM;
+
envp[i] = NULL;
return 0;
 }
diff -u -p linux/lib/kobject_uevent.j1.c linux/lib/kobject_uevent.c
--- linux/lib/kobject_uevent.j1.c   2007-02-28 14:26:58.0 -0800
+++ linux/lib/kobject_uevent.c  2007-02-28 14:27:54.0 -0800
@@ -52,6 +52,8 @@ static char *action_to_string(enum kobje
return "online";
case KOBJ_MOVE:
return "move";
+   case KOBJ_RENAME:
+   return "rename";
default:
return NULL;
}
diff -u -p linux/drivers/base/core.j1.c linux/drivers/base/core.c
--- linux/drivers/base/core.j1.c2007-02-28 15:45:45.0 -0800
+++ linux/drivers/base/core.c   2007-02-28 15:47:30.0 -0800
@@ -1007,6 +1007,8 @@ int device_rename(struct device *dev, ch
char *new_class_name = NULL;
char *old_symlink_name = NULL;
int error;
+   char *devname_string = NULL;
+   char *envp[2];
 
dev = get_device(dev);
if (!dev)
@@ -1014,6 +1016,15 @@ int device_rename(struct device *dev, ch
 
pr_debug("DEVICE: renaming '%s' to '%s'\n", dev->bus_id, new_name);
 
+   devname_string = kmalloc(strlen(dev->bus_id) + 15, GFP_KERNEL);
+   if (!devname_string) {
+   put_device(dev);
+   return -ENOMEM;
+   }
+   sprintf(devname_string, "INTERFACE_OLD=%s", dev->bus_id);
+   envp[0] = devname_string;
+   envp[1] = NULL;
+
 #ifdef CONFIG_SYSFS_DEPRECATED
if ((dev->class) && (dev->parent))
old_class_name = make_class_name(dev->class->name, &dev->kobj);
@@ -1049,12 +1060,20 @@ int device_rename(struct device *dev, ch
sysfs_create_link(&dev->class->subsys.kset.kobj, &dev->kobj,
  dev->bus_id);
}
+
+   /* This function is only used for network interface.
+* Some hotplug package track interfaces by their name and
+* therefore want to know when the name is changed by the user. */
+   if(!error)
+   kobject_uevent_env(&dev->kobj, KOBJ_RENAME, envp);
+
put_device(dev);
 
kfree(new_class_name);
kfree(old_symlink_name);
  out_free_old_class:
kfree(old_class_name);
+   kfree(devname_string);
 
return error;
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth

On Wed, Feb 28, 2007 at 03:11:03PM -0800, Stephen Hemminger wrote:
> On Wed, 28 Feb 2007 15:40:31 -0700
> "Dale Farnsworth" <[EMAIL PROTECTED]> wrote:
> 
> > The information contained within platform_data should be self-contained.
> > Replace the pointer to a MAC address with the actual MAC address in
> > struct mv643xx_eth_platform_data.
> > 
> > Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>
> > 
> > Index: b/drivers/net/mv643xx_eth.c
> > ===
> > --- a/drivers/net/mv643xx_eth.c
> > +++ b/drivers/net/mv643xx_eth.c
> > @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
> >  
> > pd = pdev->dev.platform_data;
> > if (pd) {
> > -   if (pd->mac_addr)
> > +   static u8 zero_mac_addr[6] = { 0 };
> > +
> > +   if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
> > memcpy(dev->dev_addr, pd->mac_addr, 6);
> 
> 
> is_zero_ether_addr() is faster/cleaner for this

Thanks.  I follow up with a modified patch in a day or two.

-Dale
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

PATCH: Second try at vlan mailing list patch.

2007-02-28 Thread Ben Greear


Hopefully, by attaching it as a file it will not screw up the tabs & spaces.

Signed-off-by:  Ben Greear <[EMAIL PROTECTED]>

Thanks,
Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 18fcb9f..c4209c8 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -3,7 +3,8 @@
  *		Ethernet-type device handling.
  *
  * Authors:	Ben Greear <[EMAIL PROTECTED]>
- *  Please send support related email to: [EMAIL PROTECTED]
+ *  Please send support related email to: [EMAIL PROTECTED]
+ *after subscribing using the link below.
  *  VLAN Home Page: http://www.candelatech.com/~greear/vlan.html
  * 
  * Fixes:
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index e49e252..203cd54 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -3,7 +3,8 @@
  *		Ethernet-type device handling.
  *
  * Authors:	Ben Greear <[EMAIL PROTECTED]>
- *  Please send support related email to: [EMAIL PROTECTED]
+ *  Please send support related email to: [EMAIL PROTECTED]
+ *after subscribing using the web page below.
  *  VLAN Home Page: http://www.candelatech.com/~greear/vlan.html
  * 
  * Fixes:   Mar 22 2001: Martin Bokaemper <[EMAIL PROTECTED]>

Re: [PATCH 4/5] r8169: more alignment for the 0x8168

2007-02-28 Thread Francois Romieu

Francois Romieu <[EMAIL PROTECTED]> :
[...]

The experimental r8169 patch of the day against 2.6.21-rc2 is available at:
http://www.fr.zoreil.com/linux/www.fr.zoreil.com/people/francois/misc/20070228-2.6.21-rc2-r8169-test.patch
(single patch)

or:
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc2
(series)

Log below:

commit 6686d80d6972cd5ff3ca81b72c46f4ffcc40eb4c
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Wed Feb 28 23:16:57 2007 +0100

r8169: align the IP header when there is no DMA constraint

Align the IP header when the chipset can DMA at any location (plain 0x8169).
Otherwise (0x8136/0x8168) obey the constraint imposed by the hardware.

This patch complements the previous alignment rework done for copybreak.

Original idea from Philip Craig <[EMAIL PROTECTED]>

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Philip Craig <[EMAIL PROTECTED]>
Cc: Mike Isely <[EMAIL PROTECTED]>

commit d20a6ba195172f7fb9fd30832a054effb9773bc3
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Fri Feb 23 23:50:28 2007 +0100

r8169.c: add bit description for the TxPoll register

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit 37dc1270eba2874a00564abe0d857429af5370f2
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Fri Feb 23 23:24:55 2007 +0100

r8169: MSI support

It is currently limited to 0x8136 and 0x8168. 8169sb/8110sb ought to
handle it as well where they support MSI.

Includes unregister_netdev() fix from Bernhard Walle <[EMAIL PROTECTED]>
against BUG_ON(irq_has_action(dev->first_msi_irq)) (2007/02/24).

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Fixed-by: Bernhard Walle <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit b388fb659dc5803cdb2293649e25807f88ba94ec
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Wed Feb 21 22:40:46 2007 +0100

r8169: cleanup

No functionnal change:
- trim the old history log
- whitespace/indent/case police
- unsigned int where signedness does not matte
- removal of obsolete assert
- needless cast from void * (dev_instance)

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit 090121a1d8b9452fd454fa44ba67d9761a6e8f1e
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Wed Feb 21 00:10:20 2007 +0100

r8169: remove the media option

It has been documented as deprecated:
- in MODULE_PARM_DESC since may 2005 ;
- at the top of the source file and in printk since june 2004.

Good bye.

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit 5231dd72b4d9551c6cd8baa9b7026a1f21b12052
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Tue Feb 20 22:58:51 2007 +0100

r8169: small 8101 comment

Extracted from version 1.001.00 of Realtek's r8101.

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit 20be52f668774727ba1d1a3606cc2888f66d40bf
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Tue Feb 20 22:20:51 2007 +0100

r8169: confusion between hardware and IP header alignment

The rx copybreak part is straightforward.

The align field in struct rtl_cfg_info is related to the alignment
requirements of the DMA operation. Its value is set at 2 to limit the
scale of possible regression but my old v1.21 8169 datasheet claims a
8 bytes requirements (that was never followed by the driver of course)
and the 8101/8168 go with a plain 8 bytes alignments. Yuck...

/me waits for the attack of ballistic vegetables...

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit b0ee36861173a3ac57017c8a3850ad21a4c1acf6
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Tue Feb 20 00:00:26 2007 +0100

r8169: merge with version 8.001.00 of Realtek's r8168 driver

This one includes:

- more tweaks to rtl_hw_start_8168

- a work around for a Rx FiFO overflow issue on the 8168Bb
  + rtl8169_{intr_mask/napi_event} are replaced with per-device fields
  + rtl_cfg_info is converted to C99 for readability but the values are
not changed for the 8169/8110 and the 8101

Includes ChipCmd fix from Bernhard Walle <[EMAIL PROTECTED]> (2007/02/24).

Signed-off-by: Francois Romieu <[EMAIL PROTECTED]>
Cc: Edward Hsu <[EMAIL PROTECTED]>

commit 9d4139624a1c2ae138ea043083263c84d14bbd3a
Author: Francois Romieu <[EMAIL PROTECTED]>
Date:   Tue Feb 13 23:38:05 2007 +0100

r8169: merge with version 6.001.00 of Realtek's r8169 driver

- new identif

Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Stephen Hemminger

On Wed, 28 Feb 2007 15:40:31 -0700
"Dale Farnsworth" <[EMAIL PROTECTED]> wrote:

> The information contained within platform_data should be self-contained.
> Replace the pointer to a MAC address with the actual MAC address in
> struct mv643xx_eth_platform_data.
> 
> Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>
> 
> Index: b/drivers/net/mv643xx_eth.c
> ===
> --- a/drivers/net/mv643xx_eth.c
> +++ b/drivers/net/mv643xx_eth.c
> @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
>  
>   pd = pdev->dev.platform_data;
>   if (pd) {
> - if (pd->mac_addr)
> + static u8 zero_mac_addr[6] = { 0 };
> +
> + if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
>   memcpy(dev->dev_addr, pd->mac_addr, 6);


is_zero_ether_addr() is faster/cleaner for this

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: need some help on a backport of r8169

2007-02-28 Thread Francois Romieu

[EMAIL PROTECTED] <[EMAIL PROTECTED]> :
[...]
> The result is as follows :
> I boot my new kernel : the r8169 driver is automatically loaded and
> find the network card and gives me an eth0.
> I do a ifconfig, eth0 is up, with an IP and RX and TX are not 0.

Interesting.

> The problem comes here, I do a ping and it seems to have just the time
> to make the DNS resolution but not further. When I do a new ifconfig,
> the TX dropped is not 0 anymore. Then I can turn up and down my
> interface, I won't be able to ping anything.

Ok, almost perfect for a first try. :o)

If you can issue 'ifconfig' and do an ethtool dump of the registers at
the interesting points in time, it could surely help.

[...]
> Ah... poor me who thought that the RTL8168 was just like the RTL8169 with
> a pci express interface... It seems that a PCI-Express RTL8169 also
> exist right?

Remind me to check it later.

[...]
> Do you think my problem is the one you mentionned above, without the
> experimental patches?

It is possible. I should review the diffs too.

Once you have logged the ifconfig/ethtool dump, you can try the serie
or the patch at:

http://www.fr.zoreil.com/people/francois/backport/r8169/20070228-00

Btw:

[...dmesg dump...]
> Enabling fast FPU save and restore... done.
> Enabling unmasked SIMD FPU exception support... done.
> Checking 'hlt' instruction... OK.
> ACPI: setting ELCR to 0200 (from 0c08)
> NET: Registered protocol family 16
> PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2
> PCI: Using MMCONFIG

Please disable MMCONFIG.

If you have any PCI latency option in your bios, set it to 64.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] mv643xx_eth: Place explicit port number in mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth

We had been using the platform_device.id field to identify which ethernet
port is used for mv643xx_eth device.  This is not correct in general.
It will be incorrect, for example, if a hardware platform uses a single
port but not the first port.  Here, we add an explicit port_number field
to struct mv643xx_eth_platform_data.

This makes the mv643xx_eth_platform_data structure required, but that
isn't an issue since all users currently provide it already.

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

diff --git a/arch/mips/momentum/jaguar_atx/platform.c 
b/arch/mips/momentum/jaguar_atx/platform.c
Index: b/arch/mips/momentum/jaguar_atx/platform.c
===
--- a/arch/mips/momentum/jaguar_atx/platform.c
+++ b/arch/mips/momentum/jaguar_atx/platform.c
@@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth2_pd;
+static struct mv643xx_eth_platform_data eth2_pd = {
+   .port_number= 2,
+};
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
Index: b/arch/mips/momentum/ocelot_3/platform.c
===
--- a/arch/mips/momentum/ocelot_3/platform.c
+++ b/arch/mips/momentum/ocelot_3/platform.c
@@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth2_pd;
+static struct mv643xx_eth_platform_data eth2_pd = {
+   .port_number= 2,
+};
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
Index: b/arch/mips/momentum/ocelot_c/platform.c
===
--- a/arch/mips/momentum/ocelot_c/platform.c
+++ b/arch/mips/momentum/ocelot_c/platform.c
@@ -47,6 +47,8 @@ static struct resource mv64x60_eth0_reso
 };
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -76,6 +78,8 @@ static struct resource mv64x60_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
+
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
Index: b/arch/powerpc/platforms/chrp/pegasos_eth.c
===
--- a/arch/powerpc/platforms/chrp/pegasos_eth.c
+++ b/arch/powerpc/platforms/chrp/pegasos_eth.c
@@ -58,6 +58,7 @@ static struct resource mv643xx_eth0_reso
 
 
 static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
.tx_sram_addr = PEGASOS2_SRAM_BASE_ETH0,
.tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE,
.tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16,
@@ -87,6 +88,7 @@ static struct resource mv643xx_eth1_reso
 };
 
 static struct mv643xx_eth_platform_data eth1_pd = {
+   .port_number= 1,
.tx_sram_addr = PEGASOS2_SRAM_BASE_ETH1,
.tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE,
.tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16,
Index: b/arch/ppc/syslib/mv64x60.c
===
--- a/arch/ppc/syslib/mv64x60.c
+++ b/arch/ppc/syslib/mv64x60.c
@@ -339,7 +339,9 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static struct mv643xx_eth_platform_data eth0_pd;
+static struct mv643xx_eth_platform_data eth0_pd = {
+   .port_number= 0,
+};
 
 static struct platform_device eth0_device = {
.name   = MV643XX_ETH_NAME,
@@ -362,7 +364,9 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static struct mv643xx_et

[PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data

2007-02-28 Thread Dale Farnsworth

The information contained within platform_data should be self-contained.
Replace the pointer to a MAC address with the actual MAC address in
struct mv643xx_eth_platform_data.

Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]>

Index: b/drivers/net/mv643xx_eth.c
===
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat
 
pd = pdev->dev.platform_data;
if (pd) {
-   if (pd->mac_addr)
+   static u8 zero_mac_addr[6] = { 0 };
+
+   if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0)
memcpy(dev->dev_addr, pd->mac_addr, 6);
 
if (pd->phy_addr || pd->force_phy_addr)
Index: b/include/linux/mv643xx.h
===
--- a/include/linux/mv643xx.h
+++ b/include/linux/mv643xx.h
@@ -1289,7 +1289,6 @@ struct mv64xxx_i2c_pdata {
 #define MV643XX_ETH_NAME   "mv643xx_eth"
 
 struct mv643xx_eth_platform_data {
-   char*mac_addr;  /* pointer to mac address */
u16 force_phy_addr; /* force override if phy_addr == 0 */
u16 phy_addr;
 
@@ -1304,6 +1303,7 @@ struct mv643xx_eth_platform_data {
u32 tx_sram_size;
u32 rx_sram_addr;
u32 rx_sram_size;
+   u8  mac_addr[6];/* mac address if non-zero*/
 };
 
 #endif /* __ASM_MV643XX_H */
Index: b/arch/mips/momentum/jaguar_atx/platform.c
===
--- a/arch/mips/momentum/jaguar_atx/platform.c
+++ b/arch/mips/momentum/jaguar_atx/platform.c
@@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static char eth0_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth0_pd = {
-   .mac_addr   = eth0_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static char eth1_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth1_pd = {
-   .mac_addr   = eth1_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static char eth2_mac_addr[ETH_ALEN];
-
-static struct mv643xx_eth_platform_data eth2_pd = {
-   .mac_addr   = eth2_mac_addr,
-};
+static struct mv643xx_eth_platform_data eth2_pd;
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
@@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo
int ret;
 
get_mac(mac);
-   eth_mac_add(eth0_mac_addr, mac, 0);
-   eth_mac_add(eth1_mac_addr, mac, 1);
-   eth_mac_add(eth2_mac_addr, mac, 2);
+   eth_mac_add(eth0_pd.mac_addr, mac, 0);
+   eth_mac_add(eth1_pd.mac_addr, mac, 1);
+   eth_mac_add(eth2_pd.mac_addr, mac, 2);
ret = platform_add_devices(mv643xx_eth_pd_devs,
ARRAY_SIZE(mv643xx_eth_pd_devs));
 
Index: b/arch/mips/momentum/ocelot_3/platform.c
===
--- a/arch/mips/momentum/ocelot_3/platform.c
+++ b/arch/mips/momentum/ocelot_3/platform.c
@@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso
},
 };
 
-static char eth0_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth0_pd = {
-   .mac_addr   = eth0_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH0,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso
},
 };
 
-static char eth1_mac_addr[ETH_ALEN];
-
 static struct mv643xx_eth_platform_data eth1_pd = {
-   .mac_addr   = eth1_mac_addr,
-
.tx_sram_addr   = MV_SRAM_BASE_ETH1,
.tx_sram_size   = MV_SRAM_TXRING_SIZE,
.tx_queue_size  = MV_SRAM_TXRING_SIZE / 16,
@@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso
},
 };
 
-static char eth2_mac_addr[ETH_ALEN];
-
-static struct mv643xx_eth_platform_data eth2_pd = {
-   .mac_addr   = eth2_mac_addr,
-};
+static struct mv643xx_eth_platform_data eth2_pd;
 
 static struct platform_device eth2_device = {
.name   = MV643XX_ETH_NAME,
@@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo
int ret;
 
get_mac(mac);
-   eth_mac_add(eth0_mac_addr, mac, 0);
-   eth_mac_add(eth1_mac_addr, mac, 1);
-   eth_mac_add(eth2_mac_addr, mac, 2);
+   eth_mac_add(eth0_pd.mac_addr, mac, 0);
+   eth_mac_add(eth1_pd.mac_addr, mac, 1);
+   eth_mac_add(eth2_pd.mac_addr, mac, 2);
ret = platform_

Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable

2007-02-28 Thread Jay Vosburgh

Andy Gospodarek <[EMAIL PROTECTED]> wrote:

>A while back the following change was made to the bonding code:
>
>commit df49898a47061e82219c991dfbe9ac6ddf7a866b
>Author: John W. Linville <[EMAIL PROTECTED]>
>Date:   Tue Oct 18 21:30:58 2005 -0400
>
>[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack
>
>Expand comment explaining MAC address selection for replicated IGMP
>frames transmitted in bonding mode 1 (active-backup).  Also, a small
>whitespace cleanup.
>
>Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
>Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>
>
>In general this patch is good, but this tweaks that feature by allowing
>that functionality to be enabled and disabled.  This patch adds a new
>module option as well as a sysfs entry.  It sets the default to be the
>current behavior so existing users shouldn't notice any difference.

Why would you want to turn this off?

Also, I've got a replacement patch for this functionality that
seems to be better in all regards. It sends bonus IGMP joins when a
failover occurs, rather than simply duplicating them on all slaves (the
current system can leave switches in the dark if the slaves fail back to
the originals).  As chance would have it, I'm planning to post it as
part of a set in a a little while.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bonding: make IGMP flooding on active-backup bonds configurable

2007-02-28 Thread Andy Gospodarek


A while back the following change was made to the bonding code:

commit df49898a47061e82219c991dfbe9ac6ddf7a866b
Author: John W. Linville <[EMAIL PROTECTED]>
Date:   Tue Oct 18 21:30:58 2005 -0400

[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack

Expand comment explaining MAC address selection for replicated IGMP
frames transmitted in bonding mode 1 (active-backup).  Also, a small
whitespace cleanup.

Signed-off-by: John W. Linville <[EMAIL PROTECTED]>
Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

In general this patch is good, but this tweaks that feature by allowing
that functionality to be enabled and disabled.  This patch adds a new
module option as well as a sysfs entry.  It sets the default to be the
current behavior so existing users shouldn't notice any difference.



Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]>
---

 drivers/net/bonding/bond_main.c  |   65 +++
 drivers/net/bonding/bond_sysfs.c |   46 +++
 drivers/net/bonding/bonding.h|1
 include/linux/if_bonding.h   |3 +
 4 files changed, 102 insertions(+), 13 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a7c8f98..b531d4a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -96,6 +96,7 @@ static char *xmit_hash_policy = NULL;
 static int arp_interval = BOND_LINK_ARP_INTERV;
 static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, };
 static char *arp_validate = NULL;
+static char *igmp_flood = NULL;
 struct bond_params bonding_defaults;
 
 module_param(max_bonds, int, 0);
@@ -129,6 +130,8 @@ module_param_array(arp_ip_target, charp, NULL, 0);
 MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form");
 module_param(arp_validate, charp, 0);
 MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes: none 
(default), active, backup or all");
+module_param(igmp_flood, charp, 0);
+MODULE_PARM_DESC(igmp_flood, "flood IGMP control traffic on active-backup 
bonding: yes (default) or no");
 
 /*- Global variables */
 
@@ -180,6 +183,12 @@ struct bond_parm_tbl arp_validate_tbl[] = {
 {  NULL,   -1},
 };
 
+struct bond_parm_tbl igmp_flood_tbl[] = {
+{  "no",   BOND_IGMP_ACTIVEONLY},
+{  "yes",  BOND_IGMP_ALLMEMBERS},
+{  NULL,   -1},
+};
+
 /*-- Forward declarations ---*/
 
 static void bond_send_gratuitous_arp(struct bonding *bond);
@@ -3070,6 +3079,9 @@ static void bond_info_show_slave(struct seq_file *seq, 
const struct slave *slave
   slave->perm_hwaddr[2], slave->perm_hwaddr[3],
   slave->perm_hwaddr[4], slave->perm_hwaddr[5]);
 
+   seq_printf(seq, "IGMP Flood: %s\n", (bond->params.igmp_flood) ? 
+   "yes" : "no");
+   
if (bond->params.mode == BOND_MODE_8023AD) {
const struct aggregator *agg
= SLAVE_AD_INFO(slave).port.aggregator;
@@ -4067,19 +4079,24 @@ static int bond_xmit_activebackup(struct sk_buff *skb, 
struct net_device *bond_d
if (!bond->curr_active_slave)
goto out;
 
-   /* Xmit IGMP frames on all slaves to ensure rapid fail-over
-  for multicast traffic on snooping switches */
-   if (skb->protocol == __constant_htons(ETH_P_IP) &&
-   skb->nh.iph->protocol == IPPROTO_IGMP) {
-   struct slave *slave, *active_slave;
-   int i;
-
-   active_slave = bond->curr_active_slave;
-   bond_for_each_slave_from_to(bond, slave, i, active_slave->next,
-   active_slave->prev)
-   if (IS_UP(slave->dev) &&
-   (slave->link == BOND_LINK_UP))
-   bond_activebackup_xmit_copy(skb, bond, slave);
+   /* Let's make this behavior optional since it causes problems
+  when the links are connected to different switches. */
+   if (bond->params.igmp_flood) {
+
+   /* Xmit IGMP frames on all slaves to ensure rapid fail-over
+  for multicast traffic on snooping switches */
+   if (skb->protocol == __constant_htons(ETH_P_IP) &&
+   skb->nh.iph->protocol == IPPROTO_IGMP) {
+   struct slave *slave, *active_slave;
+   int i;
+
+   active_slave = bond->curr_active_slave;
+   bond_for_each_slave_from_to(bond, slave, i, 
active_slave->next,
+   active_slave->prev)
+   if (IS_UP(slave->dev) &&
+   (slave->link == BOND_LINK_UP))
+

Re: [PATCH 4/5] r8169: more alignment for the 0x8168

2007-02-28 Thread Francois Romieu

Sorry for the delay, I took some time to check the history of the
r8169 alignment issues.

Philip Craig <[EMAIL PROTECTED]> :
[...]
> This only partially helps.  Many of the packets are greater than 200
> bytes so copybreak doesn't apply to them.

Yes.

> Can we assume anything about the alignment of skb->data?  I think it
> should be 4 byte aligned, otherwise the whole NET_IP_ALIGN thing
> won't work.  All the drivers I looked at just reserve NET_IP_ALIGN
> without checking the alignment first.
> 
> So can you do something like set align to 0 for RTL_CFG_0 and change
> rtl8169_alloc_rx_skb() to:
>   skb_reserve(skb, align ? (align - 1) & (u32)skb->data : NET_IP_ALIGN);

The "So" part assumes that the 0x8169 can DMA at any address.

/me ponders...

It's easy to debug if it misbehaves now or in 6 months on some obscure
system. It's consistent with the preprevious code. Ok, good idea, I like it.

[...]
> BTW, should the alignment expression be:
>   (((u32)skb->data + (align - 1)) & ~(align - 1)) - (u32)skb->data

I'll see if something can be hacked with a zero or power of two alignment.

-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 13:08:08 -0800

> I would be happy to see this go.  Have you tried this code with
> a SACK DoS stream? I.e. before you could consume a huge amount
> of cpu time by giving an certain bad sequence of SACK's. This
> code should have a better worst case run time.

No I didn't do that.

In fact this new code wedged my workstation overnight somehow
and I need to debug that at some point as well. :-)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread Stephen Hemminger

On Wed, 28 Feb 2007 11:49:49 -0800 (PST)
David Miller <[EMAIL PROTECTED]> wrote:

> 
> commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be
> Author: David S. Miller <[EMAIL PROTECTED]>
> Date:   Tue Feb 27 19:28:07 2007 -0800
> 
> [TCP]: Kill fastpath_{skb,cnt}_hint.
> 
> Now that we have per-skb fack_counts and an interval
> search mechanism for the retransmit queue, we don't
> need these things any more.
> 
> Instead, as we traverse the SACK blocks to tag the
> queue, we use the RB tree to lookup the first SKB
> covering the SACK block by sequence number.
> 
> Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

I would be happy to see this go.  Have you tried this code with
a SACK DoS stream? I.e. before you could consume a huge amount
of cpu time by giving an certain bad sequence of SACK's. This
code should have a better worst case run time.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4]: Store retransmit queue packets in RB tree.

2007-02-28 Thread David Miller

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 21:33:50 +0100

> I am not sure this rb_node placement is optimal. rb lookups want to access 
> rb_node and end_seq. They should be placed in the same cache line :)

Definitely an area for improvement for sure, but that's not
the are of focus of these changes at this point :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-02-28 Thread James Morris

On Wed, 28 Feb 2007, Paul Moore wrote:

> The current CIPSO engine has a problem where it does not verify that the given
> sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is
> used.  The end result is that bad packets are sent on the wire which should
> have never been sent in the first place.  This patch corrects this problem by
> verifying the sensitivity level mapping similar to what is done with the
> category mapping.  This patch also changes the returned error code in this 
> case
> to -EPERM to better match what the category mapping verification code returns.
> 
> Signed-off-by: Paul Moore <[EMAIL PROTECTED]>

[removed redhat-lspp, which is subscriber only]

Acked-by: James Morris <[EMAIL PROTECTED]>


> ---
>  net/ipv4/cipso_ipv4.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Index: net-2.6_bugfix/net/ipv4/cipso_ipv4.c
> ===
> --- net-2.6_bugfix.orig/net/ipv4/cipso_ipv4.c
> +++ net-2.6_bugfix/net/ipv4/cipso_ipv4.c
> @@ -732,11 +732,12 @@ static int cipso_v4_map_lvl_hton(const s
>   *net_lvl = host_lvl;
>   return 0;
>   case CIPSO_V4_MAP_STD:
> - if (host_lvl < doi_def->map.std->lvl.local_size) {
> + if (host_lvl < doi_def->map.std->lvl.local_size &&
> + doi_def->map.std->lvl.local[host_lvl] < CIPSO_V4_INV_LVL) {
>   *net_lvl = doi_def->map.std->lvl.local[host_lvl];
>   return 0;
>   }
> - break;
> + return -EPERM;
>   }
>  
>   return -EINVAL;
> @@ -771,7 +772,7 @@ static int cipso_v4_map_lvl_ntoh(const s
>   *host_lvl = doi_def->map.std->lvl.cipso[net_lvl];
>   return 0;
>   }
> - break;
> + return -EPERM;
>   }
>  
>   return -EINVAL;
> 
> --
> paul moore
> linux security @ hp
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4]: Store retransmit queue packets in RB tree.

2007-02-28 Thread Eric Dumazet


David Miller a écrit :

commit c387760826bd71103220e06ca7b0bf90a785567e
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Tue Feb 27 16:44:42 2007 -0800

[TCP]: Store retransmit queue packets in RB tree.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..b70fd21 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -232,6 +233,8 @@ struct sk_buff {
struct sk_buff  *next;
struct sk_buff  *prev;
 
+	struct rb_node		rb;

+
struct sock *sk;
struct skb_timeval  tstamp;
struct net_device   *dev;



I am not sure this rb_node placement is optimal. rb lookups want to access 
rb_node and end_seq. They should be placed in the same cache line :)


next/prev were at the begining of sk_buff, there is no such constraint for rb

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping

2007-02-28 Thread Paul Moore

The current CIPSO engine has a problem where it does not verify that the given
sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is
used.  The end result is that bad packets are sent on the wire which should
have never been sent in the first place.  This patch corrects this problem by
verifying the sensitivity level mapping similar to what is done with the
category mapping.  This patch also changes the returned error code in this case
to -EPERM to better match what the category mapping verification code returns.

Signed-off-by: Paul Moore <[EMAIL PROTECTED]>
---
 net/ipv4/cipso_ipv4.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: net-2.6_bugfix/net/ipv4/cipso_ipv4.c
===
--- net-2.6_bugfix.orig/net/ipv4/cipso_ipv4.c
+++ net-2.6_bugfix/net/ipv4/cipso_ipv4.c
@@ -732,11 +732,12 @@ static int cipso_v4_map_lvl_hton(const s
*net_lvl = host_lvl;
return 0;
case CIPSO_V4_MAP_STD:
-   if (host_lvl < doi_def->map.std->lvl.local_size) {
+   if (host_lvl < doi_def->map.std->lvl.local_size &&
+   doi_def->map.std->lvl.local[host_lvl] < CIPSO_V4_INV_LVL) {
*net_lvl = doi_def->map.std->lvl.local[host_lvl];
return 0;
}
-   break;
+   return -EPERM;
}
 
return -EINVAL;
@@ -771,7 +772,7 @@ static int cipso_v4_map_lvl_ntoh(const s
*host_lvl = doi_def->map.std->lvl.cipso[net_lvl];
return 0;
}
-   break;
+   return -EPERM;
}
 
return -EINVAL;

--
paul moore
linux security @ hp

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.

2007-02-28 Thread David Miller


commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Tue Feb 27 19:28:07 2007 -0800

[TCP]: Kill fastpath_{skb,cnt}_hint.

Now that we have per-skb fack_counts and an interval
search mechanism for the retransmit queue, we don't
need these things any more.

Instead, as we traverse the SACK blocks to tag the
queue, we use the RB tree to lookup the first SKB
covering the SACK block by sequence number.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index b73687a..c3f08a5 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -326,9 +326,7 @@ struct tcp_sock {
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
-   struct sk_buff *fastpath_skb_hint;
 
-   int fastpath_cnt_hint;
int lost_cnt_hint;
int retransmit_cnt_hint;
int forward_cnt_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 80a572b..408f210 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1047,12 +1047,12 @@ static inline void tcp_mib_init(void)
 }
 
 /*from STCP */
-static inline void clear_all_retrans_hints(struct tcp_sock *tp){
+static inline void clear_all_retrans_hints(struct tcp_sock *tp)
+{
tp->lost_skb_hint = NULL;
tp->scoreboard_skb_hint = NULL;
tp->retransmit_skb_hint = NULL;
tp->forward_skb_hint = NULL;
-   tp->fastpath_skb_hint = NULL;
 }
 
 /* MD5 Signature */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b919cd7..df69726 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -942,16 +942,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sock *tp = tcp_sk(sk);
unsigned char *ptr = ack_skb->h.raw + TCP_SKB_CB(ack_skb)->sacked;
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
-   struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3;
int reord = tp->packets_out;
int prior_fackets;
u32 lost_retrans = 0;
int flag = 0;
int dup_sack = 0;
-   int cached_fack_count;
+   int fack_count_base;
int i;
-   int first_sack_index;
 
if (!tp->sacked_out)
tp->fackets_out = 0;
@@ -1010,12 +1008,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp->recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
if (flag)
num_sacks = 1;
else {
int j;
-   tp->fastpath_skb_hint = NULL;
 
/* order SACK blocks to allow in order walk of the retrans 
queue */
for (i = num_sacks-1; i > 0; i--) {
@@ -1027,10 +1023,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tmp = sp[j];
sp[j] = sp[j+1];
sp[j+1] = tmp;
-
-   /* Track where the first SACK block 
goes to */
-   if (j == first_sack_index)
-   first_sack_index = j+1;
}
 
}
@@ -1040,22 +1032,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
/* clear flag as used for different purpose in following code */
flag = 0;
 
-   /* Use SACK fastpath hint if valid */
-   cached_skb = tp->fastpath_skb_hint;
-   cached_fack_count = tp->fastpath_cnt_hint;
-   if (!cached_skb) {
-   cached_skb = tcp_write_queue_head(sk);
-   cached_fack_count = 0;
-   }
-
+   fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count;
for (i=0; istart_seq);
__u32 end_seq = ntohl(sp->end_seq);
int fack_count;
 
-   skb = cached_skb;
-   fack_count = cached_fack_count;
+   skb = tcp_write_queue_find(sk, start_seq);
+   if (!skb)
+   continue;
+   fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base;
 
/* Event "B" in the comment above. */
if (after(end_seq, tp->high_seq))
@@ -1068,13 +1055,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (skb == tcp_send_head(sk))
break;
 
-   cached_skb = skb;
-   cached_fack_count = fack_count;
-   if (i == first_sack_index) {
-   tp->fastpath_skb_hint = skb;
-   tp->fastpath_cnt_hint = fa

[PATCH 3/4]: Maintain cached fack counts in retransmit queue.

2007-02-28 Thread David Miller


commit 5fc24957defcc34df8fab6bf62bc1918e54607f8
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Tue Feb 27 17:23:52 2007 -0800

[TCP]: Maintain cached fack counts in retransmit queue.

The fack count of any skb in the retransmit queue at any
given point in time is:

(skb->fack_count - head_skb->fack_count)

And we'll use this in the SACK processing loops.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cce6b0e..80a572b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -532,6 +532,7 @@ struct tcp_skb_cb {
__u32   seq;/* Starting sequence number */
__u32   end_seq;/* SEQ + FIN + SYN + datalen*/
__u32   when;   /* used to compute rtt's*/
+   unsigned intfack_count; /* speed up SACK processing */
__u8flags;  /* TCP header flags.*/
 
/* NOTE: These must match up to the flags byte in a
@@ -1272,6 +1273,12 @@ static inline void tcp_rb_unlink(struct sk_buff *skb, 
struct rb_root *root)
 
 static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff 
*skb)
 {
+   struct sk_buff *tail = tcp_write_queue_tail(sk);
+   unsigned int fc = 0;
+
+   if (tail)
+   fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(skb);
+   TCP_SKB_CB(skb)->fack_count = fc;
__skb_queue_tail(&sk->sk_write_queue, skb);
tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
 }
@@ -1285,18 +1292,44 @@ static inline void tcp_add_write_queue_tail(struct sock 
*sk, struct sk_buff *skb
sk->sk_send_head = skb;
 }
 
+/* This is only used for tcp_send_synack(), so the write queue should
+ * be empty.  If that stops being true, the fack_count assignment
+ * will need to be more elaborate.
+ */
 static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff 
*skb)
 {
+   BUG_ON(!skb_queue_empty(&sk->sk_write_queue));
__skb_queue_head(&sk->sk_write_queue, skb);
+   TCP_SKB_CB(skb)->fack_count = 0;
tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
 }
 
+/* An insert into the middle of the write queue causes the fack
+ * counts in subsequent packets to become invalid, fix them up.
+ */
+static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff 
*first)
+{
+   struct sk_buff *prev = first->prev;
+   unsigned int fc = 0;
+
+   if (prev != (struct sk_buff *) &sk->sk_write_queue)
+   fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev);
+
+   while (first != (struct sk_buff *)&sk->sk_write_queue) {
+   TCP_SKB_CB(first)->fack_count = fc;
+
+   fc += tcp_skb_pcount(first);
+   first = first->next;
+   }
+}
+
 /* Insert buff after skb on the write queue of sk.  */
 static inline void tcp_insert_write_queue_after(struct sk_buff *skb,
struct sk_buff *buff,
struct sock *sk)
 {
__skb_append(skb, buff, &sk->sk_write_queue);
+   tcp_reset_fack_counts(sk, buff);
tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
 }
 
@@ -1306,6 +1339,7 @@ static inline void tcp_insert_write_queue_before(struct 
sk_buff *new,
  struct sock *sk)
 {
__skb_insert(new, skb->prev, skb, &sk->sk_write_queue);
+   tcp_reset_fack_counts(sk, new);
tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb);
 }
 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4]: Store retransmit queue packets in RB tree.

2007-02-28 Thread David Miller


commit c387760826bd71103220e06ca7b0bf90a785567e
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Tue Feb 27 16:44:42 2007 -0800

[TCP]: Store retransmit queue packets in RB tree.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..b70fd21 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -232,6 +233,8 @@ struct sk_buff {
struct sk_buff  *next;
struct sk_buff  *prev;
 
+   struct rb_node  rb;
+
struct sock *sk;
struct skb_timeval  tstamp;
struct net_device   *dev;
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 18a468d..b73687a 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -174,6 +174,7 @@ struct tcp_md5sig {
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -306,6 +307,7 @@ struct tcp_sock {
u32 snd_cwnd_used;
u32 snd_cwnd_stamp;
 
+   struct rb_root  write_queue_rb;
struct sk_buff_head out_of_order_queue; /* Out of order segments go 
here */
 
u32 rcv_wnd;/* Current receiver window  */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 571faa1..cce6b0e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1169,6 +1169,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
 
while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
sk_stream_free_skb(sk, skb);
+   tcp_sk(sk)->write_queue_rb = RB_ROOT;
sk_stream_mem_reclaim(sk);
 }
 
@@ -1193,16 +1194,14 @@ static inline struct sk_buff 
*tcp_write_queue_next(struct sock *sk, struct sk_bu
return skb->next;
 }
 
-#define tcp_for_write_queue(skb, sk)   \
-   for (skb = (sk)->sk_write_queue.next;   \
-(skb != (sk)->sk_send_head) && \
-(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
-skb = skb->next)
+#define tcp_for_write_queue(skb, sk)   \
+   for (skb = (sk)->sk_write_queue.next;   \
+(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
+skb = skb->next)
 
-#define tcp_for_write_queue_from(skb, sk)  \
-   for (; (skb != (sk)->sk_send_head) &&   \
-(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
-skb = skb->next)
+#define tcp_for_write_queue_from(skb, sk)  \
+   for (; (skb != (struct sk_buff *)&(sk)->sk_write_queue);\
+skb = skb->next)
 
 static inline struct sk_buff *tcp_send_head(struct sock *sk)
 {
@@ -1211,7 +1210,7 @@ static inline struct sk_buff *tcp_send_head(struct sock 
*sk)
 
 static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
 {
-   sk->sk_send_head = skb->next;
+   sk->sk_send_head = tcp_write_queue_next(sk, skb);
if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
sk->sk_send_head = NULL;
 }
@@ -1227,9 +1226,54 @@ static inline void tcp_init_send_head(struct sock *sk)
sk->sk_send_head = NULL;
 }
 
+static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq)
+{
+   struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node;
+   struct sk_buff *skb = NULL;
+
+   while (rb_node) {
+   struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb);
+   if (TCP_SKB_CB(tmp)->end_seq > seq) {
+   skb = tmp;
+   if (TCP_SKB_CB(tmp)->seq <= seq)
+   break;
+   rb_node = rb_node->rb_left;
+   } else
+   rb_node = rb_node->rb_right;
+
+   }
+   return skb;
+}
+
+static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root)
+{
+   struct rb_node **rb_link, *rb_parent;
+   __u32 seq = TCP_SKB_CB(skb)->seq;
+
+   rb_link = &root->rb_node;
+   rb_parent = NULL;
+   while ((rb_parent = *rb_link) != NULL) {
+   struct sk_buff *tmp = rb_entry(rb_parent,struct sk_buff,rb);
+   if (TCP_SKB_CB(tmp)->end_seq > seq) {
+   BUG_ON(TCP_SKB_CB(tmp)->seq <= seq);
+   rb_link = &rb_parent->rb_left;
+   } else {
+   rb_link = &rb_parent->rb_right;
+   }
+   }
+   rb_link_node(&skb->rb, rb_parent, rb_link);
+   rb_insert_color(&skb->rb, root);
+}
+
+static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root)
+{
+   rb_erase(&skb->rb, root);
+}
+
 static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff 
*skb)
 {

[PATCH 1/4]: Abstract out TCP write queue operations.

2007-02-28 Thread David Miller


commit 677417ba04ad2ce616a8199e337c8c9bb28f0692
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Tue Feb 27 14:24:25 2007 -0800

[TCP]: Abstract out all write queue operations.

This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/net/sock.h b/include/net/sock.h
index 03684e7..c4023b7 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -710,15 +710,6 @@ static inline void sk_stream_mem_reclaim(struct sock *sk)
__sk_stream_mem_reclaim(sk);
 }
 
-static inline void sk_stream_writequeue_purge(struct sock *sk)
-{
-   struct sk_buff *skb;
-
-   while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
-   sk_stream_free_skb(sk, skb);
-   sk_stream_mem_reclaim(sk);
-}
-
 static inline int sk_stream_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
return (int)skb->truesize <= sk->sk_forward_alloc ||
@@ -1256,18 +1247,6 @@ static inline struct page *sk_stream_alloc_page(struct 
sock *sk)
return page;
 }
 
-#define sk_stream_for_retrans_queue(skb, sk)   \
-   for (skb = (sk)->sk_write_queue.next;   \
-(skb != (sk)->sk_send_head) && \
-(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
-skb = skb->next)
-
-/*from STCP for fast SACK Process*/
-#define sk_stream_for_retrans_queue_from(skb, sk)  \
-   for (; (skb != (sk)->sk_send_head) &&   \
-(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
-skb = skb->next)
-
 /*
  * Default write policy as shown to user space via poll/select/SIGIO
  */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f0c9e34..571faa1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1162,6 +1162,122 @@ static inline void  
tcp_put_md5sig_pool(void)
put_cpu();
 }
 
+/* write queue abstraction */
+static inline void tcp_write_queue_purge(struct sock *sk)
+{
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
+   sk_stream_free_skb(sk, skb);
+   sk_stream_mem_reclaim(sk);
+}
+
+static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)
+{
+   struct sk_buff *skb = sk->sk_write_queue.next;
+   if (skb == (struct sk_buff *) &sk->sk_write_queue)
+   return NULL;
+   return skb;
+}
+
+static inline struct sk_buff *tcp_write_queue_tail(struct sock *sk)
+{
+   struct sk_buff *skb = sk->sk_write_queue.prev;
+   if (skb == (struct sk_buff *) &sk->sk_write_queue)
+   return NULL;
+   return skb;
+}
+
+static inline struct sk_buff *tcp_write_queue_next(struct sock *sk, struct 
sk_buff *skb)
+{
+   return skb->next;
+}
+
+#define tcp_for_write_queue(skb, sk)   \
+   for (skb = (sk)->sk_write_queue.next;   \
+(skb != (sk)->sk_send_head) && \
+(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
+skb = skb->next)
+
+#define tcp_for_write_queue_from(skb, sk)  \
+   for (; (skb != (sk)->sk_send_head) &&   \
+(skb != (struct sk_buff *)&(sk)->sk_write_queue);  \
+skb = skb->next)
+
+static inline struct sk_buff *tcp_send_head(struct sock *sk)
+{
+   return sk->sk_send_head;
+}
+
+static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb)
+{
+   sk->sk_send_head = skb->next;
+   if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue)
+   sk->sk_send_head = NULL;
+}
+
+static inline void tcp_check_send_head(struct sock *sk, struct sk_buff 
*skb_unlinked)
+{
+   if (sk->sk_send_head == skb_unlinked)
+   sk->sk_send_head = NULL;
+}
+
+static inline void tcp_init_send_head(struct sock *sk)
+{
+   sk->sk_send_head = NULL;
+}
+
+static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff 
*skb)
+{
+   __skb_queue_tail(&sk->sk_write_queue, skb);
+}
+
+static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff 
*skb)
+{
+   __tcp_add_write_queue_tail(sk, skb);
+
+   /* Queue it, remembering where we must start sending. */
+   if (sk->sk_send_head == NULL)
+   sk->sk_send_head = skb;
+}
+
+static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff 
*skb)
+{
+   __skb_queue_head(&sk->sk_write_queue, skb);
+}
+
+/* Insert buff after skb on the write queue of sk.  */
+static inline void tcp_insert_write_queue_after(struct sk_buff *skb,
+   struct sk_buff *buff,
+

[PATCH 0/4]: Store TCP retransmit queue in RB-tree

2007-02-28 Thread David Miller


I'd had this idea in the back of my head for a while and
finally I tried it out yesterday to see how it would look.

Basically, we store the write queue packets in an RB tree
keyed by start sequence number.  The objective is to use
this information to parse the SACK blocks more efficiently
and get rid of the "hints" we use to optimize that code
currently.

The big win is that we can now find the start of any sequence of
packets in the retransmit queue in O(log n) time.  The downsides
are numerous, such as:

1) Increased state stored in sk_buff, we need to store an rb_node
   there which is 3 points :-(((

   It is possible that perhaps we could alias the rb_node with the
   existing next/prev pointers in sk_buff, but like the VMA code
   in the Linux MM, I decided to keep the linked list around since
   that's the fastest for all the other operations.  rb_next()
   and rb_prev() would need to be used otherwise, and those aren't
   the cheapest.

2) We eat a O(log n) insert and delete for every packet now, even
   when no SACK processing occurs.

   I think this part can be sped up.  We insert to the tail on
   new packets, so we can take the existing tail packet and do
   something similar to "rb_next()" to find the insertion point.
   But we'd still have the cost of the potential tree rotations.

Even if none of the RB stuff is useful, the first patch which
abstracts all of the write queue accesses should probably go in
because it allows experimentation in this area to be done quite
effortlessly.

One thing I'm not sure about wrt. the RB tree stuff is that although
we key on start sequence of the SKB, we do change this when trimming
the head of TSO packets.  I think this is "OK" because such changes
would never change the position of the SKB within the RB tree, but I'd
like someone else to think that is true too :-) Worst case we could
key on end sequence which never ever changes.  Actually, the whole
case of tcp SKB chopping and mid-queue insert/delete, and it's effect
on the RB tree entires needs to be audited if we are to take this
idea seriously.

I wonder if there is an algorithm better suited to this application.
It's an interval search, which needs fast insert to the tail and
fast delete from the head.

Another aspect of this patch are the per-SKB packet counts (I
named them "fack_count" but I'd rename that to "packet_count"
when I ever commited it for real).  The idea is that this can
eliminate most of the packet count hints in the tcp_sock.
The algorithm is simple:

1) On skb insert:
   if write queue empty, assign packet_count of zero
   else, assign packet_count of
tail_skb->packet_count + tcp_skb_pcount(tail_skb)
2) To get normalized packet_count of arbitrary SKB:
(skb->packet_count - head_skb->packet_count)

You can see in patch 4 how fastpath_cnt_hint is replaced by
that logic.

I added the packet count to TCP_SKB_CB() and that takes it up to the
limit of 48 bytes of skb->cb[] on 64-bit architectures.  I wanted
to steal TCP_SKB_CB()->ack_seq for this, but that's used for some
SACK logic already.

There are some pains necessary to keep the counts correct, and in
some cases I just recalculate the whole queue's packet counts
after the insert point.  I'm sure this can be improved.

Back to the RB tree, there is another way to go after at least the
SACK processing overhead, and that is to maintain a not-sacked
table which is the inverse of the SACK blocks.  There are a few
references out there which discuss that kind of approach.

Some of the other hints are hard to eliminate.  In particular the
cached retransmit state is non-trivial to replace with logic that
does not require state.  The RB tree can't help, and we can't even
use the per-SKB packet_count for the count hints because one of them
wants to remember how many packets in the queue were marked lost,
for example.

If we could get rid of all the hints, that would be an easy
44 bytes killed in tcp_sock on 64-bit.

Anyways, here come the patches, enjoy.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Missing VLAN tags in bnx2

2007-02-28 Thread Pekka Pietikainen

Just had to spend some time figuring out why a bnx2 card connected to
a switch monitor port didn't see any vlan tags (when in our scenario the
tags are pretty vital).  Found the following explanation:

[BNX2]: Fix VLAN on ASF

Always set up the device to strip incoming VLAN tags when ASF is
enabled. ASF firmware will not parse packets correctly if VLAN tags
are not stripped

My fix:

#ifdef I_DONT_KNOW_WHAT_ASF_IS_AND_DONT_REALLY_CARE_EITHER
if (REG_RD_IND(bp, bp->shmem_base + BNX2_PORT_FEATURE) &
BNX2_PORT_FEATURE_ASF_ENABLED)
bp->flags |= ASF_ENABLE_FLAG;
#endif

Any hope of getting this as a ethtool tunable or something similar?
There didn't seem to be a BIOS option for this (Dell PE2900), at least.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP minisock tcp_create_openreq_child() typo?

2007-02-28 Thread David Miller

From: "Arnaldo Carvalho de Melo" <[EMAIL PROTECTED]>
Date: Wed, 28 Feb 2007 09:10:10 -0300

> On 2/28/07, KOVACS Krisztian <[EMAIL PROTECTED]> wrote:
> >
> >   Hi,
> >
> >   While reading TCP minisock code I've found this suspiciously looking
> > code fragment:
> >
> > - 8< -
> > struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock 
> > *req, struct sk_buff *skb)
> > {
> > struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);
> >
> > if (newsk != NULL) {
> > const struct inet_request_sock *ireq = inet_rsk(req);
> > struct tcp_request_sock *treq = tcp_rsk(req);
> > struct inet_connection_sock *newicsk = inet_csk(sk);
> > struct tcp_sock *newtp;
> > - 8< -
> >
> >   The above code initializes newicsk to inet_csk(sk), isn't that supposed
> > to be inet_csk(newsk)?  As far as I can tell this might leave
> > icsk_ack.last_seg_size zero even if we do have received data.
> 
> Good catch!
> 
> David, please apply the attached patch.
> 
> Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

Applied, thanks everyone.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg

On Wed, 2007-02-28 at 10:51 -0800, Jean Tourrilhes wrote:

>   That's why I always specify the kernel version. I'll look into
> that, I'm sure it's not the end of the world ;-)

Sure, just wanted to point it out.

>   In which sense ? Wireless interface are regular netdevices.

Yeah but in mac80211 we have the wiphy concept since multiple virtual
interfaces can be associated to one hardware, and that is where QoS is
done, not the netdevs. Of course, those interested can just listen to
nl80211 events to figure out if someone renamed a 802.11 phy, but things
like hal would probably not want to and still know about the name
change.

>   I'm just trying to follow the established pattern. Both
> class_device_add() and class_device_del() are generating the
> event. Also, I'm not sure if other subsystem would benefit from it, I
> don't want to generate too many useless events.

I don't think many other subsystems (can) rename things ;)

johannes

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] Fix broken RBTX4927 support in ne.c

2007-02-28 Thread Ralf Baechle

On Thu, Mar 01, 2007 at 01:22:23AM +0900, Atsushi Nemoto wrote:

> There are some ifdefs for RBTX4927, but need some more bits.

Acked-by: Ralf Baechle <[EMAIL PROTECTED]>

Longer term I think NE2000 will need to support platform_devices.  It's
been used too widely in too creative ways and we don't want all the
clutter to deal with that in ne.c.

  Ralf
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes

On Wed, Feb 28, 2007 at 10:16:05AM +0100, Johannes Berg wrote:
> Hi,
> 
> > Patch for 2.6.20 is attached.
> 
> ... and in the meantime netdevices aren't class_device any more :) IOW,
> your patch isn't going to work any more.

That's why I always specify the kernel version. I'll look into
that, I'm sure it's not the end of the world ;-)

> Also, I think wireless could benefit from this as well.

In which sense ? Wireless interface are regular netdevices.

> > The kobject framework is well designed, so adding these
> > features is trivial change and won't run the risk of breaking anything
> > (famous last words). Obviously, hotplug apps are free to ignore those
> > additional features.
> 
> Why not just add this to base kobject_rename instead? That way,
> userspace is notified for all renames in sysfs.
> The patch then collapses down to the change in net's sysfs code to add
> the ifindex to the environment, and another change in kobject to invoke
> a new event when a name changes and show the old name.

I'm just trying to follow the established pattern. Both
class_device_add() and class_device_del() are generating the
event. Also, I'm not sure if other subsystem would benefit from it, I
don't want to generate too many useless events.

> johannes

Thanks !

Jean

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes

On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote:
> On 28-02-2007 02:27, Jean Tourrilhes wrote:
> > Hi all,
> ...
> > Patch for 2.6.20 is attached. The patch was tested on a system
> > running the hotplug scripts, and on another system running udev.
> > 
> > Have fun...
> > 
> > Jean
> > 
> > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>
> > 
> > -
> ...
> > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
> > --- linux/net/core/net-sysfs.j1.c   2007-02-27 15:01:08.0 -0800
> > +++ linux/net/core/net-sysfs.c  2007-02-27 15:06:49.0 -0800
> > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de
> > if ((size <= 0) || (i >= num_envp))
> > return -ENOMEM;
> >  
> > +   /* pass ifindex to uevent.
> > +* ifindex is useful as it won't change (interface name may change)
> > +* and is what RtNetlink uses natively. */
> > +   envp[i++] = buf;
> > +   n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
> > +   buf += n;
> > +   size -= n;
> > +
> > +   if ((size <= 0) || (i >= num_envp))
> 
> Btw.:
> 1. if size == 10 and snprintf returns 9 (without NULL)
>then n == 10 (with NULL), so isn't it enough (here and above):
>  
>   if ((size < 0) || (i >= num_envp))

I just cut'n'pasted the code a few line above. If the original
code is incorrect, it need fixing. And it will need fixing in probably
a lot of places.

> 2. shouldn't there be (here and above):
>  
>   envp[--i] = NULL;
> 

No, envp is local, so who cares.
Thanks.

Jean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jean Tourrilhes

On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote:
> On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote:
> > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> > --- linux/drivers/base/class.j1.c   2007-02-26 18:38:10.0 -0800
> > +++ linux/drivers/base/class.c  2007-02-27 15:52:37.0 -0800
> > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev
> 
> This function is not in the 2.6.21-rc2 kernel, so you might want to
> rework this patch a bit :)

It was a trial balloon to gather feedback. I will do.

> Also, it's userspace that causes the rename to happen, so it knows it
> did it, why should the kernel have to emit a message to tell userspace
> again what just happened?

Username is not one big program, but a collection of program,
and one program does not know what another program do.
In particular, udev does not know when people are using
iproute2 to rename interface and loose its marbles. We don't really
want to ban iproute2 or udev ;-)

> thanks,
> 
> greg k-h

Have fun...

Jean
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread Stephen Hemminger

On Wed, 28 Feb 2007 14:37:49 +0100
John <[EMAIL PROTECTED]> wrote:

> John wrote:
> 
> > I know it's possible to have Linux timestamp incoming datagrams as soon
> > as they are received, then for one to retrieve this timestamp later with
> > an ioctl command or a recvmsg call.
> 
> Has it ever been proposed to modify struct skb_timeval to hold 
> nanosecond stamps instead of just microsecond stamps? Then make the 
> improved precision somehow available to user space.
> 

I am playing with a couple of possible future changes.
1. Change skb timestamp to be a timespec instead of timeval, for ABI
   compatiablity the existing SO_TIMESTAMP has to stay microseconds,
   but add a new SO_TIMESPEC to get the nanosecond version.

   The change gets non trivial because of other uses of timestamp (like vegas)
   so I gave up for now.

2. Use hardware receive timestamp in Yukon2 to put actual receive time
   into skb timestamp.  Works, but still figuring out how to manage
   clock skew/resync.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP

2007-02-28 Thread Jan-Bernd Themann

This patch provides a functionality that allows parallel 
RX processing on multiple RX queues by using dummy netdevices.


Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
---


diff -Nurp -X dontdiff linux-2.6.21-rc1/drivers/net/ehea/ehea.h 
patched_kernel/drivers/net/ehea/ehea.h
--- linux-2.6.21-rc1/drivers/net/ehea/ehea.h2007-02-28 18:20:06.0 
+0100
+++ patched_kernel/drivers/net/ehea/ehea.h  2007-02-28 18:21:23.0 
+0100
@@ -39,7 +39,7 @@
 #include 
 
 #define DRV_NAME   "ehea"
-#define DRV_VERSION"EHEA_0048"
+#define DRV_VERSION"EHEA_0052"
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
@@ -78,8 +78,6 @@
 #define EHEA_RQ2_PKT_SIZE   1522
 #define EHEA_L_PKT_SIZE 256/* low latency */
 
-#define EHEA_POLL_MAX_RWQE  1000
-
 /* Send completion signaling */
 #define EHEA_SIG_IV_LONG   1
 
@@ -357,8 +355,8 @@ struct ehea_port_res {
struct ehea_qp *qp;
struct ehea_cq *send_cq;
struct ehea_cq *recv_cq;
-   struct ehea_eq *send_eq;
-   struct ehea_eq *recv_eq;
+   struct ehea_eq *eq;
+   struct net_device *d_netdev;
spinlock_t send_lock;
struct ehea_q_skb_arr rq1_skba;
struct ehea_q_skb_arr rq2_skba;
@@ -372,7 +370,6 @@ struct ehea_port_res {
int swqe_count;
u32 swqe_id_counter;
u64 tx_packets;
-   struct tasklet_struct send_comp_task;
spinlock_t recv_lock;
struct port_state p_state;
u64 rx_packets;
@@ -416,7 +413,9 @@ struct ehea_port {
char int_aff_name[EHEA_IRQ_NAME_SIZE];
int allmulti;/* Indicates IFF_ALLMULTI state */
int promisc; /* Indicates IFF_PROMISC state */
+   int num_tx_qps;
int num_add_tx_qps;
+   int num_mcs;
int resets;
u64 mac_addr;
u32 logical_port_id;
diff -Nurp -X dontdiff linux-2.6.21-rc1/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.21-rc1/drivers/net/ehea/ehea_main.c   2007-02-28 
18:20:06.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-02-28 18:21:29.0 
+0100
@@ -51,13 +51,18 @@ static int rq1_entries = EHEA_DEF_ENTRIE
 static int rq2_entries = EHEA_DEF_ENTRIES_RQ2;
 static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
+static int use_mcs = 0;
+static int num_tx_qps = EHEA_NUM_TX_QP;
 
 module_param(msg_level, int, 0);
 module_param(rq1_entries, int, 0);
 module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
+module_param(use_mcs, int, 0);
+module_param(num_tx_qps, int, 0);
 
+MODULE_PARM_DESC(num_tx_qps, "Number of TX-QPS");
 MODULE_PARM_DESC(msg_level, "msg_level");
 MODULE_PARM_DESC(rq3_entries, "Number of entries for Receive Queue 3 "
 "[2^x - 1], x = [6..14]. Default = "
@@ -71,6 +76,7 @@ MODULE_PARM_DESC(rq1_entries, "Number of
 MODULE_PARM_DESC(sq_entries, " Number of entries for the Send Queue  "
 "[2^x - 1], x = [6..14]. Default = "
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ")");
+MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 1 ");
 
 void ehea_dump(void *adr, int len, char *msg) {
int x;
@@ -197,7 +203,7 @@ static int ehea_refill_rq_def(struct ehe
struct sk_buff *skb = netdev_alloc_skb(dev, packet_size);
if (!skb) {
ehea_error("%s: no mem for skb/%d wqes filled",
-  dev->name, i);
+  pr->port->netdev->name, i);
q_skba->os_skbs = fill_wqes - i;
ret = -ENOMEM;
break;
@@ -345,10 +351,11 @@ static int ehea_treat_poll_error(struct 
return 0;
 }
 
-static int ehea_poll(struct net_device *dev, int *budget)
+static struct ehea_cqe *ehea_proc_rwqes(struct net_device *dev,
+   struct ehea_port_res *pr,
+   int *budget)
 {
-   struct ehea_port *port = netdev_priv(dev);
-   struct ehea_port_res *pr = &port->port_res[0];
+   struct ehea_port *port = pr->port;
struct ehea_qp *qp = pr->qp;
struct ehea_cqe *cqe;
struct sk_buff *skb;
@@ -359,14 +366,12 @@ static int ehea_poll(struct net_device *
int skb_arr_rq2_len = pr->rq2_skba.len;
int skb_arr_rq3_len = pr->rq3_skba.len;
int processed, processed_rq1, processed_rq2, processed_rq3;
-   int wqe_index, last_wqe_index, rq, intreq, my_quota, port_reset;
+   int wqe_index, last_wqe_index, rq, my_quota, port_reset;
 
processed = processed_rq1 = processed_rq2 = processed_rq3 = 0;
last_wqe_index = 0;
my_quota = min(*budget, dev->quota);
-   my_quota = min(my_quota, EHEA_POLL_MAX_RWQE);
 
-   /* rq0 i

[PATCH 1/2] ehea: dynamic add / remove port

2007-02-28 Thread Jan-Bernd Themann

This patch introduces functionality to dynamically add / remove
ehea ports via an userspace DLPAR tool. It creates a subnode for
each logical port in the sysfs. 


Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>
---


diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 42295d6..e595d6b 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@ #include 
 #include 
 
 #define DRV_NAME   "ehea"
-#define DRV_VERSION"EHEA_0046"
+#define DRV_VERSION"EHEA_0048"
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
@@ -380,10 +380,11 @@ struct ehea_port_res {
 };
 
 
+#define EHEA_MAX_PORTS 16
 struct ehea_adapter {
u64 handle;
-   u8 num_ports;
-   struct ehea_port *port[16];
+   struct ibmebus_dev *ebus_dev;
+   struct ehea_port *port[EHEA_MAX_PORTS];
struct ehea_eq *neq;   /* notification event queue */
struct workqueue_struct *ehea_wq;
struct tasklet_struct neq_tasklet;
@@ -406,7 +407,7 @@ struct ehea_port {
struct net_device *netdev;
struct net_device_stats stats;
struct ehea_port_res port_res[EHEA_MAX_PORT_RES];
-   struct device_node *of_dev_node; /* Open Firmware Device Node */
+   struct of_device  ofdev; /* Open Firmware Device */
struct ehea_mc_list *mc_list;/* Multicast MAC addresses */
struct vlan_group *vgrp;
struct ehea_eq *qp_eq;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 1ef3846..42edd8d 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -580,7 +580,7 @@ static struct ehea_port *ehea_get_port(s
 {
int i;
 
-   for (i = 0; i < adapter->num_ports; i++)
+   for (i = 0; i < EHEA_MAX_PORTS; i++)
if (adapter->port[i])
if (adapter->port[i]->logical_port_id == logical_port)
return adapter->port[i];
@@ -2274,8 +2274,6 @@ static void ehea_tx_watchdog(struct net_
 int ehea_sense_adapter_attr(struct ehea_adapter *adapter)
 {
struct hcp_query_ehea *cb;
-   struct device_node *lhea_dn = NULL;
-   struct device_node *eth_dn = NULL;
u64 hret;
int ret;
 
@@ -2292,18 +2290,6 @@ int ehea_sense_adapter_attr(struct ehea_
goto out_herr;
}
 
-   /* Determine the number of available logical ports
-* by counting the child nodes of the lhea OFDT entry
-*/
-   adapter->num_ports = 0;
-   lhea_dn = of_find_node_by_name(lhea_dn, "lhea");
-   do {
-   eth_dn = of_get_next_child(lhea_dn, eth_dn);
-   if (eth_dn)
-   adapter->num_ports++;
-   } while ( eth_dn );
-   of_node_put(lhea_dn);
-
adapter->max_mc_mac = cb->max_mc_mac - 1;
ret = 0;
 
@@ -2313,79 +2299,150 @@ out:
return ret;
 }
 
-static int ehea_setup_single_port(struct ehea_port *port,
- struct device_node *dn)
+int ehea_get_jumboframe_status(struct ehea_port *port, int *jumbo)
 {
-   int ret;
-   u64 hret;
-   struct net_device *dev = port->netdev;
-   struct ehea_adapter *adapter = port->adapter;
struct hcp_ehea_port_cb4 *cb4;
-   u32 *dn_log_port_id;
-   int jumbo = 0;
-
-   sema_init(&port->port_lock, 1);
-   port->state = EHEA_PORT_DOWN;
-   port->sig_comp_iv = sq_entries / 10;
-
-   if (!dn) {
-   ehea_error("bad device node: dn=%p", dn);
-   ret = -EINVAL;
-   goto out;
-   }
-
-   port->of_dev_node = dn;
-
-   /* Determine logical port id */
-   dn_log_port_id = (u32*)get_property(dn, "ibm,hea-port-no", NULL);
-
-   if (!dn_log_port_id) {
-   ehea_error("bad device node: dn_log_port_id=%p",
-  dn_log_port_id);
-   ret = -EINVAL;
-   goto out;
-   }
-   port->logical_port_id = *dn_log_port_id;
-
-   port->mc_list = kzalloc(sizeof(struct ehea_mc_list), GFP_KERNEL);
-   if (!port->mc_list) {
-   ret = -ENOMEM;
-   goto out;
-   }
-
-   INIT_LIST_HEAD(&port->mc_list->list);
+   u64 hret;
+   int ret = 0;
 
-   ret = ehea_sense_port_attr(port);
-   if (ret)
-   goto out;
+   *jumbo = 0;
 
-   /* Enable Jumbo frames */
+   /* (Try to) enable *jumbo frames */
cb4 = kzalloc(PAGE_SIZE, GFP_KERNEL);
if (!cb4) {
ehea_error("no mem for cb4");
+   ret = -ENOMEM;
+   goto out;
} else {
-   hret = ehea_h_query_ehea_port(adapter->handle,
+   hret = ehea_h_query_ehea_port(port->adapter->handle,
  port->logical_port_id,
  H_PORT_CB4,
  H_PORT_CB4_J

[PATCH 0/2] ehea: dynamic port & SMP support

2007-02-28 Thread Jan-Bernd Themann

Hi, 

this version has the issues fixed which were mentioned by
Patrick McHardy.

The patch set includes two patches against linux-2.6.21-rc1:

- dynamic add / remove port:
  Interface has been discussed and approved by John Rose
  (see: http://www.spinics.net/lists/netdev/msg25327.html)

- NAPI multi queue TX/RX path for SMP:
  Integrated comments from mailing list (R. Dreier)
  
  As soon as discussions about "splitting NAPI from netdevice"
  have settled and this functionality is in kernel, we'll provide
  a patch for the new interface.
  (see: http://www.spinics.net/lists/netdev/msg25647.html)

please apply.

Jan-Bernd


Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] pktgen: fix device name handling

2007-02-28 Thread Robert Olsson



Yes it seems be handle dev name change. So configuration scripts should
use ifindex now :)

Signed-off-by: Robert Olsson <[EMAIL PROTECTED]>

Cheers.
--ro



Stephen Hemminger writes:
 > Since devices can change name and other wierdness, don't hold onto
 > a copy of device name, instead use pointer to output device.
 > 
 > Fix a couple of leaks in error handling path as well.
 > 
 > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
 > 
 > ---
 >  net/core/pktgen.c |  137 
 > +++---
 >  1 file changed, 70 insertions(+), 67 deletions(-)
 > 
 > --- pktgen.orig/net/core/pktgen.c2007-02-27 12:08:58.0 -0800
 > +++ pktgen/net/core/pktgen.c 2007-02-27 12:11:32.0 -0800
 > @@ -210,15 +210,11 @@
 >  };
 >  
 >  struct pktgen_dev {
 > -
 >  /*
 >   * Try to keep frequent/infrequent used vars. separated.
 >   */
 > -
 > -char ifname[IFNAMSIZ];
 > -char result[512];
 > -
 > -struct pktgen_thread *pg_thread;/* the owner */
 > +struct proc_dir_entry *entry;   /* proc file */
 > +struct pktgen_thread *pg_thread;/* the owner */
 >  struct list_head list;  /* Used for chaining in the thread's 
 > run-queue */
 >  
 >  int running;/* if this changes to false, the test will stop 
 > */
 > @@ -345,6 +341,8 @@
 >  unsigned cflows;/* Concurrent flows (config) */
 >  unsigned lflow; /* Flow length  (config) */
 >  unsigned nflows;/* accumulated flows (stats) */
 > +
 > +char result[512];
 >  };
 >  
 >  struct pktgen_hdr {
 > @@ -497,7 +495,7 @@
 >  static int pktgen_stop_device(struct pktgen_dev *pkt_dev);
 >  static void pktgen_stop(struct pktgen_thread *t);
 >  static void pktgen_clear_counters(struct pktgen_dev *pkt_dev);
 > -static int pktgen_mark_device(const char *ifname);
 > +
 >  static unsigned int scan_ip6(const char *s, char ip[16]);
 >  static unsigned int fmt_ip6(char *s, const char ip[16]);
 >  
 > @@ -591,7 +589,7 @@
 > " frags: %d  delay: %u  clone_skb: %d  ifname: %s\n",
 > pkt_dev->nfrags,
 > 1000 * pkt_dev->delay_us + pkt_dev->delay_ns,
 > -   pkt_dev->clone_skb, pkt_dev->ifname);
 > +   pkt_dev->clone_skb, pkt_dev->odev->name);
 >  
 >  seq_printf(seq, " flows: %u flowlen: %u\n", pkt_dev->cflows,
 > pkt_dev->lflow);
 > @@ -1682,13 +1680,13 @@
 >  if_lock(t);
 >  list_for_each_entry(pkt_dev, &t->if_list, list)
 >  if (pkt_dev->running)
 > -seq_printf(seq, "%s ", pkt_dev->ifname);
 > +seq_printf(seq, "%s ", pkt_dev->odev->name);
 >  
 >  seq_printf(seq, "\nStopped: ");
 >  
 >  list_for_each_entry(pkt_dev, &t->if_list, list)
 >  if (!pkt_dev->running)
 > -seq_printf(seq, "%s ", pkt_dev->ifname);
 > +seq_printf(seq, "%s ", pkt_dev->odev->name);
 >  
 >  if (t->result[0])
 >  seq_printf(seq, "\nResult: %s\n", t->result);
 > @@ -1834,12 +1832,11 @@
 >  /*
 >   * mark a device for removal
 >   */
 > -static int pktgen_mark_device(const char *ifname)
 > +static void pktgen_mark_device(const char *ifname)
 >  {
 >  struct pktgen_dev *pkt_dev = NULL;
 >  const int max_tries = 10, msec_per_try = 125;
 >  int i = 0;
 > -int ret = 0;
 >  
 >  mutex_lock(&pktgen_thread_lock);
 >  pr_debug("pktgen: pktgen_mark_device marking %s for removal\n", ifname);
 > @@ -1860,32 +1857,49 @@
 >  printk("pktgen_mark_device: timed out after waiting "
 > "%d msec for device %s to be removed\n",
 > msec_per_try * i, ifname);
 > -ret = 1;
 >  break;
 >  }
 >  
 >  }
 >  
 >  mutex_unlock(&pktgen_thread_lock);
 > +}
 >  
 > -return ret;
 > +static void pktgen_change_name(struct net_device *dev)
 > +{
 > +struct pktgen_thread *t;
 > +
 > +list_for_each_entry(t, &pktgen_threads, th_list) {
 > +struct pktgen_dev *pkt_dev;
 > +
 > +list_for_each_entry(pkt_dev, &t->if_list, list) {
 > +if (pkt_dev->odev != dev)
 > +continue;
 > +
 > +remove_proc_entry(pkt_dev->entry->name, pg_proc_dir);
 > +
 > +pkt_dev->entry = create_proc_entry(dev->name, 0600,
 > +   pg_proc_dir);
 > +if (!pkt_dev->entry)
 > +printk(KERN_ERR "pktgen: can't move proc "
 > +   " entry for '%s'\n", dev->name);
 > +break;
 > +}
 > +}
 >  }
 >  
 >  static int pktgen_device_event(struct notifier_block *unused,
 > unsigned long event, void *ptr)
 >  {
 > -struc

[PATCH 3/4] pktgen: don't use __constant_htonl()

2007-02-28 Thread Robert Olsson


OK!

Signed-off-by: Robert Olsson <[EMAIL PROTECTED]>

Cheers.
--ro

Stephen Hemminger writes:
 > The existing htonl() macro is smart enough to do the same code as
 > using __constant_htonl() and it looks cleaner.
 > 
 > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
 > 
 > ---
 >  net/core/pktgen.c |   24 
 >  1 file changed, 12 insertions(+), 12 deletions(-)
 > 
 > --- pktgen.orig/net/core/pktgen.c2007-02-26 14:40:31.0 -0800
 > +++ pktgen/net/core/pktgen.c 2007-02-26 15:36:38.0 -0800
 > @@ -167,7 +167,7 @@
 >  #define LAT_BUCKETS_MAX 32
 >  #define IP_NAME_SZ 32
 >  #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
 > -#define MPLS_STACK_BOTTOM __constant_htonl(0x0100)
 > +#define MPLS_STACK_BOTTOM htonl(0x0100)
 >  
 >  /* Device flag bits */
 >  #define F_IPSRC_RND   (1<<0)/* IP-Src Random  */
 > @@ -2297,7 +2297,7 @@
 >  int datalen, iplen;
 >  struct iphdr *iph;
 >  struct pktgen_hdr *pgh = NULL;
 > -__be16 protocol = __constant_htons(ETH_P_IP);
 > +__be16 protocol = htons(ETH_P_IP);
 >  __be32 *mpls;
 >  __be16 *vlan_tci = NULL; /* Encapsulates priority and 
 > VLAN ID */
 >  __be16 *vlan_encapsulated_proto = NULL;  /* packet type ID field (or 
 > len) for VLAN tag */
 > @@ -2306,10 +2306,10 @@
 >  
 >  
 >  if (pkt_dev->nr_labels)
 > -protocol = __constant_htons(ETH_P_MPLS_UC);
 > +protocol = htons(ETH_P_MPLS_UC);
 >  
 >  if (pkt_dev->vlan_id != 0x)
 > -protocol = __constant_htons(ETH_P_8021Q);
 > +protocol = htons(ETH_P_8021Q);
 >  
 >  /* Update any of the values, used when we're incrementing various
 >   * fields.
 > @@ -2341,14 +2341,14 @@
 > pkt_dev->svlan_cfi,
 > pkt_dev->svlan_p);
 >  svlan_encapsulated_proto = (__be16 *)skb_put(skb, 
 > sizeof(__be16));
 > -*svlan_encapsulated_proto = 
 > __constant_htons(ETH_P_8021Q);
 > +*svlan_encapsulated_proto = htons(ETH_P_8021Q);
 >  }
 >  vlan_tci = (__be16 *)skb_put(skb, sizeof(__be16));
 >  *vlan_tci = build_tci(pkt_dev->vlan_id,
 >pkt_dev->vlan_cfi,
 >pkt_dev->vlan_p);
 >  vlan_encapsulated_proto = (__be16 *)skb_put(skb, 
 > sizeof(__be16));
 > -*vlan_encapsulated_proto = __constant_htons(ETH_P_IP);
 > +*vlan_encapsulated_proto = htons(ETH_P_IP);
 >  }
 >  
 >  iph = (struct iphdr *)skb_put(skb, sizeof(struct iphdr));
 > @@ -2635,7 +2635,7 @@
 >  int datalen;
 >  struct ipv6hdr *iph;
 >  struct pktgen_hdr *pgh = NULL;
 > -__be16 protocol = __constant_htons(ETH_P_IPV6);
 > +__be16 protocol = htons(ETH_P_IPV6);
 >  __be32 *mpls;
 >  __be16 *vlan_tci = NULL; /* Encapsulates priority and 
 > VLAN ID */
 >  __be16 *vlan_encapsulated_proto = NULL;  /* packet type ID field (or 
 > len) for VLAN tag */
 > @@ -2643,10 +2643,10 @@
 >  __be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or 
 > len) for SVLAN tag */
 >  
 >  if (pkt_dev->nr_labels)
 > -protocol = __constant_htons(ETH_P_MPLS_UC);
 > +protocol = htons(ETH_P_MPLS_UC);
 >  
 >  if (pkt_dev->vlan_id != 0x)
 > -protocol = __constant_htons(ETH_P_8021Q);
 > +protocol = htons(ETH_P_8021Q);
 >  
 >  /* Update any of the values, used when we're incrementing various
 >   * fields.
 > @@ -2677,14 +2677,14 @@
 > pkt_dev->svlan_cfi,
 > pkt_dev->svlan_p);
 >  svlan_encapsulated_proto = (__be16 *)skb_put(skb, 
 > sizeof(__be16));
 > -*svlan_encapsulated_proto = 
 > __constant_htons(ETH_P_8021Q);
 > +*svlan_encapsulated_proto = htons(ETH_P_8021Q);
 >  }
 >  vlan_tci = (__be16 *)skb_put(skb, sizeof(__be16));
 >  *vlan_tci = build_tci(pkt_dev->vlan_id,
 >pkt_dev->vlan_cfi,
 >pkt_dev->vlan_p);
 >  vlan_encapsulated_proto = (__be16 *)skb_put(skb, 
 > sizeof(__be16));
 > -*vlan_encapsulated_proto = __constant_htons(ETH_P_IPV6);
 > +*vlan_encapsulated_proto = htons(ETH_P_IPV6);
 >  }
 >  
 >  iph = (struct ipv6hdr *)skb_put(skb, sizeof(struct ipv6hdr));
 > @@ -2710,7 +2710,7 @@
 >  udph->len = htons(datalen + sizeof(struct udphdr));
 >  udph->check = 0;/* No checksum */
 >  
 > -*(__be32 *) iph = __constant_htonl(0x6000); /* Version + flow */
 > +*(__be32 *) iph = htonl(0x6000);/* Version + flow */
 >  
 >  if (pkt_dev-

[PATCH 2/4] pktgen: use random32

2007-02-28 Thread Robert Olsson


Thanks!

It seems like network code has preference for net_random() but they
are the same now.

Signed-off-by: Robert Olsson <[EMAIL PROTECTED]>

Cheers.

--ro

Stephen Hemminger writes:
 > Can use random32() now.
 > 
 > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
 > 
 > ---
 >  net/core/pktgen.c |   52 
 > +++-
 >  1 file changed, 19 insertions(+), 33 deletions(-)
 > 
 > --- pktgen.orig/net/core/pktgen.c2007-02-26 14:34:36.0 -0800
 > +++ pktgen/net/core/pktgen.c 2007-02-26 14:39:53.0 -0800
 > @@ -464,17 +464,6 @@
 >  return tmp;
 >  }
 >  
 > -static inline u32 pktgen_random(void)
 > -{
 > -#if 0
 > -__u32 n;
 > -get_random_bytes(&n, 4);
 > -return n;
 > -#else
 > -return net_random();
 > -#endif
 > -}
 > -
 >  static inline __u64 getCurMs(void)
 >  {
 >  struct timeval tv;
 > @@ -2091,7 +2080,7 @@
 >  int flow = 0;
 >  
 >  if (pkt_dev->cflows) {
 > -flow = pktgen_random() % pkt_dev->cflows;
 > +flow = random32() % pkt_dev->cflows;
 >  
 >  if (pkt_dev->flows[flow].count > pkt_dev->lflow)
 >  pkt_dev->flows[flow].count = 0;
 > @@ -2103,7 +2092,7 @@
 >  __u32 tmp;
 >  
 >  if (pkt_dev->flags & F_MACSRC_RND)
 > -mc = pktgen_random() % (pkt_dev->src_mac_count);
 > +mc = random32() % pkt_dev->src_mac_count;
 >  else {
 >  mc = pkt_dev->cur_src_mac_offset++;
 >  if (pkt_dev->cur_src_mac_offset >
 > @@ -2129,7 +2118,7 @@
 >  __u32 tmp;
 >  
 >  if (pkt_dev->flags & F_MACDST_RND)
 > -mc = pktgen_random() % (pkt_dev->dst_mac_count);
 > +mc = random32() % pkt_dev->dst_mac_count;
 >  
 >  else {
 >  mc = pkt_dev->cur_dst_mac_offset++;
 > @@ -2156,24 +2145,23 @@
 >  for(i = 0; i < pkt_dev->nr_labels; i++)
 >  if (pkt_dev->labels[i] & MPLS_STACK_BOTTOM)
 >  pkt_dev->labels[i] = MPLS_STACK_BOTTOM |
 > - ((__force __be32)pktgen_random() &
 > + ((__force __be32)random32() &
 >htonl(0x000f));
 >  }
 >  
 >  if ((pkt_dev->flags & F_VID_RND) && (pkt_dev->vlan_id != 0x)) {
 > -pkt_dev->vlan_id = pktgen_random() % 4096;
 > +pkt_dev->vlan_id = random32() & (4096-1);
 >  }
 >  
 >  if ((pkt_dev->flags & F_SVID_RND) && (pkt_dev->svlan_id != 0x)) {
 > -pkt_dev->svlan_id = pktgen_random() % 4096;
 > +pkt_dev->svlan_id = random32() & (4096 - 1);
 >  }
 >  
 >  if (pkt_dev->udp_src_min < pkt_dev->udp_src_max) {
 >  if (pkt_dev->flags & F_UDPSRC_RND)
 > -pkt_dev->cur_udp_src =
 > -((pktgen_random() %
 > -  (pkt_dev->udp_src_max - pkt_dev->udp_src_min)) +
 > - pkt_dev->udp_src_min);
 > +pkt_dev->cur_udp_src = random32() %
 > +(pkt_dev->udp_src_max - pkt_dev->udp_src_min)
 > ++ pkt_dev->udp_src_min;
 >  
 >  else {
 >  pkt_dev->cur_udp_src++;
 > @@ -2184,10 +2172,9 @@
 >  
 >  if (pkt_dev->udp_dst_min < pkt_dev->udp_dst_max) {
 >  if (pkt_dev->flags & F_UDPDST_RND) {
 > -pkt_dev->cur_udp_dst =
 > -((pktgen_random() %
 > -  (pkt_dev->udp_dst_max - pkt_dev->udp_dst_min)) +
 > - pkt_dev->udp_dst_min);
 > +pkt_dev->cur_udp_dst = random32() %
 > +(pkt_dev->udp_dst_max - pkt_dev->udp_dst_min)
 > ++ pkt_dev->udp_dst_min;
 >  } else {
 >  pkt_dev->cur_udp_dst++;
 >  if (pkt_dev->cur_udp_dst >= pkt_dev->udp_dst_max)
 > @@ -2202,7 +2189,7 @@
 > saddr_max))) {
 >  __u32 t;
 >  if (pkt_dev->flags & F_IPSRC_RND)
 > -t = ((pktgen_random() % (imx - imn)) + imn);
 > +t = random32() % (imx - imn) + imn;
 >  else {
 >  t = ntohl(pkt_dev->cur_saddr);
 >  t++;
 > @@ -2223,14 +2210,13 @@
 >  __be32 s;
 >  if (pkt_dev->flags & F_IPDST_RND) {
 >  
 > -t = pktgen_random() % (imx - imn) + imn;
 > +t = random32() % (imx - imn) + imn;
 >  s = htonl(t);

[PATCH 1/4] pktgen: use pr_debug

2007-02-28 Thread Robert Olsson


Thanks!

Signed-off-by: Robert Olsson <[EMAIL PROTECTED]>

--ro



Stephen Hemminger writes:
 > Remove private debug macro and replace with standard version
 > 
 > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
 > 
 > 
 > ---
 >  net/core/pktgen.c |   34 +++---
 >  1 file changed, 15 insertions(+), 19 deletions(-)
 > 
 > --- pktgen.orig/net/core/pktgen.c2007-02-26 13:21:54.0 -0800
 > +++ pktgen/net/core/pktgen.c 2007-02-26 13:22:04.0 -0800
 > @@ -163,9 +163,6 @@
 >  
 >  #define VERSION  "pktgen v2.68: Packet Generator for packet performance 
 > testing.\n"
 >  
 > -/* #define PG_DEBUG(a) a */
 > -#define PG_DEBUG(a)
 > -
 >  /* The buckets are exponential in 'width' */
 >  #define LAT_BUCKETS_MAX 32
 >  #define IP_NAME_SZ 32
 > @@ -1856,8 +1853,7 @@
 >  int ret = 0;
 >  
 >  mutex_lock(&pktgen_thread_lock);
 > -PG_DEBUG(printk("pktgen: pktgen_mark_device marking %s for removal\n",
 > -ifname));
 > +pr_debug("pktgen: pktgen_mark_device marking %s for removal\n", ifname);
 >  
 >  while (1) {
 >  
 > @@ -1866,8 +1862,8 @@
 >  break;  /* success */
 >  
 >  mutex_unlock(&pktgen_thread_lock);
 > -PG_DEBUG(printk("pktgen: pktgen_mark_device waiting for %s "
 > -"to disappear\n", ifname));
 > +pr_debug("pktgen: pktgen_mark_device waiting for %s "
 > +"to disappear\n", ifname);
 >  schedule_timeout_interruptible(msecs_to_jiffies(msec_per_try));
 >  mutex_lock(&pktgen_thread_lock);
 >  
 > @@ -2847,7 +2843,7 @@
 >  struct pktgen_dev *pkt_dev;
 >  int started = 0;
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_run. %p\n", t));
 > +pr_debug("pktgen: entering pktgen_run. %p\n", t);
 >  
 >  if_lock(t);
 >  list_for_each_entry(pkt_dev, &t->if_list, list) {
 > @@ -2879,7 +2875,7 @@
 >  {
 >  struct pktgen_thread *t;
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads_ifs.\n"));
 > +pr_debug("pktgen: entering pktgen_stop_all_threads_ifs.\n");
 >  
 >  mutex_lock(&pktgen_thread_lock);
 >  
 > @@ -2947,7 +2943,7 @@
 >  {
 >  struct pktgen_thread *t;
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_run_all_threads.\n"));
 > +pr_debug("pktgen: entering pktgen_run_all_threads.\n");
 >  
 >  mutex_lock(&pktgen_thread_lock);
 >  
 > @@ -3039,7 +3035,7 @@
 >  {
 >  struct pktgen_dev *pkt_dev;
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_stop\n"));
 > +pr_debug("pktgen: entering pktgen_stop\n");
 >  
 >  if_lock(t);
 >  
 > @@ -3063,7 +3059,7 @@
 >  struct list_head *q, *n;
 >  struct pktgen_dev *cur;
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_rem_one_if\n"));
 > +pr_debug("pktgen: entering pktgen_rem_one_if\n");
 >  
 >  if_lock(t);
 >  
 > @@ -3092,7 +3088,7 @@
 >  
 >  /* Remove all devices, free mem */
 >  
 > -PG_DEBUG(printk("pktgen: entering pktgen_rem_all_ifs\n"));
 > +pr_debug("pktgen: entering pktgen_rem_all_ifs\n");
 >  if_lock(t);
 >  
 >  list_for_each_safe(q, n, &t->if_list) {
 > @@ -3275,7 +3271,7 @@
 >  
 >  t->pid = current->pid;
 >  
 > -PG_DEBUG(printk("pktgen: starting pktgen/%d:  pid=%d\n", cpu, 
 > current->pid));
 > +pr_debug("pktgen: starting pktgen/%d:  pid=%d\n", cpu, current->pid);
 >  
 >  max_before_softirq = t->max_before_softirq;
 >  
 > @@ -3336,13 +3332,13 @@
 >  set_current_state(TASK_INTERRUPTIBLE);
 >  }
 >  
 > -PG_DEBUG(printk("pktgen: %s stopping all device\n", t->tsk->comm));
 > +pr_debug("pktgen: %s stopping all device\n", t->tsk->comm);
 >  pktgen_stop(t);
 >  
 > -PG_DEBUG(printk("pktgen: %s removing all device\n", t->tsk->comm));
 > +pr_debug("pktgen: %s removing all device\n", t->tsk->comm);
 >  pktgen_rem_all_ifs(t);
 >  
 > -PG_DEBUG(printk("pktgen: %s removing thread.\n", t->tsk->comm));
 > +pr_debug("pktgen: %s removing thread.\n", t->tsk->comm);
 >  pktgen_rem_thread(t);
 >  
 >  return 0;
 > @@ -3361,7 +3357,7 @@
 >  }
 >  
 >  if_unlock(t);
 > -PG_DEBUG(printk("pktgen: find_dev(%s) returning %p\n", ifname, 
 > pkt_dev));
 > +pr_debug("pktgen: find_dev(%s) returning %p\n", ifname, pkt_dev);
 >  return pkt_dev;
 >  }
 >  
 > @@ -3530,7 +3526,7 @@
 >  struct pktgen_dev *pkt_dev)
 >  {
 >  
 > -PG_DEBUG(printk("pktgen: remove_device pkt_dev=%p\n", pkt_dev));
 > +pr_debug("pktgen: remove_device pkt_dev=%p\n", pkt_dev);
 >  
 >  if (pkt_dev->running) {
 >  printk("pktgen:WARNING: trying to remove a running interface, 
 > stopping it now.\n");
 > -
 > To unsubscribe from this list: send the line "unsubscribe netdev" in
 > the body of a message to [EMAIL PROTECTED]
 > More majordomo info at  http://vger.kernel.org/majordo

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Greg KH

On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote:
> diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800
> +++ linux/drivers/base/class.c2007-02-27 15:52:37.0 -0800
> @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev

This function is not in the 2.6.21-rc2 kernel, so you might want to
rework this patch a bit :)

Also, it's userspace that causes the rename to happen, so it knows it
did it, why should the kernel have to emit a message to tell userspace
again what just happened?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Fix broken RBTX4927 support in ne.c

2007-02-28 Thread Atsushi Nemoto

There are some ifdefs for RBTX4927, but need some more bits.

Signed-off-by: Atsushi Nemoto <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/ne.c b/drivers/net/ne.c
index a5c4199..02cc78b 100644
--- a/drivers/net/ne.c
+++ b/drivers/net/ne.c
@@ -55,8 +55,10 @@ static const char version2[] =
 #include 
 #include 
 
-#if defined(CONFIG_TOSHIBA_RBTX4927) || defined(CONFIG_TOSHIBA_RBTX4938)
+#if defined(CONFIG_TOSHIBA_RBTX4938)
 #include 
+#elif defined(CONFIG_TOSHIBA_RBTX4927)
+#include 
 #endif
 
 #include "8390.h"
@@ -229,6 +231,9 @@ struct net_device * __init ne_probe(int unit)
 #ifdef CONFIG_TOSHIBA_RBTX4938
dev->base_addr = RBTX4938_RTL_8019_BASE;
dev->irq = RBTX4938_RTL_8019_IRQ;
+#elif defined(CONFIG_TOSHIBA_RBTX4927)
+   dev->base_addr = RBTX4927_RTL_8019_BASE;
+   dev->irq = RBTX4927_RTL_8019_IRQ;
 #endif
err = do_ne_probe(dev);
if (err)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread John


Eric Dumazet wrote:

On Wednesday 28 February 2007 15:23, John wrote:

Eric Dumazet wrote:

John wrote:

I know it's possible to have Linux timestamp incoming datagrams as soon
as they are received, then for one to retrieve this timestamp later
with an ioctl command or a recvmsg call.

Has it ever been proposed to modify struct skb_timeval to hold
nanosecond stamps instead of just microsecond stamps? Then make the
improved precision somehow available to user space.

Most modern NICS are able to delay packet delivery, in order to reduce
number of interrupts and benefit from better cache hits.


You are referring to NAPI interrupt mitigation, right?


Nope; I am referring to hardware features. NAPI is software.

See ethtool -c eth0

# ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 100
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 300
rx-frames: 60
rx-usecs-irq: 300
rx-frames-irq: 60

tx-usecs: 200
tx-frames: 53
tx-usecs-irq: 200
tx-frames-irq: 53

You can see on this setup, rx interrupts can be delayed up to 300 us (up to 60
packets might be delayed)


One can disable interrupt mitigation. Your argument that it introduces 
latency therefore becomes irrelevant.



POSIX is moving to nanoseconds interfaces.
http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html


You snipped too much. I also wrote:

struct timeval and struct timespec take as much space (64 bits).

If the hardware can indeed manage sub-microsecond accuracy, a struct
timeval forces the kernel to discard valuable information.

The fact that you are able to give nanosecond timestamps inside kernel is not 
sufficient. It is necessary of course, but not sufficient. This precision is 
OK to time locally generated events. The moment you ask a 'nanosecond' 
timestamp, it's usually long before/after the real event.


If you rely on nanosecond precision on network packets, then something is 
wrong with your algo. Even rt patches wont make sure your cpu caches are 
pre-filled, or that the routers/links between your machines are not busy.
A cache miss cost 40 ns for example. A typical interrupt handler or rx 
processing can trigger 100 cache misses, or not at all if cache is hot.


Consider an idle Linux 2.6.20-rt8 system, equipped with a single PCI-E 
gigabit Ethernet NIC, running on a modern CPU (e.g. Core 2 Duo E6700). 
All this system does is time stamp 1000 packets per second.


Are you claiming that this platform *cannot* handle most packets within 
less than 1 microsecond of their arrival?


If there are platforms that can achieve sub-microsecond precision, and 
if it is not more expensive to support nanosecond resolution (I said 
resolution not precision), then it makes sense to support nanosecond 
resolution in Linux. Right?



You said that rt gives highest priority to interrupt handlers :
If you have several nics, what will happen if you receive packets on both 
nics, or if the NIC interrupt happens in the same time than timer interrupt ? 
One timestamp will be wrong for sure.


Again, this is irrelevant. We are discussing whether it would make sense 
to support sub-microsecond resolution. If there is one platform that can 
achieve sub-microsecond precision, there is a need for sub-microsecond 
resolution. As long as we are changing the resolution, we might as well 
use something standard like struct timespec.


For sure we could timestamp packets with nanosecond resolution, and eventually 
with MONOTONIC value too, but it will give you (and others) false confidence 
on the real precision. us timestamps are already wrong...


IMHO, this is not true for all platforms.

Regards.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: need some help on a backport of r8169

2007-02-28 Thread pgs

Francois Romieu a écrit, le Tue 27 Feb 2007 à 10:24:00PM :
> Please don't do > 80 columns line again. I am not a tty.
Sorry. I turn on auto-fill-mode.

> > > There are 59 r8169 related patches between 2.6.12 and current. Only a few
> > > of those break the API. I'll give it a try tomorrow evening.
> > Hmm... you wrote this mail at 00 h 48 this morning, what did you mean by
> > tomorrow evening?
> 
> 1. I mean now. See:
>http://www.fr.zoreil.com/people/francois/backport/r8169/20070227-00
>(big patch or serie of 54 pieces).
> 
> 2. Compiled, untested. You know what you have to do.
> 
> 3. Due to the changes in the driver, one could hope that the link will be
>autocorrectly set. Please give it a try before using ethtool/mii-tool.
> 
> 4. The patchkit does not include the latest changes/bugfixes. They are
>still experimental but some users have a poor 8168 experience without
>them. YMMV. Please send a complete dmesg, lspci -vvx and the brand of
>the motherboard if the driver stops working randomly.

Thank you François.
Ok, I tried it, had to modify it a little because it was against
2.6.11 kernel and not 2.6.11.11, anyway I had not a big work to do for
this.

I also rebuilt completely my kernel instead of just doing "make
modules", and I fixed some bugs that I previously introduced myself
for the motherboard chipset.

The result is as follows :
I boot my new kernel : the r8169 driver is automatically loaded and
find the network card and gives me an eth0.
I do a ifconfig, eth0 is up, with an IP and RX and TX are not 0.
The problem comes here, I do a ping and it seems to have just the time
to make the DNS resolution but not further. When I do a new ifconfig,
the TX dropped is not 0 anymore. Then I can turn up and down my
interface, I won't be able to ping anything.

Ah... poor me who thought that the RTL8168 was just like the RTL8169 with
a pci express interface... It seems that a PCI-Express RTL8169 also
exist right?

Ok, one more precision, I didn't allow pci-express in my kernel, I
just noticed it. I'm recompiling my kernel with it to see if it makes
any change, but, with the 2.6.20 kernel, everything works well, I can
ping what I want and pci-express is also not picked up.

My kernel is compiled, with pci-express enabled, no changes.


Do you think my problem is the one you mentionned above, without the
experimental patches?

Ok, here are my hardware informations :

my motherboard uses an ICH7 chipset, i velieve it's an I945G or
something like that.

To make it work completely, I use the following patch (cannot put it
anymore on an URL, I'm sorry) :

--- ./drivers/ide/pci/piix.c.orig   2005-05-27 07:06:46.0 +0200
+++ ./drivers/ide/pci/piix.c2007-02-28 16:18:37.241527210 +0100
@@ -133,7 +133,9 @@
case PCI_DEVICE_ID_INTEL_82801EB_11:
case PCI_DEVICE_ID_INTEL_ESB_2:
case PCI_DEVICE_ID_INTEL_ICH6_19:
+   case PCI_DEVICE_ID_INTEL_ICH6_3:
case PCI_DEVICE_ID_INTEL_ICH7_21:
+   case PCI_DEVICE_ID_INTEL_ICH7_2:
mode = 3;
break;
/* UDMA 66 capable */
@@ -446,7 +448,9 @@
case PCI_DEVICE_ID_INTEL_82801E_11:
case PCI_DEVICE_ID_INTEL_ESB_2:
case PCI_DEVICE_ID_INTEL_ICH6_19:
+   case PCI_DEVICE_ID_INTEL_ICH6_3:
case PCI_DEVICE_ID_INTEL_ICH7_21:
+   case PCI_DEVICE_ID_INTEL_ICH7_2:
{
unsigned int extra = 0;
pci_read_config_dword(dev, 0x54, &extra);
@@ -572,6 +576,8 @@
/* 20 */ DECLARE_PIIX_DEV("ICH6"),
/* 21 */ DECLARE_PIIX_DEV("ICH7"),
/* 22 */ DECLARE_PIIX_DEV("ICH4"),
+   /* 23 */ DECLARE_PIIX_DEV("ICH6"),
+   /* 24 */ DECLARE_PIIX_DEV("ICH7"),
 };
 
 /**
@@ -647,6 +653,8 @@
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_19, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 20},
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_21, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 21},
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801DB_1, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 22},
+   { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_3, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 23},
+   { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_2, PCI_ANY_ID, 
PCI_ANY_ID, 0, 0, 24},
{ 0, },
 };
 MODULE_DEVICE_TABLE(pci, piix_pci_tbl);


And next you can find my lspci, dmesg, and even dmidecode outputs.

Thank you again for your help.

lspci:

00:00.0 Host bridge: Intel Corp.: Unknown device 2770 (rev 02)
Subsystem: Unknown device 1631:e015
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR-  [disabled]
Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 
Enable-
Address:   Data:

Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces

2007-02-28 Thread Eric W. Biederman

Daniel Lezcano <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted
>>
>> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
>> a network device is local to a single network namespace and
>> should never be moved.  Useful for pseudo devices that we
>> need an instance in each network namespace (like the loopback
>> device) and for any device we find that cannot handle multiple
>> network namespaces so we may trap them in the initial network
>> namespace.
>>
>> This patch introduces the function dev_change_net_namespace
>> a function used to move a network device from one network
>> namespace to another.  To the network device nothing
>> special appears to happen, to the components of the network
>> stack it appears as if the network device was unregistered
>> in the network namespace it is in, and a new device
>> was registered in the network namespace the device
>> was moved to.
>>
>> This patch sets up a namespace device destructor that
>> upon the exit of a network namespace moves all of the
>> movable network devices  to the initial network namespace
>> so they are not lost.
>>
> If you:
> * create etun0/etun1
> * create a namespace
> * move etun1 to this namespace
> *  rename the etun1 to eth0
> *  kill the namespace
>
> the former network device etun1 will be lost if you have in your parent
> namespace an interface eth0 because it will conflict.
> Perhaps, the first name should be restored before moving the device back to 
> the
> initial network namespace ?

Restoration of a previous name is no guarantee of anything.  Someone may have
renamed the some other interface etun1 in the original network namespace.

However if you look closely at the code.  You will discover that if it can't
keep the same name it will rename the device as it switches namespaces.
In particular it will become devN where N is replaced by some unused number.

That is what the pat parameter to dev_change_net_namespace is about.

I'm not exactly thrilled about the generic name but the code should work,
and I don't know if there is a name that makes better sense.


>  -- Daniel
>
> ps : nice patchset

Thanks.

Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 22/31] net: Add network namespace clone support.

2007-02-28 Thread Eric W. Biederman

Daniel Lezcano <[EMAIL PROTECTED]> writes:


>> +
>> +mutex_lock(&net_mutex);
>> +err = setup_net(new_net);
>> +if (err)
>> +goto out_unlock;
>>
> Should we "net_free" in case of error ?

Oops.  Yes we should.
Thanks.

>> +net_lock();
>> +net_list_append(new_net);
>> +net_unlock();
>> +
>> +tsk->nsproxy->net_ns = new_net;
>> +
>> +out_unlock:
>> +mutex_unlock(&net_mutex);
net_free(new_net);
>> +out:
>> +put_net(old_net);
>> +return err;
>> +}
>> +
>>

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread Eric Dumazet

On Wednesday 28 February 2007 15:23, John wrote:
> Eric Dumazet wrote:
> >> John wrote:
> >>> I know it's possible to have Linux timestamp incoming datagrams as soon
> >>> as they are received, then for one to retrieve this timestamp later
> >>> with an ioctl command or a recvmsg call.
> >>
> >> Has it ever been proposed to modify struct skb_timeval to hold
> >> nanosecond stamps instead of just microsecond stamps? Then make the
> >> improved precision somehow available to user space.
> >
> > Most modern NICS are able to delay packet delivery, in order to reduce
> > number of interrupts and benefit from better cache hits.
>
> You are referring to NAPI interrupt mitigation, right?

Nope; I am referring to hardware features. NAPI is software.

See ethtool -c eth0

# ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 100
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 300
rx-frames: 60
rx-usecs-irq: 300
rx-frames-irq: 60

tx-usecs: 200
tx-frames: 53
tx-usecs-irq: 200
tx-frames-irq: 53

You can see on this setup, rx interrupts can be delayed up to 300 us (up to 60 
packets might be delayed)

>
> POSIX is moving to nanoseconds interfaces.
> http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html

The fact that you are able to give nanosecond timestamps inside kernel is not 
sufficient. It is necessary of course, but not sufficient. This precision is 
OK to time locally generated events. The moment you ask a 'nanosecond' 
timestamp, it's usually long before/after the real event.

If you rely on nanosecond precision on network packets, then something is 
wrong with your algo. Even rt patches wont make sure your cpu caches are 
pre-filled, or that the routers/links between your machines are not busy.
A cache miss cost 40 ns for example. A typical interrupt handler or rx 
processing can trigger 100 cache misses, or not at all if cache is hot.

You said that rt gives highest priority to interrupt handlers :
If you have several nics, what will happen if you receive packets on both 
nics, or if the NIC interrupt happens in the same time than timer interrupt ? 
One timestamp will be wrong for sure.

For sure we could timestamp packets with nanosecond resolution, and eventually 
with MONOTONIC value too, but it will give you (and others) false confidence 
on the real precision. us timestamps are already wrong...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] spidernet: Fix problem sending IP fragments

2007-02-28 Thread Norbert Eicker

Hi,

I found out that the spidernet-driver is unable to send fragmented IP 
frames.

Let me just recall the basic structure of "normal" UDP/IP/Ethernet 
frames (that actually work):
 - It starts with the Ethernet header (dest MAC, src MAC, etc.)
 - The next part is occupied by the IP header (version info, length of 
packet, id=0, fragment offset=0, checksum, from / to address, etc.)
 - Then comes the UDP header (src / dest port, length, checksum)
 - Actual payload
 - Ethernet checksum

Now what's different for IP fragment:
 - The IP header has id set to some value (same for all fragments), 
offset is set appropriately (i.e. 0 for first fragment, following 
according to size of other fragments), size is the length of the frame.
 - UDP header is unchanged. I.e. length is according to full UDP 
datagram, not just the part within the actual frame! But this is only 
true within the first frame: all following frames don't have a valid 
UDP-header at all.

The spidernet silicon seems to be quite intelligent: It's able to 
compute (IP / UDP / Ethernet) checksums on the fly and tests if frames 
are conforming to RFC -- at least conforming to RFC on complete frames.

But IP fragments are different as explained above:
I.e. for IP fragments containing part of a UDP datagram it sees 
incompatible length in the headers for IP and UDP in the first frame 
and, thus, skips this frame. But the content *is* correct for IP 
fragments. For all following frames it finds (most probably) no valid 
UDP header at all. But this *is* also correct for IP fragments.

The Linux IP-stack seems to be clever in this point. It expects the 
spidernet to calculate the checksum (since the module claims to be able 
to do so) and marks the skb's for "normal" frames accordingly 
(ip_summed set to CHECKSUM_HW).
But for the IP fragments it does not expect the driver to be capable to 
handle the frames appropriately. Thus all checksums are allready 
computed. This is also flaged within the skb (ip_summed set to 
CHECKSUM_NONE).

Unfortunately the spidernet driver ignores that hints. It tries to send 
the IP fragments of UDP datagrams as normal UDP/IP frames. Since they 
have different structure the silicon detects them the be not 
"well-formed" and skips them.

The following one-liner against 2.6.21-rc2 changes this behavior. If the 
IP-stack claims to have done the checksumming, the driver should not 
try to checksum (and analyze) the frame but send it as is.

Signed-off-by: Norbert Eicker <[EMAIL PROTECTED]>
---
diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c
index 3b91af8..31507ac 100644
--- a/drivers/net/spider_net.c
+++ b/drivers/net/spider_net.c
@@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide
SPIDER_NET_DESCR_CARDOWNED | 
SPIDER_NET_DMAC_NOCS;
spin_unlock_irqrestore(&chain->lock, flags);

-   if (skb->protocol == htons(ETH_P_IP))
+   if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == 
CHECKSUM_HW)
switch (skb->nh.iph->protocol) {
case IPPROTO_TCP:
hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP;

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 22/31] net: Add network namespace clone support.

2007-02-28 Thread Daniel Lezcano


Eric W. Biederman wrote:

From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted

This patch allows you to create a new network namespace
using sys_clone(...).

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 include/linux/sched.h|1 +
 kernel/nsproxy.c |   11 +++
 net/core/net_namespace.c |   38 ++
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4463735..9e0f91a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -26,6 +26,7 @@
 #define CLONE_STOPPED  0x0200  /* Start in stopped state */
 #define CLONE_NEWUTS   0x0400  /* New utsname group? */
 #define CLONE_NEWIPC   0x0800  /* New ipcs */
+#define CLONE_NEWNET   0x2000  /* New network namespace */

 /*
  * Scheduling policies
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 4f3c95a..7861c4c 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 

 struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
 EXPORT_SYMBOL_GPL(init_nsproxy);
@@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig)
get_ipc_ns(ns->ipc_ns);
if (ns->pid_ns)
get_pid_ns(ns->pid_ns);
+   get_net(ns->net_ns);
}

return ns;
@@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk)
if (err)
goto out_pid;

+   err = copy_net(flags, tsk);
+   if (err)
+   goto out_net;
+
 out:
put_nsproxy(old_ns);
return err;

+out_net:
+   if (new_ns->pid_ns)
+   put_pid_ns(new_ns->pid_ns);
+
 out_pid:
if (new_ns->ipc_ns)
put_ipc_ns(new_ns->ipc_ns);
@@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns)
put_ipc_ns(ns->ipc_ns);
if (ns->pid_ns)
put_pid_ns(ns->pid_ns);
+   put_net(ns->net_ns);
kfree(ns);
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 93e3879..cc56105 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -175,6 +175,44 @@ out_undo:
goto out;
 }

+int copy_net(int flags, struct task_struct *tsk)
+{
+   net_t old_net = tsk->nsproxy->net_ns;
+   net_t new_net;
+   int err;
+
+   get_net(old_net);
+
+   if (!(flags & CLONE_NEWNET))
+   return 0;
+
+   err = -EPERM;
+   if (!capable(CAP_SYS_ADMIN))
+   goto out;
+
+   err = -ENOMEM;
+   new_net = net_alloc();
+   if (null_net(new_net))
+   goto out;
+
+   mutex_lock(&net_mutex);
+   err = setup_net(new_net);
+   if (err)
+   goto out_unlock;
  

Should we "net_free" in case of error ?

+
+   net_lock();
+   net_list_append(new_net);
+   net_unlock();
+
+   tsk->nsproxy->net_ns = new_net;
+
+out_unlock:
+   mutex_unlock(&net_mutex);
+out:
+   put_net(old_net);
+   return err;
+}
+
 void pernet_modcopy(void *pnetdst, const void *src, unsigned long size)
 {
net_t net;
  


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces

2007-02-28 Thread Daniel Lezcano


Eric W. Biederman wrote:

From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted

This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
a network device is local to a single network namespace and
should never be moved.  Useful for pseudo devices that we
need an instance in each network namespace (like the loopback
device) and for any device we find that cannot handle multiple
network namespaces so we may trap them in the initial network
namespace.

This patch introduces the function dev_change_net_namespace
a function used to move a network device from one network
namespace to another.  To the network device nothing
special appears to happen, to the components of the network
stack it appears as if the network device was unregistered
in the network namespace it is in, and a new device
was registered in the network namespace the device
was moved to.

This patch sets up a namespace device destructor that
upon the exit of a network namespace moves all of the
movable network devices  to the initial network namespace
so they are not lost.
  

If you:
* create etun0/etun1
* create a namespace
* move etun1 to this namespace
*  rename the etun1 to eth0
*  kill the namespace

the former network device etun1 will be lost if you have in your parent 
namespace an interface eth0 because it will conflict.
Perhaps, the first name should be restored before moving the device back 
to the initial network namespace ?


 -- Daniel

ps : nice patchset
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread John


Eric Dumazet wrote:


John wrote:


I know it's possible to have Linux timestamp incoming datagrams as soon
as they are received, then for one to retrieve this timestamp later with
an ioctl command or a recvmsg call.

Has it ever been proposed to modify struct skb_timeval to hold
nanosecond stamps instead of just microsecond stamps? Then make the
improved precision somehow available to user space.


Most modern NICS are able to delay packet delivery, in order to reduce number 
of interrupts and benefit from better cache hits.


You are referring to NAPI interrupt mitigation, right?

AFAIU, it is possible to disable this feature.

I'm dealing with 200-4000 packets per second. I don't think I'd save 
much with interrupt mitigation. Please correct any misconception.


Then kernel is not realtime and some delays can occur between the hardware 
interrupt and the very moment we timestamp the packet. If CPU caches are 
cold, even the instruction fetches could easily add some us.


I've applied the real-time patch.
http://rt.wiki.kernel.org/index.php/Main_Page
This doesn't make Linux hard real-time, but the interrupt handlers can 
run with the highest priority (even kernel threads are preempted).


Enabling nanosecond stamps would be a lie to users, because real accuracy is 
not nanosecond, but in the order of 10 us (at least)


POSIX is moving to nanoseconds interfaces.
http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html

struct timeval and struct timespec take as much space (64 bits).

If the hardware can indeed manage sub-microsecond accuracy, a struct 
timeval forces the kernel to discard valuable information.


If you depend on a < 50 us precision, then linux might be the wrong OS for 
your application. Or maybe you need a NIC that is able to provide a timestamp 
in the packet itself (well... along with the packet...) , so that kernel 
latencies are not a problem.


Does Linux support NICs that can do that?

Regards.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Run-time kfree check for correct cache [plus x86_64 APIC troubles]

2007-02-28 Thread Evgeniy Polyakov

On Wed, Feb 28, 2007 at 11:10:54AM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> On Wednesday 28 February 2007 10:02, Evgeniy Polyakov wrote:
> > Attached patch detects in run-time things like:
> > skb = alloc_skb();
> > kfree(skb);
> >
> > where provided to kfree pointer does not belong to kmalloc caches.
> > It is turned on when slab debug config option is enabled.
> >
> > When problem is detected, following warning is printed with hint to
> > what cache/function should be used instead:
> 
> It would be less expensive to add a flag 
> #define SLAB_KFREE_NOWARNING 0x0020UL
> 
> And OR this flags into cs->flags of all standard caches created by 
> kmem_cache_init() from malloc_sizes[]/cache_names[]
> 
> kfree() would then just test this flag.

That does not work - my x86_64 test machine fails badly with following
patch applied:

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 1ef822e..acc3cfb 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -32,6 +32,7 @@ typedef struct kmem_cache kmem_cache_t __deprecated;
 #define SLAB_PANIC 0x0004UL/* Panic if kmem_cache_create() 
fails */
 #define SLAB_DESTROY_BY_RCU0x0008UL/* Defer freeing slabs to RCU */
 #define SLAB_MEM_SPREAD0x0010UL/* Spread some memory 
over cpuset */
+#define SLAB_KFREE_NOWARNING   0x0020UL/* Do not warn if object 
belongs to this cache and is freed via kfree */
 
 /* Flags passed to a constructor functions */
 #define SLAB_CTOR_CONSTRUCTOR  0x001UL /* If not set, then 
deconstructor */
diff --git a/mm/slab.c b/mm/slab.c
index 8fdaffa..313014e 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -177,7 +177,8 @@
 SLAB_CACHE_DMA | \
 SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
-SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
+SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \
+SLAB_KFREE_NOWARNING )
 #else
 # define CREATE_MASK   (SLAB_HWCACHE_ALIGN | \
 SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \
@@ -814,7 +815,7 @@ static size_t slab_mgmt_size(size_t nr_objs, size_t align)
  * Calculate the number of objects and left-over bytes for a given buffer size.
  */
 static void cache_estimate(unsigned long gfporder, size_t buffer_size,
-  size_t align, int flags, size_t *left_over,
+  size_t align, unsigned long flags, size_t *left_over,
   unsigned int *num)
 {
int nr_objs;
@@ -1466,7 +1467,8 @@ void __init kmem_cache_init(void)
sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
sizes[INDEX_AC].cs_size,
ARCH_KMALLOC_MINALIGN,
-   ARCH_KMALLOC_FLAGS|SLAB_PANIC,
+   ARCH_KMALLOC_FLAGS|SLAB_PANIC|
+   SLAB_KFREE_NOWARNING,
NULL, NULL);
 
if (INDEX_AC != INDEX_L3) {
@@ -1474,7 +1476,8 @@ void __init kmem_cache_init(void)
kmem_cache_create(names[INDEX_L3].name,
sizes[INDEX_L3].cs_size,
ARCH_KMALLOC_MINALIGN,
-   ARCH_KMALLOC_FLAGS|SLAB_PANIC,
+   ARCH_KMALLOC_FLAGS|SLAB_PANIC|
+   SLAB_KFREE_NOWARNING,
NULL, NULL);
}
 
@@ -1492,7 +1495,8 @@ void __init kmem_cache_init(void)
sizes->cs_cachep = kmem_cache_create(names->name,
sizes->cs_size,
ARCH_KMALLOC_MINALIGN,
-   ARCH_KMALLOC_FLAGS|SLAB_PANIC,
+   ARCH_KMALLOC_FLAGS|SLAB_PANIC|
+   SLAB_KFREE_NOWARNING,
NULL, NULL);
}
 #ifdef CONFIG_ZONE_DMA
@@ -1501,7 +1505,7 @@ void __init kmem_cache_init(void)
sizes->cs_size,
ARCH_KMALLOC_MINALIGN,
ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA|
-   SLAB_PANIC,
+   SLAB_PANIC|SLAB_KFREE_NOWARNING,
NULL, NULL);
 #endif
sizes++;
@@ -2827,6 +2831,16 @@ static void kfree_debugcheck(const void *objp)
}
 }
 
+static void kfree_debug_cache_pointer(struct kmem_cache *cachep, const void 
*objp)
+{
+   if (!(cachep->flags & SLAB_KFREE_NOWARNING)) {
+   printk(KERN_ERR "kfree debug: obj: %p, li

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread Eric Dumazet

On Wednesday 28 February 2007 14:37, John wrote:
> John wrote:
> > I know it's possible to have Linux timestamp incoming datagrams as soon
> > as they are received, then for one to retrieve this timestamp later with
> > an ioctl command or a recvmsg call.
>
> Has it ever been proposed to modify struct skb_timeval to hold
> nanosecond stamps instead of just microsecond stamps? Then make the
> improved precision somehow available to user space.

John, 

Most modern NICS are able to delay packet delivery, in order to reduce number 
of interrupts and benefit from better cache hits.

tg3 for example are able to delay up to 1024 us.

Then kernel is not realtime and some delays can occur between the hardware 
interrupt and the very moment we timestamp the packet. If CPU caches are 
cold, even the instruction fetches could easily add some us.

Enabling nanosecond stamps would be a lie to users, because real accuracy is 
not nanosecond, but in the order of 10 us (at least)

If you depend on a < 50 us precision, then linux might be the wrong OS for 
your application. Or maybe you need a NIC that is able to provide a timestamp 
in the packet itself (well... along with the packet...) , so that kernel 
latencies are not a problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread John


John wrote:


I know it's possible to have Linux timestamp incoming datagrams as soon
as they are received, then for one to retrieve this timestamp later with
an ioctl command or a recvmsg call.


Has it ever been proposed to modify struct skb_timeval to hold 
nanosecond stamps instead of just microsecond stamps? Then make the 
improved precision somehow available to user space.


On a related note, the comment for skb_set_timestamp() states:

/**
 * skb_set_timestamp - set timestamp of a skb
 * @skb: skb to set stamp of
 * @stamp: pointer to struct timeval to get stamp from
 *
 * Timestamps are stored in the skb as offsets to a base timestamp.
 * This function converts a struct timeval to an offset and stores
 * it in the skb.
 */

But there is no mention of an offset in the code:

static inline void skb_set_timestamp(
  struct sk_buff *skb, const struct timeval *stamp)
{
  skb->tstamp.off_sec  = stamp->tv_sec;
  skb->tstamp.off_usec = stamp->tv_usec;
}

Likewise for skb_get_timestamp:

/**
 * skb_get_timestamp - get timestamp from a skb
 * @skb: skb to get stamp from
 * @stamp: pointer to struct timeval to store stamp in
 *
 * Timestamps are stored in the skb as offsets to a base timestamp.
 * This function converts the offset back to a struct timeval and stores
 * it in stamp.
 */

static inline void skb_get_timestamp(
  const struct sk_buff *skb, struct timeval *stamp)
{
  stamp->tv_sec  = skb->tstamp.off_sec;
  stamp->tv_usec = skb->tstamp.off_usec;
}

Are the comments related to code that has since been modified?

Regards.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3]: NetXen 1G/10G Ethernet driver updates

2007-02-28 Thread Linsys Contractor Mithlesh Thukral

Hi All,

I will be sending updates to NetXen: 1G/10G Ethernet driver in subsequent mails.
The patches will be with respect to netdev#upstream.

Regards,
Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface

2007-02-28 Thread Florian Zumbiehl

Hi,

> Well, your opinions are welcome. Plus any hints as to how to fix this.
> I'd tend to simply(?) add some more fields to the
> {hash,get,set,delete}_item() functions in drivers/net/pppoe.c.
> But maybe there is some better way?

As noone seems to have an opinion on this: Here is a patch that does
work for me and that should solve the problem as far as that is easily
possible. It is based on the assumption that an interface's ifindex is
basically an alias for a local MAC address, so incoming packets now are
matched to sockets based on remote MAC, session id, and ifindex of the
interface the packet came in on/the socket was bound to by connect().

For relayed packets, the socket that's used for relaying is selected
based on destination MAC, session ID and the interface index of the
interface whose name currently matches the name requested by userspace
as the relaying source interface. The relaying part of the patch is
untested.

Please note that I'd consider this a security fix for reasons outlined
in previous mails.

Florian

--- linux-2.6.20/drivers/net/pppoe.c.orig   2007-02-25 19:23:51.0 
+0100
+++ linux-2.6.20/drivers/net/pppoe.c2007-02-28 12:56:05.0 +0100
@@ -7,6 +7,12 @@
  *
  * Version:0.7.0
  *
+ * 070228 :Fix to allow multiple sessions with same remote MAC and same
+ * session id by including the local device ifindex in the
+ * tuple identifying a session. This also ensures packets can't
+ * be injected into a session from interfaces other than the one
+ * specified by userspace. Florian Zumbiehl <[EMAIL PROTECTED]>
+ * (Oh, BTW, this one is YYMMDD, in case you were wondering ...)
  * 220102 :Fix module use count on failure in pppoe_create, pppox_sk -acme
  * 030700 :Fixed connect logic to allow for disconnect.
  * 270700 :Fixed potential SMP problems; we must protect against
@@ -127,14 +133,14 @@
  *  Set/get/delete/rehash items  (internal versions)
  *
  **/
-static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr)
+static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr, 
int ifindex)
 {
int hash = hash_item(sid, addr);
struct pppox_sock *ret;
 
ret = item_hash_table[hash];
 
-   while (ret && !cmp_addr(&ret->pppoe_pa, sid, addr))
+   while (ret && !(cmp_addr(&ret->pppoe_pa, sid, addr) && 
ret->pppoe_dev->ifindex == ifindex))
ret = ret->next;
 
return ret;
@@ -147,21 +153,19 @@
 
ret = item_hash_table[hash];
while (ret) {
-   if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa))
+   if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa) && 
ret->pppoe_dev->ifindex == po->pppoe_dev->ifindex)
return -EALREADY;
 
ret = ret->next;
}
 
-   if (!ret) {
-   po->next = item_hash_table[hash];
-   item_hash_table[hash] = po;
-   }
+   po->next = item_hash_table[hash];
+   item_hash_table[hash] = po;
 
return 0;
 }
 
-static struct pppox_sock *__delete_item(unsigned long sid, char *addr)
+static struct pppox_sock *__delete_item(unsigned long sid, char *addr, int 
ifindex)
 {
int hash = hash_item(sid, addr);
struct pppox_sock *ret, **src;
@@ -170,7 +174,7 @@
src = &item_hash_table[hash];
 
while (ret) {
-   if (cmp_addr(&ret->pppoe_pa, sid, addr)) {
+   if (cmp_addr(&ret->pppoe_pa, sid, addr) && 
ret->pppoe_dev->ifindex == ifindex) {
*src = ret->next;
break;
}
@@ -188,12 +192,12 @@
  *
  **/
 static inline struct pppox_sock *get_item(unsigned long sid,
-unsigned char *addr)
+unsigned char *addr, int ifindex)
 {
struct pppox_sock *po;
 
read_lock_bh(&pppoe_hash_lock);
-   po = __get_item(sid, addr);
+   po = __get_item(sid, addr, ifindex);
if (po)
sock_hold(sk_pppox(po));
read_unlock_bh(&pppoe_hash_lock);
@@ -203,7 +207,15 @@
 
 static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp)
 {
-   return get_item(sp->sa_addr.pppoe.sid, sp->sa_addr.pppoe.remote);
+   struct net_device *dev = NULL;
+   int ifindex;
+
+   dev = dev_get_by_name(sp->sa_addr.pppoe.dev);
+   if(!dev)
+   return NULL;
+   ifindex = dev->ifindex;
+   dev_put(dev);
+   return get_item(sp->sa_addr.pppoe.sid, sp->sa_addr.pppoe.remote, 
ifindex);
 }
 
 static inline int set_item(struct pppox_sock *po)
@@ -220,12 +232,12 @@
return i;
 }
 
-static inline struct pppox_sock *delete_item(unsigned long sid, char *addr)
+static inline struct pppox_sock *delete_item(unsigned lo

[PATCH 2/3]: Fix second rmmod failure observed on PowerPC machines.

2007-02-28 Thread Linsys Contractor Mithlesh Thukral

NetXen: Fix second rmmod failure observed on PowerPC machines.

Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 netxen_nic_hw.c   |5 +++--
 netxen_nic_init.c |   23 +--
 netxen_nic_main.c |9 -
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_hw.c 
b/drivers/net/netxen/netxen_nic_hw.c
index deec796..a2877f3 100644
--- a/drivers/net/netxen/netxen_nic_hw.c
+++ b/drivers/net/netxen/netxen_nic_hw.c
@@ -508,8 +508,8 @@ void netxen_nic_pci_change_crbwindow(str
 void netxen_load_firmware(struct netxen_adapter *adapter)
 {
int i;
-   long data, size = 0;
-   long flashaddr = NETXEN_FLASH_BASE, memaddr = NETXEN_PHANTOM_MEM_BASE;
+   u32 data, size = 0;
+   u32 flashaddr = NETXEN_FLASH_BASE, memaddr = NETXEN_PHANTOM_MEM_BASE;
u64 off;
void __iomem *addr;
 
@@ -951,6 +951,7 @@ void netxen_nic_flash_print(struct netxe
   netxen_nic_driver_name);
return;
}
+   *ptr32 = le32_to_cpu(*ptr32);
ptr32++;
addr += sizeof(u32);
}
diff --git a/drivers/net/netxen/netxen_nic_init.c 
b/drivers/net/netxen/netxen_nic_init.c
index 2f96570..586d32b 100644
--- a/drivers/net/netxen/netxen_nic_init.c
+++ b/drivers/net/netxen/netxen_nic_init.c
@@ -38,13 +38,13 @@ #include "netxen_nic_hw.h"
 #include "netxen_nic_phan_reg.h"
 
 struct crb_addr_pair {
-   long addr;
-   long data;
+   u32 addr;
+   u32 data;
 };
 
 #define NETXEN_MAX_CRB_XFORM 60
 static unsigned int crb_addr_xform[NETXEN_MAX_CRB_XFORM];
-#define NETXEN_ADDR_ERROR ((unsigned long ) 0x )
+#define NETXEN_ADDR_ERROR (0x)
 
 #define crb_addr_transform(name) \
crb_addr_xform[NETXEN_HW_PX_MAP_CRB_##name] = \
@@ -252,10 +252,10 @@ void netxen_initialize_adapter_ops(struc
  * netxen_decode_crb_addr(0 - utility to translate from internal Phantom CRB
  * address to external PCI CRB address.
  */
-unsigned long netxen_decode_crb_addr(unsigned long addr)
+u32 netxen_decode_crb_addr(u32 addr)
 {
int i;
-   unsigned long base_addr, offset, pci_base;
+   u32 base_addr, offset, pci_base;
 
crb_addr_transform_setup();
 
@@ -756,7 +756,7 @@ int netxen_pinit_from_rom(struct netxen_
int n, i;
int init_delay = 0;
struct crb_addr_pair *buf;
-   unsigned long off;
+   u32 off;
 
/* resetall */
status = netxen_nic_get_board_info(adapter);
@@ -813,14 +813,13 @@ int netxen_pinit_from_rom(struct netxen_
if (verbose)
printk("%s: PCI: 0x%08x == 0x%08x\n",
   netxen_nic_driver_name, (unsigned int)
-  netxen_decode_crb_addr((unsigned long)
- addr), val);
+  netxen_decode_crb_addr(addr), val);
}
for (i = 0; i < n; i++) {
 
-   off = netxen_decode_crb_addr((unsigned 
long)buf[i].addr);
+   off = netxen_decode_crb_addr(buf[i].addr);
if (off == NETXEN_ADDR_ERROR) {
-   printk(KERN_ERR"CRB init value out of range 
%lx\n",
+   printk(KERN_ERR"CRB init value out of range 
%x\n",
buf[i].addr);
continue;
}
@@ -927,6 +926,10 @@ int netxen_initialize_adapter_offload(st
 void netxen_free_adapter_offload(struct netxen_adapter *adapter)
 {
if (adapter->dummy_dma.addr) {
+   writel(0, NETXEN_CRB_NORMALIZE(adapter,
+   CRB_HOST_DUMMY_BUF_ADDR_HI));
+   writel(0, NETXEN_CRB_NORMALIZE(adapter,
+   CRB_HOST_DUMMY_BUF_ADDR_LO));
pci_free_consistent(adapter->ahw.pdev,
NETXEN_HOST_DUMMY_DMA_SIZE,
adapter->dummy_dma.addr,
diff --git a/drivers/net/netxen/netxen_nic_main.c 
b/drivers/net/netxen/netxen_nic_main.c
index 2227504..7d2525e 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -434,13 +434,11 @@ #endif
adapter->port_count++;
adapter->port[i] = port;
}
-#ifndef CONFIG_PPC64
writel(0, NETXEN_CRB_NORMALIZE(adapter, CRB_CMDPEG_STATE));
netxen_pinit_from_rom(adapter, 0);
udelay(500);
netxen_load_firmware(adapter);
netxen_phantom_init(adapter, NETXEN_NIC_PEG_TUNE);
-#endif
/*
 * delay a while to ensure that the Pegs are up & running.
 * Otherwise, we might see some flaky behaviour.
@@ -529,12 +527,13 @@ static void __devexit netxen_nic_remove(

[PATCH 1/3]: Updates, removal of unsupported features and minor bug fixes.

2007-02-28 Thread Linsys Contractor Mithlesh Thukral

NetXen: Updates, removal of unsupported features and minor bug fixes.

Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---
 netxen_nic.h  |4 +
 netxen_nic_ethtool.c  |  144 +-
 netxen_nic_main.c |4 -
 netxen_nic_phan_reg.h |3 +
 4 files changed, 34 insertions(+), 121 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 2807ef4..81742e4 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -72,6 +72,8 @@ #define NUM_FLASH_SECTORS (64)
 #define FLASH_SECTOR_SIZE (64 * 1024)
 #define FLASH_TOTAL_SIZE  (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE)
 
+#define PHAN_VENDOR_ID 0x4040
+
 #define RCV_DESC_RINGSIZE  \
(sizeof(struct rcv_desc) * adapter->max_rx_desc_count)
 #define STATUS_DESC_RINGSIZE   \
@@ -82,7 +84,7 @@ #define TX_RINGSIZE   \
(sizeof(struct netxen_cmd_buffer) * adapter->max_tx_desc_count)
 #define RCV_BUFFSIZE   \
(sizeof(struct netxen_rx_buffer) * rcv_desc->max_rx_desc_count)
-#define find_diff_among(a,b,range) ((a)<(b)?((b)-(a)):((b)+(range)-(a)))
+#define find_diff_among(a,b,range) ((a)<=(b)?((b)-(a)):((b)+(range)-(a)))
 
 #define NETXEN_NETDEV_STATUS   0x1
 #define NETXEN_RCV_PRODUCER_OFFSET 0
diff --git a/drivers/net/netxen/netxen_nic_ethtool.c 
b/drivers/net/netxen/netxen_nic_ethtool.c
index 6252e9a..986ef98 100644
--- a/drivers/net/netxen/netxen_nic_ethtool.c
+++ b/drivers/net/netxen/netxen_nic_ethtool.c
@@ -82,8 +82,7 @@ static const struct netxen_nic_stats net
 #define NETXEN_NIC_STATS_LEN   ARRAY_SIZE(netxen_nic_gstrings_stats)
 
 static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] = {
-   "Register_Test_offline", "EEPROM_Test_offline",
-   "Interrupt_Test_offline", "Loopback_Test_offline",
+   "Register_Test_on_offline",
"Link_Test_on_offline"
 };
 
@@ -394,19 +393,12 @@ netxen_nic_get_regs(struct net_device *d
}
 }
 
-static void
-netxen_nic_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
-{
-   wol->supported = WAKE_UCAST | WAKE_MCAST | WAKE_BCAST | WAKE_MAGIC;
-   /* options can be added depending upon the mode */
-   wol->wolopts = 0;
-}
-
 static u32 netxen_nic_test_link(struct net_device *dev)
 {
struct netxen_port *port = netdev_priv(dev);
struct netxen_adapter *adapter = port->adapter;
__u32 status;
+   int val;
 
/* read which mode */
if (adapter->ahw.board_type == NETXEN_NIC_GBE) {
@@ -415,11 +407,13 @@ static u32 netxen_nic_test_link(struct n
 NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS,
 &status) != 0)
return -EIO;
-   else
-   return (netxen_get_phy_link(status));
+   else {
+   val = netxen_get_phy_link(status);
+   return !val;
+   }
} else if (adapter->ahw.board_type == NETXEN_NIC_XGBE) {
-   int val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_XG_STATE));
-   return val == XG_LINK_UP;
+   val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_XG_STATE));
+   return (val == XG_LINK_UP) ? 0 : 1;
}
return -EIO;
 }
@@ -606,100 +600,21 @@ netxen_nic_set_pauseparam(struct net_dev
 
 static int netxen_nic_reg_test(struct net_device *dev)
 {
-   struct netxen_port *port = netdev_priv(dev);
-   struct netxen_adapter *adapter = port->adapter;
-   u32 data_read, data_written, save;
-   __u32 mode;
-
-   /* 
-* first test the "Read Only" registers by writing which mode
-*/
-   netxen_nic_read_w0(adapter, NETXEN_NIU_MODE, &mode);
-   if (netxen_get_niu_enable_ge(mode)) {   /* GB Mode */
-   netxen_nic_read_w0(adapter,
-  NETXEN_NIU_GB_MII_MGMT_STATUS(port->portnum),
-  &data_read);
-
-   save = data_read;
-   if (data_read)
-   data_written = data_read & NETXEN_NIC_INVALID_DATA;
-   else
-   data_written = NETXEN_NIC_INVALID_DATA;
-   netxen_nic_write_w0(adapter,
-   NETXEN_NIU_GB_MII_MGMT_STATUS(port->
- portnum),
-   data_written);
-   netxen_nic_read_w0(adapter,
-  NETXEN_NIU_GB_MII_MGMT_STATUS(port->portnum),
-  &data_read);
-
-   if (data_written == data_read) {
-   netxen_nic_write_w0(adapter,
-   NETXEN_NIU_GB_MII_MGMT_STATUS(port->
- 
portnum),
-

Re: TCP minisock tcp_create_openreq_child() typo?

2007-02-28 Thread Arnaldo Carvalho de Melo


On 2/28/07, KOVACS Krisztian <[EMAIL PROTECTED]> wrote:


  Hi,

  While reading TCP minisock code I've found this suspiciously looking
code fragment:

- 8< -
struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock 
*req, struct sk_buff *skb)
{
struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);

if (newsk != NULL) {
const struct inet_request_sock *ireq = inet_rsk(req);
struct tcp_request_sock *treq = tcp_rsk(req);
struct inet_connection_sock *newicsk = inet_csk(sk);
struct tcp_sock *newtp;
- 8< -

  The above code initializes newicsk to inet_csk(sk), isn't that supposed
to be inet_csk(newsk)?  As far as I can tell this might leave
icsk_ack.last_seg_size zero even if we do have received data.


Good catch!

David, please apply the attached patch.

Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]>

Thanks Krisztian!

- Arnaldo
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 30b1e52..6b5c64f 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -381,7 +381,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
 	if (newsk != NULL) {
 		const struct inet_request_sock *ireq = inet_rsk(req);
 		struct tcp_request_sock *treq = tcp_rsk(req);
-		struct inet_connection_sock *newicsk = inet_csk(sk);
+		struct inet_connection_sock *newicsk = inet_csk(newsk);
 		struct tcp_sock *newtp;
 
 		/* Now setup tcp_sock */

Re: [PATCH 1/2] [TCP]: Add two new spurious RTO responses to FRTO

2007-02-28 Thread Jarek Poplawski

On 27-02-2007 16:50, Ilpo Järvinen wrote:
> New sysctl tcp_frto_response is added to select amongst these
...
> Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]>
> @@ -762,15 +763,17 @@ __u32 tcp_init_cwnd(struct tcp_sock *tp,
>  }
>  
>  /* Set slow start threshold and cwnd not falling to slow start */
> -void tcp_enter_cwr(struct sock *sk)
> +void tcp_enter_cwr(struct sock *sk, const int set_ssthresh)
>  {
>   struct tcp_sock *tp = tcp_sk(sk);
> + const struct inet_connection_sock *icsk = inet_csk(sk);
>  
>   tp->prior_ssthresh = 0;
>   tp->bytes_acked = 0;
>   if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {

-   if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) {
+   if (icsk->icsk_ca_state < TCP_CA_CWR) {

Probably something for the next "BTW".

Regards,
Jarek P.

>   tp->undo_marker = 0;
> - tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk);
> + if (set_ssthresh)
> + tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk);
...
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

TCP minisock tcp_create_openreq_child() typo?

2007-02-28 Thread KOVACS Krisztian


  Hi,

  While reading TCP minisock code I've found this suspiciously looking
code fragment:

- 8< -
struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock 
*req, struct sk_buff *skb)
{
struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);

if (newsk != NULL) {
const struct inet_request_sock *ireq = inet_rsk(req);
struct tcp_request_sock *treq = tcp_rsk(req);
struct inet_connection_sock *newicsk = inet_csk(sk);
struct tcp_sock *newtp;
- 8< -

  The above code initializes newicsk to inet_csk(sk), isn't that supposed
to be inet_csk(newsk)?  As far as I can tell this might leave
icsk_ack.last_seg_size zero even if we do have received data.

-- 
 Regards,
  Krisztian Kovacs
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

CLOCK_MONOTONIC datagram timestamps by the kernel

2007-02-28 Thread John


Hello,

I know it's possible to have Linux timestamp incoming datagrams as soon 
as they are received, then for one to retrieve this timestamp later with 
an ioctl command or a recvmsg call.


As far as I understand, one can either do

  const int on = 1;
  setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &on, sizeof on);

then use recvmsg()

or not set the SO_TIMESTAMP socket option and just call

  ioctl(sock, SIOCGSTAMP, &tv);

after each datagram has been received.

SIOCGSTAMP
Return a struct timeval with the receive timestamp of the last
packet passed to the user. This is useful for accurate round trip time
measurements. See setitimer(2) for a description of struct timeval.


As far as I understand, this timestamp is given by the CLOCK_REALTIME 
clock. However, I would like to obtain a timestamp given by the 
CLOCK_MONOTONIC clock.


Relevant parts of the code (I think):

net/core/dev.c

void net_enable_timestamp(void)
{
  atomic_inc(&netstamp_needed);
}

void __net_timestamp(struct sk_buff *skb)
{
  struct timeval tv;

  do_gettimeofday(&tv);
  skb_set_timestamp(skb, &tv);
}

static inline void net_timestamp(struct sk_buff *skb)
{
  if (atomic_read(&netstamp_needed))
__net_timestamp(skb);
  else {
skb->tstamp.off_sec = 0;
skb->tstamp.off_usec = 0;
  }
}

do_gettimeofday() just calls __get_realtime_clock_ts()

Would it be possible to replace do_gettimeofday() by ktime_get_ts() with 
the appropriate division by 1000 to convert the struct timespec back 
into a struct timeval?


void __net_timestamp(struct sk_buff *skb)
{
  struct timespec now;
  struct timeval tv;

  ktime_get_ts(&ts);
  tv.tv_sec = now.tv_sec;
  tv->tv_usec = now.tv_nsec/1000;
  skb_set_timestamp(skb, &tv);
}

How many apps / drivers would this break?

Is there perhaps a different way to achieve this?

Regards.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Run-time kfree check for correct cache [was Re: [NET]: Fix kfree(skb)]

2007-02-28 Thread Eric Dumazet

On Wednesday 28 February 2007 10:02, Evgeniy Polyakov wrote:
> Attached patch detects in run-time things like:
> skb = alloc_skb();
> kfree(skb);
>
> where provided to kfree pointer does not belong to kmalloc caches.
> It is turned on when slab debug config option is enabled.
>
> When problem is detected, following warning is printed with hint to
> what cache/function should be used instead:

It would be less expensive to add a flag 
#define SLAB_KFREE_NOWARNING 0x0020UL

And OR this flags into cs->flags of all standard caches created by 
kmem_cache_init() from malloc_sizes[]/cache_names[]

kfree() would then just test this flag.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jarek Poplawski

On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote:
> On 28-02-2007 02:27, Jean Tourrilhes wrote:
...
> > +   /* This function is only used for network interface.
> > +* Some hotplug package track interfaces by their name and
> > +* therefore want to know when the name is changed by the user. */
> > +   if(!error)
> > +   kobject_uevent_env(&class_dev->kobj, KOBJ_RENAME, envp);
> > +
> > class_device_put(class_dev);
> >  
> > +   kfree(devname_string);
> 
> Maybe I miss something, but it seems kobject_uevent_env copies
> pointers from envp instead of buffers' contents.

And it's enough - sorry.

Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Jarek Poplawski

On 28-02-2007 02:27, Jean Tourrilhes wrote:
>   Hi all,
...
>   Patch for 2.6.20 is attached. The patch was tested on a system
> running the hotplug scripts, and on another system running udev.
> 
>   Have fun...
> 
>   Jean
> 
> Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]>
> 
> -
...
> diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c
> --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800
> +++ linux/net/core/net-sysfs.c2007-02-27 15:06:49.0 -0800
> @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de
>   if ((size <= 0) || (i >= num_envp))
>   return -ENOMEM;
>  
> + /* pass ifindex to uevent.
> +  * ifindex is useful as it won't change (interface name may change)
> +  * and is what RtNetlink uses natively. */
> + envp[i++] = buf;
> + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1;
> + buf += n;
> + size -= n;
> +
> + if ((size <= 0) || (i >= num_envp))

Btw.:
1. if size == 10 and snprintf returns 9 (without NULL)
   then n == 10 (with NULL), so isn't it enough (here and above):
 
if ((size < 0) || (i >= num_envp))

2. shouldn't there be (here and above):
 
envp[--i] = NULL;

> + return -ENOMEM;
> +
>   envp[i] = NULL;
>   return 0;
>  }
...
> diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c
> --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800
> +++ linux/drivers/base/class.c2007-02-27 15:52:37.0 -0800
> @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev
>  {
>   int error = 0;
>   char *old_class_name = NULL, *new_class_name = NULL;
> + char *devname_string = NULL;
> + char *envp[2];
>  
>   class_dev = class_device_get(class_dev);
>   if (!class_dev)
> @@ -849,6 +851,15 @@ int class_device_rename(struct class_dev
>   pr_debug("CLASS: renaming '%s' to '%s'\n", class_dev->class_id,
>new_name);
>  
> + devname_string = kmalloc(strlen(class_dev->class_id) + 15, GFP_KERNEL);
> + if (!devname_string) {
> + class_device_put(class_dev);
> + return -ENOMEM;
> + }
> + sprintf(devname_string, "INTERFACE_OLD=%s", class_dev->class_id);
> + envp[0] = devname_string;
> + envp[1] = NULL;
> +
>  #ifdef CONFIG_SYSFS_DEPRECATED
>   if (class_dev->dev)
>   old_class_name = make_class_name(class_dev->class->name,
> @@ -868,8 +879,16 @@ int class_device_rename(struct class_dev
>   sysfs_remove_link(&class_dev->dev->kobj, old_class_name);
>   }
>  #endif
> +
> + /* This function is only used for network interface.
> +  * Some hotplug package track interfaces by their name and
> +  * therefore want to know when the name is changed by the user. */
> + if(!error)
> + kobject_uevent_env(&class_dev->kobj, KOBJ_RENAME, envp);
> +
>   class_device_put(class_dev);
>  
> + kfree(devname_string);

Maybe I miss something, but it seems kobject_uevent_env copies
pointers from envp instead of buffers' contents.

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP

2007-02-28 Thread Jan-Bernd Themann

Hi,

> >  
> > +static inline int ehea_hash_skb(struct sk_buff *skb, int num_qps)
> > +{
> > +   u32 tmp;
> > +   if ((skb->nh.iph->protocol == IPPROTO_TCP)
> > +   && skb->protocol == ETH_P_IP) {
> 
> skb->protocol has network byte order. The ETH_P_IP test should also
> logically come before checking the IP protocol.
> 

fixed.

> > +   tmp = (skb->h.th->source + (skb->h.th->dest << 16)) % 31;
> 
> Only locally generated packets have a valid h.th pointer.
> 
good point. I'll fix that.

I'll send a new patch set later today

Thanks,
Jan-Bernd
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2.6.20] kobject net ifindex + rename

2007-02-28 Thread Johannes Berg

Hi,

>   Patch for 2.6.20 is attached.

... and in the meantime netdevices aren't class_device any more :) IOW,
your patch isn't going to work any more. Also, I think wireless could
benefit from this as well.

> The kobject framework is well designed, so adding these
> features is trivial change and won't run the risk of breaking anything
> (famous last words). Obviously, hotplug apps are free to ignore those
> additional features.

Why not just add this to base kobject_rename instead? That way,
userspace is notified for all renames in sysfs.
The patch then collapses down to the change in net's sysfs code to add
the ifindex to the environment, and another change in kobject to invoke
a new event when a name changes and show the old name.

johannes


signature.asc
Description: This is a digitally signed message part

Run-time kfree check for correct cache [was Re: [NET]: Fix kfree(skb)]

2007-02-28 Thread Evgeniy Polyakov

Attached patch detects in run-time things like:
skb = alloc_skb();
kfree(skb);

where provided to kfree pointer does not belong to kmalloc caches.
It is turned on when slab debug config option is enabled.

When problem is detected, following warning is printed with hint to
what cache/function should be used instead:

[  168.085641] bhtest_init: skb: 81003e791478.
[  168.085698] kfree debug: i: 4, size: 15, caches: malloc:
81000119d8c0, dma: 81000119e100, free: 81003f19c940.
[  168.085776] kfree debug: likely you want to use something with
'skbuff_head_cache' in name instead of kfree().
[  168.085853] BUG: at mm/slab.c:2847 kfree_debug_cahce_pointer()
[  168.085907]
[  168.085907] Call Trace:
[  168.086008]  [] kfree+0xfd/0x274
[  168.086064]  [] :bhtest:bhtest_init+0x38/0x3f
[  168.086122]  [] sys_init_module+0x163d/0x179d
[  168.086183]  [] filp_close+0x5d/0x65
[  168.086240]  [] system_call+0x7e/0x83
[  168.086295]

Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>

diff --git a/mm/slab.c b/mm/slab.c
index c610062..acd3871 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2829,6 +2829,27 @@ static void kfree_debugcheck(const void *objp)
}
 }
 
+static void kfree_debug_cahce_pointer(struct kmem_cache *cachep, void *objp)
+{
+   int size = obj_size(cachep), i;
+   struct cache_sizes *cs;
+
+   for (i=0; ics_size)
+   break;
+   }
+   if ((i == ARRAY_SIZE(malloc_sizes)) || 
+   (cs->cs_cachep != cachep && cs->cs_dmacachep != 
cachep)) {
+   printk("kfree debug: i: %d, size: %u, caches: malloc: %p, dma: 
%p, free: %p.\n",
+   i, ARRAY_SIZE(malloc_sizes), cs->cs_cachep, 
cs->cs_dmacachep,
+   cachep);
+   printk("kfree debug: likely you want to use something with '%s' 
in name instead of kfree().\n",
+   cachep->name);
+   WARN_ON(1);
+   }
+}
+
 static inline void verify_redzone_free(struct kmem_cache *cache, void *obj)
 {
unsigned long redzone1, redzone2;
@@ -2940,6 +2961,7 @@ bad:
 }
 #else
 #define kfree_debugcheck(x) do { } while(0)
+#define kfree_debug_cahce_pointer(x, y) do { } while(0)
 #define cache_free_debugcheck(x,objp,z) (objp)
 #define check_slabp(x,y) do { } while(0)
 #endif
@@ -3757,6 +3779,7 @@ void kfree(const void *objp)
local_irq_save(flags);
kfree_debugcheck(objp);
c = virt_to_cache(objp);
+   kfree_debug_cahce_pointer(c, objp);
debug_check_no_locks_freed(objp, obj_size(c));
__cache_free(c, (void *)objp);
local_irq_restore(flags);

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

93 matches

Mail list logo