[PATCH (RESEND)] [USBNET] DM9601: Add Corega FEther USB-TXC support.
Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> --- drivers/usb/net/dm9601.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c index 4a932e1..c0bc52b 100644 --- a/drivers/usb/net/dm9601.c +++ b/drivers/usb/net/dm9601.c @@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = { static const struct usb_device_id products[] = { { +USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */ +.driver_info = (unsigned long)&dm9601_info, +}, + { USB_DEVICE(0x0a46, 0x9601),/* Davicom USB-100 */ .driver_info = (unsigned long)&dm9601_info, }, -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [USBNET] DM9501: Add Corega FEther USB-TXC support.
Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> --- drivers/usb/net/dm9601.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/usb/net/dm9601.c b/drivers/usb/net/dm9601.c index 4a932e1..c0bc52b 100644 --- a/drivers/usb/net/dm9601.c +++ b/drivers/usb/net/dm9601.c @@ -571,6 +571,10 @@ static const struct driver_info dm9601_info = { static const struct usb_device_id products[] = { { +USB_DEVICE(0x07aa, 0x9601),/* Corega FEther USB-TXC */ +.driver_info = (unsigned long)&dm9601_info, +}, + { USB_DEVICE(0x0a46, 0x9601),/* Davicom USB-100 */ .driver_info = (unsigned long)&dm9601_info, }, -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:45:41AM -0800, Jean Tourrilhes wrote: > On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote: > > On 28-02-2007 02:27, Jean Tourrilhes wrote: > > > Hi all, > > ... > > > Patch for 2.6.20 is attached. The patch was tested on a system > > > running the hotplug scripts, and on another system running udev. > > > > > > Have fun... > > > > > > Jean > > > > > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> > > > > > > - > > ... > > > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c > > > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800 > > > +++ linux/net/core/net-sysfs.c2007-02-27 15:06:49.0 -0800 > > > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de > > > if ((size <= 0) || (i >= num_envp)) > > > return -ENOMEM; > > > > > > + /* pass ifindex to uevent. > > > + * ifindex is useful as it won't change (interface name may change) > > > + * and is what RtNetlink uses natively. */ > > > + envp[i++] = buf; > > > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1; > > > + buf += n; > > > + size -= n; > > > + > > > + if ((size <= 0) || (i >= num_envp)) > > > > Btw.: > > 1. if size == 10 and snprintf returns 9 (without NULL) > >then n == 10 (with NULL), so isn't it enough (here and above): > > > > if ((size < 0) || (i >= num_envp)) > > I just cut'n'pasted the code a few line above. If the original > code is incorrect, it need fixing. And it will need fixing in probably > a lot of places. I think you're kind of responsible for your part, at least. > > > 2. shouldn't there be (here and above): > > > > envp[--i] = NULL; > > > > No, envp is local, so who cares. But envp[i] isn't (at least here). So, I guess, a caller of this function could care. > > > + if ((size <= 0) || (i >= num_envp)) > > > + return -ENOMEM; And one more thing (not necessarily for you): ENOBUFS is probably more adequate here. Cheers, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: replace system timer with work queue
On Wed, 28 Feb 2007 10:12:01 +0100 (CET) Jaroslav Kysela <[EMAIL PROTECTED]> wrote: > Hi, > > please, review and apply to mm tree for further testing. The patch > is also available at > ftp://ftp.alsa-project.org/pub/kernel-patches/bonding-workqueue.patch . Please cc netdev@vger.kernel.org on net-related patches, thanks. > Thank you, > Jaroslav > > == > bonding: replace system timer with work queue > > This patch replaces system timer with work queue in monitor functions. > The reason for this change is that bonding handlers calls various > sleeping functions from the timer handler which is not allowed. Which sleeping functions? I'd have expected the kernel to spew runtime warnings when this happens, but I don't recall any such reports. > Because we cannot share the main workqueue threads (rtnl_lock is used > also in linkwatch_event) - new bond workqueue thread is created. > > Signed-off-by: Jaroslav Kysela <[EMAIL PROTECTED]> > > diff -rupN linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c > linux-2.6.20/drivers/net/bonding/bond_3ad.c > --- linux-2.6.20.orig/drivers/net/bonding/bond_3ad.c 2007-02-04 > 19:44:54.0 +0100 > +++ linux-2.6.20/drivers/net/bonding/bond_3ad.c 2007-02-28 > 09:19:43.831369202 +0100 > @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave > * times out, and it selects an aggregator for the ports that are yet not > * related to any aggregator, and selects the active aggregator for a bond. > */ > -void bond_3ad_state_machine_handler(struct bonding *bond) > +void bond_3ad_state_machine_handler(struct work_struct *work) > { > + struct ad_bond_info *ad_info = container_of(work, struct ad_bond_info, > ad_work.work); > + struct bonding *bond = (struct bonding *)((char *)ad_info - > offsetof(struct bonding, ad_info)); We can use containers_of here too? > -void bond_alb_monitor(struct bonding *bond) > +void bond_alb_monitor(struct work_struct *work) > { > - struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond)); > + struct alb_bond_info *bond_info = container_of(work, struct > alb_bond_info, alb_work.work); > + struct bonding *bond = (struct bonding *)((char *)bond_info - > offsetof(struct bonding, alb_info)); And here. > + cancel_rearming_delayed_workqueue(bond_wq, > &(BOND_AD_INFO(bond).ad_work)); > break; > case BOND_MODE_TLB: > case BOND_MODE_ALB: > - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer)); > + cancel_rearming_delayed_workqueue(bond_wq, > &(BOND_ALB_INFO(bond).alb_work)); > break; > default: > break; > @@ -4289,6 +4272,14 @@ static int bond_init(struct net_device * > rwlock_init(&bond->lock); > rwlock_init(&bond->curr_slave_lock); > > + /* initialize work */ > + INIT_DELAYED_WORK(&bond->mii_work, (void *)&bond_mii_monitor); > + if (params->mode == BOND_MODE_ACTIVEBACKUP) { > + INIT_DELAYED_WORK(&bond->arp_work, (void > *)&bond_activebackup_arp_mon); > + } else { > + INIT_DELAYED_WORK(&bond->arp_work, (void > *)&bond_loadbalance_arp_mon); > + } Can we lose the unneeded braces, the unneeded typecasts and fit the code into 80 cols? yup. > bond->params = *params; /* copy params struct */ > > /* Initialize pointers */ > @@ -4782,6 +4773,12 @@ static int __init bonding_init(void) > goto err; > } > > + bond_wq = create_singlethread_workqueue("bond"); > + if (bond_wq == NULL) { > + res = -ENOMEM; > + goto err; > + } > + > res = bond_create_sysfs(); > if (res) > goto err; > @@ -4807,6 +4804,7 @@ static void __exit bonding_exit(void) > > rtnl_lock(); > bond_free_all(); > + destroy_workqueue(bond_wq); > bond_destroy_sysfs(); > rtnl_unlock(); Are you sure that all pending delayed works have been cancelled when we destroy this workqueue? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: avoid ptype_all packet handling
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 23:26:36 -0800 > sounds like a socket option would help, the data is already there. Then > the normal > UDP receive path would work. That would be perfect for new applications. But we have to support all the old ones, so we're stuck providing correctly functioning AF_PACKET handling on all devices, sorry. And in fact that effectively makes the new socket option pointless, since it doesn't buy us anything since we have to support the old stuff fully anyways. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: avoid ptype_all packet handling
David Miller wrote: From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 23:04:36 -0800 If an normal application has to use something like raw packet filtering, it seems there is a missing API. I'm loosely following this discussion, but Ben mentions DHCP and I remember learning the other month that DHCP uses AF_PACKET and filtering instead of IP RAW sockets because it needs to get the MAC address and RAW sockets don't provide that. sounds like a socket option would help, the data is already there. Then the normal UDP receive path would work. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: avoid ptype_all packet handling
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 23:04:36 -0800 > If an normal application has to use something like raw packet > filtering, it seems there is a missing API. I'm loosely following this discussion, but Ben mentions DHCP and I remember learning the other month that DHCP uses AF_PACKET and filtering instead of IP RAW sockets because it needs to get the MAC address and RAW sockets don't provide that. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: avoid ptype_all packet handling
Ben Greear wrote: Stephen Hemminger wrote: On Wed, 28 Feb 2007 17:28:09 -0800 Ben Greear <[EMAIL PROTECTED]> wrote: Stephen Hemminger wrote: I was measuring bridging/routing performance and noticed this. The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter (slow). By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there. This results in a 14% improvement in performance, but it does mean that some snooping applications would miss packets if being used on a bridge. The correct way to see all packets on a bridge is to set the bridge pseudo-device to promiscuous mode. Seems it would be better to fix these clients to be more selective as to where they bind. The problem is any use of BPF is a lose, if it has to be done to all traffic. Right, but couldn't you have the dhcp client bind to eth0, eth7, and br0 (ie, skipping the eth1-6 that comprise the bridge group?) The only difficulty I see is having the client know when new devices come and go, but there are probably ways to know that without keeping a whole lot of state or probing the /proc/net/dev (like my own bloated app does :)) I envision the client args to be something like --skip-devices "eth1 eth2 eth3 ..." I know you can bind raw packet sockets to individual devices, though I don't know much about BPF, so it's possible I'm wrong... The kernel has to deal with busted applications all the time. And each damn distro and configuration seems to invent it's own new way of doing network configuration. If an normal application has to use something like raw packet filtering, it seems there is a missing API. This breaks the case where you want to see packets on a particular interface, not just the entire bridge, right? It might be possible to use promisc counter to handle this. Not really, it's perfectly valid to sniff a port in non-promiscuous mode... The non-promiscuous mode packets still make it in through the normal receive path. The only packets that don't make up the stack are those that are being bridged. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000 stop raw interrupts disabled nag from RT
Kok, Auke wrote: Mark Huth wrote: Current e1000_xmit_frame spews raw interrupt disabled nag messages when used with RT kernel patches. This patch uses spin_trylock_irqsave, which allows RT patches to properly manage the irq semantics. Looks OK with me on first sight, I'll keep it on my stack and push it upstream after Jesse looks it over. Which -RT paches make this pop up btw? I'd like to repro it. Thanks, Auke Auke, Well, I'm not an expert on the realtime patches - but most any patch set from Ingo seems to set this off - we've run through a bunch all the way since 2.6.10. It's a standard warning to get drivers to not mess with the processor interrupt function, since RT threads both the hard and soft irqs, and except for rare instances, the drivers and critical region protection no longer require the processor interrupt to be off. The XX_irqsave functions get turned into pre-empt disable, which is adequate for most things. But the local_irq_save is recognized and warned, and then if it is really necessary it can be converted to an RT varient that won't warn. Signed-off-by: Mark Huth <[EMAIL PROTECTED]> --- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 619c892..48f94ee 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) (adapter->hw.mac_type == e1000_82573)) e1000_transfer_dhcp_info(adapter, skb); -local_irq_save(flags); -if (!spin_trylock(&tx_ring->tx_lock)) { +if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) /* Collision - tell upper layer to requeue */ -local_irq_restore(flags); return NETDEV_TX_LOCKED; -} /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -mm 3/5] Blackfin: on-chip ethernet MAC controller driver
Hi folks, Here is the blackfin on-chip ethernet MAC controller driver for Linux. It's name is blackfin-driver-net-stamp537.patch [PATCH] Blackfin: on-chip ethernet MAC controller driver This patch implements the driver necessary use the Analog Devices Blackfin processor's on-chip ethernet MAC controller. Signed-off-by: Bryan Wu <[EMAIL PROTECTED]> --- drivers/net/Kconfig| 44 ++ drivers/net/Makefile |1 drivers/net/bfin_mac.c | 988 + drivers/net/bfin_mac.h | 146 + 4 files changed, 1179 insertions(+) Index: linux-2.6/drivers/net/Kconfig === --- linux-2.6.orig/drivers/net/Kconfig 2007-03-01 11:39:14.0 +0800 +++ linux-2.6/drivers/net/Kconfig 2007-03-01 11:39:19.0 +0800 @@ -836,6 +836,50 @@ module, say M here and read as well as . +config BFIN_MAC + tristate "Blackfin 536/537 on-chip mac support" + depends on NET_ETHERNET && (BF537 || BF536) && (!BF537_PORT_H) + select CRC32 + select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE + help + This is the driver for blackfin on-chip mac device. Say Y if you want it + compiled into the kernel. This driver is also available as a module + ( = code which can be inserted in and removed from the running kernel + whenever you want). The module will be called bfin_mac. + +config BFIN_MAC_USE_L1 + bool "Use L1 memory for rx/tx packets" + depends on BFIN_MAC && BF537 + default y + help + To get maximum network performace, you should use L1 memory as rx/tx buffers. + Say N here if you want to reserve L1 memory for other uses. + +config BFIN_TX_DESC_NUM + int "Number of transmit buffer packets" + depends on BFIN_MAC + range 6 10 if BFIN_MAC_USE_L1 + range 10 100 + default "10" + help + Set the number of buffer packets used in driver. + +config BFIN_RX_DESC_NUM + int "Number of receive buffer packets" + depends on BFIN_MAC + range 20 100 if BFIN_MAC_USE_L1 + range 20 800 + default "20" + help + Set the number of buffer packets used in driver. + +config BFIN_MAC_RMII + bool "RMII PHY Interface (EXPERIMENTAL)" + depends on BFIN_MAC && EXPERIMENTAL + default n + help + Use Reduced PHY MII Interface + config SMC9194 tristate "SMC 9194 support" depends on NET_VENDOR_SMC && (ISA || MAC && BROKEN) Index: linux-2.6/drivers/net/Makefile === --- linux-2.6.orig/drivers/net/Makefile 2007-03-01 11:33:24.0 +0800 +++ linux-2.6/drivers/net/Makefile 2007-03-01 11:39:19.0 +0800 @@ -195,6 +195,7 @@ obj-$(CONFIG_MYRI10GE) += myri10ge/ obj-$(CONFIG_SMC91X) += smc91x.o obj-$(CONFIG_SMC911X) += smc911x.o +obj-$(CONFIG_BFIN_MAC) += bfin_mac.o obj-$(CONFIG_DM9000) += dm9000.o obj-$(CONFIG_FEC_8XX) += fec_8xx/ obj-$(CONFIG_PASEMI_MAC) += pasemi_mac.o Index: linux-2.6/drivers/net/bfin_mac.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6/drivers/net/bfin_mac.c2007-03-01 11:39:19.0 +0800 @@ -0,0 +1,988 @@ +/* + * File: drivers/net/bfin_mac.c + * Based on: + * Author: Luke Yang <[EMAIL PROTECTED]> + * + * Created: + * Description: + * + * Rev: $Id: bfin_mac.c,v 1.60 2006/12/16 11:23:56 hennerich Exp $ + * + * Modified: + * Copyright 2004-2006 Analog Devices Inc. + * + * Bugs: Enter bugs at http://blackfin.uclinux.org/ + * + * This program is free software ; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation ; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY ; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program ; see the file COPYING. + * If not, write to the Free Software Foundation, + * 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include +#include +#include + +#include +#include +#include +#include + +#include +#include +#include + +#include "bfin_mac.h" + +#define CARDNAME "bfin_mac" + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Luke Yang"); +MODULE_DESCRIPTION("Blackfin MAC Driver"); + +#if defined(CONFIG_BFIN_MAC_USE
Re: [PATCH] bridge: avoid ptype_all packet handling
Stephen Hemminger wrote: On Wed, 28 Feb 2007 17:28:09 -0800 Ben Greear <[EMAIL PROTECTED]> wrote: Stephen Hemminger wrote: I was measuring bridging/routing performance and noticed this. The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter (slow). By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there. This results in a 14% improvement in performance, but it does mean that some snooping applications would miss packets if being used on a bridge. The correct way to see all packets on a bridge is to set the bridge pseudo-device to promiscuous mode. Seems it would be better to fix these clients to be more selective as to where they bind. The problem is any use of BPF is a lose, if it has to be done to all traffic. Right, but couldn't you have the dhcp client bind to eth0, eth7, and br0 (ie, skipping the eth1-6 that comprise the bridge group?) The only difficulty I see is having the client know when new devices come and go, but there are probably ways to know that without keeping a whole lot of state or probing the /proc/net/dev (like my own bloated app does :)) I envision the client args to be something like --skip-devices "eth1 eth2 eth3 ..." I know you can bind raw packet sockets to individual devices, though I don't know much about BPF, so it's possible I'm wrong... This breaks the case where you want to see packets on a particular interface, not just the entire bridge, right? It might be possible to use promisc counter to handle this. Not really, it's perfectly valid to sniff a port in non-promiscuous mode... Ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: avoid ptype_all packet handling
On Wed, 28 Feb 2007 17:28:09 -0800 Ben Greear <[EMAIL PROTECTED]> wrote: > Stephen Hemminger wrote: > > I was measuring bridging/routing performance and noticed this. > > > > The current code runs the "all packet" type handlers before calling > > the bridge hook. If an application (like some DHCP clients) is > > using AF_PACKET, this means that each received packet gets run > > through the Berkeley Packet Filter code in sk_run_filter (slow). > > > > By moving the bridging hook to run first, the packets flowing > > through the bridge get filtered out there. This results in a 14% > > improvement in performance, but it does mean that some snooping > > applications would miss packets if being used on a bridge. The > > correct way to see all packets on a bridge is to set the bridge > > pseudo-device to promiscuous mode. > > Seems it would be better to fix these clients to be more selective as > to where they bind. The problem is any use of BPF is a lose, if it has to be done to all traffic. > This breaks the case where you want to see packets on a particular > interface, not just the entire bridge, right? It might be possible to use promisc counter to handle this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000 stop raw interrupts disabled nag from RT
Mark Huth wrote: Current e1000_xmit_frame spews raw interrupt disabled nag messages when used with RT kernel patches. This patch uses spin_trylock_irqsave, which allows RT patches to properly manage the irq semantics. Looks OK with me on first sight, I'll keep it on my stack and push it upstream after Jesse looks it over. Which -RT paches make this pop up btw? I'd like to repro it. Thanks, Auke Signed-off-by: Mark Huth <[EMAIL PROTECTED]> --- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 619c892..48f94ee 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) (adapter->hw.mac_type == e1000_82573)) e1000_transfer_dhcp_info(adapter, skb); - local_irq_save(flags); - if (!spin_trylock(&tx_ring->tx_lock)) { + if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) /* Collision - tell upper layer to requeue */ - local_irq_restore(flags); return NETDEV_TX_LOCKED; - } /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] e1000 stop raw interrupts disabled nag from RT
Current e1000_xmit_frame spews raw interrupt disabled nag messages when used with RT kernel patches. This patch uses spin_trylock_irqsave, which allows RT patches to properly manage the irq semantics. Signed-off-by: Mark Huth <[EMAIL PROTECTED]> --- diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c index 619c892..48f94ee 100644 --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -3363,12 +3363,9 @@ e1000_xmit_frame(struct sk_buff *skb, struct net_device *netdev) (adapter->hw.mac_type == e1000_82573)) e1000_transfer_dhcp_info(adapter, skb); - local_irq_save(flags); - if (!spin_trylock(&tx_ring->tx_lock)) { + if (!spin_trylock_irqsave(&tx_ring->tx_lock, flags)) /* Collision - tell upper layer to requeue */ - local_irq_restore(flags); return NETDEV_TX_LOCKED; - } /* need: count + 2 desc gap to keep tail from touching * head, otherwise try next time */
Re: [PATCH] bridge: avoid ptype_all packet handling
Stephen Hemminger wrote: I was measuring bridging/routing performance and noticed this. The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter (slow). By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there. This results in a 14% improvement in performance, but it does mean that some snooping applications would miss packets if being used on a bridge. The correct way to see all packets on a bridge is to set the bridge pseudo-device to promiscuous mode. Seems it would be better to fix these clients to be more selective as to where they bind. This breaks the case where you want to see packets on a particular interface, not just the entire bridge, right? Thanks, Ben Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/core/dev.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index cf71614..dc2cda6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); + if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) + goto out; + + #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); @@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb) ncls: #endif - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) - goto out; - type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { if (ptype->type == type && -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: avoid ptype_all packet handling
I was measuring bridging/routing performance and noticed this. The current code runs the "all packet" type handlers before calling the bridge hook. If an application (like some DHCP clients) is using AF_PACKET, this means that each received packet gets run through the Berkeley Packet Filter code in sk_run_filter (slow). By moving the bridging hook to run first, the packets flowing through the bridge get filtered out there. This results in a 14% improvement in performance, but it does mean that some snooping applications would miss packets if being used on a bridge. The correct way to see all packets on a bridge is to set the bridge pseudo-device to promiscuous mode. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/core/dev.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index cf71614..dc2cda6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1792,6 +1792,10 @@ int netif_receive_skb(struct sk_buff *skb) rcu_read_lock(); + if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) + goto out; + + #ifdef CONFIG_NET_CLS_ACT if (skb->tc_verd & TC_NCLS) { skb->tc_verd = CLR_TC_NCLS(skb->tc_verd); @@ -1826,9 +1830,6 @@ int netif_receive_skb(struct sk_buff *skb) ncls: #endif - if (handle_bridge(&skb, &pt_prev, &ret, orig_dev)) - goto out; - type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type)&15], list) { if (ptype->type == type && -- 1.4.4.2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable
On Wed, Feb 28, 2007 at 05:08:59PM -0800, Jay Vosburgh wrote: > > > >That sounds like a nice add-on to the existing functionality. I can see > >the value in something dynamic like that, but I can also see the value > >in something static like the functionality we have. Did you plan to > >keep the existing functionality intact or just have it done dynamically? > > Well, I posted the patch just a bit ago, so you can see for > yourself, but no, it removes the existing "copy IGMP everywhere" I see them now -- I'll check them out and make any comments there. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PCIe NICs that correctly suspend/resume
We're trying to make the Attansic L1 NIC correctly suspend, resume, and wake-on-lan. Can someone point me to a PCIe-based NIC driver in the kernel tree that correctly does these things? I'd like to see how it's *supposed* to be done. Thanks, Jay - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable
Andy Gospodarek <[EMAIL PROTECTED]> wrote: >On Wed, Feb 28, 2007 at 02:39:42PM -0800, Jay Vosburgh wrote: [...] >> Why would you want to turn this off? > >When you connect active-backup bonds to 2 separate switches that are in >'distant' parts of the network you can end up with a bunch of unwanted >multicast data flowing everywhere and if you don't care whether or not >your multicast traffic is highly available then it just seems like >noise. I thought the flexibility seemed nice. Ok, I can buy the "multicast spew" argument. >> Also, I've got a replacement patch for this functionality that >> seems to be better in all regards. It sends bonus IGMP joins when a >> failover occurs, rather than simply duplicating them on all slaves (the >> current system can leave switches in the dark if the slaves fail back to >> the originals). As chance would have it, I'm planning to post it as >> part of a set in a a little while. >> > >That sounds like a nice add-on to the existing functionality. I can see >the value in something dynamic like that, but I can also see the value >in something static like the functionality we have. Did you plan to >keep the existing functionality intact or just have it done dynamically? Well, I posted the patch just a bit ago, so you can see for yourself, but no, it removes the existing "copy IGMP everywhere" behavior. I couldn't really think of an advantage to flooding everywhere all the time if the hose is re-aimed during failover (if you'll pardon my cheesy metaphor). >Is this separate from your workqueue/refactoring patch or does it work >on the existing code? This is separate, for the current mainline. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] bonding: only receive ARPs for us
The ARP validation code only needs ARPs for the bonding device. Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f263ac..7ec6121 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3427,7 +3427,7 @@ void bond_register_arp(struct bonding *b return; pt->type = htons(ETH_P_ARP); - pt->dev = NULL; /*bond->dev;XXX*/ + pt->dev = bond->dev; pt->func = bond_arp_rcv; dev_add_pack(pt); } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] bonding: Improve IGMP join processing
In active-backup mode, the current bonding code duplicates IGMP traffic to all slaves, so that switches are up to date in case of a failover from an active to a backup interface. If bonding then fails back to the original active interface, it is likely that the "active slave" switch's IGMP forwarding for the port will be out of date until some event occurs to refresh the switch (e.g., a membership query). This patch alters the behavior of bonding to no longer flood IGMP to all ports, and to issue IGMP JOINs to the newly active port at the time of a failover. This insures that switches are kept up to date for all cases. "GOELLESCH Niels" <[EMAIL PROTECTED]> originally reported this problem, and included a patch. His original patch was modified by Jay Vosburgh to additionally remove the existing IGMP flood behavior, use RCU, streamline code paths, fix trailing white space, and adjust for style. Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7ec6121..338d452 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -60,6 +60,7 @@ #include #include #include #include +#include #include #include #include @@ -861,6 +862,28 @@ static void bond_mc_delete(struct bondin } } + +/* + * Retrieve the list of registered multicast addresses for the bonding + * device and retransmit an IGMP JOIN request to the current active + * slave. + */ +static void bond_resend_igmp_join_requests(struct bonding *bond) +{ + struct in_device *in_dev; + struct ip_mc_list *im; + + rcu_read_lock(); + in_dev = __in_dev_get_rcu(bond->dev); + if (in_dev) { + for (im = in_dev->mc_list; im; im = im->next) { + ip_mc_rejoin_group(im); + } + } + + rcu_read_unlock(); +} + /* * Totally destroys the mc_list in bond */ @@ -874,6 +897,7 @@ static void bond_mc_list_destroy(struct kfree(dmi); dmi = bond->mc_list; } +bond->mc_list = NULL; } /* @@ -967,6 +991,7 @@ static void bond_mc_swap(struct bonding for (dmi = bond->dev->mc_list; dmi; dmi = dmi->next) { dev_mc_add(new_active->dev, dmi->dmi_addr, dmi->dmi_addrlen, 0); } + bond_resend_igmp_join_requests(bond); } } @@ -4017,42 +4042,6 @@ out: return 0; } -static void bond_activebackup_xmit_copy(struct sk_buff *skb, -struct bonding *bond, -struct slave *slave) -{ - struct sk_buff *skb2 = skb_copy(skb, GFP_ATOMIC); - struct ethhdr *eth_data; - u8 *hwaddr; - int res; - - if (!skb2) { - printk(KERN_ERR DRV_NAME ": Error: " - "bond_activebackup_xmit_copy(): skb_copy() failed\n"); - return; - } - - skb2->mac.raw = (unsigned char *)skb2->data; - eth_data = eth_hdr(skb2); - - /* Pick an appropriate source MAC address -* -- use slave's perm MAC addr, unless used by bond -* -- otherwise, borrow active slave's perm MAC addr -* since that will not be used -*/ - hwaddr = slave->perm_hwaddr; - if (!memcmp(eth_data->h_source, hwaddr, ETH_ALEN)) - hwaddr = bond->curr_active_slave->perm_hwaddr; - - /* Set source MAC address appropriately */ - memcpy(eth_data->h_source, hwaddr, ETH_ALEN); - - res = bond_dev_queue_xmit(bond, skb2, slave->dev); - if (res) - dev_kfree_skb(skb2); - - return; -} /* * in active-backup mode, we know that bond->curr_active_slave is always valid if @@ -4073,21 +4062,6 @@ static int bond_xmit_activebackup(struct if (!bond->curr_active_slave) goto out; - /* Xmit IGMP frames on all slaves to ensure rapid fail-over - for multicast traffic on snooping switches */ - if (skb->protocol == __constant_htons(ETH_P_IP) && - skb->nh.iph->protocol == IPPROTO_IGMP) { - struct slave *slave, *active_slave; - int i; - - active_slave = bond->curr_active_slave; - bond_for_each_slave_from_to(bond, slave, i, active_slave->next, - active_slave->prev) - if (IS_UP(slave->dev) && - (slave->link == BOND_LINK_UP)) - bond_activebackup_xmit_copy(skb, bond, slave); - } - res = bond_dev_queue_xmit(bond, skb, bond->curr_active_slave->dev); out: diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 9dbb525..a113fe6 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -218,5 +218,7 @@ extern void ip_mc_up(struct in_device *) extern void ip_mc_down(struct in_
[PATCH 1/3] bonding: fix double dev_add_pack
Bonding can erroneously register the same packet_type to receive ARPs (for use by ARP validation): once at device open time, and once via sysfs. Since sysfs can change the validate setting (and thus register or unregister) at any time, a flag is needed to synchronize with device open in order to avoid double registrations, and the simplest place is within the packet_type structure itself. Double unregister is not an issue. Bug reported by Ulrich Oelmann <[EMAIL PROTECTED]>. Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a7c8f98..1f263ac 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3423,6 +3423,9 @@ void bond_register_arp(struct bonding *b { struct packet_type *pt = &bond->arp_mon_pt; + if (pt->type) + return; + pt->type = htons(ETH_P_ARP); pt->dev = NULL; /*bond->dev;XXX*/ pt->func = bond_arp_rcv; @@ -3431,7 +3434,10 @@ void bond_register_arp(struct bonding *b void bond_unregister_arp(struct bonding *bond) { - dev_remove_pack(&bond->arp_mon_pt); + struct packet_type *pt = &bond->arp_mon_pt; + + dev_remove_pack(pt); + pt->type = 0; } /* Hashing Policies -*/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable
On Wed, Feb 28, 2007 at 02:39:42PM -0800, Jay Vosburgh wrote: > Andy Gospodarek <[EMAIL PROTECTED]> wrote: > > >A while back the following change was made to the bonding code: > > > >commit df49898a47061e82219c991dfbe9ac6ddf7a866b > >Author: John W. Linville <[EMAIL PROTECTED]> > >Date: Tue Oct 18 21:30:58 2005 -0400 > > > >[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack > > > >Expand comment explaining MAC address selection for replicated IGMP > >frames transmitted in bonding mode 1 (active-backup). Also, a small > >whitespace cleanup. > > > >Signed-off-by: John W. Linville <[EMAIL PROTECTED]> > >Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> > > > >In general this patch is good, but this tweaks that feature by allowing > >that functionality to be enabled and disabled. This patch adds a new > >module option as well as a sysfs entry. It sets the default to be the > >current behavior so existing users shouldn't notice any difference. > > Why would you want to turn this off? > When you connect active-backup bonds to 2 separate switches that are in 'distant' parts of the network you can end up with a bunch of unwanted multicast data flowing everywhere and if you don't care whether or not your multicast traffic is highly available then it just seems like noise. I thought the flexibility seemed nice. > Also, I've got a replacement patch for this functionality that > seems to be better in all regards. It sends bonus IGMP joins when a > failover occurs, rather than simply duplicating them on all slaves (the > current system can leave switches in the dark if the slaves fail back to > the originals). As chance would have it, I'm planning to post it as > part of a set in a a little while. > That sounds like a nice add-on to the existing functionality. I can see the value in something dynamic like that, but I can also see the value in something static like the functionality we have. Did you plan to keep the existing functionality intact or just have it done dynamically? Is this separate from your workqueue/refactoring patch or does it work on the existing code? > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED]> More majordomo info at > http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, 2007-02-28 at 16:51 -0800, Jean Tourrilhes wrote: > I would prefer to fix the comment when this change actually > happens. I prefer comments to refer to the current reality, rather > than past/future situation. Uh, no. device_rename is perfectly fine, even other people may use it in the future. > When you introduce wireless renaming, you > will need to verify the whole chain anyway, so you might as well fix > the comment while merging wireless renaming. No again, device_rename is perfectly fine API, I shouldn't have to look at it's internals to see if it's broken in my use case. Even if it's only a broken comment. I'm not going to respin your patches though, if this doesn't make it in I don't care. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Thu, Mar 01, 2007 at 01:37:46AM +0100, Johannes Berg wrote: > On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote: > > > + /* This function is only used for network interface. > > +* Some hotplug package track interfaces by their name and > > +* therefore want to know when the name is changed by the user. */ > > Right now, that's true, but wireless is going to start using > device_rename pretty soon as well. Could you rephrase this comment? > > johannes I would prefer to fix the comment when this change actually happens. I prefer comments to refer to the current reality, rather than past/future situation. When you introduce wireless renaming, you will need to verify the whole chain anyway, so you might as well fix the comment while merging wireless renaming. Note also that my comment is technically correct. I did not say 'netdev' but the more generic term 'network interface', and I believe your wireless interface is a 'network interface', even if it's not a netdev ;-) But if this really bugs you, please feel free to respin my patch. Have fun... Jean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qla3xxx: bugfix for line omitted in previous patch.
>From 01751a39d7327acc28dabf4f68930b7e20b279d1 Mon Sep 17 00:00:00 2001 From: Ron Mercer <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 16:42:17 -0800 Subject: [PATCH] [PATCH] qla3xxx: bugfix for line omitted in previous patch. This missing line caused transmit errors on the Qlogic 4032 chip. Signed-off-by: Ron Mercer <[EMAIL PROTECTED]> --- drivers/net/qla3xxx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/qla3xxx.c b/drivers/net/qla3xxx.c index 3a14d19..d3f65da 100755 --- a/drivers/net/qla3xxx.c +++ b/drivers/net/qla3xxx.c @@ -2210,7 +2210,7 @@ static int ql_send_map(struct ql3_adapter *qdev, { struct oal *oal; struct oal_entry *oal_entry; - int len = skb->len; + int len = skb_headlen(skb); dma_addr_t map; int err; int completed_segs, i; -- 1.5.0.rc4.16.g9e258 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Missing VLAN tags in bnx2
On Wed, 2007-02-28 at 21:12 +0200, Pekka Pietikainen wrote: > Just had to spend some time figuring out why a bnx2 card connected to > a switch monitor port didn't see any vlan tags (when in our scenario the > tags are pretty vital). I'll have someone send you a utility to disable the ASF firmware. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, 2007-02-28 at 16:26 -0800, Jean Tourrilhes wrote: > + /* This function is only used for network interface. > + * Some hotplug package track interfaces by their name and > + * therefore want to know when the name is changed by the user. */ Right now, that's true, but wireless is going to start using device_rename pretty soon as well. Could you rephrase this comment? johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote: > On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote: > > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c > > --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800 > > +++ linux/drivers/base/class.c 2007-02-27 15:52:37.0 -0800 > > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev > > This function is not in the 2.6.21-rc2 kernel, so you might want to > rework this patch a bit :) Thanks for all you good comments. I ported my patch to 2.6.21-rc2, and tested it both on a hotplug and a udev system. Patch is attached, I would be glad if you could push that through the usual channels. Also, I realised that I forgot to say in my original e-mail that migrating udev to use ifindex instead of ifname would fix the remove/add race condition for network devices. But that's not going to happen overnight... Have fun... Jean Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> - diff -u -p linux/include/linux/kobject.j1.h linux/include/linux/kobject.h --- linux/include/linux/kobject.j1.h2007-02-28 14:26:29.0 -0800 +++ linux/include/linux/kobject.h 2007-02-28 14:27:54.0 -0800 @@ -48,6 +48,7 @@ enum kobject_action { KOBJ_OFFLINE= (__force kobject_action_t) 0x06, /* device offline */ KOBJ_ONLINE = (__force kobject_action_t) 0x07, /* device online */ KOBJ_MOVE = (__force kobject_action_t) 0x08, /* device move */ + KOBJ_RENAME = (__force kobject_action_t) 0x09, /* device renamed */ }; struct kobject { diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c --- linux/net/core/net-sysfs.j1.c 2007-02-28 14:26:45.0 -0800 +++ linux/net/core/net-sysfs.c 2007-02-28 14:27:54.0 -0800 @@ -424,6 +424,17 @@ static int netdev_uevent(struct device * if ((size <= 0) || (i >= num_envp)) return -ENOMEM; + /* pass ifindex to uevent. +* ifindex is useful as it won't change (interface name may change) +* and is what RtNetlink uses natively. */ + envp[i++] = buf; + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1; + buf += n; + size -= n; + + if ((size <= 0) || (i >= num_envp)) + return -ENOMEM; + envp[i] = NULL; return 0; } diff -u -p linux/lib/kobject_uevent.j1.c linux/lib/kobject_uevent.c --- linux/lib/kobject_uevent.j1.c 2007-02-28 14:26:58.0 -0800 +++ linux/lib/kobject_uevent.c 2007-02-28 14:27:54.0 -0800 @@ -52,6 +52,8 @@ static char *action_to_string(enum kobje return "online"; case KOBJ_MOVE: return "move"; + case KOBJ_RENAME: + return "rename"; default: return NULL; } diff -u -p linux/drivers/base/core.j1.c linux/drivers/base/core.c --- linux/drivers/base/core.j1.c2007-02-28 15:45:45.0 -0800 +++ linux/drivers/base/core.c 2007-02-28 15:47:30.0 -0800 @@ -1007,6 +1007,8 @@ int device_rename(struct device *dev, ch char *new_class_name = NULL; char *old_symlink_name = NULL; int error; + char *devname_string = NULL; + char *envp[2]; dev = get_device(dev); if (!dev) @@ -1014,6 +1016,15 @@ int device_rename(struct device *dev, ch pr_debug("DEVICE: renaming '%s' to '%s'\n", dev->bus_id, new_name); + devname_string = kmalloc(strlen(dev->bus_id) + 15, GFP_KERNEL); + if (!devname_string) { + put_device(dev); + return -ENOMEM; + } + sprintf(devname_string, "INTERFACE_OLD=%s", dev->bus_id); + envp[0] = devname_string; + envp[1] = NULL; + #ifdef CONFIG_SYSFS_DEPRECATED if ((dev->class) && (dev->parent)) old_class_name = make_class_name(dev->class->name, &dev->kobj); @@ -1049,12 +1060,20 @@ int device_rename(struct device *dev, ch sysfs_create_link(&dev->class->subsys.kset.kobj, &dev->kobj, dev->bus_id); } + + /* This function is only used for network interface. +* Some hotplug package track interfaces by their name and +* therefore want to know when the name is changed by the user. */ + if(!error) + kobject_uevent_env(&dev->kobj, KOBJ_RENAME, envp); + put_device(dev); kfree(new_class_name); kfree(old_symlink_name); out_free_old_class: kfree(old_class_name); + kfree(devname_string); return error; } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data
On Wed, Feb 28, 2007 at 03:11:03PM -0800, Stephen Hemminger wrote: > On Wed, 28 Feb 2007 15:40:31 -0700 > "Dale Farnsworth" <[EMAIL PROTECTED]> wrote: > > > The information contained within platform_data should be self-contained. > > Replace the pointer to a MAC address with the actual MAC address in > > struct mv643xx_eth_platform_data. > > > > Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]> > > > > Index: b/drivers/net/mv643xx_eth.c > > === > > --- a/drivers/net/mv643xx_eth.c > > +++ b/drivers/net/mv643xx_eth.c > > @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat > > > > pd = pdev->dev.platform_data; > > if (pd) { > > - if (pd->mac_addr) > > + static u8 zero_mac_addr[6] = { 0 }; > > + > > + if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0) > > memcpy(dev->dev_addr, pd->mac_addr, 6); > > > is_zero_ether_addr() is faster/cleaner for this Thanks. I follow up with a modified patch in a day or two. -Dale - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH: Second try at vlan mailing list patch.
Hopefully, by attaching it as a file it will not screw up the tabs & spaces. Signed-off-by: Ben Greear <[EMAIL PROTECTED]> Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 18fcb9f..c4209c8 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -3,7 +3,8 @@ * Ethernet-type device handling. * * Authors: Ben Greear <[EMAIL PROTECTED]> - * Please send support related email to: [EMAIL PROTECTED] + * Please send support related email to: [EMAIL PROTECTED] + *after subscribing using the link below. * VLAN Home Page: http://www.candelatech.com/~greear/vlan.html * * Fixes: diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index e49e252..203cd54 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -3,7 +3,8 @@ * Ethernet-type device handling. * * Authors: Ben Greear <[EMAIL PROTECTED]> - * Please send support related email to: [EMAIL PROTECTED] + * Please send support related email to: [EMAIL PROTECTED] + *after subscribing using the web page below. * VLAN Home Page: http://www.candelatech.com/~greear/vlan.html * * Fixes: Mar 22 2001: Martin Bokaemper <[EMAIL PROTECTED]>
Re: [PATCH 4/5] r8169: more alignment for the 0x8168
Francois Romieu <[EMAIL PROTECTED]> : [...] The experimental r8169 patch of the day against 2.6.21-rc2 is available at: http://www.fr.zoreil.com/linux/www.fr.zoreil.com/people/francois/misc/20070228-2.6.21-rc2-r8169-test.patch (single patch) or: http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.21-rc2 (series) Log below: commit 6686d80d6972cd5ff3ca81b72c46f4ffcc40eb4c Author: Francois Romieu <[EMAIL PROTECTED]> Date: Wed Feb 28 23:16:57 2007 +0100 r8169: align the IP header when there is no DMA constraint Align the IP header when the chipset can DMA at any location (plain 0x8169). Otherwise (0x8136/0x8168) obey the constraint imposed by the hardware. This patch complements the previous alignment rework done for copybreak. Original idea from Philip Craig <[EMAIL PROTECTED]> Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Philip Craig <[EMAIL PROTECTED]> Cc: Mike Isely <[EMAIL PROTECTED]> commit d20a6ba195172f7fb9fd30832a054effb9773bc3 Author: Francois Romieu <[EMAIL PROTECTED]> Date: Fri Feb 23 23:50:28 2007 +0100 r8169.c: add bit description for the TxPoll register Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit 37dc1270eba2874a00564abe0d857429af5370f2 Author: Francois Romieu <[EMAIL PROTECTED]> Date: Fri Feb 23 23:24:55 2007 +0100 r8169: MSI support It is currently limited to 0x8136 and 0x8168. 8169sb/8110sb ought to handle it as well where they support MSI. Includes unregister_netdev() fix from Bernhard Walle <[EMAIL PROTECTED]> against BUG_ON(irq_has_action(dev->first_msi_irq)) (2007/02/24). Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Fixed-by: Bernhard Walle <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit b388fb659dc5803cdb2293649e25807f88ba94ec Author: Francois Romieu <[EMAIL PROTECTED]> Date: Wed Feb 21 22:40:46 2007 +0100 r8169: cleanup No functionnal change: - trim the old history log - whitespace/indent/case police - unsigned int where signedness does not matte - removal of obsolete assert - needless cast from void * (dev_instance) Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit 090121a1d8b9452fd454fa44ba67d9761a6e8f1e Author: Francois Romieu <[EMAIL PROTECTED]> Date: Wed Feb 21 00:10:20 2007 +0100 r8169: remove the media option It has been documented as deprecated: - in MODULE_PARM_DESC since may 2005 ; - at the top of the source file and in printk since june 2004. Good bye. Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit 5231dd72b4d9551c6cd8baa9b7026a1f21b12052 Author: Francois Romieu <[EMAIL PROTECTED]> Date: Tue Feb 20 22:58:51 2007 +0100 r8169: small 8101 comment Extracted from version 1.001.00 of Realtek's r8101. Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit 20be52f668774727ba1d1a3606cc2888f66d40bf Author: Francois Romieu <[EMAIL PROTECTED]> Date: Tue Feb 20 22:20:51 2007 +0100 r8169: confusion between hardware and IP header alignment The rx copybreak part is straightforward. The align field in struct rtl_cfg_info is related to the alignment requirements of the DMA operation. Its value is set at 2 to limit the scale of possible regression but my old v1.21 8169 datasheet claims a 8 bytes requirements (that was never followed by the driver of course) and the 8101/8168 go with a plain 8 bytes alignments. Yuck... /me waits for the attack of ballistic vegetables... Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit b0ee36861173a3ac57017c8a3850ad21a4c1acf6 Author: Francois Romieu <[EMAIL PROTECTED]> Date: Tue Feb 20 00:00:26 2007 +0100 r8169: merge with version 8.001.00 of Realtek's r8168 driver This one includes: - more tweaks to rtl_hw_start_8168 - a work around for a Rx FiFO overflow issue on the 8168Bb + rtl8169_{intr_mask/napi_event} are replaced with per-device fields + rtl_cfg_info is converted to C99 for readability but the values are not changed for the 8169/8110 and the 8101 Includes ChipCmd fix from Bernhard Walle <[EMAIL PROTECTED]> (2007/02/24). Signed-off-by: Francois Romieu <[EMAIL PROTECTED]> Cc: Edward Hsu <[EMAIL PROTECTED]> commit 9d4139624a1c2ae138ea043083263c84d14bbd3a Author: Francois Romieu <[EMAIL PROTECTED]> Date: Tue Feb 13 23:38:05 2007 +0100 r8169: merge with version 6.001.00 of Realtek's r8169 driver - new identif
Re: [PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data
On Wed, 28 Feb 2007 15:40:31 -0700 "Dale Farnsworth" <[EMAIL PROTECTED]> wrote: > The information contained within platform_data should be self-contained. > Replace the pointer to a MAC address with the actual MAC address in > struct mv643xx_eth_platform_data. > > Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]> > > Index: b/drivers/net/mv643xx_eth.c > === > --- a/drivers/net/mv643xx_eth.c > +++ b/drivers/net/mv643xx_eth.c > @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat > > pd = pdev->dev.platform_data; > if (pd) { > - if (pd->mac_addr) > + static u8 zero_mac_addr[6] = { 0 }; > + > + if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0) > memcpy(dev->dev_addr, pd->mac_addr, 6); is_zero_ether_addr() is faster/cleaner for this -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: need some help on a backport of r8169
[EMAIL PROTECTED] <[EMAIL PROTECTED]> : [...] > The result is as follows : > I boot my new kernel : the r8169 driver is automatically loaded and > find the network card and gives me an eth0. > I do a ifconfig, eth0 is up, with an IP and RX and TX are not 0. Interesting. > The problem comes here, I do a ping and it seems to have just the time > to make the DNS resolution but not further. When I do a new ifconfig, > the TX dropped is not 0 anymore. Then I can turn up and down my > interface, I won't be able to ping anything. Ok, almost perfect for a first try. :o) If you can issue 'ifconfig' and do an ethtool dump of the registers at the interesting points in time, it could surely help. [...] > Ah... poor me who thought that the RTL8168 was just like the RTL8169 with > a pci express interface... It seems that a PCI-Express RTL8169 also > exist right? Remind me to check it later. [...] > Do you think my problem is the one you mentionned above, without the > experimental patches? It is possible. I should review the diffs too. Once you have logged the ifconfig/ethtool dump, you can try the serie or the patch at: http://www.fr.zoreil.com/people/francois/backport/r8169/20070228-00 Btw: [...dmesg dump...] > Enabling fast FPU save and restore... done. > Enabling unmasked SIMD FPU exception support... done. > Checking 'hlt' instruction... OK. > ACPI: setting ELCR to 0200 (from 0c08) > NET: Registered protocol family 16 > PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=2 > PCI: Using MMCONFIG Please disable MMCONFIG. If you have any PCI latency option in your bios, set it to 64. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] mv643xx_eth: Place explicit port number in mv643xx_eth_platform_data
We had been using the platform_device.id field to identify which ethernet port is used for mv643xx_eth device. This is not correct in general. It will be incorrect, for example, if a hardware platform uses a single port but not the first port. Here, we add an explicit port_number field to struct mv643xx_eth_platform_data. This makes the mv643xx_eth_platform_data structure required, but that isn't an issue since all users currently provide it already. Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]> diff --git a/arch/mips/momentum/jaguar_atx/platform.c b/arch/mips/momentum/jaguar_atx/platform.c Index: b/arch/mips/momentum/jaguar_atx/platform.c === --- a/arch/mips/momentum/jaguar_atx/platform.c +++ b/arch/mips/momentum/jaguar_atx/platform.c @@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso }; static struct mv643xx_eth_platform_data eth0_pd = { + .port_number= 0, + .tx_sram_addr = MV_SRAM_BASE_ETH0, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso }; static struct mv643xx_eth_platform_data eth1_pd = { + .port_number= 1, + .tx_sram_addr = MV_SRAM_BASE_ETH1, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso }, }; -static struct mv643xx_eth_platform_data eth2_pd; +static struct mv643xx_eth_platform_data eth2_pd = { + .port_number= 2, +}; static struct platform_device eth2_device = { .name = MV643XX_ETH_NAME, Index: b/arch/mips/momentum/ocelot_3/platform.c === --- a/arch/mips/momentum/ocelot_3/platform.c +++ b/arch/mips/momentum/ocelot_3/platform.c @@ -48,6 +48,8 @@ static struct resource mv64x60_eth0_reso }; static struct mv643xx_eth_platform_data eth0_pd = { + .port_number= 0, + .tx_sram_addr = MV_SRAM_BASE_ETH0, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -77,6 +79,8 @@ static struct resource mv64x60_eth1_reso }; static struct mv643xx_eth_platform_data eth1_pd = { + .port_number= 1, + .tx_sram_addr = MV_SRAM_BASE_ETH1, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -105,7 +109,9 @@ static struct resource mv64x60_eth2_reso }, }; -static struct mv643xx_eth_platform_data eth2_pd; +static struct mv643xx_eth_platform_data eth2_pd = { + .port_number= 2, +}; static struct platform_device eth2_device = { .name = MV643XX_ETH_NAME, Index: b/arch/mips/momentum/ocelot_c/platform.c === --- a/arch/mips/momentum/ocelot_c/platform.c +++ b/arch/mips/momentum/ocelot_c/platform.c @@ -47,6 +47,8 @@ static struct resource mv64x60_eth0_reso }; static struct mv643xx_eth_platform_data eth0_pd = { + .port_number= 0, + .tx_sram_addr = MV_SRAM_BASE_ETH0, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -76,6 +78,8 @@ static struct resource mv64x60_eth1_reso }; static struct mv643xx_eth_platform_data eth1_pd = { + .port_number= 1, + .tx_sram_addr = MV_SRAM_BASE_ETH1, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, Index: b/arch/powerpc/platforms/chrp/pegasos_eth.c === --- a/arch/powerpc/platforms/chrp/pegasos_eth.c +++ b/arch/powerpc/platforms/chrp/pegasos_eth.c @@ -58,6 +58,7 @@ static struct resource mv643xx_eth0_reso static struct mv643xx_eth_platform_data eth0_pd = { + .port_number= 0, .tx_sram_addr = PEGASOS2_SRAM_BASE_ETH0, .tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE, .tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16, @@ -87,6 +88,7 @@ static struct resource mv643xx_eth1_reso }; static struct mv643xx_eth_platform_data eth1_pd = { + .port_number= 1, .tx_sram_addr = PEGASOS2_SRAM_BASE_ETH1, .tx_sram_size = PEGASOS2_SRAM_TXRING_SIZE, .tx_queue_size = PEGASOS2_SRAM_TXRING_SIZE/16, Index: b/arch/ppc/syslib/mv64x60.c === --- a/arch/ppc/syslib/mv64x60.c +++ b/arch/ppc/syslib/mv64x60.c @@ -339,7 +339,9 @@ static struct resource mv64x60_eth0_reso }, }; -static struct mv643xx_eth_platform_data eth0_pd; +static struct mv643xx_eth_platform_data eth0_pd = { + .port_number= 0, +}; static struct platform_device eth0_device = { .name = MV643XX_ETH_NAME, @@ -362,7 +364,9 @@ static struct resource mv64x60_eth1_reso }, }; -static struct mv643xx_et
[PATCH 1/2] mv643xx_eth: move mac_addr inside of mv643xx_eth_platform_data
The information contained within platform_data should be self-contained. Replace the pointer to a MAC address with the actual MAC address in struct mv643xx_eth_platform_data. Signed-off-by: Dale Farnsworth <[EMAIL PROTECTED]> Index: b/drivers/net/mv643xx_eth.c === --- a/drivers/net/mv643xx_eth.c +++ b/drivers/net/mv643xx_eth.c @@ -1380,7 +1380,9 @@ static int mv643xx_eth_probe(struct plat pd = pdev->dev.platform_data; if (pd) { - if (pd->mac_addr) + static u8 zero_mac_addr[6] = { 0 }; + + if (memcmp(pd->mac_addr, zero_mac_addr, 6) != 0) memcpy(dev->dev_addr, pd->mac_addr, 6); if (pd->phy_addr || pd->force_phy_addr) Index: b/include/linux/mv643xx.h === --- a/include/linux/mv643xx.h +++ b/include/linux/mv643xx.h @@ -1289,7 +1289,6 @@ struct mv64xxx_i2c_pdata { #define MV643XX_ETH_NAME "mv643xx_eth" struct mv643xx_eth_platform_data { - char*mac_addr; /* pointer to mac address */ u16 force_phy_addr; /* force override if phy_addr == 0 */ u16 phy_addr; @@ -1304,6 +1303,7 @@ struct mv643xx_eth_platform_data { u32 tx_sram_size; u32 rx_sram_addr; u32 rx_sram_size; + u8 mac_addr[6];/* mac address if non-zero*/ }; #endif /* __ASM_MV643XX_H */ Index: b/arch/mips/momentum/jaguar_atx/platform.c === --- a/arch/mips/momentum/jaguar_atx/platform.c +++ b/arch/mips/momentum/jaguar_atx/platform.c @@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso }, }; -static char eth0_mac_addr[ETH_ALEN]; - static struct mv643xx_eth_platform_data eth0_pd = { - .mac_addr = eth0_mac_addr, - .tx_sram_addr = MV_SRAM_BASE_ETH0, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso }, }; -static char eth1_mac_addr[ETH_ALEN]; - static struct mv643xx_eth_platform_data eth1_pd = { - .mac_addr = eth1_mac_addr, - .tx_sram_addr = MV_SRAM_BASE_ETH1, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso }, }; -static char eth2_mac_addr[ETH_ALEN]; - -static struct mv643xx_eth_platform_data eth2_pd = { - .mac_addr = eth2_mac_addr, -}; +static struct mv643xx_eth_platform_data eth2_pd; static struct platform_device eth2_device = { .name = MV643XX_ETH_NAME, @@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo int ret; get_mac(mac); - eth_mac_add(eth0_mac_addr, mac, 0); - eth_mac_add(eth1_mac_addr, mac, 1); - eth_mac_add(eth2_mac_addr, mac, 2); + eth_mac_add(eth0_pd.mac_addr, mac, 0); + eth_mac_add(eth1_pd.mac_addr, mac, 1); + eth_mac_add(eth2_pd.mac_addr, mac, 2); ret = platform_add_devices(mv643xx_eth_pd_devs, ARRAY_SIZE(mv643xx_eth_pd_devs)); Index: b/arch/mips/momentum/ocelot_3/platform.c === --- a/arch/mips/momentum/ocelot_3/platform.c +++ b/arch/mips/momentum/ocelot_3/platform.c @@ -47,11 +47,7 @@ static struct resource mv64x60_eth0_reso }, }; -static char eth0_mac_addr[ETH_ALEN]; - static struct mv643xx_eth_platform_data eth0_pd = { - .mac_addr = eth0_mac_addr, - .tx_sram_addr = MV_SRAM_BASE_ETH0, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -80,11 +76,7 @@ static struct resource mv64x60_eth1_reso }, }; -static char eth1_mac_addr[ETH_ALEN]; - static struct mv643xx_eth_platform_data eth1_pd = { - .mac_addr = eth1_mac_addr, - .tx_sram_addr = MV_SRAM_BASE_ETH1, .tx_sram_size = MV_SRAM_TXRING_SIZE, .tx_queue_size = MV_SRAM_TXRING_SIZE / 16, @@ -113,11 +105,7 @@ static struct resource mv64x60_eth2_reso }, }; -static char eth2_mac_addr[ETH_ALEN]; - -static struct mv643xx_eth_platform_data eth2_pd = { - .mac_addr = eth2_mac_addr, -}; +static struct mv643xx_eth_platform_data eth2_pd; static struct platform_device eth2_device = { .name = MV643XX_ETH_NAME, @@ -200,9 +188,9 @@ static int __init mv643xx_eth_add_pds(vo int ret; get_mac(mac); - eth_mac_add(eth0_mac_addr, mac, 0); - eth_mac_add(eth1_mac_addr, mac, 1); - eth_mac_add(eth2_mac_addr, mac, 2); + eth_mac_add(eth0_pd.mac_addr, mac, 0); + eth_mac_add(eth1_pd.mac_addr, mac, 1); + eth_mac_add(eth2_pd.mac_addr, mac, 2); ret = platform_
Re: [PATCH] bonding: make IGMP flooding on active-backup bonds configurable
Andy Gospodarek <[EMAIL PROTECTED]> wrote: >A while back the following change was made to the bonding code: > >commit df49898a47061e82219c991dfbe9ac6ddf7a866b >Author: John W. Linville <[EMAIL PROTECTED]> >Date: Tue Oct 18 21:30:58 2005 -0400 > >[PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack > >Expand comment explaining MAC address selection for replicated IGMP >frames transmitted in bonding mode 1 (active-backup). Also, a small >whitespace cleanup. > >Signed-off-by: John W. Linville <[EMAIL PROTECTED]> >Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> > >In general this patch is good, but this tweaks that feature by allowing >that functionality to be enabled and disabled. This patch adds a new >module option as well as a sysfs entry. It sets the default to be the >current behavior so existing users shouldn't notice any difference. Why would you want to turn this off? Also, I've got a replacement patch for this functionality that seems to be better in all regards. It sends bonus IGMP joins when a failover occurs, rather than simply duplicating them on all slaves (the current system can leave switches in the dark if the slaves fail back to the originals). As chance would have it, I'm planning to post it as part of a set in a a little while. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bonding: make IGMP flooding on active-backup bonds configurable
A while back the following change was made to the bonding code: commit df49898a47061e82219c991dfbe9ac6ddf7a866b Author: John W. Linville <[EMAIL PROTECTED]> Date: Tue Oct 18 21:30:58 2005 -0400 [PATCH] bonding: cleanup comment for mode 1 IGMP xmit hack Expand comment explaining MAC address selection for replicated IGMP frames transmitted in bonding mode 1 (active-backup). Also, a small whitespace cleanup. Signed-off-by: John W. Linville <[EMAIL PROTECTED]> Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]> In general this patch is good, but this tweaks that feature by allowing that functionality to be enabled and disabled. This patch adds a new module option as well as a sysfs entry. It sets the default to be the current behavior so existing users shouldn't notice any difference. Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> --- drivers/net/bonding/bond_main.c | 65 +++ drivers/net/bonding/bond_sysfs.c | 46 +++ drivers/net/bonding/bonding.h|1 include/linux/if_bonding.h |3 + 4 files changed, 102 insertions(+), 13 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a7c8f98..b531d4a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -96,6 +96,7 @@ static char *xmit_hash_policy = NULL; static int arp_interval = BOND_LINK_ARP_INTERV; static char *arp_ip_target[BOND_MAX_ARP_TARGETS] = { NULL, }; static char *arp_validate = NULL; +static char *igmp_flood = NULL; struct bond_params bonding_defaults; module_param(max_bonds, int, 0); @@ -129,6 +130,8 @@ module_param_array(arp_ip_target, charp, NULL, 0); MODULE_PARM_DESC(arp_ip_target, "arp targets in n.n.n.n form"); module_param(arp_validate, charp, 0); MODULE_PARM_DESC(arp_validate, "validate src/dst of ARP probes: none (default), active, backup or all"); +module_param(igmp_flood, charp, 0); +MODULE_PARM_DESC(igmp_flood, "flood IGMP control traffic on active-backup bonding: yes (default) or no"); /*- Global variables */ @@ -180,6 +183,12 @@ struct bond_parm_tbl arp_validate_tbl[] = { { NULL, -1}, }; +struct bond_parm_tbl igmp_flood_tbl[] = { +{ "no", BOND_IGMP_ACTIVEONLY}, +{ "yes", BOND_IGMP_ALLMEMBERS}, +{ NULL, -1}, +}; + /*-- Forward declarations ---*/ static void bond_send_gratuitous_arp(struct bonding *bond); @@ -3070,6 +3079,9 @@ static void bond_info_show_slave(struct seq_file *seq, const struct slave *slave slave->perm_hwaddr[2], slave->perm_hwaddr[3], slave->perm_hwaddr[4], slave->perm_hwaddr[5]); + seq_printf(seq, "IGMP Flood: %s\n", (bond->params.igmp_flood) ? + "yes" : "no"); + if (bond->params.mode == BOND_MODE_8023AD) { const struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator; @@ -4067,19 +4079,24 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d if (!bond->curr_active_slave) goto out; - /* Xmit IGMP frames on all slaves to ensure rapid fail-over - for multicast traffic on snooping switches */ - if (skb->protocol == __constant_htons(ETH_P_IP) && - skb->nh.iph->protocol == IPPROTO_IGMP) { - struct slave *slave, *active_slave; - int i; - - active_slave = bond->curr_active_slave; - bond_for_each_slave_from_to(bond, slave, i, active_slave->next, - active_slave->prev) - if (IS_UP(slave->dev) && - (slave->link == BOND_LINK_UP)) - bond_activebackup_xmit_copy(skb, bond, slave); + /* Let's make this behavior optional since it causes problems + when the links are connected to different switches. */ + if (bond->params.igmp_flood) { + + /* Xmit IGMP frames on all slaves to ensure rapid fail-over + for multicast traffic on snooping switches */ + if (skb->protocol == __constant_htons(ETH_P_IP) && + skb->nh.iph->protocol == IPPROTO_IGMP) { + struct slave *slave, *active_slave; + int i; + + active_slave = bond->curr_active_slave; + bond_for_each_slave_from_to(bond, slave, i, active_slave->next, + active_slave->prev) + if (IS_UP(slave->dev) && + (slave->link == BOND_LINK_UP)) +
Re: [PATCH 4/5] r8169: more alignment for the 0x8168
Sorry for the delay, I took some time to check the history of the r8169 alignment issues. Philip Craig <[EMAIL PROTECTED]> : [...] > This only partially helps. Many of the packets are greater than 200 > bytes so copybreak doesn't apply to them. Yes. > Can we assume anything about the alignment of skb->data? I think it > should be 4 byte aligned, otherwise the whole NET_IP_ALIGN thing > won't work. All the drivers I looked at just reserve NET_IP_ALIGN > without checking the alignment first. > > So can you do something like set align to 0 for RTL_CFG_0 and change > rtl8169_alloc_rx_skb() to: > skb_reserve(skb, align ? (align - 1) & (u32)skb->data : NET_IP_ALIGN); The "So" part assumes that the 0x8169 can DMA at any address. /me ponders... It's easy to debug if it misbehaves now or in 6 months on some obscure system. It's consistent with the preprevious code. Ok, good idea, I like it. [...] > BTW, should the alignment expression be: > (((u32)skb->data + (align - 1)) & ~(align - 1)) - (u32)skb->data I'll see if something can be hacked with a zero or power of two alignment. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 13:08:08 -0800 > I would be happy to see this go. Have you tried this code with > a SACK DoS stream? I.e. before you could consume a huge amount > of cpu time by giving an certain bad sequence of SACK's. This > code should have a better worst case run time. No I didn't do that. In fact this new code wedged my workstation overnight somehow and I need to debug that at some point as well. :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.
On Wed, 28 Feb 2007 11:49:49 -0800 (PST) David Miller <[EMAIL PROTECTED]> wrote: > > commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be > Author: David S. Miller <[EMAIL PROTECTED]> > Date: Tue Feb 27 19:28:07 2007 -0800 > > [TCP]: Kill fastpath_{skb,cnt}_hint. > > Now that we have per-skb fack_counts and an interval > search mechanism for the retransmit queue, we don't > need these things any more. > > Instead, as we traverse the SACK blocks to tag the > queue, we use the RB tree to lookup the first SKB > covering the SACK block by sequence number. > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> I would be happy to see this go. Have you tried this code with a SACK DoS stream? I.e. before you could consume a huge amount of cpu time by giving an certain bad sequence of SACK's. This code should have a better worst case run time. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4]: Store retransmit queue packets in RB tree.
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 21:33:50 +0100 > I am not sure this rb_node placement is optimal. rb lookups want to access > rb_node and end_seq. They should be placed in the same cache line :) Definitely an area for improvement for sure, but that's not the are of focus of these changes at this point :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping
On Wed, 28 Feb 2007, Paul Moore wrote: > The current CIPSO engine has a problem where it does not verify that the given > sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is > used. The end result is that bad packets are sent on the wire which should > have never been sent in the first place. This patch corrects this problem by > verifying the sensitivity level mapping similar to what is done with the > category mapping. This patch also changes the returned error code in this > case > to -EPERM to better match what the category mapping verification code returns. > > Signed-off-by: Paul Moore <[EMAIL PROTECTED]> [removed redhat-lspp, which is subscriber only] Acked-by: James Morris <[EMAIL PROTECTED]> > --- > net/ipv4/cipso_ipv4.c |7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > Index: net-2.6_bugfix/net/ipv4/cipso_ipv4.c > === > --- net-2.6_bugfix.orig/net/ipv4/cipso_ipv4.c > +++ net-2.6_bugfix/net/ipv4/cipso_ipv4.c > @@ -732,11 +732,12 @@ static int cipso_v4_map_lvl_hton(const s > *net_lvl = host_lvl; > return 0; > case CIPSO_V4_MAP_STD: > - if (host_lvl < doi_def->map.std->lvl.local_size) { > + if (host_lvl < doi_def->map.std->lvl.local_size && > + doi_def->map.std->lvl.local[host_lvl] < CIPSO_V4_INV_LVL) { > *net_lvl = doi_def->map.std->lvl.local[host_lvl]; > return 0; > } > - break; > + return -EPERM; > } > > return -EINVAL; > @@ -771,7 +772,7 @@ static int cipso_v4_map_lvl_ntoh(const s > *host_lvl = doi_def->map.std->lvl.cipso[net_lvl]; > return 0; > } > - break; > + return -EPERM; > } > > return -EINVAL; > > -- > paul moore > linux security @ hp > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4]: Store retransmit queue packets in RB tree.
David Miller a écrit : commit c387760826bd71103220e06ca7b0bf90a785567e Author: David S. Miller <[EMAIL PROTECTED]> Date: Tue Feb 27 16:44:42 2007 -0800 [TCP]: Store retransmit queue packets in RB tree. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4ff3940..b70fd21 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -232,6 +233,8 @@ struct sk_buff { struct sk_buff *next; struct sk_buff *prev; + struct rb_node rb; + struct sock *sk; struct skb_timeval tstamp; struct net_device *dev; I am not sure this rb_node placement is optimal. rb lookups want to access rb_node and end_seq. They should be placed in the same cache line :) next/prev were at the begining of sk_buff, there is no such constraint for rb - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NetLabel: Verify sensitivity level has a valid CIPSO mapping
The current CIPSO engine has a problem where it does not verify that the given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is used. The end result is that bad packets are sent on the wire which should have never been sent in the first place. This patch corrects this problem by verifying the sensitivity level mapping similar to what is done with the category mapping. This patch also changes the returned error code in this case to -EPERM to better match what the category mapping verification code returns. Signed-off-by: Paul Moore <[EMAIL PROTECTED]> --- net/ipv4/cipso_ipv4.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) Index: net-2.6_bugfix/net/ipv4/cipso_ipv4.c === --- net-2.6_bugfix.orig/net/ipv4/cipso_ipv4.c +++ net-2.6_bugfix/net/ipv4/cipso_ipv4.c @@ -732,11 +732,12 @@ static int cipso_v4_map_lvl_hton(const s *net_lvl = host_lvl; return 0; case CIPSO_V4_MAP_STD: - if (host_lvl < doi_def->map.std->lvl.local_size) { + if (host_lvl < doi_def->map.std->lvl.local_size && + doi_def->map.std->lvl.local[host_lvl] < CIPSO_V4_INV_LVL) { *net_lvl = doi_def->map.std->lvl.local[host_lvl]; return 0; } - break; + return -EPERM; } return -EINVAL; @@ -771,7 +772,7 @@ static int cipso_v4_map_lvl_ntoh(const s *host_lvl = doi_def->map.std->lvl.cipso[net_lvl]; return 0; } - break; + return -EPERM; } return -EINVAL; -- paul moore linux security @ hp - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4]: Kill fastpath_{skb,cnt}_hint.
commit 71b270d966cd42e29eabcd39434c4ad4d33aa2be Author: David S. Miller <[EMAIL PROTECTED]> Date: Tue Feb 27 19:28:07 2007 -0800 [TCP]: Kill fastpath_{skb,cnt}_hint. Now that we have per-skb fack_counts and an interval search mechanism for the retransmit queue, we don't need these things any more. Instead, as we traverse the SACK blocks to tag the queue, we use the RB tree to lookup the first SKB covering the SACK block by sequence number. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/linux/tcp.h b/include/linux/tcp.h index b73687a..c3f08a5 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -326,9 +326,7 @@ struct tcp_sock { struct sk_buff *scoreboard_skb_hint; struct sk_buff *retransmit_skb_hint; struct sk_buff *forward_skb_hint; - struct sk_buff *fastpath_skb_hint; - int fastpath_cnt_hint; int lost_cnt_hint; int retransmit_cnt_hint; int forward_cnt_hint; diff --git a/include/net/tcp.h b/include/net/tcp.h index 80a572b..408f210 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1047,12 +1047,12 @@ static inline void tcp_mib_init(void) } /*from STCP */ -static inline void clear_all_retrans_hints(struct tcp_sock *tp){ +static inline void clear_all_retrans_hints(struct tcp_sock *tp) +{ tp->lost_skb_hint = NULL; tp->scoreboard_skb_hint = NULL; tp->retransmit_skb_hint = NULL; tp->forward_skb_hint = NULL; - tp->fastpath_skb_hint = NULL; } /* MD5 Signature */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b919cd7..df69726 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -942,16 +942,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ struct tcp_sock *tp = tcp_sk(sk); unsigned char *ptr = ack_skb->h.raw + TCP_SKB_CB(ack_skb)->sacked; struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2); - struct sk_buff *cached_skb; int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)>>3; int reord = tp->packets_out; int prior_fackets; u32 lost_retrans = 0; int flag = 0; int dup_sack = 0; - int cached_fack_count; + int fack_count_base; int i; - int first_sack_index; if (!tp->sacked_out) tp->fackets_out = 0; @@ -1010,12 +1008,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tp->recv_sack_cache[i].end_seq = 0; } - first_sack_index = 0; if (flag) num_sacks = 1; else { int j; - tp->fastpath_skb_hint = NULL; /* order SACK blocks to allow in order walk of the retrans queue */ for (i = num_sacks-1; i > 0; i--) { @@ -1027,10 +1023,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ tmp = sp[j]; sp[j] = sp[j+1]; sp[j+1] = tmp; - - /* Track where the first SACK block goes to */ - if (j == first_sack_index) - first_sack_index = j+1; } } @@ -1040,22 +1032,17 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ /* clear flag as used for different purpose in following code */ flag = 0; - /* Use SACK fastpath hint if valid */ - cached_skb = tp->fastpath_skb_hint; - cached_fack_count = tp->fastpath_cnt_hint; - if (!cached_skb) { - cached_skb = tcp_write_queue_head(sk); - cached_fack_count = 0; - } - + fack_count_base = TCP_SKB_CB(tcp_write_queue_head(sk))->fack_count; for (i=0; istart_seq); __u32 end_seq = ntohl(sp->end_seq); int fack_count; - skb = cached_skb; - fack_count = cached_fack_count; + skb = tcp_write_queue_find(sk, start_seq); + if (!skb) + continue; + fack_count = TCP_SKB_CB(skb)->fack_count - fack_count_base; /* Event "B" in the comment above. */ if (after(end_seq, tp->high_seq)) @@ -1068,13 +1055,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (skb == tcp_send_head(sk)) break; - cached_skb = skb; - cached_fack_count = fack_count; - if (i == first_sack_index) { - tp->fastpath_skb_hint = skb; - tp->fastpath_cnt_hint = fa
[PATCH 3/4]: Maintain cached fack counts in retransmit queue.
commit 5fc24957defcc34df8fab6bf62bc1918e54607f8 Author: David S. Miller <[EMAIL PROTECTED]> Date: Tue Feb 27 17:23:52 2007 -0800 [TCP]: Maintain cached fack counts in retransmit queue. The fack count of any skb in the retransmit queue at any given point in time is: (skb->fack_count - head_skb->fack_count) And we'll use this in the SACK processing loops. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/net/tcp.h b/include/net/tcp.h index cce6b0e..80a572b 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -532,6 +532,7 @@ struct tcp_skb_cb { __u32 seq;/* Starting sequence number */ __u32 end_seq;/* SEQ + FIN + SYN + datalen*/ __u32 when; /* used to compute rtt's*/ + unsigned intfack_count; /* speed up SACK processing */ __u8flags; /* TCP header flags.*/ /* NOTE: These must match up to the flags byte in a @@ -1272,6 +1273,12 @@ static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root) static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) { + struct sk_buff *tail = tcp_write_queue_tail(sk); + unsigned int fc = 0; + + if (tail) + fc = TCP_SKB_CB(tail)->fack_count + tcp_skb_pcount(skb); + TCP_SKB_CB(skb)->fack_count = fc; __skb_queue_tail(&sk->sk_write_queue, skb); tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } @@ -1285,18 +1292,44 @@ static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb sk->sk_send_head = skb; } +/* This is only used for tcp_send_synack(), so the write queue should + * be empty. If that stops being true, the fack_count assignment + * will need to be more elaborate. + */ static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb) { + BUG_ON(!skb_queue_empty(&sk->sk_write_queue)); __skb_queue_head(&sk->sk_write_queue, skb); + TCP_SKB_CB(skb)->fack_count = 0; tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } +/* An insert into the middle of the write queue causes the fack + * counts in subsequent packets to become invalid, fix them up. + */ +static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *first) +{ + struct sk_buff *prev = first->prev; + unsigned int fc = 0; + + if (prev != (struct sk_buff *) &sk->sk_write_queue) + fc = TCP_SKB_CB(prev)->fack_count + tcp_skb_pcount(prev); + + while (first != (struct sk_buff *)&sk->sk_write_queue) { + TCP_SKB_CB(first)->fack_count = fc; + + fc += tcp_skb_pcount(first); + first = first->next; + } +} + /* Insert buff after skb on the write queue of sk. */ static inline void tcp_insert_write_queue_after(struct sk_buff *skb, struct sk_buff *buff, struct sock *sk) { __skb_append(skb, buff, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, buff); tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } @@ -1306,6 +1339,7 @@ static inline void tcp_insert_write_queue_before(struct sk_buff *new, struct sock *sk) { __skb_insert(new, skb->prev, skb, &sk->sk_write_queue); + tcp_reset_fack_counts(sk, new); tcp_rb_insert(skb, &tcp_sk(sk)->write_queue_rb); } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4]: Store retransmit queue packets in RB tree.
commit c387760826bd71103220e06ca7b0bf90a785567e Author: David S. Miller <[EMAIL PROTECTED]> Date: Tue Feb 27 16:44:42 2007 -0800 [TCP]: Store retransmit queue packets in RB tree. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4ff3940..b70fd21 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -232,6 +233,8 @@ struct sk_buff { struct sk_buff *next; struct sk_buff *prev; + struct rb_node rb; + struct sock *sk; struct skb_timeval tstamp; struct net_device *dev; diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 18a468d..b73687a 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -174,6 +174,7 @@ struct tcp_md5sig { #include #include +#include #include #include #include @@ -306,6 +307,7 @@ struct tcp_sock { u32 snd_cwnd_used; u32 snd_cwnd_stamp; + struct rb_root write_queue_rb; struct sk_buff_head out_of_order_queue; /* Out of order segments go here */ u32 rcv_wnd;/* Current receiver window */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 571faa1..cce6b0e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1169,6 +1169,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_stream_free_skb(sk, skb); + tcp_sk(sk)->write_queue_rb = RB_ROOT; sk_stream_mem_reclaim(sk); } @@ -1193,16 +1194,14 @@ static inline struct sk_buff *tcp_write_queue_next(struct sock *sk, struct sk_bu return skb->next; } -#define tcp_for_write_queue(skb, sk) \ - for (skb = (sk)->sk_write_queue.next; \ -(skb != (sk)->sk_send_head) && \ -(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ -skb = skb->next) +#define tcp_for_write_queue(skb, sk) \ + for (skb = (sk)->sk_write_queue.next; \ +(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ +skb = skb->next) -#define tcp_for_write_queue_from(skb, sk) \ - for (; (skb != (sk)->sk_send_head) && \ -(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ -skb = skb->next) +#define tcp_for_write_queue_from(skb, sk) \ + for (; (skb != (struct sk_buff *)&(sk)->sk_write_queue);\ +skb = skb->next) static inline struct sk_buff *tcp_send_head(struct sock *sk) { @@ -1211,7 +1210,7 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { - sk->sk_send_head = skb->next; + sk->sk_send_head = tcp_write_queue_next(sk, skb); if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) sk->sk_send_head = NULL; } @@ -1227,9 +1226,54 @@ static inline void tcp_init_send_head(struct sock *sk) sk->sk_send_head = NULL; } +static inline struct sk_buff *tcp_write_queue_find(struct sock *sk, __u32 seq) +{ + struct rb_node *rb_node = tcp_sk(sk)->write_queue_rb.rb_node; + struct sk_buff *skb = NULL; + + while (rb_node) { + struct sk_buff *tmp = rb_entry(rb_node,struct sk_buff,rb); + if (TCP_SKB_CB(tmp)->end_seq > seq) { + skb = tmp; + if (TCP_SKB_CB(tmp)->seq <= seq) + break; + rb_node = rb_node->rb_left; + } else + rb_node = rb_node->rb_right; + + } + return skb; +} + +static inline void tcp_rb_insert(struct sk_buff *skb, struct rb_root *root) +{ + struct rb_node **rb_link, *rb_parent; + __u32 seq = TCP_SKB_CB(skb)->seq; + + rb_link = &root->rb_node; + rb_parent = NULL; + while ((rb_parent = *rb_link) != NULL) { + struct sk_buff *tmp = rb_entry(rb_parent,struct sk_buff,rb); + if (TCP_SKB_CB(tmp)->end_seq > seq) { + BUG_ON(TCP_SKB_CB(tmp)->seq <= seq); + rb_link = &rb_parent->rb_left; + } else { + rb_link = &rb_parent->rb_right; + } + } + rb_link_node(&skb->rb, rb_parent, rb_link); + rb_insert_color(&skb->rb, root); +} + +static inline void tcp_rb_unlink(struct sk_buff *skb, struct rb_root *root) +{ + rb_erase(&skb->rb, root); +} + static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) {
[PATCH 1/4]: Abstract out TCP write queue operations.
commit 677417ba04ad2ce616a8199e337c8c9bb28f0692 Author: David S. Miller <[EMAIL PROTECTED]> Date: Tue Feb 27 14:24:25 2007 -0800 [TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/net/sock.h b/include/net/sock.h index 03684e7..c4023b7 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -710,15 +710,6 @@ static inline void sk_stream_mem_reclaim(struct sock *sk) __sk_stream_mem_reclaim(sk); } -static inline void sk_stream_writequeue_purge(struct sock *sk) -{ - struct sk_buff *skb; - - while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) - sk_stream_free_skb(sk, skb); - sk_stream_mem_reclaim(sk); -} - static inline int sk_stream_rmem_schedule(struct sock *sk, struct sk_buff *skb) { return (int)skb->truesize <= sk->sk_forward_alloc || @@ -1256,18 +1247,6 @@ static inline struct page *sk_stream_alloc_page(struct sock *sk) return page; } -#define sk_stream_for_retrans_queue(skb, sk) \ - for (skb = (sk)->sk_write_queue.next; \ -(skb != (sk)->sk_send_head) && \ -(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ -skb = skb->next) - -/*from STCP for fast SACK Process*/ -#define sk_stream_for_retrans_queue_from(skb, sk) \ - for (; (skb != (sk)->sk_send_head) && \ -(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ -skb = skb->next) - /* * Default write policy as shown to user space via poll/select/SIGIO */ diff --git a/include/net/tcp.h b/include/net/tcp.h index f0c9e34..571faa1 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1162,6 +1162,122 @@ static inline void tcp_put_md5sig_pool(void) put_cpu(); } +/* write queue abstraction */ +static inline void tcp_write_queue_purge(struct sock *sk) +{ + struct sk_buff *skb; + + while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) + sk_stream_free_skb(sk, skb); + sk_stream_mem_reclaim(sk); +} + +static inline struct sk_buff *tcp_write_queue_head(struct sock *sk) +{ + struct sk_buff *skb = sk->sk_write_queue.next; + if (skb == (struct sk_buff *) &sk->sk_write_queue) + return NULL; + return skb; +} + +static inline struct sk_buff *tcp_write_queue_tail(struct sock *sk) +{ + struct sk_buff *skb = sk->sk_write_queue.prev; + if (skb == (struct sk_buff *) &sk->sk_write_queue) + return NULL; + return skb; +} + +static inline struct sk_buff *tcp_write_queue_next(struct sock *sk, struct sk_buff *skb) +{ + return skb->next; +} + +#define tcp_for_write_queue(skb, sk) \ + for (skb = (sk)->sk_write_queue.next; \ +(skb != (sk)->sk_send_head) && \ +(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ +skb = skb->next) + +#define tcp_for_write_queue_from(skb, sk) \ + for (; (skb != (sk)->sk_send_head) && \ +(skb != (struct sk_buff *)&(sk)->sk_write_queue); \ +skb = skb->next) + +static inline struct sk_buff *tcp_send_head(struct sock *sk) +{ + return sk->sk_send_head; +} + +static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) +{ + sk->sk_send_head = skb->next; + if (sk->sk_send_head == (struct sk_buff *)&sk->sk_write_queue) + sk->sk_send_head = NULL; +} + +static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unlinked) +{ + if (sk->sk_send_head == skb_unlinked) + sk->sk_send_head = NULL; +} + +static inline void tcp_init_send_head(struct sock *sk) +{ + sk->sk_send_head = NULL; +} + +static inline void __tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) +{ + __skb_queue_tail(&sk->sk_write_queue, skb); +} + +static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb) +{ + __tcp_add_write_queue_tail(sk, skb); + + /* Queue it, remembering where we must start sending. */ + if (sk->sk_send_head == NULL) + sk->sk_send_head = skb; +} + +static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff *skb) +{ + __skb_queue_head(&sk->sk_write_queue, skb); +} + +/* Insert buff after skb on the write queue of sk. */ +static inline void tcp_insert_write_queue_after(struct sk_buff *skb, + struct sk_buff *buff, +
[PATCH 0/4]: Store TCP retransmit queue in RB-tree
I'd had this idea in the back of my head for a while and finally I tried it out yesterday to see how it would look. Basically, we store the write queue packets in an RB tree keyed by start sequence number. The objective is to use this information to parse the SACK blocks more efficiently and get rid of the "hints" we use to optimize that code currently. The big win is that we can now find the start of any sequence of packets in the retransmit queue in O(log n) time. The downsides are numerous, such as: 1) Increased state stored in sk_buff, we need to store an rb_node there which is 3 points :-((( It is possible that perhaps we could alias the rb_node with the existing next/prev pointers in sk_buff, but like the VMA code in the Linux MM, I decided to keep the linked list around since that's the fastest for all the other operations. rb_next() and rb_prev() would need to be used otherwise, and those aren't the cheapest. 2) We eat a O(log n) insert and delete for every packet now, even when no SACK processing occurs. I think this part can be sped up. We insert to the tail on new packets, so we can take the existing tail packet and do something similar to "rb_next()" to find the insertion point. But we'd still have the cost of the potential tree rotations. Even if none of the RB stuff is useful, the first patch which abstracts all of the write queue accesses should probably go in because it allows experimentation in this area to be done quite effortlessly. One thing I'm not sure about wrt. the RB tree stuff is that although we key on start sequence of the SKB, we do change this when trimming the head of TSO packets. I think this is "OK" because such changes would never change the position of the SKB within the RB tree, but I'd like someone else to think that is true too :-) Worst case we could key on end sequence which never ever changes. Actually, the whole case of tcp SKB chopping and mid-queue insert/delete, and it's effect on the RB tree entires needs to be audited if we are to take this idea seriously. I wonder if there is an algorithm better suited to this application. It's an interval search, which needs fast insert to the tail and fast delete from the head. Another aspect of this patch are the per-SKB packet counts (I named them "fack_count" but I'd rename that to "packet_count" when I ever commited it for real). The idea is that this can eliminate most of the packet count hints in the tcp_sock. The algorithm is simple: 1) On skb insert: if write queue empty, assign packet_count of zero else, assign packet_count of tail_skb->packet_count + tcp_skb_pcount(tail_skb) 2) To get normalized packet_count of arbitrary SKB: (skb->packet_count - head_skb->packet_count) You can see in patch 4 how fastpath_cnt_hint is replaced by that logic. I added the packet count to TCP_SKB_CB() and that takes it up to the limit of 48 bytes of skb->cb[] on 64-bit architectures. I wanted to steal TCP_SKB_CB()->ack_seq for this, but that's used for some SACK logic already. There are some pains necessary to keep the counts correct, and in some cases I just recalculate the whole queue's packet counts after the insert point. I'm sure this can be improved. Back to the RB tree, there is another way to go after at least the SACK processing overhead, and that is to maintain a not-sacked table which is the inverse of the SACK blocks. There are a few references out there which discuss that kind of approach. Some of the other hints are hard to eliminate. In particular the cached retransmit state is non-trivial to replace with logic that does not require state. The RB tree can't help, and we can't even use the per-SKB packet_count for the count hints because one of them wants to remember how many packets in the queue were marked lost, for example. If we could get rid of all the hints, that would be an easy 44 bytes killed in tcp_sock on 64-bit. Anyways, here come the patches, enjoy. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Missing VLAN tags in bnx2
Just had to spend some time figuring out why a bnx2 card connected to a switch monitor port didn't see any vlan tags (when in our scenario the tags are pretty vital). Found the following explanation: [BNX2]: Fix VLAN on ASF Always set up the device to strip incoming VLAN tags when ASF is enabled. ASF firmware will not parse packets correctly if VLAN tags are not stripped My fix: #ifdef I_DONT_KNOW_WHAT_ASF_IS_AND_DONT_REALLY_CARE_EITHER if (REG_RD_IND(bp, bp->shmem_base + BNX2_PORT_FEATURE) & BNX2_PORT_FEATURE_ASF_ENABLED) bp->flags |= ASF_ENABLE_FLAG; #endif Any hope of getting this as a ethtool tunable or something similar? There didn't seem to be a BIOS option for this (Dell PE2900), at least. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP minisock tcp_create_openreq_child() typo?
From: "Arnaldo Carvalho de Melo" <[EMAIL PROTECTED]> Date: Wed, 28 Feb 2007 09:10:10 -0300 > On 2/28/07, KOVACS Krisztian <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > While reading TCP minisock code I've found this suspiciously looking > > code fragment: > > > > - 8< - > > struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock > > *req, struct sk_buff *skb) > > { > > struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC); > > > > if (newsk != NULL) { > > const struct inet_request_sock *ireq = inet_rsk(req); > > struct tcp_request_sock *treq = tcp_rsk(req); > > struct inet_connection_sock *newicsk = inet_csk(sk); > > struct tcp_sock *newtp; > > - 8< - > > > > The above code initializes newicsk to inet_csk(sk), isn't that supposed > > to be inet_csk(newsk)? As far as I can tell this might leave > > icsk_ack.last_seg_size zero even if we do have received data. > > Good catch! > > David, please apply the attached patch. > > Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Applied, thanks everyone. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, 2007-02-28 at 10:51 -0800, Jean Tourrilhes wrote: > That's why I always specify the kernel version. I'll look into > that, I'm sure it's not the end of the world ;-) Sure, just wanted to point it out. > In which sense ? Wireless interface are regular netdevices. Yeah but in mac80211 we have the wiphy concept since multiple virtual interfaces can be associated to one hardware, and that is where QoS is done, not the netdevs. Of course, those interested can just listen to nl80211 events to figure out if someone renamed a 802.11 phy, but things like hal would probably not want to and still know about the name change. > I'm just trying to follow the established pattern. Both > class_device_add() and class_device_del() are generating the > event. Also, I'm not sure if other subsystem would benefit from it, I > don't want to generate too many useless events. I don't think many other subsystems (can) rename things ;) johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH] Fix broken RBTX4927 support in ne.c
On Thu, Mar 01, 2007 at 01:22:23AM +0900, Atsushi Nemoto wrote: > There are some ifdefs for RBTX4927, but need some more bits. Acked-by: Ralf Baechle <[EMAIL PROTECTED]> Longer term I think NE2000 will need to support platform_devices. It's been used too widely in too creative ways and we don't want all the clutter to deal with that in ne.c. Ralf - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:16:05AM +0100, Johannes Berg wrote: > Hi, > > > Patch for 2.6.20 is attached. > > ... and in the meantime netdevices aren't class_device any more :) IOW, > your patch isn't going to work any more. That's why I always specify the kernel version. I'll look into that, I'm sure it's not the end of the world ;-) > Also, I think wireless could benefit from this as well. In which sense ? Wireless interface are regular netdevices. > > The kobject framework is well designed, so adding these > > features is trivial change and won't run the risk of breaking anything > > (famous last words). Obviously, hotplug apps are free to ignore those > > additional features. > > Why not just add this to base kobject_rename instead? That way, > userspace is notified for all renames in sysfs. > The patch then collapses down to the change in net's sysfs code to add > the ifindex to the environment, and another change in kobject to invoke > a new event when a name changes and show the old name. I'm just trying to follow the established pattern. Both class_device_add() and class_device_del() are generating the event. Also, I'm not sure if other subsystem would benefit from it, I don't want to generate too many useless events. > johannes Thanks ! Jean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote: > On 28-02-2007 02:27, Jean Tourrilhes wrote: > > Hi all, > ... > > Patch for 2.6.20 is attached. The patch was tested on a system > > running the hotplug scripts, and on another system running udev. > > > > Have fun... > > > > Jean > > > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> > > > > - > ... > > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c > > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800 > > +++ linux/net/core/net-sysfs.c 2007-02-27 15:06:49.0 -0800 > > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de > > if ((size <= 0) || (i >= num_envp)) > > return -ENOMEM; > > > > + /* pass ifindex to uevent. > > +* ifindex is useful as it won't change (interface name may change) > > +* and is what RtNetlink uses natively. */ > > + envp[i++] = buf; > > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1; > > + buf += n; > > + size -= n; > > + > > + if ((size <= 0) || (i >= num_envp)) > > Btw.: > 1. if size == 10 and snprintf returns 9 (without NULL) >then n == 10 (with NULL), so isn't it enough (here and above): > > if ((size < 0) || (i >= num_envp)) I just cut'n'pasted the code a few line above. If the original code is incorrect, it need fixing. And it will need fixing in probably a lot of places. > 2. shouldn't there be (here and above): > > envp[--i] = NULL; > No, envp is local, so who cares. Thanks. Jean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 07:36:17AM -0800, Greg KH wrote: > On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote: > > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c > > --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800 > > +++ linux/drivers/base/class.c 2007-02-27 15:52:37.0 -0800 > > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev > > This function is not in the 2.6.21-rc2 kernel, so you might want to > rework this patch a bit :) It was a trial balloon to gather feedback. I will do. > Also, it's userspace that causes the rename to happen, so it knows it > did it, why should the kernel have to emit a message to tell userspace > again what just happened? Username is not one big program, but a collection of program, and one program does not know what another program do. In particular, udev does not know when people are using iproute2 to rename interface and loose its marbles. We don't really want to ban iproute2 or udev ;-) > thanks, > > greg k-h Have fun... Jean - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
On Wed, 28 Feb 2007 14:37:49 +0100 John <[EMAIL PROTECTED]> wrote: > John wrote: > > > I know it's possible to have Linux timestamp incoming datagrams as soon > > as they are received, then for one to retrieve this timestamp later with > > an ioctl command or a recvmsg call. > > Has it ever been proposed to modify struct skb_timeval to hold > nanosecond stamps instead of just microsecond stamps? Then make the > improved precision somehow available to user space. > I am playing with a couple of possible future changes. 1. Change skb timestamp to be a timespec instead of timeval, for ABI compatiablity the existing SO_TIMESTAMP has to stay microseconds, but add a new SO_TIMESPEC to get the nanosecond version. The change gets non trivial because of other uses of timestamp (like vegas) so I gave up for now. 2. Use hardware receive timestamp in Yukon2 to put actual receive time into skb timestamp. Works, but still figuring out how to manage clock skew/resync. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP
This patch provides a functionality that allows parallel RX processing on multiple RX queues by using dummy netdevices. Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> --- diff -Nurp -X dontdiff linux-2.6.21-rc1/drivers/net/ehea/ehea.h patched_kernel/drivers/net/ehea/ehea.h --- linux-2.6.21-rc1/drivers/net/ehea/ehea.h2007-02-28 18:20:06.0 +0100 +++ patched_kernel/drivers/net/ehea/ehea.h 2007-02-28 18:21:23.0 +0100 @@ -39,7 +39,7 @@ #include #define DRV_NAME "ehea" -#define DRV_VERSION"EHEA_0048" +#define DRV_VERSION"EHEA_0052" #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) @@ -78,8 +78,6 @@ #define EHEA_RQ2_PKT_SIZE 1522 #define EHEA_L_PKT_SIZE 256/* low latency */ -#define EHEA_POLL_MAX_RWQE 1000 - /* Send completion signaling */ #define EHEA_SIG_IV_LONG 1 @@ -357,8 +355,8 @@ struct ehea_port_res { struct ehea_qp *qp; struct ehea_cq *send_cq; struct ehea_cq *recv_cq; - struct ehea_eq *send_eq; - struct ehea_eq *recv_eq; + struct ehea_eq *eq; + struct net_device *d_netdev; spinlock_t send_lock; struct ehea_q_skb_arr rq1_skba; struct ehea_q_skb_arr rq2_skba; @@ -372,7 +370,6 @@ struct ehea_port_res { int swqe_count; u32 swqe_id_counter; u64 tx_packets; - struct tasklet_struct send_comp_task; spinlock_t recv_lock; struct port_state p_state; u64 rx_packets; @@ -416,7 +413,9 @@ struct ehea_port { char int_aff_name[EHEA_IRQ_NAME_SIZE]; int allmulti;/* Indicates IFF_ALLMULTI state */ int promisc; /* Indicates IFF_PROMISC state */ + int num_tx_qps; int num_add_tx_qps; + int num_mcs; int resets; u64 mac_addr; u32 logical_port_id; diff -Nurp -X dontdiff linux-2.6.21-rc1/drivers/net/ehea/ehea_main.c patched_kernel/drivers/net/ehea/ehea_main.c --- linux-2.6.21-rc1/drivers/net/ehea/ehea_main.c 2007-02-28 18:20:06.0 +0100 +++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-02-28 18:21:29.0 +0100 @@ -51,13 +51,18 @@ static int rq1_entries = EHEA_DEF_ENTRIE static int rq2_entries = EHEA_DEF_ENTRIES_RQ2; static int rq3_entries = EHEA_DEF_ENTRIES_RQ3; static int sq_entries = EHEA_DEF_ENTRIES_SQ; +static int use_mcs = 0; +static int num_tx_qps = EHEA_NUM_TX_QP; module_param(msg_level, int, 0); module_param(rq1_entries, int, 0); module_param(rq2_entries, int, 0); module_param(rq3_entries, int, 0); module_param(sq_entries, int, 0); +module_param(use_mcs, int, 0); +module_param(num_tx_qps, int, 0); +MODULE_PARM_DESC(num_tx_qps, "Number of TX-QPS"); MODULE_PARM_DESC(msg_level, "msg_level"); MODULE_PARM_DESC(rq3_entries, "Number of entries for Receive Queue 3 " "[2^x - 1], x = [6..14]. Default = " @@ -71,6 +76,7 @@ MODULE_PARM_DESC(rq1_entries, "Number of MODULE_PARM_DESC(sq_entries, " Number of entries for the Send Queue " "[2^x - 1], x = [6..14]. Default = " __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ")"); +MODULE_PARM_DESC(use_mcs, " 0:NAPI, 1:Multiple receive queues, Default = 1 "); void ehea_dump(void *adr, int len, char *msg) { int x; @@ -197,7 +203,7 @@ static int ehea_refill_rq_def(struct ehe struct sk_buff *skb = netdev_alloc_skb(dev, packet_size); if (!skb) { ehea_error("%s: no mem for skb/%d wqes filled", - dev->name, i); + pr->port->netdev->name, i); q_skba->os_skbs = fill_wqes - i; ret = -ENOMEM; break; @@ -345,10 +351,11 @@ static int ehea_treat_poll_error(struct return 0; } -static int ehea_poll(struct net_device *dev, int *budget) +static struct ehea_cqe *ehea_proc_rwqes(struct net_device *dev, + struct ehea_port_res *pr, + int *budget) { - struct ehea_port *port = netdev_priv(dev); - struct ehea_port_res *pr = &port->port_res[0]; + struct ehea_port *port = pr->port; struct ehea_qp *qp = pr->qp; struct ehea_cqe *cqe; struct sk_buff *skb; @@ -359,14 +366,12 @@ static int ehea_poll(struct net_device * int skb_arr_rq2_len = pr->rq2_skba.len; int skb_arr_rq3_len = pr->rq3_skba.len; int processed, processed_rq1, processed_rq2, processed_rq3; - int wqe_index, last_wqe_index, rq, intreq, my_quota, port_reset; + int wqe_index, last_wqe_index, rq, my_quota, port_reset; processed = processed_rq1 = processed_rq2 = processed_rq3 = 0; last_wqe_index = 0; my_quota = min(*budget, dev->quota); - my_quota = min(my_quota, EHEA_POLL_MAX_RWQE); - /* rq0 i
[PATCH 1/2] ehea: dynamic add / remove port
This patch introduces functionality to dynamically add / remove ehea ports via an userspace DLPAR tool. It creates a subnode for each logical port in the sysfs. Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> --- diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index 42295d6..e595d6b 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -39,7 +39,7 @@ #include #include #define DRV_NAME "ehea" -#define DRV_VERSION"EHEA_0046" +#define DRV_VERSION"EHEA_0048" #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) @@ -380,10 +380,11 @@ struct ehea_port_res { }; +#define EHEA_MAX_PORTS 16 struct ehea_adapter { u64 handle; - u8 num_ports; - struct ehea_port *port[16]; + struct ibmebus_dev *ebus_dev; + struct ehea_port *port[EHEA_MAX_PORTS]; struct ehea_eq *neq; /* notification event queue */ struct workqueue_struct *ehea_wq; struct tasklet_struct neq_tasklet; @@ -406,7 +407,7 @@ struct ehea_port { struct net_device *netdev; struct net_device_stats stats; struct ehea_port_res port_res[EHEA_MAX_PORT_RES]; - struct device_node *of_dev_node; /* Open Firmware Device Node */ + struct of_device ofdev; /* Open Firmware Device */ struct ehea_mc_list *mc_list;/* Multicast MAC addresses */ struct vlan_group *vgrp; struct ehea_eq *qp_eq; diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 1ef3846..42edd8d 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -580,7 +580,7 @@ static struct ehea_port *ehea_get_port(s { int i; - for (i = 0; i < adapter->num_ports; i++) + for (i = 0; i < EHEA_MAX_PORTS; i++) if (adapter->port[i]) if (adapter->port[i]->logical_port_id == logical_port) return adapter->port[i]; @@ -2274,8 +2274,6 @@ static void ehea_tx_watchdog(struct net_ int ehea_sense_adapter_attr(struct ehea_adapter *adapter) { struct hcp_query_ehea *cb; - struct device_node *lhea_dn = NULL; - struct device_node *eth_dn = NULL; u64 hret; int ret; @@ -2292,18 +2290,6 @@ int ehea_sense_adapter_attr(struct ehea_ goto out_herr; } - /* Determine the number of available logical ports -* by counting the child nodes of the lhea OFDT entry -*/ - adapter->num_ports = 0; - lhea_dn = of_find_node_by_name(lhea_dn, "lhea"); - do { - eth_dn = of_get_next_child(lhea_dn, eth_dn); - if (eth_dn) - adapter->num_ports++; - } while ( eth_dn ); - of_node_put(lhea_dn); - adapter->max_mc_mac = cb->max_mc_mac - 1; ret = 0; @@ -2313,79 +2299,150 @@ out: return ret; } -static int ehea_setup_single_port(struct ehea_port *port, - struct device_node *dn) +int ehea_get_jumboframe_status(struct ehea_port *port, int *jumbo) { - int ret; - u64 hret; - struct net_device *dev = port->netdev; - struct ehea_adapter *adapter = port->adapter; struct hcp_ehea_port_cb4 *cb4; - u32 *dn_log_port_id; - int jumbo = 0; - - sema_init(&port->port_lock, 1); - port->state = EHEA_PORT_DOWN; - port->sig_comp_iv = sq_entries / 10; - - if (!dn) { - ehea_error("bad device node: dn=%p", dn); - ret = -EINVAL; - goto out; - } - - port->of_dev_node = dn; - - /* Determine logical port id */ - dn_log_port_id = (u32*)get_property(dn, "ibm,hea-port-no", NULL); - - if (!dn_log_port_id) { - ehea_error("bad device node: dn_log_port_id=%p", - dn_log_port_id); - ret = -EINVAL; - goto out; - } - port->logical_port_id = *dn_log_port_id; - - port->mc_list = kzalloc(sizeof(struct ehea_mc_list), GFP_KERNEL); - if (!port->mc_list) { - ret = -ENOMEM; - goto out; - } - - INIT_LIST_HEAD(&port->mc_list->list); + u64 hret; + int ret = 0; - ret = ehea_sense_port_attr(port); - if (ret) - goto out; + *jumbo = 0; - /* Enable Jumbo frames */ + /* (Try to) enable *jumbo frames */ cb4 = kzalloc(PAGE_SIZE, GFP_KERNEL); if (!cb4) { ehea_error("no mem for cb4"); + ret = -ENOMEM; + goto out; } else { - hret = ehea_h_query_ehea_port(adapter->handle, + hret = ehea_h_query_ehea_port(port->adapter->handle, port->logical_port_id, H_PORT_CB4, H_PORT_CB4_J
[PATCH 0/2] ehea: dynamic port & SMP support
Hi, this version has the issues fixed which were mentioned by Patrick McHardy. The patch set includes two patches against linux-2.6.21-rc1: - dynamic add / remove port: Interface has been discussed and approved by John Rose (see: http://www.spinics.net/lists/netdev/msg25327.html) - NAPI multi queue TX/RX path for SMP: Integrated comments from mailing list (R. Dreier) As soon as discussions about "splitting NAPI from netdevice" have settled and this functionality is in kernel, we'll provide a patch for the new interface. (see: http://www.spinics.net/lists/netdev/msg25647.html) please apply. Jan-Bernd Signed-off-by: Jan-Bernd Themann <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] pktgen: fix device name handling
Yes it seems be handle dev name change. So configuration scripts should use ifindex now :) Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Cheers. --ro Stephen Hemminger writes: > Since devices can change name and other wierdness, don't hold onto > a copy of device name, instead use pointer to output device. > > Fix a couple of leaks in error handling path as well. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > --- > net/core/pktgen.c | 137 > +++--- > 1 file changed, 70 insertions(+), 67 deletions(-) > > --- pktgen.orig/net/core/pktgen.c2007-02-27 12:08:58.0 -0800 > +++ pktgen/net/core/pktgen.c 2007-02-27 12:11:32.0 -0800 > @@ -210,15 +210,11 @@ > }; > > struct pktgen_dev { > - > /* > * Try to keep frequent/infrequent used vars. separated. > */ > - > -char ifname[IFNAMSIZ]; > -char result[512]; > - > -struct pktgen_thread *pg_thread;/* the owner */ > +struct proc_dir_entry *entry; /* proc file */ > +struct pktgen_thread *pg_thread;/* the owner */ > struct list_head list; /* Used for chaining in the thread's > run-queue */ > > int running;/* if this changes to false, the test will stop > */ > @@ -345,6 +341,8 @@ > unsigned cflows;/* Concurrent flows (config) */ > unsigned lflow; /* Flow length (config) */ > unsigned nflows;/* accumulated flows (stats) */ > + > +char result[512]; > }; > > struct pktgen_hdr { > @@ -497,7 +495,7 @@ > static int pktgen_stop_device(struct pktgen_dev *pkt_dev); > static void pktgen_stop(struct pktgen_thread *t); > static void pktgen_clear_counters(struct pktgen_dev *pkt_dev); > -static int pktgen_mark_device(const char *ifname); > + > static unsigned int scan_ip6(const char *s, char ip[16]); > static unsigned int fmt_ip6(char *s, const char ip[16]); > > @@ -591,7 +589,7 @@ > " frags: %d delay: %u clone_skb: %d ifname: %s\n", > pkt_dev->nfrags, > 1000 * pkt_dev->delay_us + pkt_dev->delay_ns, > - pkt_dev->clone_skb, pkt_dev->ifname); > + pkt_dev->clone_skb, pkt_dev->odev->name); > > seq_printf(seq, " flows: %u flowlen: %u\n", pkt_dev->cflows, > pkt_dev->lflow); > @@ -1682,13 +1680,13 @@ > if_lock(t); > list_for_each_entry(pkt_dev, &t->if_list, list) > if (pkt_dev->running) > -seq_printf(seq, "%s ", pkt_dev->ifname); > +seq_printf(seq, "%s ", pkt_dev->odev->name); > > seq_printf(seq, "\nStopped: "); > > list_for_each_entry(pkt_dev, &t->if_list, list) > if (!pkt_dev->running) > -seq_printf(seq, "%s ", pkt_dev->ifname); > +seq_printf(seq, "%s ", pkt_dev->odev->name); > > if (t->result[0]) > seq_printf(seq, "\nResult: %s\n", t->result); > @@ -1834,12 +1832,11 @@ > /* > * mark a device for removal > */ > -static int pktgen_mark_device(const char *ifname) > +static void pktgen_mark_device(const char *ifname) > { > struct pktgen_dev *pkt_dev = NULL; > const int max_tries = 10, msec_per_try = 125; > int i = 0; > -int ret = 0; > > mutex_lock(&pktgen_thread_lock); > pr_debug("pktgen: pktgen_mark_device marking %s for removal\n", ifname); > @@ -1860,32 +1857,49 @@ > printk("pktgen_mark_device: timed out after waiting " > "%d msec for device %s to be removed\n", > msec_per_try * i, ifname); > -ret = 1; > break; > } > > } > > mutex_unlock(&pktgen_thread_lock); > +} > > -return ret; > +static void pktgen_change_name(struct net_device *dev) > +{ > +struct pktgen_thread *t; > + > +list_for_each_entry(t, &pktgen_threads, th_list) { > +struct pktgen_dev *pkt_dev; > + > +list_for_each_entry(pkt_dev, &t->if_list, list) { > +if (pkt_dev->odev != dev) > +continue; > + > +remove_proc_entry(pkt_dev->entry->name, pg_proc_dir); > + > +pkt_dev->entry = create_proc_entry(dev->name, 0600, > + pg_proc_dir); > +if (!pkt_dev->entry) > +printk(KERN_ERR "pktgen: can't move proc " > + " entry for '%s'\n", dev->name); > +break; > +} > +} > } > > static int pktgen_device_event(struct notifier_block *unused, > unsigned long event, void *ptr) > { > -struc
[PATCH 3/4] pktgen: don't use __constant_htonl()
OK! Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Cheers. --ro Stephen Hemminger writes: > The existing htonl() macro is smart enough to do the same code as > using __constant_htonl() and it looks cleaner. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > --- > net/core/pktgen.c | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > --- pktgen.orig/net/core/pktgen.c2007-02-26 14:40:31.0 -0800 > +++ pktgen/net/core/pktgen.c 2007-02-26 15:36:38.0 -0800 > @@ -167,7 +167,7 @@ > #define LAT_BUCKETS_MAX 32 > #define IP_NAME_SZ 32 > #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */ > -#define MPLS_STACK_BOTTOM __constant_htonl(0x0100) > +#define MPLS_STACK_BOTTOM htonl(0x0100) > > /* Device flag bits */ > #define F_IPSRC_RND (1<<0)/* IP-Src Random */ > @@ -2297,7 +2297,7 @@ > int datalen, iplen; > struct iphdr *iph; > struct pktgen_hdr *pgh = NULL; > -__be16 protocol = __constant_htons(ETH_P_IP); > +__be16 protocol = htons(ETH_P_IP); > __be32 *mpls; > __be16 *vlan_tci = NULL; /* Encapsulates priority and > VLAN ID */ > __be16 *vlan_encapsulated_proto = NULL; /* packet type ID field (or > len) for VLAN tag */ > @@ -2306,10 +2306,10 @@ > > > if (pkt_dev->nr_labels) > -protocol = __constant_htons(ETH_P_MPLS_UC); > +protocol = htons(ETH_P_MPLS_UC); > > if (pkt_dev->vlan_id != 0x) > -protocol = __constant_htons(ETH_P_8021Q); > +protocol = htons(ETH_P_8021Q); > > /* Update any of the values, used when we're incrementing various > * fields. > @@ -2341,14 +2341,14 @@ > pkt_dev->svlan_cfi, > pkt_dev->svlan_p); > svlan_encapsulated_proto = (__be16 *)skb_put(skb, > sizeof(__be16)); > -*svlan_encapsulated_proto = > __constant_htons(ETH_P_8021Q); > +*svlan_encapsulated_proto = htons(ETH_P_8021Q); > } > vlan_tci = (__be16 *)skb_put(skb, sizeof(__be16)); > *vlan_tci = build_tci(pkt_dev->vlan_id, >pkt_dev->vlan_cfi, >pkt_dev->vlan_p); > vlan_encapsulated_proto = (__be16 *)skb_put(skb, > sizeof(__be16)); > -*vlan_encapsulated_proto = __constant_htons(ETH_P_IP); > +*vlan_encapsulated_proto = htons(ETH_P_IP); > } > > iph = (struct iphdr *)skb_put(skb, sizeof(struct iphdr)); > @@ -2635,7 +2635,7 @@ > int datalen; > struct ipv6hdr *iph; > struct pktgen_hdr *pgh = NULL; > -__be16 protocol = __constant_htons(ETH_P_IPV6); > +__be16 protocol = htons(ETH_P_IPV6); > __be32 *mpls; > __be16 *vlan_tci = NULL; /* Encapsulates priority and > VLAN ID */ > __be16 *vlan_encapsulated_proto = NULL; /* packet type ID field (or > len) for VLAN tag */ > @@ -2643,10 +2643,10 @@ > __be16 *svlan_encapsulated_proto = NULL; /* packet type ID field (or > len) for SVLAN tag */ > > if (pkt_dev->nr_labels) > -protocol = __constant_htons(ETH_P_MPLS_UC); > +protocol = htons(ETH_P_MPLS_UC); > > if (pkt_dev->vlan_id != 0x) > -protocol = __constant_htons(ETH_P_8021Q); > +protocol = htons(ETH_P_8021Q); > > /* Update any of the values, used when we're incrementing various > * fields. > @@ -2677,14 +2677,14 @@ > pkt_dev->svlan_cfi, > pkt_dev->svlan_p); > svlan_encapsulated_proto = (__be16 *)skb_put(skb, > sizeof(__be16)); > -*svlan_encapsulated_proto = > __constant_htons(ETH_P_8021Q); > +*svlan_encapsulated_proto = htons(ETH_P_8021Q); > } > vlan_tci = (__be16 *)skb_put(skb, sizeof(__be16)); > *vlan_tci = build_tci(pkt_dev->vlan_id, >pkt_dev->vlan_cfi, >pkt_dev->vlan_p); > vlan_encapsulated_proto = (__be16 *)skb_put(skb, > sizeof(__be16)); > -*vlan_encapsulated_proto = __constant_htons(ETH_P_IPV6); > +*vlan_encapsulated_proto = htons(ETH_P_IPV6); > } > > iph = (struct ipv6hdr *)skb_put(skb, sizeof(struct ipv6hdr)); > @@ -2710,7 +2710,7 @@ > udph->len = htons(datalen + sizeof(struct udphdr)); > udph->check = 0;/* No checksum */ > > -*(__be32 *) iph = __constant_htonl(0x6000); /* Version + flow */ > +*(__be32 *) iph = htonl(0x6000);/* Version + flow */ > > if (pkt_dev-
[PATCH 2/4] pktgen: use random32
Thanks! It seems like network code has preference for net_random() but they are the same now. Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> Cheers. --ro Stephen Hemminger writes: > Can use random32() now. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > --- > net/core/pktgen.c | 52 > +++- > 1 file changed, 19 insertions(+), 33 deletions(-) > > --- pktgen.orig/net/core/pktgen.c2007-02-26 14:34:36.0 -0800 > +++ pktgen/net/core/pktgen.c 2007-02-26 14:39:53.0 -0800 > @@ -464,17 +464,6 @@ > return tmp; > } > > -static inline u32 pktgen_random(void) > -{ > -#if 0 > -__u32 n; > -get_random_bytes(&n, 4); > -return n; > -#else > -return net_random(); > -#endif > -} > - > static inline __u64 getCurMs(void) > { > struct timeval tv; > @@ -2091,7 +2080,7 @@ > int flow = 0; > > if (pkt_dev->cflows) { > -flow = pktgen_random() % pkt_dev->cflows; > +flow = random32() % pkt_dev->cflows; > > if (pkt_dev->flows[flow].count > pkt_dev->lflow) > pkt_dev->flows[flow].count = 0; > @@ -2103,7 +2092,7 @@ > __u32 tmp; > > if (pkt_dev->flags & F_MACSRC_RND) > -mc = pktgen_random() % (pkt_dev->src_mac_count); > +mc = random32() % pkt_dev->src_mac_count; > else { > mc = pkt_dev->cur_src_mac_offset++; > if (pkt_dev->cur_src_mac_offset > > @@ -2129,7 +2118,7 @@ > __u32 tmp; > > if (pkt_dev->flags & F_MACDST_RND) > -mc = pktgen_random() % (pkt_dev->dst_mac_count); > +mc = random32() % pkt_dev->dst_mac_count; > > else { > mc = pkt_dev->cur_dst_mac_offset++; > @@ -2156,24 +2145,23 @@ > for(i = 0; i < pkt_dev->nr_labels; i++) > if (pkt_dev->labels[i] & MPLS_STACK_BOTTOM) > pkt_dev->labels[i] = MPLS_STACK_BOTTOM | > - ((__force __be32)pktgen_random() & > + ((__force __be32)random32() & >htonl(0x000f)); > } > > if ((pkt_dev->flags & F_VID_RND) && (pkt_dev->vlan_id != 0x)) { > -pkt_dev->vlan_id = pktgen_random() % 4096; > +pkt_dev->vlan_id = random32() & (4096-1); > } > > if ((pkt_dev->flags & F_SVID_RND) && (pkt_dev->svlan_id != 0x)) { > -pkt_dev->svlan_id = pktgen_random() % 4096; > +pkt_dev->svlan_id = random32() & (4096 - 1); > } > > if (pkt_dev->udp_src_min < pkt_dev->udp_src_max) { > if (pkt_dev->flags & F_UDPSRC_RND) > -pkt_dev->cur_udp_src = > -((pktgen_random() % > - (pkt_dev->udp_src_max - pkt_dev->udp_src_min)) + > - pkt_dev->udp_src_min); > +pkt_dev->cur_udp_src = random32() % > +(pkt_dev->udp_src_max - pkt_dev->udp_src_min) > ++ pkt_dev->udp_src_min; > > else { > pkt_dev->cur_udp_src++; > @@ -2184,10 +2172,9 @@ > > if (pkt_dev->udp_dst_min < pkt_dev->udp_dst_max) { > if (pkt_dev->flags & F_UDPDST_RND) { > -pkt_dev->cur_udp_dst = > -((pktgen_random() % > - (pkt_dev->udp_dst_max - pkt_dev->udp_dst_min)) + > - pkt_dev->udp_dst_min); > +pkt_dev->cur_udp_dst = random32() % > +(pkt_dev->udp_dst_max - pkt_dev->udp_dst_min) > ++ pkt_dev->udp_dst_min; > } else { > pkt_dev->cur_udp_dst++; > if (pkt_dev->cur_udp_dst >= pkt_dev->udp_dst_max) > @@ -2202,7 +2189,7 @@ > saddr_max))) { > __u32 t; > if (pkt_dev->flags & F_IPSRC_RND) > -t = ((pktgen_random() % (imx - imn)) + imn); > +t = random32() % (imx - imn) + imn; > else { > t = ntohl(pkt_dev->cur_saddr); > t++; > @@ -2223,14 +2210,13 @@ > __be32 s; > if (pkt_dev->flags & F_IPDST_RND) { > > -t = pktgen_random() % (imx - imn) + imn; > +t = random32() % (imx - imn) + imn; > s = htonl(t);
[PATCH 1/4] pktgen: use pr_debug
Thanks! Signed-off-by: Robert Olsson <[EMAIL PROTECTED]> --ro Stephen Hemminger writes: > Remove private debug macro and replace with standard version > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > --- > net/core/pktgen.c | 34 +++--- > 1 file changed, 15 insertions(+), 19 deletions(-) > > --- pktgen.orig/net/core/pktgen.c2007-02-26 13:21:54.0 -0800 > +++ pktgen/net/core/pktgen.c 2007-02-26 13:22:04.0 -0800 > @@ -163,9 +163,6 @@ > > #define VERSION "pktgen v2.68: Packet Generator for packet performance > testing.\n" > > -/* #define PG_DEBUG(a) a */ > -#define PG_DEBUG(a) > - > /* The buckets are exponential in 'width' */ > #define LAT_BUCKETS_MAX 32 > #define IP_NAME_SZ 32 > @@ -1856,8 +1853,7 @@ > int ret = 0; > > mutex_lock(&pktgen_thread_lock); > -PG_DEBUG(printk("pktgen: pktgen_mark_device marking %s for removal\n", > -ifname)); > +pr_debug("pktgen: pktgen_mark_device marking %s for removal\n", ifname); > > while (1) { > > @@ -1866,8 +1862,8 @@ > break; /* success */ > > mutex_unlock(&pktgen_thread_lock); > -PG_DEBUG(printk("pktgen: pktgen_mark_device waiting for %s " > -"to disappear\n", ifname)); > +pr_debug("pktgen: pktgen_mark_device waiting for %s " > +"to disappear\n", ifname); > schedule_timeout_interruptible(msecs_to_jiffies(msec_per_try)); > mutex_lock(&pktgen_thread_lock); > > @@ -2847,7 +2843,7 @@ > struct pktgen_dev *pkt_dev; > int started = 0; > > -PG_DEBUG(printk("pktgen: entering pktgen_run. %p\n", t)); > +pr_debug("pktgen: entering pktgen_run. %p\n", t); > > if_lock(t); > list_for_each_entry(pkt_dev, &t->if_list, list) { > @@ -2879,7 +2875,7 @@ > { > struct pktgen_thread *t; > > -PG_DEBUG(printk("pktgen: entering pktgen_stop_all_threads_ifs.\n")); > +pr_debug("pktgen: entering pktgen_stop_all_threads_ifs.\n"); > > mutex_lock(&pktgen_thread_lock); > > @@ -2947,7 +2943,7 @@ > { > struct pktgen_thread *t; > > -PG_DEBUG(printk("pktgen: entering pktgen_run_all_threads.\n")); > +pr_debug("pktgen: entering pktgen_run_all_threads.\n"); > > mutex_lock(&pktgen_thread_lock); > > @@ -3039,7 +3035,7 @@ > { > struct pktgen_dev *pkt_dev; > > -PG_DEBUG(printk("pktgen: entering pktgen_stop\n")); > +pr_debug("pktgen: entering pktgen_stop\n"); > > if_lock(t); > > @@ -3063,7 +3059,7 @@ > struct list_head *q, *n; > struct pktgen_dev *cur; > > -PG_DEBUG(printk("pktgen: entering pktgen_rem_one_if\n")); > +pr_debug("pktgen: entering pktgen_rem_one_if\n"); > > if_lock(t); > > @@ -3092,7 +3088,7 @@ > > /* Remove all devices, free mem */ > > -PG_DEBUG(printk("pktgen: entering pktgen_rem_all_ifs\n")); > +pr_debug("pktgen: entering pktgen_rem_all_ifs\n"); > if_lock(t); > > list_for_each_safe(q, n, &t->if_list) { > @@ -3275,7 +3271,7 @@ > > t->pid = current->pid; > > -PG_DEBUG(printk("pktgen: starting pktgen/%d: pid=%d\n", cpu, > current->pid)); > +pr_debug("pktgen: starting pktgen/%d: pid=%d\n", cpu, current->pid); > > max_before_softirq = t->max_before_softirq; > > @@ -3336,13 +3332,13 @@ > set_current_state(TASK_INTERRUPTIBLE); > } > > -PG_DEBUG(printk("pktgen: %s stopping all device\n", t->tsk->comm)); > +pr_debug("pktgen: %s stopping all device\n", t->tsk->comm); > pktgen_stop(t); > > -PG_DEBUG(printk("pktgen: %s removing all device\n", t->tsk->comm)); > +pr_debug("pktgen: %s removing all device\n", t->tsk->comm); > pktgen_rem_all_ifs(t); > > -PG_DEBUG(printk("pktgen: %s removing thread.\n", t->tsk->comm)); > +pr_debug("pktgen: %s removing thread.\n", t->tsk->comm); > pktgen_rem_thread(t); > > return 0; > @@ -3361,7 +3357,7 @@ > } > > if_unlock(t); > -PG_DEBUG(printk("pktgen: find_dev(%s) returning %p\n", ifname, > pkt_dev)); > +pr_debug("pktgen: find_dev(%s) returning %p\n", ifname, pkt_dev); > return pkt_dev; > } > > @@ -3530,7 +3526,7 @@ > struct pktgen_dev *pkt_dev) > { > > -PG_DEBUG(printk("pktgen: remove_device pkt_dev=%p\n", pkt_dev)); > +pr_debug("pktgen: remove_device pkt_dev=%p\n", pkt_dev); > > if (pkt_dev->running) { > printk("pktgen:WARNING: trying to remove a running interface, > stopping it now.\n"); > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordo
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Tue, Feb 27, 2007 at 05:27:41PM -0800, Jean Tourrilhes wrote: > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c > --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800 > +++ linux/drivers/base/class.c2007-02-27 15:52:37.0 -0800 > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev This function is not in the 2.6.21-rc2 kernel, so you might want to rework this patch a bit :) Also, it's userspace that causes the rename to happen, so it knows it did it, why should the kernel have to emit a message to tell userspace again what just happened? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix broken RBTX4927 support in ne.c
There are some ifdefs for RBTX4927, but need some more bits. Signed-off-by: Atsushi Nemoto <[EMAIL PROTECTED]> --- diff --git a/drivers/net/ne.c b/drivers/net/ne.c index a5c4199..02cc78b 100644 --- a/drivers/net/ne.c +++ b/drivers/net/ne.c @@ -55,8 +55,10 @@ static const char version2[] = #include #include -#if defined(CONFIG_TOSHIBA_RBTX4927) || defined(CONFIG_TOSHIBA_RBTX4938) +#if defined(CONFIG_TOSHIBA_RBTX4938) #include +#elif defined(CONFIG_TOSHIBA_RBTX4927) +#include #endif #include "8390.h" @@ -229,6 +231,9 @@ struct net_device * __init ne_probe(int unit) #ifdef CONFIG_TOSHIBA_RBTX4938 dev->base_addr = RBTX4938_RTL_8019_BASE; dev->irq = RBTX4938_RTL_8019_IRQ; +#elif defined(CONFIG_TOSHIBA_RBTX4927) + dev->base_addr = RBTX4927_RTL_8019_BASE; + dev->irq = RBTX4927_RTL_8019_IRQ; #endif err = do_ne_probe(dev); if (err) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
Eric Dumazet wrote: On Wednesday 28 February 2007 15:23, John wrote: Eric Dumazet wrote: John wrote: I know it's possible to have Linux timestamp incoming datagrams as soon as they are received, then for one to retrieve this timestamp later with an ioctl command or a recvmsg call. Has it ever been proposed to modify struct skb_timeval to hold nanosecond stamps instead of just microsecond stamps? Then make the improved precision somehow available to user space. Most modern NICS are able to delay packet delivery, in order to reduce number of interrupts and benefit from better cache hits. You are referring to NAPI interrupt mitigation, right? Nope; I am referring to hardware features. NAPI is software. See ethtool -c eth0 # ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 100 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 300 rx-frames: 60 rx-usecs-irq: 300 rx-frames-irq: 60 tx-usecs: 200 tx-frames: 53 tx-usecs-irq: 200 tx-frames-irq: 53 You can see on this setup, rx interrupts can be delayed up to 300 us (up to 60 packets might be delayed) One can disable interrupt mitigation. Your argument that it introduces latency therefore becomes irrelevant. POSIX is moving to nanoseconds interfaces. http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html You snipped too much. I also wrote: struct timeval and struct timespec take as much space (64 bits). If the hardware can indeed manage sub-microsecond accuracy, a struct timeval forces the kernel to discard valuable information. The fact that you are able to give nanosecond timestamps inside kernel is not sufficient. It is necessary of course, but not sufficient. This precision is OK to time locally generated events. The moment you ask a 'nanosecond' timestamp, it's usually long before/after the real event. If you rely on nanosecond precision on network packets, then something is wrong with your algo. Even rt patches wont make sure your cpu caches are pre-filled, or that the routers/links between your machines are not busy. A cache miss cost 40 ns for example. A typical interrupt handler or rx processing can trigger 100 cache misses, or not at all if cache is hot. Consider an idle Linux 2.6.20-rt8 system, equipped with a single PCI-E gigabit Ethernet NIC, running on a modern CPU (e.g. Core 2 Duo E6700). All this system does is time stamp 1000 packets per second. Are you claiming that this platform *cannot* handle most packets within less than 1 microsecond of their arrival? If there are platforms that can achieve sub-microsecond precision, and if it is not more expensive to support nanosecond resolution (I said resolution not precision), then it makes sense to support nanosecond resolution in Linux. Right? You said that rt gives highest priority to interrupt handlers : If you have several nics, what will happen if you receive packets on both nics, or if the NIC interrupt happens in the same time than timer interrupt ? One timestamp will be wrong for sure. Again, this is irrelevant. We are discussing whether it would make sense to support sub-microsecond resolution. If there is one platform that can achieve sub-microsecond precision, there is a need for sub-microsecond resolution. As long as we are changing the resolution, we might as well use something standard like struct timespec. For sure we could timestamp packets with nanosecond resolution, and eventually with MONOTONIC value too, but it will give you (and others) false confidence on the real precision. us timestamps are already wrong... IMHO, this is not true for all platforms. Regards. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: need some help on a backport of r8169
Francois Romieu a écrit, le Tue 27 Feb 2007 à 10:24:00PM : > Please don't do > 80 columns line again. I am not a tty. Sorry. I turn on auto-fill-mode. > > > There are 59 r8169 related patches between 2.6.12 and current. Only a few > > > of those break the API. I'll give it a try tomorrow evening. > > Hmm... you wrote this mail at 00 h 48 this morning, what did you mean by > > tomorrow evening? > > 1. I mean now. See: >http://www.fr.zoreil.com/people/francois/backport/r8169/20070227-00 >(big patch or serie of 54 pieces). > > 2. Compiled, untested. You know what you have to do. > > 3. Due to the changes in the driver, one could hope that the link will be >autocorrectly set. Please give it a try before using ethtool/mii-tool. > > 4. The patchkit does not include the latest changes/bugfixes. They are >still experimental but some users have a poor 8168 experience without >them. YMMV. Please send a complete dmesg, lspci -vvx and the brand of >the motherboard if the driver stops working randomly. Thank you François. Ok, I tried it, had to modify it a little because it was against 2.6.11 kernel and not 2.6.11.11, anyway I had not a big work to do for this. I also rebuilt completely my kernel instead of just doing "make modules", and I fixed some bugs that I previously introduced myself for the motherboard chipset. The result is as follows : I boot my new kernel : the r8169 driver is automatically loaded and find the network card and gives me an eth0. I do a ifconfig, eth0 is up, with an IP and RX and TX are not 0. The problem comes here, I do a ping and it seems to have just the time to make the DNS resolution but not further. When I do a new ifconfig, the TX dropped is not 0 anymore. Then I can turn up and down my interface, I won't be able to ping anything. Ah... poor me who thought that the RTL8168 was just like the RTL8169 with a pci express interface... It seems that a PCI-Express RTL8169 also exist right? Ok, one more precision, I didn't allow pci-express in my kernel, I just noticed it. I'm recompiling my kernel with it to see if it makes any change, but, with the 2.6.20 kernel, everything works well, I can ping what I want and pci-express is also not picked up. My kernel is compiled, with pci-express enabled, no changes. Do you think my problem is the one you mentionned above, without the experimental patches? Ok, here are my hardware informations : my motherboard uses an ICH7 chipset, i velieve it's an I945G or something like that. To make it work completely, I use the following patch (cannot put it anymore on an URL, I'm sorry) : --- ./drivers/ide/pci/piix.c.orig 2005-05-27 07:06:46.0 +0200 +++ ./drivers/ide/pci/piix.c2007-02-28 16:18:37.241527210 +0100 @@ -133,7 +133,9 @@ case PCI_DEVICE_ID_INTEL_82801EB_11: case PCI_DEVICE_ID_INTEL_ESB_2: case PCI_DEVICE_ID_INTEL_ICH6_19: + case PCI_DEVICE_ID_INTEL_ICH6_3: case PCI_DEVICE_ID_INTEL_ICH7_21: + case PCI_DEVICE_ID_INTEL_ICH7_2: mode = 3; break; /* UDMA 66 capable */ @@ -446,7 +448,9 @@ case PCI_DEVICE_ID_INTEL_82801E_11: case PCI_DEVICE_ID_INTEL_ESB_2: case PCI_DEVICE_ID_INTEL_ICH6_19: + case PCI_DEVICE_ID_INTEL_ICH6_3: case PCI_DEVICE_ID_INTEL_ICH7_21: + case PCI_DEVICE_ID_INTEL_ICH7_2: { unsigned int extra = 0; pci_read_config_dword(dev, 0x54, &extra); @@ -572,6 +576,8 @@ /* 20 */ DECLARE_PIIX_DEV("ICH6"), /* 21 */ DECLARE_PIIX_DEV("ICH7"), /* 22 */ DECLARE_PIIX_DEV("ICH4"), + /* 23 */ DECLARE_PIIX_DEV("ICH6"), + /* 24 */ DECLARE_PIIX_DEV("ICH7"), }; /** @@ -647,6 +653,8 @@ { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_19, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 20}, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_21, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 21}, { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82801DB_1, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 22}, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_3, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 23}, + { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_2, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 24}, { 0, }, }; MODULE_DEVICE_TABLE(pci, piix_pci_tbl); And next you can find my lspci, dmesg, and even dmidecode outputs. Thank you again for your help. lspci: 00:00.0 Host bridge: Intel Corp.: Unknown device 2770 (rev 02) Subsystem: Unknown device 1631:e015 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- [disabled] Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Address: Data:
Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces
Daniel Lezcano <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted >> >> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate >> a network device is local to a single network namespace and >> should never be moved. Useful for pseudo devices that we >> need an instance in each network namespace (like the loopback >> device) and for any device we find that cannot handle multiple >> network namespaces so we may trap them in the initial network >> namespace. >> >> This patch introduces the function dev_change_net_namespace >> a function used to move a network device from one network >> namespace to another. To the network device nothing >> special appears to happen, to the components of the network >> stack it appears as if the network device was unregistered >> in the network namespace it is in, and a new device >> was registered in the network namespace the device >> was moved to. >> >> This patch sets up a namespace device destructor that >> upon the exit of a network namespace moves all of the >> movable network devices to the initial network namespace >> so they are not lost. >> > If you: > * create etun0/etun1 > * create a namespace > * move etun1 to this namespace > * rename the etun1 to eth0 > * kill the namespace > > the former network device etun1 will be lost if you have in your parent > namespace an interface eth0 because it will conflict. > Perhaps, the first name should be restored before moving the device back to > the > initial network namespace ? Restoration of a previous name is no guarantee of anything. Someone may have renamed the some other interface etun1 in the original network namespace. However if you look closely at the code. You will discover that if it can't keep the same name it will rename the device as it switches namespaces. In particular it will become devN where N is replaced by some unused number. That is what the pat parameter to dev_change_net_namespace is about. I'm not exactly thrilled about the generic name but the code should work, and I don't know if there is a name that makes better sense. > -- Daniel > > ps : nice patchset Thanks. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 22/31] net: Add network namespace clone support.
Daniel Lezcano <[EMAIL PROTECTED]> writes: >> + >> +mutex_lock(&net_mutex); >> +err = setup_net(new_net); >> +if (err) >> +goto out_unlock; >> > Should we "net_free" in case of error ? Oops. Yes we should. Thanks. >> +net_lock(); >> +net_list_append(new_net); >> +net_unlock(); >> + >> +tsk->nsproxy->net_ns = new_net; >> + >> +out_unlock: >> +mutex_unlock(&net_mutex); net_free(new_net); >> +out: >> +put_net(old_net); >> +return err; >> +} >> + >> Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
On Wednesday 28 February 2007 15:23, John wrote: > Eric Dumazet wrote: > >> John wrote: > >>> I know it's possible to have Linux timestamp incoming datagrams as soon > >>> as they are received, then for one to retrieve this timestamp later > >>> with an ioctl command or a recvmsg call. > >> > >> Has it ever been proposed to modify struct skb_timeval to hold > >> nanosecond stamps instead of just microsecond stamps? Then make the > >> improved precision somehow available to user space. > > > > Most modern NICS are able to delay packet delivery, in order to reduce > > number of interrupts and benefit from better cache hits. > > You are referring to NAPI interrupt mitigation, right? Nope; I am referring to hardware features. NAPI is software. See ethtool -c eth0 # ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 100 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 300 rx-frames: 60 rx-usecs-irq: 300 rx-frames-irq: 60 tx-usecs: 200 tx-frames: 53 tx-usecs-irq: 200 tx-frames-irq: 53 You can see on this setup, rx interrupts can be delayed up to 300 us (up to 60 packets might be delayed) > > POSIX is moving to nanoseconds interfaces. > http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html The fact that you are able to give nanosecond timestamps inside kernel is not sufficient. It is necessary of course, but not sufficient. This precision is OK to time locally generated events. The moment you ask a 'nanosecond' timestamp, it's usually long before/after the real event. If you rely on nanosecond precision on network packets, then something is wrong with your algo. Even rt patches wont make sure your cpu caches are pre-filled, or that the routers/links between your machines are not busy. A cache miss cost 40 ns for example. A typical interrupt handler or rx processing can trigger 100 cache misses, or not at all if cache is hot. You said that rt gives highest priority to interrupt handlers : If you have several nics, what will happen if you receive packets on both nics, or if the NIC interrupt happens in the same time than timer interrupt ? One timestamp will be wrong for sure. For sure we could timestamp packets with nanosecond resolution, and eventually with MONOTONIC value too, but it will give you (and others) false confidence on the real precision. us timestamps are already wrong... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] spidernet: Fix problem sending IP fragments
Hi, I found out that the spidernet-driver is unable to send fragmented IP frames. Let me just recall the basic structure of "normal" UDP/IP/Ethernet frames (that actually work): - It starts with the Ethernet header (dest MAC, src MAC, etc.) - The next part is occupied by the IP header (version info, length of packet, id=0, fragment offset=0, checksum, from / to address, etc.) - Then comes the UDP header (src / dest port, length, checksum) - Actual payload - Ethernet checksum Now what's different for IP fragment: - The IP header has id set to some value (same for all fragments), offset is set appropriately (i.e. 0 for first fragment, following according to size of other fragments), size is the length of the frame. - UDP header is unchanged. I.e. length is according to full UDP datagram, not just the part within the actual frame! But this is only true within the first frame: all following frames don't have a valid UDP-header at all. The spidernet silicon seems to be quite intelligent: It's able to compute (IP / UDP / Ethernet) checksums on the fly and tests if frames are conforming to RFC -- at least conforming to RFC on complete frames. But IP fragments are different as explained above: I.e. for IP fragments containing part of a UDP datagram it sees incompatible length in the headers for IP and UDP in the first frame and, thus, skips this frame. But the content *is* correct for IP fragments. For all following frames it finds (most probably) no valid UDP header at all. But this *is* also correct for IP fragments. The Linux IP-stack seems to be clever in this point. It expects the spidernet to calculate the checksum (since the module claims to be able to do so) and marks the skb's for "normal" frames accordingly (ip_summed set to CHECKSUM_HW). But for the IP fragments it does not expect the driver to be capable to handle the frames appropriately. Thus all checksums are allready computed. This is also flaged within the skb (ip_summed set to CHECKSUM_NONE). Unfortunately the spidernet driver ignores that hints. It tries to send the IP fragments of UDP datagrams as normal UDP/IP frames. Since they have different structure the silicon detects them the be not "well-formed" and skips them. The following one-liner against 2.6.21-rc2 changes this behavior. If the IP-stack claims to have done the checksumming, the driver should not try to checksum (and analyze) the frame but send it as is. Signed-off-by: Norbert Eicker <[EMAIL PROTECTED]> --- diff --git a/drivers/net/spider_net.c b/drivers/net/spider_net.c index 3b91af8..31507ac 100644 --- a/drivers/net/spider_net.c +++ b/drivers/net/spider_net.c @@ -719,7 +719,7 @@ spider_net_prepare_tx_descr(struct spide SPIDER_NET_DESCR_CARDOWNED | SPIDER_NET_DMAC_NOCS; spin_unlock_irqrestore(&chain->lock, flags); - if (skb->protocol == htons(ETH_P_IP)) + if (skb->protocol == htons(ETH_P_IP) && skb->ip_summed == CHECKSUM_HW) switch (skb->nh.iph->protocol) { case IPPROTO_TCP: hwdescr->dmac_cmd_status |= SPIDER_NET_DMAC_TCP; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 22/31] net: Add network namespace clone support.
Eric W. Biederman wrote: From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted This patch allows you to create a new network namespace using sys_clone(...). Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> --- include/linux/sched.h|1 + kernel/nsproxy.c | 11 +++ net/core/net_namespace.c | 38 ++ 3 files changed, 50 insertions(+), 0 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 4463735..9e0f91a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -26,6 +26,7 @@ #define CLONE_STOPPED 0x0200 /* Start in stopped state */ #define CLONE_NEWUTS 0x0400 /* New utsname group? */ #define CLONE_NEWIPC 0x0800 /* New ipcs */ +#define CLONE_NEWNET 0x2000 /* New network namespace */ /* * Scheduling policies diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index 4f3c95a..7861c4c 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -20,6 +20,7 @@ #include #include #include +#include struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy); EXPORT_SYMBOL_GPL(init_nsproxy); @@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig) get_ipc_ns(ns->ipc_ns); if (ns->pid_ns) get_pid_ns(ns->pid_ns); + get_net(ns->net_ns); } return ns; @@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk) if (err) goto out_pid; + err = copy_net(flags, tsk); + if (err) + goto out_net; + out: put_nsproxy(old_ns); return err; +out_net: + if (new_ns->pid_ns) + put_pid_ns(new_ns->pid_ns); + out_pid: if (new_ns->ipc_ns) put_ipc_ns(new_ns->ipc_ns); @@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns) put_ipc_ns(ns->ipc_ns); if (ns->pid_ns) put_pid_ns(ns->pid_ns); + put_net(ns->net_ns); kfree(ns); } diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 93e3879..cc56105 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -175,6 +175,44 @@ out_undo: goto out; } +int copy_net(int flags, struct task_struct *tsk) +{ + net_t old_net = tsk->nsproxy->net_ns; + net_t new_net; + int err; + + get_net(old_net); + + if (!(flags & CLONE_NEWNET)) + return 0; + + err = -EPERM; + if (!capable(CAP_SYS_ADMIN)) + goto out; + + err = -ENOMEM; + new_net = net_alloc(); + if (null_net(new_net)) + goto out; + + mutex_lock(&net_mutex); + err = setup_net(new_net); + if (err) + goto out_unlock; Should we "net_free" in case of error ? + + net_lock(); + net_list_append(new_net); + net_unlock(); + + tsk->nsproxy->net_ns = new_net; + +out_unlock: + mutex_unlock(&net_mutex); +out: + put_net(old_net); + return err; +} + void pernet_modcopy(void *pnetdst, const void *src, unsigned long size) { net_t net; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces
Eric W. Biederman wrote: From: Eric W. Biederman <[EMAIL PROTECTED]> - unquoted This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate a network device is local to a single network namespace and should never be moved. Useful for pseudo devices that we need an instance in each network namespace (like the loopback device) and for any device we find that cannot handle multiple network namespaces so we may trap them in the initial network namespace. This patch introduces the function dev_change_net_namespace a function used to move a network device from one network namespace to another. To the network device nothing special appears to happen, to the components of the network stack it appears as if the network device was unregistered in the network namespace it is in, and a new device was registered in the network namespace the device was moved to. This patch sets up a namespace device destructor that upon the exit of a network namespace moves all of the movable network devices to the initial network namespace so they are not lost. If you: * create etun0/etun1 * create a namespace * move etun1 to this namespace * rename the etun1 to eth0 * kill the namespace the former network device etun1 will be lost if you have in your parent namespace an interface eth0 because it will conflict. Perhaps, the first name should be restored before moving the device back to the initial network namespace ? -- Daniel ps : nice patchset - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
Eric Dumazet wrote: John wrote: I know it's possible to have Linux timestamp incoming datagrams as soon as they are received, then for one to retrieve this timestamp later with an ioctl command or a recvmsg call. Has it ever been proposed to modify struct skb_timeval to hold nanosecond stamps instead of just microsecond stamps? Then make the improved precision somehow available to user space. Most modern NICS are able to delay packet delivery, in order to reduce number of interrupts and benefit from better cache hits. You are referring to NAPI interrupt mitigation, right? AFAIU, it is possible to disable this feature. I'm dealing with 200-4000 packets per second. I don't think I'd save much with interrupt mitigation. Please correct any misconception. Then kernel is not realtime and some delays can occur between the hardware interrupt and the very moment we timestamp the packet. If CPU caches are cold, even the instruction fetches could easily add some us. I've applied the real-time patch. http://rt.wiki.kernel.org/index.php/Main_Page This doesn't make Linux hard real-time, but the interrupt handlers can run with the highest priority (even kernel threads are preempted). Enabling nanosecond stamps would be a lie to users, because real accuracy is not nanosecond, but in the order of 10 us (at least) POSIX is moving to nanoseconds interfaces. http://www.opengroup.org/onlinepubs/009695399/functions/clock_settime.html struct timeval and struct timespec take as much space (64 bits). If the hardware can indeed manage sub-microsecond accuracy, a struct timeval forces the kernel to discard valuable information. If you depend on a < 50 us precision, then linux might be the wrong OS for your application. Or maybe you need a NIC that is able to provide a timestamp in the packet itself (well... along with the packet...) , so that kernel latencies are not a problem. Does Linux support NICs that can do that? Regards. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Run-time kfree check for correct cache [plus x86_64 APIC troubles]
On Wed, Feb 28, 2007 at 11:10:54AM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > On Wednesday 28 February 2007 10:02, Evgeniy Polyakov wrote: > > Attached patch detects in run-time things like: > > skb = alloc_skb(); > > kfree(skb); > > > > where provided to kfree pointer does not belong to kmalloc caches. > > It is turned on when slab debug config option is enabled. > > > > When problem is detected, following warning is printed with hint to > > what cache/function should be used instead: > > It would be less expensive to add a flag > #define SLAB_KFREE_NOWARNING 0x0020UL > > And OR this flags into cs->flags of all standard caches created by > kmem_cache_init() from malloc_sizes[]/cache_names[] > > kfree() would then just test this flag. That does not work - my x86_64 test machine fails badly with following patch applied: diff --git a/include/linux/slab.h b/include/linux/slab.h index 1ef822e..acc3cfb 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -32,6 +32,7 @@ typedef struct kmem_cache kmem_cache_t __deprecated; #define SLAB_PANIC 0x0004UL/* Panic if kmem_cache_create() fails */ #define SLAB_DESTROY_BY_RCU0x0008UL/* Defer freeing slabs to RCU */ #define SLAB_MEM_SPREAD0x0010UL/* Spread some memory over cpuset */ +#define SLAB_KFREE_NOWARNING 0x0020UL/* Do not warn if object belongs to this cache and is freed via kfree */ /* Flags passed to a constructor functions */ #define SLAB_CTOR_CONSTRUCTOR 0x001UL /* If not set, then deconstructor */ diff --git a/mm/slab.c b/mm/slab.c index 8fdaffa..313014e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -177,7 +177,8 @@ SLAB_CACHE_DMA | \ SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \ SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \ -SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD) +SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD | \ +SLAB_KFREE_NOWARNING ) #else # define CREATE_MASK (SLAB_HWCACHE_ALIGN | \ SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \ @@ -814,7 +815,7 @@ static size_t slab_mgmt_size(size_t nr_objs, size_t align) * Calculate the number of objects and left-over bytes for a given buffer size. */ static void cache_estimate(unsigned long gfporder, size_t buffer_size, - size_t align, int flags, size_t *left_over, + size_t align, unsigned long flags, size_t *left_over, unsigned int *num) { int nr_objs; @@ -1466,7 +1467,8 @@ void __init kmem_cache_init(void) sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name, sizes[INDEX_AC].cs_size, ARCH_KMALLOC_MINALIGN, - ARCH_KMALLOC_FLAGS|SLAB_PANIC, + ARCH_KMALLOC_FLAGS|SLAB_PANIC| + SLAB_KFREE_NOWARNING, NULL, NULL); if (INDEX_AC != INDEX_L3) { @@ -1474,7 +1476,8 @@ void __init kmem_cache_init(void) kmem_cache_create(names[INDEX_L3].name, sizes[INDEX_L3].cs_size, ARCH_KMALLOC_MINALIGN, - ARCH_KMALLOC_FLAGS|SLAB_PANIC, + ARCH_KMALLOC_FLAGS|SLAB_PANIC| + SLAB_KFREE_NOWARNING, NULL, NULL); } @@ -1492,7 +1495,8 @@ void __init kmem_cache_init(void) sizes->cs_cachep = kmem_cache_create(names->name, sizes->cs_size, ARCH_KMALLOC_MINALIGN, - ARCH_KMALLOC_FLAGS|SLAB_PANIC, + ARCH_KMALLOC_FLAGS|SLAB_PANIC| + SLAB_KFREE_NOWARNING, NULL, NULL); } #ifdef CONFIG_ZONE_DMA @@ -1501,7 +1505,7 @@ void __init kmem_cache_init(void) sizes->cs_size, ARCH_KMALLOC_MINALIGN, ARCH_KMALLOC_FLAGS|SLAB_CACHE_DMA| - SLAB_PANIC, + SLAB_PANIC|SLAB_KFREE_NOWARNING, NULL, NULL); #endif sizes++; @@ -2827,6 +2831,16 @@ static void kfree_debugcheck(const void *objp) } } +static void kfree_debug_cache_pointer(struct kmem_cache *cachep, const void *objp) +{ + if (!(cachep->flags & SLAB_KFREE_NOWARNING)) { + printk(KERN_ERR "kfree debug: obj: %p, li
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
On Wednesday 28 February 2007 14:37, John wrote: > John wrote: > > I know it's possible to have Linux timestamp incoming datagrams as soon > > as they are received, then for one to retrieve this timestamp later with > > an ioctl command or a recvmsg call. > > Has it ever been proposed to modify struct skb_timeval to hold > nanosecond stamps instead of just microsecond stamps? Then make the > improved precision somehow available to user space. John, Most modern NICS are able to delay packet delivery, in order to reduce number of interrupts and benefit from better cache hits. tg3 for example are able to delay up to 1024 us. Then kernel is not realtime and some delays can occur between the hardware interrupt and the very moment we timestamp the packet. If CPU caches are cold, even the instruction fetches could easily add some us. Enabling nanosecond stamps would be a lie to users, because real accuracy is not nanosecond, but in the order of 10 us (at least) If you depend on a < 50 us precision, then linux might be the wrong OS for your application. Or maybe you need a NIC that is able to provide a timestamp in the packet itself (well... along with the packet...) , so that kernel latencies are not a problem. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: CLOCK_MONOTONIC datagram timestamps by the kernel
John wrote: I know it's possible to have Linux timestamp incoming datagrams as soon as they are received, then for one to retrieve this timestamp later with an ioctl command or a recvmsg call. Has it ever been proposed to modify struct skb_timeval to hold nanosecond stamps instead of just microsecond stamps? Then make the improved precision somehow available to user space. On a related note, the comment for skb_set_timestamp() states: /** * skb_set_timestamp - set timestamp of a skb * @skb: skb to set stamp of * @stamp: pointer to struct timeval to get stamp from * * Timestamps are stored in the skb as offsets to a base timestamp. * This function converts a struct timeval to an offset and stores * it in the skb. */ But there is no mention of an offset in the code: static inline void skb_set_timestamp( struct sk_buff *skb, const struct timeval *stamp) { skb->tstamp.off_sec = stamp->tv_sec; skb->tstamp.off_usec = stamp->tv_usec; } Likewise for skb_get_timestamp: /** * skb_get_timestamp - get timestamp from a skb * @skb: skb to get stamp from * @stamp: pointer to struct timeval to store stamp in * * Timestamps are stored in the skb as offsets to a base timestamp. * This function converts the offset back to a struct timeval and stores * it in stamp. */ static inline void skb_get_timestamp( const struct sk_buff *skb, struct timeval *stamp) { stamp->tv_sec = skb->tstamp.off_sec; stamp->tv_usec = skb->tstamp.off_usec; } Are the comments related to code that has since been modified? Regards. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3]: NetXen 1G/10G Ethernet driver updates
Hi All, I will be sending updates to NetXen: 1G/10G Ethernet driver in subsequent mails. The patches will be with respect to netdev#upstream. Regards, Mithlesh Thukral - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface
Hi, > Well, your opinions are welcome. Plus any hints as to how to fix this. > I'd tend to simply(?) add some more fields to the > {hash,get,set,delete}_item() functions in drivers/net/pppoe.c. > But maybe there is some better way? As noone seems to have an opinion on this: Here is a patch that does work for me and that should solve the problem as far as that is easily possible. It is based on the assumption that an interface's ifindex is basically an alias for a local MAC address, so incoming packets now are matched to sockets based on remote MAC, session id, and ifindex of the interface the packet came in on/the socket was bound to by connect(). For relayed packets, the socket that's used for relaying is selected based on destination MAC, session ID and the interface index of the interface whose name currently matches the name requested by userspace as the relaying source interface. The relaying part of the patch is untested. Please note that I'd consider this a security fix for reasons outlined in previous mails. Florian --- linux-2.6.20/drivers/net/pppoe.c.orig 2007-02-25 19:23:51.0 +0100 +++ linux-2.6.20/drivers/net/pppoe.c2007-02-28 12:56:05.0 +0100 @@ -7,6 +7,12 @@ * * Version:0.7.0 * + * 070228 :Fix to allow multiple sessions with same remote MAC and same + * session id by including the local device ifindex in the + * tuple identifying a session. This also ensures packets can't + * be injected into a session from interfaces other than the one + * specified by userspace. Florian Zumbiehl <[EMAIL PROTECTED]> + * (Oh, BTW, this one is YYMMDD, in case you were wondering ...) * 220102 :Fix module use count on failure in pppoe_create, pppox_sk -acme * 030700 :Fixed connect logic to allow for disconnect. * 270700 :Fixed potential SMP problems; we must protect against @@ -127,14 +133,14 @@ * Set/get/delete/rehash items (internal versions) * **/ -static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr) +static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret; ret = item_hash_table[hash]; - while (ret && !cmp_addr(&ret->pppoe_pa, sid, addr)) + while (ret && !(cmp_addr(&ret->pppoe_pa, sid, addr) && ret->pppoe_dev->ifindex == ifindex)) ret = ret->next; return ret; @@ -147,21 +153,19 @@ ret = item_hash_table[hash]; while (ret) { - if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa)) + if (cmp_2_addr(&ret->pppoe_pa, &po->pppoe_pa) && ret->pppoe_dev->ifindex == po->pppoe_dev->ifindex) return -EALREADY; ret = ret->next; } - if (!ret) { - po->next = item_hash_table[hash]; - item_hash_table[hash] = po; - } + po->next = item_hash_table[hash]; + item_hash_table[hash] = po; return 0; } -static struct pppox_sock *__delete_item(unsigned long sid, char *addr) +static struct pppox_sock *__delete_item(unsigned long sid, char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret, **src; @@ -170,7 +174,7 @@ src = &item_hash_table[hash]; while (ret) { - if (cmp_addr(&ret->pppoe_pa, sid, addr)) { + if (cmp_addr(&ret->pppoe_pa, sid, addr) && ret->pppoe_dev->ifindex == ifindex) { *src = ret->next; break; } @@ -188,12 +192,12 @@ * **/ static inline struct pppox_sock *get_item(unsigned long sid, -unsigned char *addr) +unsigned char *addr, int ifindex) { struct pppox_sock *po; read_lock_bh(&pppoe_hash_lock); - po = __get_item(sid, addr); + po = __get_item(sid, addr, ifindex); if (po) sock_hold(sk_pppox(po)); read_unlock_bh(&pppoe_hash_lock); @@ -203,7 +207,15 @@ static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp) { - return get_item(sp->sa_addr.pppoe.sid, sp->sa_addr.pppoe.remote); + struct net_device *dev = NULL; + int ifindex; + + dev = dev_get_by_name(sp->sa_addr.pppoe.dev); + if(!dev) + return NULL; + ifindex = dev->ifindex; + dev_put(dev); + return get_item(sp->sa_addr.pppoe.sid, sp->sa_addr.pppoe.remote, ifindex); } static inline int set_item(struct pppox_sock *po) @@ -220,12 +232,12 @@ return i; } -static inline struct pppox_sock *delete_item(unsigned long sid, char *addr) +static inline struct pppox_sock *delete_item(unsigned lo
[PATCH 2/3]: Fix second rmmod failure observed on PowerPC machines.
NetXen: Fix second rmmod failure observed on PowerPC machines. Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> --- netxen_nic_hw.c |5 +++-- netxen_nic_init.c | 23 +-- netxen_nic_main.c |9 - 3 files changed, 20 insertions(+), 17 deletions(-) diff --git a/drivers/net/netxen/netxen_nic_hw.c b/drivers/net/netxen/netxen_nic_hw.c index deec796..a2877f3 100644 --- a/drivers/net/netxen/netxen_nic_hw.c +++ b/drivers/net/netxen/netxen_nic_hw.c @@ -508,8 +508,8 @@ void netxen_nic_pci_change_crbwindow(str void netxen_load_firmware(struct netxen_adapter *adapter) { int i; - long data, size = 0; - long flashaddr = NETXEN_FLASH_BASE, memaddr = NETXEN_PHANTOM_MEM_BASE; + u32 data, size = 0; + u32 flashaddr = NETXEN_FLASH_BASE, memaddr = NETXEN_PHANTOM_MEM_BASE; u64 off; void __iomem *addr; @@ -951,6 +951,7 @@ void netxen_nic_flash_print(struct netxe netxen_nic_driver_name); return; } + *ptr32 = le32_to_cpu(*ptr32); ptr32++; addr += sizeof(u32); } diff --git a/drivers/net/netxen/netxen_nic_init.c b/drivers/net/netxen/netxen_nic_init.c index 2f96570..586d32b 100644 --- a/drivers/net/netxen/netxen_nic_init.c +++ b/drivers/net/netxen/netxen_nic_init.c @@ -38,13 +38,13 @@ #include "netxen_nic_hw.h" #include "netxen_nic_phan_reg.h" struct crb_addr_pair { - long addr; - long data; + u32 addr; + u32 data; }; #define NETXEN_MAX_CRB_XFORM 60 static unsigned int crb_addr_xform[NETXEN_MAX_CRB_XFORM]; -#define NETXEN_ADDR_ERROR ((unsigned long ) 0x ) +#define NETXEN_ADDR_ERROR (0x) #define crb_addr_transform(name) \ crb_addr_xform[NETXEN_HW_PX_MAP_CRB_##name] = \ @@ -252,10 +252,10 @@ void netxen_initialize_adapter_ops(struc * netxen_decode_crb_addr(0 - utility to translate from internal Phantom CRB * address to external PCI CRB address. */ -unsigned long netxen_decode_crb_addr(unsigned long addr) +u32 netxen_decode_crb_addr(u32 addr) { int i; - unsigned long base_addr, offset, pci_base; + u32 base_addr, offset, pci_base; crb_addr_transform_setup(); @@ -756,7 +756,7 @@ int netxen_pinit_from_rom(struct netxen_ int n, i; int init_delay = 0; struct crb_addr_pair *buf; - unsigned long off; + u32 off; /* resetall */ status = netxen_nic_get_board_info(adapter); @@ -813,14 +813,13 @@ int netxen_pinit_from_rom(struct netxen_ if (verbose) printk("%s: PCI: 0x%08x == 0x%08x\n", netxen_nic_driver_name, (unsigned int) - netxen_decode_crb_addr((unsigned long) - addr), val); + netxen_decode_crb_addr(addr), val); } for (i = 0; i < n; i++) { - off = netxen_decode_crb_addr((unsigned long)buf[i].addr); + off = netxen_decode_crb_addr(buf[i].addr); if (off == NETXEN_ADDR_ERROR) { - printk(KERN_ERR"CRB init value out of range %lx\n", + printk(KERN_ERR"CRB init value out of range %x\n", buf[i].addr); continue; } @@ -927,6 +926,10 @@ int netxen_initialize_adapter_offload(st void netxen_free_adapter_offload(struct netxen_adapter *adapter) { if (adapter->dummy_dma.addr) { + writel(0, NETXEN_CRB_NORMALIZE(adapter, + CRB_HOST_DUMMY_BUF_ADDR_HI)); + writel(0, NETXEN_CRB_NORMALIZE(adapter, + CRB_HOST_DUMMY_BUF_ADDR_LO)); pci_free_consistent(adapter->ahw.pdev, NETXEN_HOST_DUMMY_DMA_SIZE, adapter->dummy_dma.addr, diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c index 2227504..7d2525e 100644 --- a/drivers/net/netxen/netxen_nic_main.c +++ b/drivers/net/netxen/netxen_nic_main.c @@ -434,13 +434,11 @@ #endif adapter->port_count++; adapter->port[i] = port; } -#ifndef CONFIG_PPC64 writel(0, NETXEN_CRB_NORMALIZE(adapter, CRB_CMDPEG_STATE)); netxen_pinit_from_rom(adapter, 0); udelay(500); netxen_load_firmware(adapter); netxen_phantom_init(adapter, NETXEN_NIC_PEG_TUNE); -#endif /* * delay a while to ensure that the Pegs are up & running. * Otherwise, we might see some flaky behaviour. @@ -529,12 +527,13 @@ static void __devexit netxen_nic_remove(
[PATCH 1/3]: Updates, removal of unsupported features and minor bug fixes.
NetXen: Updates, removal of unsupported features and minor bug fixes. Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]> --- netxen_nic.h |4 + netxen_nic_ethtool.c | 144 +- netxen_nic_main.c |4 - netxen_nic_phan_reg.h |3 + 4 files changed, 34 insertions(+), 121 deletions(-) diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h index 2807ef4..81742e4 100644 --- a/drivers/net/netxen/netxen_nic.h +++ b/drivers/net/netxen/netxen_nic.h @@ -72,6 +72,8 @@ #define NUM_FLASH_SECTORS (64) #define FLASH_SECTOR_SIZE (64 * 1024) #define FLASH_TOTAL_SIZE (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE) +#define PHAN_VENDOR_ID 0x4040 + #define RCV_DESC_RINGSIZE \ (sizeof(struct rcv_desc) * adapter->max_rx_desc_count) #define STATUS_DESC_RINGSIZE \ @@ -82,7 +84,7 @@ #define TX_RINGSIZE \ (sizeof(struct netxen_cmd_buffer) * adapter->max_tx_desc_count) #define RCV_BUFFSIZE \ (sizeof(struct netxen_rx_buffer) * rcv_desc->max_rx_desc_count) -#define find_diff_among(a,b,range) ((a)<(b)?((b)-(a)):((b)+(range)-(a))) +#define find_diff_among(a,b,range) ((a)<=(b)?((b)-(a)):((b)+(range)-(a))) #define NETXEN_NETDEV_STATUS 0x1 #define NETXEN_RCV_PRODUCER_OFFSET 0 diff --git a/drivers/net/netxen/netxen_nic_ethtool.c b/drivers/net/netxen/netxen_nic_ethtool.c index 6252e9a..986ef98 100644 --- a/drivers/net/netxen/netxen_nic_ethtool.c +++ b/drivers/net/netxen/netxen_nic_ethtool.c @@ -82,8 +82,7 @@ static const struct netxen_nic_stats net #define NETXEN_NIC_STATS_LEN ARRAY_SIZE(netxen_nic_gstrings_stats) static const char netxen_nic_gstrings_test[][ETH_GSTRING_LEN] = { - "Register_Test_offline", "EEPROM_Test_offline", - "Interrupt_Test_offline", "Loopback_Test_offline", + "Register_Test_on_offline", "Link_Test_on_offline" }; @@ -394,19 +393,12 @@ netxen_nic_get_regs(struct net_device *d } } -static void -netxen_nic_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) -{ - wol->supported = WAKE_UCAST | WAKE_MCAST | WAKE_BCAST | WAKE_MAGIC; - /* options can be added depending upon the mode */ - wol->wolopts = 0; -} - static u32 netxen_nic_test_link(struct net_device *dev) { struct netxen_port *port = netdev_priv(dev); struct netxen_adapter *adapter = port->adapter; __u32 status; + int val; /* read which mode */ if (adapter->ahw.board_type == NETXEN_NIC_GBE) { @@ -415,11 +407,13 @@ static u32 netxen_nic_test_link(struct n NETXEN_NIU_GB_MII_MGMT_ADDR_PHY_STATUS, &status) != 0) return -EIO; - else - return (netxen_get_phy_link(status)); + else { + val = netxen_get_phy_link(status); + return !val; + } } else if (adapter->ahw.board_type == NETXEN_NIC_XGBE) { - int val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_XG_STATE)); - return val == XG_LINK_UP; + val = readl(NETXEN_CRB_NORMALIZE(adapter, CRB_XG_STATE)); + return (val == XG_LINK_UP) ? 0 : 1; } return -EIO; } @@ -606,100 +600,21 @@ netxen_nic_set_pauseparam(struct net_dev static int netxen_nic_reg_test(struct net_device *dev) { - struct netxen_port *port = netdev_priv(dev); - struct netxen_adapter *adapter = port->adapter; - u32 data_read, data_written, save; - __u32 mode; - - /* -* first test the "Read Only" registers by writing which mode -*/ - netxen_nic_read_w0(adapter, NETXEN_NIU_MODE, &mode); - if (netxen_get_niu_enable_ge(mode)) { /* GB Mode */ - netxen_nic_read_w0(adapter, - NETXEN_NIU_GB_MII_MGMT_STATUS(port->portnum), - &data_read); - - save = data_read; - if (data_read) - data_written = data_read & NETXEN_NIC_INVALID_DATA; - else - data_written = NETXEN_NIC_INVALID_DATA; - netxen_nic_write_w0(adapter, - NETXEN_NIU_GB_MII_MGMT_STATUS(port-> - portnum), - data_written); - netxen_nic_read_w0(adapter, - NETXEN_NIU_GB_MII_MGMT_STATUS(port->portnum), - &data_read); - - if (data_written == data_read) { - netxen_nic_write_w0(adapter, - NETXEN_NIU_GB_MII_MGMT_STATUS(port-> - portnum), -
Re: TCP minisock tcp_create_openreq_child() typo?
On 2/28/07, KOVACS Krisztian <[EMAIL PROTECTED]> wrote: Hi, While reading TCP minisock code I've found this suspiciously looking code fragment: - 8< - struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb) { struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC); if (newsk != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); struct tcp_request_sock *treq = tcp_rsk(req); struct inet_connection_sock *newicsk = inet_csk(sk); struct tcp_sock *newtp; - 8< - The above code initializes newicsk to inet_csk(sk), isn't that supposed to be inet_csk(newsk)? As far as I can tell this might leave icsk_ack.last_seg_size zero even if we do have received data. Good catch! David, please apply the attached patch. Signed-off-by: Arnaldo Carvalho de Melo <[EMAIL PROTECTED]> Thanks Krisztian! - Arnaldo diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 30b1e52..6b5c64f 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -381,7 +381,7 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, if (newsk != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); struct tcp_request_sock *treq = tcp_rsk(req); - struct inet_connection_sock *newicsk = inet_csk(sk); + struct inet_connection_sock *newicsk = inet_csk(newsk); struct tcp_sock *newtp; /* Now setup tcp_sock */
Re: [PATCH 1/2] [TCP]: Add two new spurious RTO responses to FRTO
On 27-02-2007 16:50, Ilpo Järvinen wrote: > New sysctl tcp_frto_response is added to select amongst these ... > Signed-off-by: Ilpo Järvinen <[EMAIL PROTECTED]> > @@ -762,15 +763,17 @@ __u32 tcp_init_cwnd(struct tcp_sock *tp, > } > > /* Set slow start threshold and cwnd not falling to slow start */ > -void tcp_enter_cwr(struct sock *sk) > +void tcp_enter_cwr(struct sock *sk, const int set_ssthresh) > { > struct tcp_sock *tp = tcp_sk(sk); > + const struct inet_connection_sock *icsk = inet_csk(sk); > > tp->prior_ssthresh = 0; > tp->bytes_acked = 0; > if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) { - if (inet_csk(sk)->icsk_ca_state < TCP_CA_CWR) { + if (icsk->icsk_ca_state < TCP_CA_CWR) { Probably something for the next "BTW". Regards, Jarek P. > tp->undo_marker = 0; > - tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk); > + if (set_ssthresh) > + tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); ... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
TCP minisock tcp_create_openreq_child() typo?
Hi, While reading TCP minisock code I've found this suspiciously looking code fragment: - 8< - struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb) { struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC); if (newsk != NULL) { const struct inet_request_sock *ireq = inet_rsk(req); struct tcp_request_sock *treq = tcp_rsk(req); struct inet_connection_sock *newicsk = inet_csk(sk); struct tcp_sock *newtp; - 8< - The above code initializes newicsk to inet_csk(sk), isn't that supposed to be inet_csk(newsk)? As far as I can tell this might leave icsk_ack.last_seg_size zero even if we do have received data. -- Regards, Krisztian Kovacs - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
CLOCK_MONOTONIC datagram timestamps by the kernel
Hello, I know it's possible to have Linux timestamp incoming datagrams as soon as they are received, then for one to retrieve this timestamp later with an ioctl command or a recvmsg call. As far as I understand, one can either do const int on = 1; setsockopt(sock, SOL_SOCKET, SO_TIMESTAMP, &on, sizeof on); then use recvmsg() or not set the SO_TIMESTAMP socket option and just call ioctl(sock, SIOCGSTAMP, &tv); after each datagram has been received. SIOCGSTAMP Return a struct timeval with the receive timestamp of the last packet passed to the user. This is useful for accurate round trip time measurements. See setitimer(2) for a description of struct timeval. As far as I understand, this timestamp is given by the CLOCK_REALTIME clock. However, I would like to obtain a timestamp given by the CLOCK_MONOTONIC clock. Relevant parts of the code (I think): net/core/dev.c void net_enable_timestamp(void) { atomic_inc(&netstamp_needed); } void __net_timestamp(struct sk_buff *skb) { struct timeval tv; do_gettimeofday(&tv); skb_set_timestamp(skb, &tv); } static inline void net_timestamp(struct sk_buff *skb) { if (atomic_read(&netstamp_needed)) __net_timestamp(skb); else { skb->tstamp.off_sec = 0; skb->tstamp.off_usec = 0; } } do_gettimeofday() just calls __get_realtime_clock_ts() Would it be possible to replace do_gettimeofday() by ktime_get_ts() with the appropriate division by 1000 to convert the struct timespec back into a struct timeval? void __net_timestamp(struct sk_buff *skb) { struct timespec now; struct timeval tv; ktime_get_ts(&ts); tv.tv_sec = now.tv_sec; tv->tv_usec = now.tv_nsec/1000; skb_set_timestamp(skb, &tv); } How many apps / drivers would this break? Is there perhaps a different way to achieve this? Regards. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Run-time kfree check for correct cache [was Re: [NET]: Fix kfree(skb)]
On Wednesday 28 February 2007 10:02, Evgeniy Polyakov wrote: > Attached patch detects in run-time things like: > skb = alloc_skb(); > kfree(skb); > > where provided to kfree pointer does not belong to kmalloc caches. > It is turned on when slab debug config option is enabled. > > When problem is detected, following warning is printed with hint to > what cache/function should be used instead: It would be less expensive to add a flag #define SLAB_KFREE_NOWARNING 0x0020UL And OR this flags into cs->flags of all standard caches created by kmem_cache_init() from malloc_sizes[]/cache_names[] kfree() would then just test this flag. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On Wed, Feb 28, 2007 at 10:34:37AM +0100, Jarek Poplawski wrote: > On 28-02-2007 02:27, Jean Tourrilhes wrote: ... > > + /* This function is only used for network interface. > > +* Some hotplug package track interfaces by their name and > > +* therefore want to know when the name is changed by the user. */ > > + if(!error) > > + kobject_uevent_env(&class_dev->kobj, KOBJ_RENAME, envp); > > + > > class_device_put(class_dev); > > > > + kfree(devname_string); > > Maybe I miss something, but it seems kobject_uevent_env copies > pointers from envp instead of buffers' contents. And it's enough - sorry. Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
On 28-02-2007 02:27, Jean Tourrilhes wrote: > Hi all, ... > Patch for 2.6.20 is attached. The patch was tested on a system > running the hotplug scripts, and on another system running udev. > > Have fun... > > Jean > > Signed-off-by: Jean Tourrilhes <[EMAIL PROTECTED]> > > - ... > diff -u -p linux/net/core/net-sysfs.j1.c linux/net/core/net-sysfs.c > --- linux/net/core/net-sysfs.j1.c 2007-02-27 15:01:08.0 -0800 > +++ linux/net/core/net-sysfs.c2007-02-27 15:06:49.0 -0800 > @@ -412,6 +412,17 @@ static int netdev_uevent(struct class_de > if ((size <= 0) || (i >= num_envp)) > return -ENOMEM; > > + /* pass ifindex to uevent. > + * ifindex is useful as it won't change (interface name may change) > + * and is what RtNetlink uses natively. */ > + envp[i++] = buf; > + n = snprintf(buf, size, "IFINDEX=%d", dev->ifindex) + 1; > + buf += n; > + size -= n; > + > + if ((size <= 0) || (i >= num_envp)) Btw.: 1. if size == 10 and snprintf returns 9 (without NULL) then n == 10 (with NULL), so isn't it enough (here and above): if ((size < 0) || (i >= num_envp)) 2. shouldn't there be (here and above): envp[--i] = NULL; > + return -ENOMEM; > + > envp[i] = NULL; > return 0; > } ... > diff -u -p linux/drivers/base/class.j1.c linux/drivers/base/class.c > --- linux/drivers/base/class.j1.c 2007-02-26 18:38:10.0 -0800 > +++ linux/drivers/base/class.c2007-02-27 15:52:37.0 -0800 > @@ -841,6 +841,8 @@ int class_device_rename(struct class_dev > { > int error = 0; > char *old_class_name = NULL, *new_class_name = NULL; > + char *devname_string = NULL; > + char *envp[2]; > > class_dev = class_device_get(class_dev); > if (!class_dev) > @@ -849,6 +851,15 @@ int class_device_rename(struct class_dev > pr_debug("CLASS: renaming '%s' to '%s'\n", class_dev->class_id, >new_name); > > + devname_string = kmalloc(strlen(class_dev->class_id) + 15, GFP_KERNEL); > + if (!devname_string) { > + class_device_put(class_dev); > + return -ENOMEM; > + } > + sprintf(devname_string, "INTERFACE_OLD=%s", class_dev->class_id); > + envp[0] = devname_string; > + envp[1] = NULL; > + > #ifdef CONFIG_SYSFS_DEPRECATED > if (class_dev->dev) > old_class_name = make_class_name(class_dev->class->name, > @@ -868,8 +879,16 @@ int class_device_rename(struct class_dev > sysfs_remove_link(&class_dev->dev->kobj, old_class_name); > } > #endif > + > + /* This function is only used for network interface. > + * Some hotplug package track interfaces by their name and > + * therefore want to know when the name is changed by the user. */ > + if(!error) > + kobject_uevent_env(&class_dev->kobj, KOBJ_RENAME, envp); > + > class_device_put(class_dev); > > + kfree(devname_string); Maybe I miss something, but it seems kobject_uevent_env copies pointers from envp instead of buffers' contents. Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ehea: NAPI multi queue TX/RX path for SMP
Hi, > > > > +static inline int ehea_hash_skb(struct sk_buff *skb, int num_qps) > > +{ > > + u32 tmp; > > + if ((skb->nh.iph->protocol == IPPROTO_TCP) > > + && skb->protocol == ETH_P_IP) { > > skb->protocol has network byte order. The ETH_P_IP test should also > logically come before checking the IP protocol. > fixed. > > + tmp = (skb->h.th->source + (skb->h.th->dest << 16)) % 31; > > Only locally generated packets have a valid h.th pointer. > good point. I'll fix that. I'll send a new patch set later today Thanks, Jan-Bernd - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.20] kobject net ifindex + rename
Hi, > Patch for 2.6.20 is attached. ... and in the meantime netdevices aren't class_device any more :) IOW, your patch isn't going to work any more. Also, I think wireless could benefit from this as well. > The kobject framework is well designed, so adding these > features is trivial change and won't run the risk of breaking anything > (famous last words). Obviously, hotplug apps are free to ignore those > additional features. Why not just add this to base kobject_rename instead? That way, userspace is notified for all renames in sysfs. The patch then collapses down to the change in net's sysfs code to add the ifindex to the environment, and another change in kobject to invoke a new event when a name changes and show the old name. johannes signature.asc Description: This is a digitally signed message part
Run-time kfree check for correct cache [was Re: [NET]: Fix kfree(skb)]
Attached patch detects in run-time things like: skb = alloc_skb(); kfree(skb); where provided to kfree pointer does not belong to kmalloc caches. It is turned on when slab debug config option is enabled. When problem is detected, following warning is printed with hint to what cache/function should be used instead: [ 168.085641] bhtest_init: skb: 81003e791478. [ 168.085698] kfree debug: i: 4, size: 15, caches: malloc: 81000119d8c0, dma: 81000119e100, free: 81003f19c940. [ 168.085776] kfree debug: likely you want to use something with 'skbuff_head_cache' in name instead of kfree(). [ 168.085853] BUG: at mm/slab.c:2847 kfree_debug_cahce_pointer() [ 168.085907] [ 168.085907] Call Trace: [ 168.086008] [] kfree+0xfd/0x274 [ 168.086064] [] :bhtest:bhtest_init+0x38/0x3f [ 168.086122] [] sys_init_module+0x163d/0x179d [ 168.086183] [] filp_close+0x5d/0x65 [ 168.086240] [] system_call+0x7e/0x83 [ 168.086295] Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/mm/slab.c b/mm/slab.c index c610062..acd3871 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2829,6 +2829,27 @@ static void kfree_debugcheck(const void *objp) } } +static void kfree_debug_cahce_pointer(struct kmem_cache *cachep, void *objp) +{ + int size = obj_size(cachep), i; + struct cache_sizes *cs; + + for (i=0; ics_size) + break; + } + if ((i == ARRAY_SIZE(malloc_sizes)) || + (cs->cs_cachep != cachep && cs->cs_dmacachep != cachep)) { + printk("kfree debug: i: %d, size: %u, caches: malloc: %p, dma: %p, free: %p.\n", + i, ARRAY_SIZE(malloc_sizes), cs->cs_cachep, cs->cs_dmacachep, + cachep); + printk("kfree debug: likely you want to use something with '%s' in name instead of kfree().\n", + cachep->name); + WARN_ON(1); + } +} + static inline void verify_redzone_free(struct kmem_cache *cache, void *obj) { unsigned long redzone1, redzone2; @@ -2940,6 +2961,7 @@ bad: } #else #define kfree_debugcheck(x) do { } while(0) +#define kfree_debug_cahce_pointer(x, y) do { } while(0) #define cache_free_debugcheck(x,objp,z) (objp) #define check_slabp(x,y) do { } while(0) #endif @@ -3757,6 +3779,7 @@ void kfree(const void *objp) local_irq_save(flags); kfree_debugcheck(objp); c = virt_to_cache(objp); + kfree_debug_cahce_pointer(c, objp); debug_check_no_locks_freed(objp, obj_size(c)); __cache_free(c, (void *)objp); local_irq_restore(flags); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html