Re: 2.6.20-2.6.21 - networking dies after random time
2007/7/26, Ingo Molnar [EMAIL PROTECTED]: (..) yeah - i meant to cover both arches but forgot about x86_64 - updated patch attached below. Ingo - Subject: x86: activate HARDIRQS_SW_RESEND From: Ingo Molnar [EMAIL PROTECTED] activate the software-triggered IRQ-resend logic. it appears some chipsets/cpus do not handle local-APIC driven IRQ resends all that well, so always use the soft mechanism to trigger the execution of pending interrupts. Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- arch/i386/Kconfig |4 arch/x86_64/Kconfig |4 kernel/irq/manage.c |8 3 files changed, 16 insertions(+) Index: linux-rt-rebase.q/arch/i386/Kconfig === --- linux-rt-rebase.q.orig/arch/i386/Kconfig +++ linux-rt-rebase.q/arch/i386/Kconfig @@ -1284,6 +1284,10 @@ config GENERIC_PENDING_IRQ depends on GENERIC_HARDIRQS SMP default y +config HARDIRQS_SW_RESEND + bool + default y + config X86_SMP bool depends on SMP !X86_VOYAGER Index: linux-rt-rebase.q/arch/x86_64/Kconfig === --- linux-rt-rebase.q.orig/arch/x86_64/Kconfig +++ linux-rt-rebase.q/arch/x86_64/Kconfig @@ -721,6 +721,10 @@ config GENERIC_PENDING_IRQ depends on GENERIC_HARDIRQS SMP default y +config HARDIRQS_SW_RESEND + bool + default y + menu Power management options source kernel/power/Kconfig Index: linux-rt-rebase.q/kernel/irq/manage.c === --- linux-rt-rebase.q.orig/kernel/irq/manage.c +++ linux-rt-rebase.q/kernel/irq/manage.c @@ -175,6 +175,14 @@ void enable_irq(unsigned int irq) desc-depth--; } spin_unlock_irqrestore(desc-lock, flags); +#ifdef CONFIG_HARDIRQS_SW_RESEND + /* +* Do a bh disable/enable pair to trigger any pending +* irq resend logic: +*/ + local_bh_disable(); + local_bh_enable(); +#endif } EXPORT_SYMBOL(enable_irq); This patch didn't help (tested on 2.6.22.1) - ne2k_pci timed out. ps: I retested all patches posted in this thread on top of 2.6.22.1 and behavior from 2.6.21.3 didn't changed. My next tests will be on 2.6.22.x only. Regards, Marcin Slusarz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.
On Sun, 29 Jul 2007, Oliver Neukum wrote: Am Sonntag 29 Juli 2007 schrieb Jesper Juhl: On 29/07/07, Satyam Sharma [EMAIL PROTECTED] wrote: Hi, On 7/29/07, Jesper Juhl [EMAIL PROTECTED] wrote: Hello, This patch makes sure we don't dereference a NULL pointer in drivers/net/usb/pegasus.c::write_bulk_callback() in the initial struct net_device *net = pegasus-net; assignment. The existing code checks if 'pegasus' is NULL and bails out if it is, so we better not touch that pointer until after that check. [...] diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c index a05fd97..04cba6b 100644 --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -768,11 +768,13 @@ done: static void write_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb-context; - struct net_device *net = pegasus-net; + struct net_device *net; if (!pegasus) return; + net = pegasus-net; + if (!netif_device_present(net) || !netif_running(net)) return; Is it really possible that we're called into this function with urb-context == NULL? If not, I'd suggest let's just get rid of the check itself, instead. I'm not sure. I am not very familiar with this code. I just figured that moving the assignment is potentially a little safer and it is certainly no worse than the current code, so that's a safe and potentially benneficial change. Removing the check may be safe but I'm not certain enough to make that call... pegasus == NULL there would be a kernel bug. Silently ignoring it, like the code now wants to do is bad. As the oops has never been reported, I figure turning it into an explicit debugging test is overkill, so removal seems to be the best option. In the past urb-context was not guaranteed to be non-null for any asynchronous calls. If this is not the case anymore then it should be removed from at least two more locations in the driver. Attached you'll find the resulting patch. Petko--- drivers/net/usb/pegasus.c.old 2007-07-10 11:39:50.0 +0300 +++ drivers/net/usb/pegasus.c 2007-07-30 11:02:10.0 +0300 @@ -100,9 +100,6 @@ static void ctrl_callback(struct urb *ur { pegasus_t *pegasus = urb-context; - if (!pegasus) - return; - switch (urb-status) { case 0: if (pegasus-flags ETH_REGS_CHANGE) { @@ -609,15 +606,11 @@ static inline struct sk_buff *pull_skb(p static void read_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb-context; - struct net_device *net; + struct net_device *net = pegasus-net; int rx_status, count = urb-actual_length; u8 *buf = urb-transfer_buffer; __u16 pkt_len; - if (!pegasus) - return; - - net = pegasus-net; if (!netif_device_present(net) || !netif_running(net)) return; @@ -770,9 +763,6 @@ static void write_bulk_callback(struct u pegasus_t *pegasus = urb-context; struct net_device *net = pegasus-net; - if (!pegasus) - return; - if (!netif_device_present(net) || !netif_running(net)) return; @@ -805,13 +795,9 @@ static void write_bulk_callback(struct u static void intr_callback(struct urb *urb) { pegasus_t *pegasus = urb-context; - struct net_device *net; + struct net_device *net = pegasus-net; int status; - if (!pegasus) - return; - net = pegasus-net; - switch (urb-status) { case 0: break;
Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.
On Mon, 30 Jul 2007, Petko Manolov wrote: On Sun, 29 Jul 2007, Oliver Neukum wrote: [...] pegasus == NULL there would be a kernel bug. Silently ignoring it, like the code now wants to do is bad. As the oops has never been reported, I figure turning it into an explicit debugging test is overkill, so removal seems to be the best option. In the past urb-context was not guaranteed to be non-null for any asynchronous calls. If this is not the case anymore then it should be removed from at least two more locations in the driver. Attached you'll find the resulting patch. Given Oliver's earlier comment, it looks okay to me. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: source interface ping bug ?
Can someone have a look a this and tell if it's kernel related or if I posted this in the wrong place ? Thanks. On 7/29/07, nano bug [EMAIL PROTECTED] wrote: Any news about this ? On 7/27/07, nano bug [EMAIL PROTECTED] wrote: Hello there, I'm facing the following issue : when I try to ping using source interface instead of a source ip address the ping utility starts to send arp requests instead of icmp requests though the ip address I'm pinging it's not in the subnets directly connected to my linux box. I've noticed this situation since I upgraded from kernel 2.6.20 to 2.6.21. On 2.6.20 and lower I haven't had this problem. Now I upgraded to 2.6.22 but it's the same. I'm using latest iproute and iputils. Here is an output of tcpdump when I try to ping an outside ip address, like for example www.yahoo.com, using source interface : [EMAIL PROTECTED]:~# uname -a Linux darkstar 2.6.22 #1 Thu Jul 26 21:22:11 EEST 2007 i686 Pentium II (Deschutes) GenuineIntel GNU/Linux [EMAIL PROTECTED]:~# ip -V ip utility, iproute2-ss070710 [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# ip address show dev eth2 3: eth2: BROADCAST,MULTICAST,NOTRAILERS,UP,LOWER_UP mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:90:27:0f:79:f3 brd ff:ff:ff:ff:ff:ff inet 86.106.19.75/23 brd 86.106.19.255 scope global eth2 [EMAIL PROTECTED]:~# ip route get 87.248.113.14 from 86.106.19.75 oif eth2 87.248.113.14 from 86.106.19.75 via 86.106.18.1 dev eth2 cache mtu 1500 advmss 1460 hoplimit 64 [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~/iputils# ./ping -V ping utility, iputils-sss20070202 [EMAIL PROTECTED]:~/iputils# ./ping -I 86.106.19.75 87.248.113.14 -c 2 PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 : 56(84) bytes of data. 64 bytes from 87.248.113.14: icmp_seq=1 ttl=51 time=60.5 ms 64 bytes from 87.248.113.14: icmp_seq=2 ttl=51 time=63.2 ms --- 87.248.113.14 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 60.574/61.924/63.274/1.350 ms [EMAIL PROTECTED]:~/iputils# [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host 86.106.19.75 tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes 01:18:09.572603 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 86.106.19.75 87.248.113.14: ICMP echo request, id 27166, seq 1, length 64 01:18:09.632861 IP (tos 0x0, ttl 51, id 6100, offset 0, flags [none], proto: ICMP (1), length: 84) 87.248.113.14 86.106.19.75: ICMP echo reply, id 27166, seq 1, length 64 01:18:10.572746 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: ICMP (1), length: 84) 86.106.19.75 87.248.113.14: ICMP echo request, id 27166, seq 2, length 64 01:18:10.634951 IP (tos 0x0, ttl 51, id 8790, offset 0, flags [none], proto: ICMP (1), length: 84) 87.248.113.14 86.106.19.75: ICMP echo reply, id 27166, seq 2, length 64 using source interface : [EMAIL PROTECTED]:~/iputils# ./ping -I eth2 87.248.113.14 PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 eth2: 56(84) bytes of data. From 86.106.19.75 icmp_seq=1 Destination Host Unreachable From 86.106.19.75 icmp_seq=2 Destination Host Unreachable From 86.106.19.75 icmp_seq=3 Destination Host Unreachable From 86.106.19.75 icmp_seq=5 Destination Host Unreachable From 86.106.19.75 icmp_seq=6 Destination Host Unreachable From 86.106.19.75 icmp_seq=7 Destination Host Unreachable From 86.106.19.75 icmp_seq=9 Destination Host Unreachable From 86.106.19.75 icmp_seq=10 Destination Host Unreachable From 86.106.19.75 icmp_seq=11 Destination Host Unreachable --- 87.248.113.14 ping statistics --- 13 packets transmitted, 0 received, +9 errors, 100% packet loss, time 12006ms , pipe 3 [EMAIL PROTECTED]:~/iputils# [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host 86.106.19.75 tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes 01:19:24.292911 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:25.292897 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:26.292901 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:27.302906 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:28.302911 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:29.302912 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:31.302917 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:32.302921 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:33.302923 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:35.302932 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:36.302932 arp who-has 87.248.113.14 tell 86.106.19.75 01:19:37.302939 arp who-has 87.248.113.14 tell 86.106.19.75 12 packets captured 12 packets received by filter 0 packets dropped by kernel [EMAIL PROTECTED]:~# There is one exception though, it works when using eth0. I'm
Re: 2.6.20-2.6.21 - networking dies after random time
* Alan Cox [EMAIL PROTECTED] wrote: Ok the logic behind the 8390 is very simple: thanks for the explanation Alan! A few comments and a question: Things to know - IRQ delivery is asynchronous to the PCI bus - Blocking the local CPU IRQ via spin locks was too slow - The chip has register windows needing locking work So the path was once (I say once as people appear to have changed it in the mean time and it now looks rather bogus if the changes to use disable_irq_nosync_irqsave are disabling the local IRQ) Take the page lock Mask the IRQ on chip Disable the IRQ (but not mask locally- someone seems to have broken this with the lock validator stuff) [This must be _nosync as the page lock may otherwise deadlock us] ( side-note: you can ignore the lock validator stuff here, the validator changes are supposed to a NOP on the !lockdep case. Local irqs will only be disabled if the validator is running. This could cause dropped serial irqs on very old boxes but i doubt anyone will want to run the validator on those. ) Drop the page lock and turn IRQs back on At this point an existing IRQ may still be running but we can't get a new one Take the lock (so we know the IRQ has terminated) but don't mask the IRQs on the processor Set irqlock [for debug] Transmit (slow as ) re-enable the IRQ We have to use disable_irq because otherwise you will get delayed interrupts on the APIC bus deadlocking the transmit path. Quite hairy but the chip simply wasn't designed for SMP and you can't even ACK an interrupt without risking corrupting other parallel activities on the chip. So the whole locking is to be able to keep irqs enabled for a long time, without risking entry of the same IRQ handler on this same CPU, correct? Marcin's test results suggest that if an IRQ is resent right at the enable_irq() point [be that via the hw irq-resend mechanism or the sw irq-resend mechanism], the hang happens. In the previous 2.6.20 logic we'd not normally generate an IRQ at that point (because we masked the irq and the card itself deasserts the line so any level-triggered irq is now moot). Once Thomas hacked off this resend mechanism for level-triggered irqs, Marcin saw the hangs go away. So it seems to me that maybe the driver could be surprised via these spurious interrupts that happen right after the irq_enable(). Does the patch below make any sense in your opinion? Ingo Index: linux/drivers/net/lib8390.c === --- linux.orig/drivers/net/lib8390.c +++ linux/drivers/net/lib8390.c @@ -375,6 +375,8 @@ static int ei_start_xmit(struct sk_buff /* Turn 8390 interrupts back on. */ ei_local-irqlock = 0; ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR); + /* force POST: */ + ei_inb_p(e8390_base + EN0_IMR); spin_unlock(ei_local-page_lock); enable_irq_lockdep_irqrestore(dev-irq, flags); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
* Marcin Ślusarz [EMAIL PROTECTED] wrote: Subject: x86: activate HARDIRQS_SW_RESEND From: Ingo Molnar [EMAIL PROTECTED] activate the software-triggered IRQ-resend logic. This patch didn't help (tested on 2.6.22.1) - ne2k_pci timed out. ok. This makes it more likely that the driver itself (or the card) gets confused by the resend. does the patch below fix those timeouts? It tests the theory whether any POST latency could expose this problem. Ingo Index: linux/drivers/net/lib8390.c === --- linux.orig/drivers/net/lib8390.c +++ linux/drivers/net/lib8390.c @@ -375,6 +375,8 @@ static int ei_start_xmit(struct sk_buff /* Turn 8390 interrupts back on. */ ei_local-irqlock = 0; ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR); + /* force POST: */ + ei_inb_p(e8390_base + EN0_IMR); spin_unlock(ei_local-page_lock); enable_irq_lockdep_irqrestore(dev-irq, flags); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linksys Gigabit USB2.0 adapter (asix) regression
David Hollis wrote: On Mon, 2007-06-25 at 19:05 +0200, Erik Slagter wrote: drivers/net/usb/asix.c: PHYID=0x01410cc2 Ok, it is using a Marvell PHY so that part should be fine. You mentioned that it looks like the packets are being transmitted, but are garbled in some way. The device does prepend a 'header' to ethernet packets as they are transmitted down the USB pipe. The device strips this off and puts the packets on the wire. This could be where the issue lies. Are you on x86 by chance or something else? They are either garbled are they are not passed on the wire. The transmitted packets are shown by tshark, but a tshark run on the other end of the line does not show them. Platform is indeed x86, to be precise: fedora 7, kernel 2.6.22-rc6, cpu pentium M, dell laptop inspiron 9300, ICH6. If you want me to test something please yell, it's no trouble at all to change a few lines in the driver's source and recompile the module. Please note I cannot send mail to you: (conversation with dhollis.dyndns.org[71.251.104.159] timed out while sending MAIL FROM) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.23-rc1] PPPOL2TP: Add CONFIG_INET Kconfig dependency
[PPPOL2TP]: Add CONFIG_INET Kconfig dependency. PPPOL2TP uses UDP so it obviously depends on CONFIG_INET. Signed-off-by: James Chapman [EMAIL PROTECTED] --- Toralf Foerster reported that make rndconfig failed in 2.6.23-rc1 when selecting CONFIG_PPPOL2TP without CONFIG_INET. Index: linux-2.6.23-rc1/drivers/net/Kconfig === --- linux-2.6.23-rc1.orig/drivers/net/Kconfig +++ linux-2.6.23-rc1/drivers/net/Kconfig @@ -2851,7 +2851,7 @@ config PPPOATM config PPPOL2TP tristate PPP over L2TP (EXPERIMENTAL) - depends on EXPERIMENTAL PPP + depends on EXPERIMENTAL PPP INET help Support for PPP-over-L2TP socket family. L2TP is a protocol used by ISPs and enterprises to tunnel PPP traffic over UDP - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] make nf_ct_ipv6_skip_exthdr() static
Adrian Bunk wrote: nf_ct_ipv6_skip_exthdr() can now become static. Applied, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] HW VLAN filtering control
Mitch Williams wrote: This patchset adds the capability to disable hardware VLAN filtering at runtime via the existing vconfig utility. It's useful for debugging purposes. The first patch modifies the VLAN subsystem to define the flag, and to support passing the flag on to the base driver. The second patch modifies e1000 to support the flag. Since it's only one function, other drivers can be easily modified to support this functionality. vconfig is used without modification to enable or disable filtering: # vconfig [vlan-interface] 2 1 will disable filtering, and # vconfig [vlan-interface] 2 0 will enable filtering. I think an ethtool option for this would be more consistent with configuration of other device features and its questionable whether new feature should be added to the ioctl interface at all. The vlan driver could perform the necessary reconfiguration in the FEAT_CHANGE notifier call. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour whose device is an ipoib one, there is a struct ipoib_neigh buddy which is created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour) call. When using the bonding driver, neighbours are created by the net stack on behalf of the bonding (master) device. On the tx flow the bonding code gets an skb such that skb-dev points to the master device, it changes this skb to point on the slave device and calls the slave hard_start_xmit function. Under this scheme, ipoib_neigh_destructor assumption that for each struct neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev) can be casted to struct ipoib_dev_priv is buggy. To fix it, this patch adds a dev field to struct ipoib_neigh which is used instead of the struct neighbour dev one, when n-dev-flags has the IFF_MASTER bit set. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib.h |4 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 +++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 +- 3 files changed, 19 insertions(+), 4 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-07-25 14:57:48.095724495 +0300 @@ -328,6 +328,7 @@ struct ipoib_neigh { struct sk_buff_head queue; struct neighbour *neighbour; + struct net_device *dev; struct list_headlist; }; @@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip INFINIBAND_ALEN, sizeof(void *)); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh); +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh, + struct net_device *dev); void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh); extern struct workqueue_struct *ipoib_workqueue; Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:03:11.291291271 +0300 @@ -510,7 +510,7 @@ static void neigh_add_path(struct sk_buf struct ipoib_path *path; struct ipoib_neigh *neigh; - neigh = ipoib_neigh_alloc(skb-dst-neighbour); + neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (!neigh) { ++priv-stats.tx_dropped; dev_kfree_skb_any(skb); @@ -817,6 +817,16 @@ static void ipoib_neigh_cleanup(struct n unsigned long flags; struct ipoib_ah *ah = NULL; + if (n-dev-flags IFF_MASTER) { + /* n-dev is not an IPoIB device and we have to take priv from elsewhere */ + neigh = *to_ipoib_neigh(n); + if (neigh){ + priv = netdev_priv(neigh-dev); + ipoib_dbg(priv, neigh_destructor for bonding device: %s\n, + n-dev-name); + } else + return; + } ipoib_dbg(priv, neigh_cleanup for %06x IPOIB_GID_FMT \n, IPOIB_QPN(n-ha), @@ -838,7 +848,9 @@ static void ipoib_neigh_cleanup(struct n ipoib_put_ah(ah); } -struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour) +struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour, + struct net_device *dev) + { struct ipoib_neigh *neigh; @@ -847,6 +859,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st return NULL; neigh-neighbour = neighbour; + neigh-dev = dev; *to_ipoib_neigh(neighbour) = neigh; skb_queue_head_init(neigh-queue); ipoib_cm_set(neigh, NULL); Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:56:13.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 14:57:48.097724142 +0300 @@ -727,7 +727,7 @@ out: if (skb-dst skb-dst-neighbour !*to_ipoib_neigh(skb-dst-neighbour)) { - struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour); + struct ipoib_neigh *neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev); if (neigh) { kref_get(mcast-ah-ref); - To unsubscribe from this list: send the line
[PATCH V3 2/7] IB/ipoib: Verify address handle validity on send
When the bonding device senses a carrier loss of its active slave it replaces that slave with a new one. In between the times when the carrier of an IPoIB device goes down and ipoib_neigh is destroyed, it is possible that the bonding driver will send a packet on a new slave that uses an old ipoib_neigh. This patch detects and prevents this from happenning. Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c === --- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 14:57:48.0 +0300 +++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c 2007-07-25 15:02:55.525131034 +0300 @@ -685,9 +685,10 @@ static int ipoib_start_xmit(struct sk_bu goto out; } } else if (neigh-ah) { - if (unlikely(memcmp(neigh-dgid.raw, + if (unlikely((memcmp(neigh-dgid.raw, skb-dst-neighbour-ha + 4, - sizeof(union ib_gid { + sizeof(union ib_gid))) || +(neigh-dev != dev))) { spin_lock(priv-lock); /* * It's safe to call ipoib_put_ah() inside - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER
This patch changes some of the bond netdevice attributes and functions to be that of the active slave for the case of the enslaved device not being of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(), which are netdevice **type** dependent and hence might be not appropriate for devices of other types. It also enforces mutual exclusion on bonding slaves from dissimilar ether types, as was concluded over the v1 discussion. IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this IPoIB device is bounded to. The QP is a resource created by the IB HW and the GID is an identifier burned into the HCA (i have omitted here some details which are not important for the bonding RFC). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 38 ++ 1 files changed, 38 insertions(+) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:02:10.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:24:30.913343981 +0300 @@ -1277,6 +1277,26 @@ static int bond_compute_features(struct return 0; } + +static void bond_setup_by_slave(struct net_device *bond_dev, + struct net_device *slave_dev) +{ + bond_dev-hard_header = slave_dev-hard_header; + bond_dev-rebuild_header= slave_dev-rebuild_header; + bond_dev-hard_header_cache = slave_dev-hard_header_cache; + bond_dev-header_cache_update = slave_dev-header_cache_update; + bond_dev-hard_header_parse = slave_dev-hard_header_parse; + + bond_dev-neigh_setup = slave_dev-neigh_setup; + + bond_dev-type = slave_dev-type; + bond_dev-hard_header_len = slave_dev-hard_header_len; + bond_dev-addr_len = slave_dev-addr_len; + + memcpy(bond_dev-broadcast, slave_dev-broadcast, + slave_dev-addr_len); +} + /* enslave device slave to bond device master */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) { @@ -1351,6 +1371,24 @@ int bond_enslave(struct net_device *bond goto err_undo_flags; } + /* set bonding device ether type by slave - bonding netdevices are +* created with ether_setup, so when the slave type is not ARPHRD_ETHER +* there is a need to override some of the type dependent attribs/funcs. +* +* bond ether type mutual exclusion - don't allow slaves of dissimilar +* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same bond +*/ + if (bond-slave_cnt == 0) { + if (slave_dev-type != ARPHRD_ETHER) + bond_setup_by_slave(bond_dev, slave_dev); + } else if (bond_dev-type != slave_dev-type) { + printk(KERN_ERR DRV_NAME : %s ether type (%d) is different from + other slaves (%d), can not enslave it.\n, slave_dev-name, + slave_dev-type, bond_dev-type); + res = -EINVAL; + goto err_undo_flags; + } + if (slave_dev-set_mac_address == NULL) { printk(KERN_ERR DRV_NAME : %s: Error: The slave device you specified does - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
This patch allows for enslaving netdevices which do not support the set_mac_address() function. In that case the bond mac address is the one of the active slave, where remote peers are notified on the mac address (neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs (this is already done by the bonding code). Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 88 +++- drivers/net/bonding/bonding.h |1 2 files changed, 61 insertions(+), 28 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-29 16:24:30.913343981 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:36:53.234602471 +0300 @@ -1127,6 +1127,14 @@ void bond_change_active_slave(struct bon if (new_active) { bond_set_slave_active_flags(new_active); } + + /* when bonding does not set the slave MAC address, the bond MAC +* address is the one of the active slave. +*/ + if (new_active !bond-do_set_mac_addr) + memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, + new_active-dev-addr_len); + bond_send_gratuitous_arp(bond); } } @@ -1390,13 +1398,22 @@ int bond_enslave(struct net_device *bond } if (slave_dev-set_mac_address == NULL) { - printk(KERN_ERR DRV_NAME - : %s: Error: The slave device you specified does - not support setting the MAC address. - Your kernel likely does not support slave - devices.\n, bond_dev-name); - res = -EOPNOTSUPP; - goto err_undo_flags; + if (bond-slave_cnt == 0) { + printk(KERN_WARNING DRV_NAME + : %s: Warning: The first slave device you + specified does not support setting the MAC + address. This bond MAC address would be that + of the active slave.\n, bond_dev-name); + bond-do_set_mac_addr = 0; + } else if (bond-do_set_mac_addr) { + printk(KERN_ERR DRV_NAME + : %s: Error: The slave device you specified + does not support setting the MAC addres,. + but this bond uses this practice. \n + , bond_dev-name); + res = -EOPNOTSUPP; + goto err_undo_flags; + } } new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL); @@ -1417,16 +1434,18 @@ int bond_enslave(struct net_device *bond */ memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN); - /* -* Set slave to master's mac address. The application already -* set the master's mac address to that of the first slave -*/ - memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); - addr.sa_family = slave_dev-type; - res = dev_set_mac_address(slave_dev, addr); - if (res) { - dprintk(Error %d calling set_mac_address\n, res); - goto err_free; + if (bond-do_set_mac_addr) { + /* +* Set slave to master's mac address. The application already +* set the master's mac address to that of the first slave +*/ + memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len); + addr.sa_family = slave_dev-type; + res = dev_set_mac_address(slave_dev, addr); + if (res) { + dprintk(Error %d calling set_mac_address\n, res); + goto err_free; + } } res = netdev_set_master(slave_dev, bond_dev); @@ -1651,9 +1670,11 @@ err_close: dev_close(slave_dev); err_restore_mac: - memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); - addr.sa_family = slave_dev-type; - dev_set_mac_address(slave_dev, addr); + if (bond-do_set_mac_addr) { + memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN); + addr.sa_family = slave_dev-type; + dev_set_mac_address(slave_dev, addr); + } err_free: kfree(new_slave); @@ -1831,10 +1852,12 @@ int bond_release(struct net_device *bond /* close slave before restoring its mac address */ dev_close(slave_dev); - /* restore original (permanent) mac address */ - memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN); - addr.sa_family =
[PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices
Allow to enslave devices when the bonding device is not up. Over the discussion held at the previous post this seemed to be the most clean way to go, where it is not expected to cause instabilities. Normally, the bonding driver is UP before any enslavement takes place. Once a netdevice is UP, the network stack acts to have it join some multicast groups (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called where for multicast joins taking place after the enslavement another ip_xxx_mc_map() is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND) Signed-off-by: Moni Shoua [EMAIL PROTECTED] Signed-off-by: Or Gerlitz [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |4 ++-- drivers/net/bonding/bond_sysfs.c |6 ++ 2 files changed, 4 insertions(+), 6 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:04:50.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:06:17.175820632 +0300 @@ -1325,8 +1325,8 @@ int bond_enslave(struct net_device *bond /* bond must be initialized by bond_open() before enslaving */ if (!(bond_dev-flags IFF_UP)) { - dprintk(Error, master_dev is not up\n); - return -EPERM; + printk(KERN_WARNING DRV_NAME +%s: master_dev is not up in bond_enslave\n, bond_dev-name); } /* already enslaved */ Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 14:18:12.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:06:17.176820452 +0300 @@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru /* Quick sanity check -- is the bond interface up? */ if (!(bond-dev-flags IFF_UP)) { - printk(KERN_ERR DRV_NAME - : %s: Unable to update slaves because interface is down.\n, + printk(KERN_WARNING DRV_NAME + : %s: doing slave updates when interface is down.\n, bond-dev-name); - ret = -EPERM; - goto out; } /* Note: We can't hold bond-lock here, as bond_create grabs it. */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
This patch series is the third version (see below link to V2) of the suggested changes to the bonding driver so it would be able to support non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. The motivation is to enable the bonding driver on its HA mode to work with the IP over Infiniband (IPoIB) driver. With these patches I was able to enslave IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with fail-over and fail-back working fine. The working environment was the net-2.6 git. More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which is used by native IB ULPs whose addressing scheme is based on IP (e.g. iSER, SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA for these ULPs. This holds as when the ULP is informed by the IB HW on the failure of the current IB connection, it just need to reconnect, where the bonding device will now issue the IB ARP over the active IPoIB slave. This series also includes patches to the IPoIB driver that fix some fix some neighboring related issues. There are still 2 open issues here: 1. When bonding enslaves an IPoIB device the bonding neighbor holds a reference to a cleanup function in the IPoIB drives. This makes it unsafe to unload the IPoIB module if there are bonding neighbors in the air. So, to avoid this race one must unload bonding before unloading IPoIB. 2. Patch No. 7 is a workaround to a problem where gratuitous were not sent quite often. I didn't find something better that fixes this and I would appreciate advices and comments regarding it. However, this doesn't seem to me as an issue related exclusively to IPoIB. Links to earlier discussion: 1. A discussion in netdev about bonding support for IPoIB. http://lists.openwall.net/netdev/2006/11/30/46 2. A discussion in openfabrics regarding changes in the IPoIB that enable using it as a slave for bonding. http://lists.openfabrics.org/pipermail/general/2007-March/034033.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device
bonding sometimes uses Ethernet constants (such as MTU and address length) which are not good when it enslaves non Ethernet devices (such as InfiniBand). Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c |2 +- drivers/net/bonding/bond_sysfs.c | 10 -- drivers/net/bonding/bonding.h|1 + 3 files changed, 10 insertions(+), 3 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:06:17.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.012883360 +0300 @@ -1255,7 +1255,7 @@ static int bond_compute_features(struct unsigned long features = BOND_INTERSECT_FEATURES; struct slave *slave; struct net_device *bond_dev = bond-dev; - unsigned short max_hard_header_len = ETH_HLEN; + u16 max_hard_header_len = max((u16)ETH_HLEN, bond_dev-hard_header_len); int i; bond_for_each_slave(bond, slave, i) { Index: net-2.6/drivers/net/bonding/bond_sysfs.c === --- net-2.6.orig/drivers/net/bonding/bond_sysfs.c 2007-07-25 15:06:17.0 +0300 +++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:20:10.224527636 +0300 @@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru char command[IFNAMSIZ + 1] = { 0, }; char *ifname; int i, res, found, ret = count; + u32 original_mtu; struct slave *slave; struct net_device *dev = NULL; struct bonding *bond = to_bond(d); @@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru } /* Set the slave's MTU to match the bond */ + original_mtu = dev-mtu; if (dev-mtu != bond-dev-mtu) { if (dev-change_mtu) { res = dev-change_mtu(dev, @@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru } rtnl_lock(); res = bond_enslave(bond-dev, dev); + bond_for_each_slave(bond, slave, i) + if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) + slave-original_mtu=original_mtu; rtnl_unlock(); if (res) { ret = res; @@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru bond_for_each_slave(bond, slave, i) if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) { dev = slave-dev; + original_mtu = slave-original_mtu; break; } if (dev) { @@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru } /* set the slave MTU to the default */ if (dev-change_mtu) { - dev-change_mtu(dev, 1500); + dev-change_mtu(dev, original_mtu); } else { - dev-mtu = 1500; + dev-mtu = original_mtu; } } else { Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:03:32.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.223527810 +0300 @@ -156,6 +156,7 @@ struct slave { s8 link;/* one of BOND_LINK_ */ s8 state; /* one of BOND_STATE_ */ u32original_flags; + u32original_mtu; u32link_failure_count; u16speed; u8 duplex; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 25 + drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 4 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 15:33:25.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 +0300 @@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)){ + dprintk(delaying gratuitous arp on %s\n,bond-curr_active_slave-dev-name); + bond-send_grat_arp=1; + }else{ + bond_send_gratuitous_arp(bond); + } } } @@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n,__FUNCTION__); + else { + dprintk(sending delayed gratuitous arp on ond-curr_active_slave-dev-name\n); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp=0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str struct slave *slave = bond-curr_active_slave; struct vlan_entry *vlan; struct net_device *vlan_dev; + int i; dprintk(bond_send_grat_arp: bond %s slave %s\n, bond-dev-name, slave ? slave-dev-name : NULL); @@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str return; if (bond-master_ip) { - bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, - bond-master_ip, 0); + for (i=0;i3;i++) + bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, + bond-master_ip, 0); } list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4331,6 +4347,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp=0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300 @@ -203,6 +203,7 @@ struct bonding { struct vlan_group *vlgrp; struct packet_type arp_mon_pt; s8 do_set_mac_addr; + int send_grat_arp; }; /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.20-2.6.21 - networking dies after random time
So the whole locking is to be able to keep irqs enabled for a long time, without risking entry of the same IRQ handler on this same CPU, correct? As implemented - on any CPU. We also need to know that the IRQ handler is not doing useful work on another processor which is why we take the lock after disabling the interrupt line everywhere. Without that we might be completing an IRQ on another CPU and that would race the transmit and make a nasty mess. So it seems to me that maybe the driver could be surprised via these spurious interrupts that happen right after the irq_enable(). Does the patch below make any sense in your opinion? For MMIO it does look like that may be needed. Looks sensible. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] Preparatory refactoring part 1.
Corey Hickey wrote: diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 9579573..8ae077f 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -77,6 +77,9 @@ #define SFQ_DEPTH128 #define SFQ_HASH_DIVISOR 1024 +#define SFQ_HEAD 0 +#define SFQ_TAIL 1 + /* This type should contain at least SFQ_DEPTH*2 values */ typedef unsigned char sfq_index; @@ -244,10 +247,8 @@ static unsigned int sfq_drop(struct Qdisc *sch) return 0; } -static int -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, unsigned int end) Please make sure to break at 80 chars and to keep the style in this file consistent (newline before function name). { - struct sfq_sched_data *q = qdisc_priv(sch); unsigned hash = sfq_hash(q, skb); sfq_index x; @@ -256,8 +257,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-ht[hash] = x = q-dep[SFQ_DEPTH].next; q-hash[x] = hash; } - sch-qstats.backlog += skb-len; Why not keep this instead of having both callers do it? - __skb_queue_tail(q-qs[x], skb); + + if (end == SFQ_TAIL) + __skb_queue_tail(q-qs[x], skb); + else + __skb_queue_head(q-qs[x], skb); + sfq_inc(q, x); if (q-qs[x].qlen == 1) { /* The flow is new */ if (q-tail == SFQ_DEPTH) { /* It is the first flow */ @@ -270,12 +275,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) q-tail = x; } } +} + +static int +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); newline please. + sfq_q_enqueue(skb, q, SFQ_TAIL); + sch-qstats.backlog += skb-len; if (++sch-q.qlen q-limit-1) { sch-bstats.bytes += skb-len; sch-bstats.packets++; return 0; } + sch-qstats.drops++; sfq_drop already increments this. sfq_drop(sch); return NET_XMIT_CN; } @@ -284,28 +298,8 @@ static int sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) { struct sfq_sched_data *q = qdisc_priv(sch); newline please - unsigned hash = sfq_hash(q, skb); - sfq_index x; - - x = q-ht[hash]; - if (x == SFQ_DEPTH) { - q-ht[hash] = x = q-dep[SFQ_DEPTH].next; - q-hash[x] = hash; - } + sfq_q_enqueue(skb, q, SFQ_HEAD); sch-qstats.backlog += skb-len; - __skb_queue_head(q-qs[x], skb); - sfq_inc(q, x); - if (q-qs[x].qlen == 1) { /* The flow is new */ - if (q-tail == SFQ_DEPTH) { /* It is the first flow */ - q-tail = x; - q-next[x] = x; - q-allot[x] = q-quantum; - } else { - q-next[x] = q-next[q-tail]; - q-next[q-tail] = x; - q-tail = x; - } - } if (++sch-q.qlen q-limit - 1) { sch-qstats.requeues++; return 0; @@ -316,13 +310,8 @@ sfq_requeue(struct sk_buff *skb, struct Qdisc* sch) return NET_XMIT_CN; } - - - -static struct sk_buff * -sfq_dequeue(struct Qdisc* sch) +static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q) Keep style consistent please. { - struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, old_a; @@ -335,8 +324,6 @@ sfq_dequeue(struct Qdisc* sch) /* Grab packet */ skb = __skb_dequeue(q-qs[a]); sfq_dec(q, a); - sch-q.qlen--; - sch-qstats.backlog -= skb-len; /* Is the slot empty? */ if (q-qs[a].qlen == 0) { @@ -353,6 +340,21 @@ sfq_dequeue(struct Qdisc* sch) a = q-next[a]; q-allot[a] += q-quantum; } + + return skb; +} + +static struct sk_buff +*sfq_dequeue(struct Qdisc* sch) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sk_buff *skb; + + skb = sfq_q_dequeue(q); + if (skb == NULL) + return NULL; + sch-q.qlen--; + sch-qstats.backlog -= skb-len; return skb; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] Preparatory refactoring part 2.
Corey Hickey wrote: diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 8ae077f..0c46938 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -380,71 +380,71 @@ static void sfq_perturbation(unsigned long arg) } } -static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt) { - struct sfq_sched_data *q = qdisc_priv(sch); struct tc_sfq_qopt *ctl = RTA_DATA(opt); - unsigned int qlen; + int i; - if (opt-rta_len RTA_LENGTH(sizeof(*ctl))) + if (opt opt-rta_len RTA_LENGTH(sizeof(*ctl))) opt is dereferenced above (RTA_DATA), so if it is NULL we've already crashed. return -EINVAL; - sch_tree_lock(sch); - q-quantum = ctl-quantum ? : psched_mtu(sch-dev); - q-perturb_period = ctl-perturb_period*HZ; - if (ctl-limit) - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + q-perturbation = 0; + q-max_depth = 0; + q-tail = q-limit = SFQ_DEPTH; + if (opt == NULL) { + q-perturb_period = 0; + } else { + struct tc_sfq_qopt *ctl = RTA_DATA(opt); + if (ctl-quantum) + q-quantum = ctl-quantum; + q-perturb_period = ctl-perturb_period*HZ; - qlen = sch-q.qlen; - while (sch-q.qlen = q-limit-1) - sfq_drop(sch); - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen); I hope that patch that makes changing possible brings this back .. checking .. it doesn't. Please either keep this or fix up 6/7 to bring it back. + if (ctl-limit) + q-limit = min_t(u32, ctl-limit, SFQ_DEPTH); + } - del_timer(q-perturb_timer); - if (q-perturb_period) { - q-perturb_timer.expires = jiffies + q-perturb_period; - add_timer(q-perturb_timer); + for (i=0; iSFQ_HASH_DIVISOR; i++) + q-ht[i] = SFQ_DEPTH; + for (i=0; iSFQ_DEPTH; i++) { + skb_queue_head_init(q-qs[i]); + q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH; + q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH; } - sch_tree_unlock(sch); + + for (i=0; iSFQ_DEPTH; i++) + sfq_link(q, i); return 0; } +static void sfq_q_destroy(struct sfq_sched_data *q) +{ + del_timer(q-perturb_timer); +} + static void sfq_destroy(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - del_timer(q-perturb_timer); + sfq_q_destroy(q); } That really does look a bit pointless. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] Make qdisc changeable.
Corey Hickey wrote: Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc change will work. Signed-off-by: Corey Hickey [EMAIL PROTECTED] --- net/sched/sch_sfq.c | 51 ++- 1 files changed, 50 insertions(+), 1 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index e6a6a21..e042cd0 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -485,6 +485,55 @@ static int sfq_init(struct Qdisc *sch, struct rtattr *opt) return 0; } +static int sfq_change(struct Qdisc *sch, struct rtattr *opt) +{ + struct sfq_sched_data *q = qdisc_priv(sch); + struct sfq_sched_data tmp; + struct sk_buff *skb; + int err; + + /* set up tmp queue */ + memset(tmp, 0, sizeof(struct sfq_sched_data)); + tmp.quantum = psched_mtu(sch-dev); /* default */ If no value is given it should use the old value instead of reinitializing to the default. + if ((err = sfq_q_init(tmp, opt))) + return err; This will also use defaults for all unspecified values. It would be more consistent with other qdiscs to only change those values that are actually specified, so something like tc qdisc change ... perturb 10 will *only* change the perturbation parameter. I'm not sure reusing the initialization function and copying the parameters is the best way to do this. + + /* copy packets from the old queue to the tmp queue */ + sch_tree_lock(sch); + while (sch-q.qlen = tmp.limit - 1) + sfq_drop(sch); + while ((skb = sfq_q_dequeue(q)) != NULL) + sfq_q_enqueue(skb, tmp, SFQ_TAIL); + + /* clean up the old queue */ + sfq_q_destroy(q); + + /* copy elements of the tmp queue into the old queue */ + q-perturb_period = tmp.perturb_period; + q-quantum= tmp.quantum; + q-limit = tmp.limit; + q-depth = tmp.depth; + q-hash_divisor = tmp.hash_divisor; + q-tail = tmp.tail; + q-max_depth = tmp.max_depth; + q-ht= tmp.ht; + q-dep = tmp.dep; + q-next = tmp.next; + q-allot = tmp.allot; + q-hash = tmp.hash; + q-qs= tmp.qs; + + /* finish up */ + if (q-perturb_period) { + q-perturb_timer.expires = jiffies + q-perturb_period; + add_timer(q-perturb_timer); + } else { + q-perturbation = 0; + } + sch_tree_unlock(sch); + return 0; +} + - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec
On 7/26/07, David Miller [EMAIL PROTECTED] wrote: From: Shannon Nelson [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 17:36:06 -0700 (repost - original eaten by vger?) Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was unreachable and thus unused. The code originally was there to support in-kernel dma needs, but since it remains unused, we'll pull it out. Signed-off-by: Shannon Nelson [EMAIL PROTECTED] Applied, thanks Shannon. NET_DMA on kernel buffer is pretty useful in ndb, iSCSI target and initiators which uses kernel buffer to receive data. Are there any other issues with dma-memcpy on kernel buffers, if not then following patch makes dma_memcpy_to_kernel_iovec() reachable from tcp_recvmsg. I tested this patch and it work fine with unh iSCSI target. comments? --pravin. Index: linux-2.6.23-rc1/net/ipv4/tcp.c === --- linux-2.6.23-rc1.orig/net/ipv4/tcp.c2007-07-23 02:11:00.0 +0530 +++ linux-2.6.23-rc1/net/ipv4/tcp.c 2007-07-30 17:43:51.0 +0530 @@ -1115,7 +1115,7 @@ int target; /* Read at least this many bytes */ long timeo; struct task_struct *user_recv = NULL; - int copied_early = 0; + int copied_early = 0, kernel_dma = 0; struct sk_buff *skb; lock_sock(sk); @@ -1154,6 +1154,9 @@ !sysctl_tcp_low_latency __get_cpu_var(softnet_data).net_dma) { preempt_enable_no_resched(); + if (segment_eq(get_fs(), KERNEL_DS)) + kernel_dma = 1; + else tp-ucopy.pinned_list = dma_pin_iovec_pages(msg-msg_iov, len); } else { @@ -1362,7 +1365,7 @@ if (!(flags MSG_TRUNC)) { #ifdef CONFIG_NET_DMA - if (!tp-ucopy.dma_chan tp-ucopy.pinned_list) + if (!tp-ucopy.dma_chan (kernel_dma || tp-ucopy.pinned_list)) tp-ucopy.dma_chan = get_softnet_dma(); if (tp-ucopy.dma_chan) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC]: napi_struct V4
If you have a means in the device (like tg3, bnx2, e1000, and a score of others do) to force the device to trigger a HW interrupt, that's what you do if you detect that events are pending after re-enabling interrupt in the -poll() handler. It is possible to trigger an interrupt for IPoIB I think but it will lead to a fair bit of code, since one would have to do something like a dummy send operation to generate an event that leads to the interrupt. I think this will be rather complex/ugly to implement because we have to make sure we have space in the send queue to post the send operation, etc. Frankly I don't think the lock is a big deal and you need something like it anyways typically. If I understand this correctly, you're suggesting a spin_lock_irqsave() around the netif_rx_complete() in the poll routine, and a corresponding lock in the interrupt handler. That seems like a pretty big step backwards for performance to me. Especially since in my experience, fast machines handling full-MTU traffic often end up being basically interrupt driven because they drain the RX ring too quickly to stay in NAPI polling. Yes, it's only one more lock but look at the tricky smp_mb() usage that tg3, bnx2, etc have to avoid using a spinlock... IPoIB can cope but it really seems like an unfortunate feature of these changes that we can't do something like what we have today, which imposes no overhead unless an event actually lands in the race window. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] ehea: Fix workqueue handling
Fix: Workqueue ehea_driver_wq was not destroyed Signed-off-by: Thomas Klein [EMAIL PROTECTED] --- drivers/net/ehea/ehea.h |2 +- drivers/net/ehea/ehea_main.c |1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index 8ee2c2c..d67f97b 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -39,7 +39,7 @@ #include asm/io.h #define DRV_NAME ehea -#define DRV_VERSIONEHEA_0072 +#define DRV_VERSIONEHEA_0073 /* eHEA capability flags */ #define DLPAR_PORT_ADD_REM 1 diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 58702f5..d43ab0f 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -3099,6 +3099,7 @@ out: static void __exit ehea_module_exit(void) { + destroy_workqueue(ehea_driver_wq); driver_remove_file(ehea_driver.driver, driver_attr_capabilities); ibmebus_unregister_driver(ehea_driver); ehea_destroy_busmap(); -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] ehea: Simplify resource usage check
Use shorter method to determine whether adapter has configured ports Signed-off-by: Thomas Klein [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c | 18 ++ 1 files changed, 6 insertions(+), 12 deletions(-) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index d43ab0f..36ca322 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -2165,24 +2165,18 @@ static int ehea_clean_all_portres(struct ehea_port *port) return ret; } -static void ehea_remove_adapter_mr (struct ehea_adapter *adapter) +static void ehea_remove_adapter_mr(struct ehea_adapter *adapter) { - int i; - - for (i=0; i EHEA_MAX_PORTS; i++) - if (adapter-port[i]) - return; + if (adapter-active_ports) + return; ehea_rem_mr(adapter-mr); } -static int ehea_add_adapter_mr (struct ehea_adapter *adapter) +static int ehea_add_adapter_mr(struct ehea_adapter *adapter) { - int i; - - for (i=0; i EHEA_MAX_PORTS; i++) - if (adapter-port[i]) - return 0; + if (adapter-active_ports) + return 0; return ehea_reg_kernel_mr(adapter, adapter-mr); } -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] ehea: Eliminated some compiler warnings
Fixed wrongly casted pointers Signed-off-by: Thomas Klein [EMAIL PROTECTED] --- drivers/net/ehea/ehea_main.c | 25 ++--- 1 files changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 36ca322..9756211 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -1326,7 +1326,6 @@ static void write_swqe2_TSO(struct sk_buff *skb, u8 *imm_data = swqe-u.immdata_desc.immediate_data[0]; int skb_data_size = skb-len - skb-data_len; int headersize; - u64 tmp_addr; /* Packet is TCP with TSO enabled */ swqe-tx_control |= EHEA_SWQE_TSO; @@ -1347,9 +1346,8 @@ static void write_swqe2_TSO(struct sk_buff *skb, /* set sg1entry data */ sg1entry-l_key = lkey; sg1entry-len = skb_data_size - headersize; - - tmp_addr = (u64)(skb-data + headersize); - sg1entry-vaddr = ehea_map_vaddr(tmp_addr); + sg1entry-vaddr = + ehea_map_vaddr(skb-data + headersize); swqe-descriptors++; } } else @@ -1362,7 +1360,6 @@ static void write_swqe2_nonTSO(struct sk_buff *skb, int skb_data_size = skb-len - skb-data_len; u8 *imm_data = swqe-u.immdata_desc.immediate_data[0]; struct ehea_vsgentry *sg1entry = swqe-u.immdata_desc.sg_entry; - u64 tmp_addr; /* Packet is any nonTSO type * @@ -1379,8 +1376,8 @@ static void write_swqe2_nonTSO(struct sk_buff *skb, /* copy sg1entry data */ sg1entry-l_key = lkey; sg1entry-len = skb_data_size - SWQE2_MAX_IMM; - tmp_addr = (u64)(skb-data + SWQE2_MAX_IMM); - sg1entry-vaddr = ehea_map_vaddr(tmp_addr); + sg1entry-vaddr = + ehea_map_vaddr(skb-data + SWQE2_MAX_IMM); swqe-descriptors++; } } else { @@ -1395,7 +1392,6 @@ static inline void write_swqe2_data(struct sk_buff *skb, struct net_device *dev, struct ehea_vsgentry *sg_list, *sg1entry, *sgentry; skb_frag_t *frag; int nfrags, sg1entry_contains_frag_data, i; - u64 tmp_addr; nfrags = skb_shinfo(skb)-nr_frags; sg1entry = swqe-u.immdata_desc.sg_entry; @@ -1417,9 +1413,9 @@ static inline void write_swqe2_data(struct sk_buff *skb, struct net_device *dev, /* copy sg1entry data */ sg1entry-l_key = lkey; sg1entry-len = frag-size; - tmp_addr = (u64)(page_address(frag-page) - + frag-page_offset); - sg1entry-vaddr = ehea_map_vaddr(tmp_addr); + sg1entry-vaddr = + ehea_map_vaddr(page_address(frag-page) + + frag-page_offset); swqe-descriptors++; sg1entry_contains_frag_data = 1; } @@ -1431,10 +1427,9 @@ static inline void write_swqe2_data(struct sk_buff *skb, struct net_device *dev, sgentry-l_key = lkey; sgentry-len = frag-size; - - tmp_addr = (u64)(page_address(frag-page) -+ frag-page_offset); - sgentry-vaddr = ehea_map_vaddr(tmp_addr); + sgentry-vaddr = + ehea_map_vaddr(page_address(frag-page) + + frag-page_offset); swqe-descriptors++; } } -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Generic Large Receive Offload for TCP traffic Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- include/linux/inet_lro.h | 173 ++ net/ipv4/inet_lro.c | 590 ++ 2 files changed, 763 insertions(+), 0 deletions(-) create mode 100644 include/linux/inet_lro.h create mode 100644 net/ipv4/inet_lro.c diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h new file mode 100644 index 000..0957234 --- /dev/null +++ b/include/linux/inet_lro.h @@ -0,0 +1,173 @@ +/* + * linux/include/linux/inet_lro.h + * + * Large Receive Offload (ipv4 / tcp) + * + * (C) Copyright IBM Corp. 2007 + * + * Authors: + * Jan-Bernd Themann [EMAIL PROTECTED] + * Christoph Raisch [EMAIL PROTECTED] + * + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2, or (at your option) + * any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + */ + +#ifndef __INET_LRO_H_ +#define __INET_LRO_H_ + +#include net/ip.h +#include net/tcp.h + +/* + * LRO statistics + */ + +struct net_lro_stats { + unsigned long aggregated; + unsigned long flushed; + unsigned long no_desc; +}; + +/* + * LRO descriptor for a tcp session + */ +struct net_lro_desc { + struct sk_buff *parent; + struct sk_buff *last_skb; + struct skb_frag_struct *next_frag; + struct iphdr *iph; + struct tcphdr *tcph; + struct vlan_group *vgrp; + __wsum data_csum; + u32 tcp_rcv_tsecr; + u32 tcp_rcv_tsval; + u32 tcp_ack; + u32 tcp_next_seq; + u32 skb_tot_frags_len; + u16 ip_tot_len; + u16 tcp_saw_tstamp; /* timestamps enabled */ + u16 tcp_window; + u16 vlan_tag; + int pkt_aggr_cnt; /* counts aggregated packets */ + int vlan_packet; + int active; +}; + +/* + * Large Receive Offload (LRO) Manager + * + * Fields must be set by driver + */ + +struct net_lro_mgr { + struct net_device *dev; + struct net_lro_stats stats; + + /* LRO features */ + unsigned long features; +#define LRO_F_NAPI1 /* Pass packets to stack via NAPI */ +#define LRO_F_EXTRACT_VLAN_ID 2 /* Set flag if VLAN IDs are extracted + from received packets and eth protocol + is still ETH_P_8021Q */ + + u32 ip_summed; /* Options to be set in generated SKB in page mode */ + int max_desc; /* Max number of LRO descriptors */ + int max_aggr; /* Max number of LRO packets to be aggregated */ + + struct net_lro_desc *lro_arr; /* Array of LRO descriptors */ + + /* +* Optimized driver functions +* +* get_skb_header: returns tcp and ip header for packet in SKB +*/ + int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr, + void **tcpudp_hdr, u64 *hdr_flags, void *priv); + + /* hdr_flags: */ +#define LRO_IPV4 1 /* ip_hdr is IPv4 header */ +#define LRO_TCP 2 /* tcpudp_hdr is TCP header */ + + /* +* get_frag_header: returns mac, tcp and ip header for packet in SKB +* +* @hdr_flags: Indicate what kind of LRO has to be done +* (IPv4/IPv6/TCP/UDP) +*/ + int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr, + void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags, + void *priv); +}; + +/* + * Processes a SKB + * + * @lro_mgr: LRO manager to use + * @skb: SKB to aggregate + * @priv: Private data that may be used by driver functions + *(for example get_tcp_ip_hdr) + */ + +void lro_receive_skb(struct net_lro_mgr *lro_mgr, +struct sk_buff *skb, +void *priv); + +/* + * Processes a SKB with VLAN HW acceleration support + */ + +void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr, + struct sk_buff *skb, + struct vlan_group *vgrp, + u16 vlan_tag, + void *priv); + +/* + * Processes a fragment list + * + * This functions aggregate fragments and generate SKBs do pass + * the packets to the stack. + * + * @lro_mgr: LRO manager to use + * @frags: Fragment to be processed. Must contain entire header in first + * element. + * @len: Length
[PATCH 2/4][RFC] lro: Kconfig and Makefile
Kconfig and Makefile for LRO Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- net/ipv4/Kconfig |8 net/ipv4/Makefile |1 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index fb79097..d894f61 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -394,6 +394,14 @@ config INET_XFRM_MODE_BEET If unsure, say Y. +config INET_LRO + tristate Large Receive Offload (ipv4/tcp) + + ---help--- + Support for Large Receive Offload (ipv4/tcp). + + If unsure, say Y. + config INET_DIAG tristate INET: socket monitoring interface default y diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index fbf1674..a02c36d 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_INET_ESP) += esp4.o obj-$(CONFIG_INET_IPCOMP) += ipcomp.o obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_tunnel.o obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o +obj-$(CONFIG_INET_LRO) += inet_lro.o obj-$(CONFIG_INET_TUNNEL) += tunnel4.o obj-$(CONFIG_INET_XFRM_MODE_TRANSPORT) += xfrm4_mode_transport.o obj-$(CONFIG_INET_XFRM_MODE_TUNNEL) += xfrm4_mode_tunnel.o -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Hi, this patch set contains the latest generic LRO code, a Kconfig / Makefile and an eHEA patch demonstrating how the aggregate SKB interface has to to be used. Drew, could you provide a patch for the myri10ge driver to show how the receive in page interface works? Please check the Kconfig / Makefile patch. Is that the right place for the LRO entries? There is still one open question for the receive in page mode: How many data (length) has to be copied to skb-data for packets that do not work for LRO (other protocols?). Currently I choose 64 as default. Is that ok? Thanks, Jan-Bernd [PATCH 1/4][RFC] lro: Generic Large Receive Offload for TCP traffic [PATCH 2/4][RFC] lro: Kconfig and Makefile [PATCH 3/4][RFC] ehea: LRO support [PATCH 4/4][RFC] ehea: Kconfig Changes to http://www.spinics.net/lists/netdev/msg36912.html 1) A new field called features has been added to the net_lro_mgr struct. It is set by the driver to indicate: - LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack - LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN packets but does not modify ETH protocol (ETH_P_8021Q) 2) Padded frames are not aggregated for now. Bug fixed 3) Correct header length now used. No minimal header length for aggregated packets used anymore. 4) Statistic counters were introduced. They are stored in a new struct in the net_lro_mgr. This has the advantage that no locking is required in cases where the driver uses multiple lro_mgrs for different receive queues. Thus we get the following statistics per lro_mgr / eth device: - Number of aggregated packets - Number of flushed packets - Number of times we run out of lro_desc. The ratio of aggregated packets and flushed packets give you an idea how well LRO is working. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4][RFC] ehea: LRO support
Added LRO support using the SKB aggregate interface Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea.h |9 - drivers/net/ehea/ehea_ethtool.c | 15 +++ drivers/net/ehea/ehea_main.c| 82 +++--- 3 files changed, 98 insertions(+), 8 deletions(-) diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h index d67f97b..70e33fe 100644 --- a/drivers/net/ehea/ehea.h +++ b/drivers/net/ehea/ehea.h @@ -33,13 +33,14 @@ #include linux/ethtool.h #include linux/vmalloc.h #include linux/if_vlan.h +#include linux/inet_lro.h #include asm/ibmebus.h #include asm/abs_addr.h #include asm/io.h #define DRV_NAME ehea -#define DRV_VERSIONEHEA_0073 +#define DRV_VERSIONEHEA_0074 /* eHEA capability flags */ #define DLPAR_PORT_ADD_REM 1 @@ -58,6 +59,7 @@ #define EHEA_SMALL_QUEUES #define EHEA_NUM_TX_QP 1 +#define EHEA_LRO_MAX_AGGR 64 #ifdef EHEA_SMALL_QUEUES #define EHEA_MAX_CQE_COUNT 1023 @@ -84,6 +86,8 @@ #define EHEA_RQ2_PKT_SIZE 1522 #define EHEA_L_PKT_SIZE 256/* low latency */ +#define MAX_LRO_DESCRIPTORS 8 + /* Send completion signaling */ /* Protection Domain Identifier */ @@ -376,6 +380,8 @@ struct ehea_port_res { u64 tx_packets; u64 rx_packets; u32 poll_counter; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS]; }; @@ -427,6 +433,7 @@ struct ehea_port { u32 msg_enable; u32 sig_comp_iv; u32 state; + u32 lro_max_aggr; u8 full_duplex; u8 autoneg; u8 num_def_qps; diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c index decec8c..29ef7a9 100644 --- a/drivers/net/ehea/ehea_ethtool.c +++ b/drivers/net/ehea/ehea_ethtool.c @@ -183,6 +183,9 @@ static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = { {PR5 free_swqes}, {PR6 free_swqes}, {PR7 free_swqes}, + {LRO aggregated}, + {LRO flushed}, + {LRO no_desc}, }; static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data) @@ -239,6 +242,18 @@ static void ehea_get_ethtool_stats(struct net_device *dev, for (k = 0; k 8; k++) data[i++] = atomic_read(port-port_res[k].swqe_avail); + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.aggregated; + data[i++] = tmp; + + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.flushed; + data[i++] = tmp; + + for (k = 0, tmp = 0; k EHEA_MAX_PORT_RES; k++) + tmp |= port-port_res[k].lro_mgr.stats.no_desc; + data[i++] = tmp; + } const struct ethtool_ops ehea_ethtool_ops = { diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c index 9756211..41bc075 100644 --- a/drivers/net/ehea/ehea_main.c +++ b/drivers/net/ehea/ehea_main.c @@ -52,6 +52,8 @@ static int rq2_entries = EHEA_DEF_ENTRIES_RQ2; static int rq3_entries = EHEA_DEF_ENTRIES_RQ3; static int sq_entries = EHEA_DEF_ENTRIES_SQ; static int use_mcs = 0; +static int use_lro = 0; +static int lro_max_aggr = EHEA_LRO_MAX_AGGR; static int num_tx_qps = EHEA_NUM_TX_QP; module_param(msg_level, int, 0); @@ -60,6 +62,8 @@ module_param(rq2_entries, int, 0); module_param(rq3_entries, int, 0); module_param(sq_entries, int, 0); module_param(use_mcs, int, 0); +module_param(use_lro, int, 0); +module_param(lro_max_aggr, int, 0); module_param(num_tx_qps, int, 0); MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS); @@ -77,6 +81,10 @@ MODULE_PARM_DESC(sq_entries, Number of entries for the Send Queue [2^x - 1], x = [6..14]. Default = __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) )); MODULE_PARM_DESC(use_mcs, 0:NAPI, 1:Multiple receive queues, Default = 1 ); +MODULE_PARM_DESC(lro_max_aggr, LRO: Max packets to be aggregated. Default = +__MODULE_STRING(EHEA_LRO_MAX_AGGR)); +MODULE_PARM_DESC(use_lro, Large Receive Offload, 1: enable, 0: disable, + Default = 0); static int port_name_cnt = 0; static LIST_HEAD(adapter_list); @@ -389,6 +397,60 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, int rq, return 0; } +static int get_skb_hdr(struct sk_buff *skb, void **iphdr, + void **tcph, u64 *hdr_flags, void *priv) +{ + struct ehea_cqe *cqe = priv; + unsigned int ip_len; + struct iphdr *iph; + +/* non tcp/udp packets */ + if (!cqe-header_length) + return -1; + +/* non tcp packet */ + skb_reset_network_header(skb); + iph = ip_hdr(skb); + if (iph-protocol != IPPROTO_TCP) + return -1; + + ip_len = ip_hdrlen(skb); + skb_set_transport_header(skb, ip_len); + *tcph = tcp_hdr(skb); + +/* check if ip header and tcp header are complete */ +
[PATCH 4/4][RFC] ehea: Kconfig
Kconfig changes for LRO Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f8a602c..fec4004 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2399,6 +2399,7 @@ config CHELSIO_T3 config EHEA tristate eHEA Ethernet support depends on IBMEBUS + select INET_LRO ---help--- This driver supports the IBM pSeries eHEA ethernet adapter. -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: source interface ping bug ?
nano bug wrote: Can someone have a look a this and tell if it's kernel related or if I posted this in the wrong place ? Thanks. Last I checked, ping did not do an SO_BINDTODEVICE even if you did -i ethX. I think it just looked up the IP for that port and treated it as -i a.b.c.d. That said, I'm not sure why the behaviour changes for you between kernel releases. Maybe an 'strace' of your ping command on the different kernels would help figure out what the problem is? Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [NET]: fix multicast list when cloning sockets
The sock_copy() function uses memcpy() to clone the socket including the struct ip_mc_socklist *mc_list pointer. The ip_mc_drop_socket() function is called when socket is closed to free these objects leaving the other sockets cloned from the same master socket with invalid pointers. This patch sets mc_list of cloned socket to NULL. Signed-off-by: Flavio Leitner [EMAIL PROTECTED] diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index fbe7714..8ee0f54 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -506,6 +506,8 @@ struct sock *inet_csk_clone(struct sock *sk, const struct request_sock *req, newicsk-icsk_backoff = 0; newicsk-icsk_probes_out = 0; + inet_sk(inet)-mc_list = NULL; + /* Deinitialize accept_queue to trap illegal accesses. */ memset(newicsk-icsk_accept_queue, 0, sizeof(newicsk-icsk_accept_queue)); -- 1.5.2.4 -- Flavio - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Seems pretty good to me, save for one minor detail: patches #1/#2 should be combined together for greater git-bisect happiness. Ditto for patches #3/#4. Largely harmless in this case, but keeps the git history pollution to a minimum. Caveat reviewer: I'm not an expert of net/ipv4/* code, so I reviewed largely from the driver API perspective. David, thoughts on merging? I'm not We could stick this into your tree or mine. Whether yours or mine, I would like to keep the driver and net-core patches together in the same git tree. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: source interface ping bug ?
nano bug wrote: [...] using source interface : [EMAIL PROTECTED]:~/iputils# ./ping -I eth2 87.248.113.14 PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 eth2: 56(84) bytes of data. From 86.106.19.75 icmp_seq=1 Destination Host Unreachable [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host 86.106.19.75 tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes 01:19:24.292911 arp who-has 87.248.113.14 tell 86.106.19.75 Are you using (or running) NAT locally? What do your routing tables look like? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile
On Mon, 30 Jul 2007 17:24:45 +0200 Jan-Bernd Themann [EMAIL PROTECTED] wrote: Kconfig and Makefile for LRO Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- net/ipv4/Kconfig |8 net/ipv4/Makefile |1 + 2 files changed, 9 insertions(+), 0 deletions(-) diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index fb79097..d894f61 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -394,6 +394,14 @@ config INET_XFRM_MODE_BEET If unsure, say Y. +config INET_LRO + tristate Large Receive Offload (ipv4/tcp) + + ---help--- + Support for Large Receive Offload (ipv4/tcp). + + If unsure, say Y. + Why make this a user selectable option at all? Unless you want to deal with out of tree drivers (not my problem), it should be hidden to avoid having to explain an support it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/3] 2.6.23-rc1: known regressions v3
Hi all, Here is a list of some known regressions in 2.6.23-rc1. Feel free to add new regressions/remove fixed etc. http://kernelnewbies.org/known_regressions List of Aces NameRegressions fixed since 21-Jun-2007 Adrian Bunk6 Andi Kleen 4 Andrew Morton 4 Linus Torvalds 4 Al Viro3 Jens Axboe 3 Tejun Heo 3 David Woodhouse2 Hugh Dickins 2 CPUFREQ/ACPI Subject : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not References : http://lkml.org/lkml/2007/7/27/298 http://lkml.org/lkml/2007/7/29/371 Last known good : ? Submitter : dth [EMAIL PROTECTED] Caused-By : ? Handled-By : ? Status : unknown Networking Subject : sky2 boot crash in sky2_mac_intr References : http://lkml.org/lkml/2007/7/24/91 Last known good : ? Submitter : Florian Lohoff [EMAIL PROTECTED] Caused-By : Handled-By : Stephen Hemminger [EMAIL PROTECTED] Status : unknown Subject : New wake ups from sky2 References : http://lkml.org/lkml/2007/7/20/386 Last known good : ? Submitter : Thomas Meyer [EMAIL PROTECTED] Caused-By : Stephen Hemminger [EMAIL PROTECTED] commit eb35cf60e462491249166182e3e755d3d5d91a28 Handled-By : Stephen Hemminger [EMAIL PROTECTED] Status : unknown Power Management Subject : New ACPI error/warning with Linus' latest GIT References : http://lkml.org/lkml/2007/7/26/395 Last known good : ? Submitter : Ismail Dönmez [EMAIL PROTECTED] Caused-By : ? Handled-By : ? Status : unknown SCSI Subject : lpfc_sli.c: off-by-10 References : http://lkml.org/lkml/2007/7/22/284 Last known good : ? Submitter : Adrian Bunk [EMAIL PROTECTED] Caused-By : ? Handled-By : ? Status : unknown Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile
Stephen Hemminger wrote: Why make this a user selectable option at all? Unless you want to deal with out of tree drivers (not my problem), it should be hidden to avoid having to explain an support it. In this case it's an optional library kernel module. That seems to be a common setup for library modules. We do the same with CONFIG_MII and drivers/net/mii.ko as well. Originally it was done purely in the Makefile, but that does not account for net drivers in sub-directories (or out of tree as you point out). Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4][RFC] ehea: LRO support
Jan-Bernd Themann wrote: Added LRO support using the SKB aggregate interface Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED] --- drivers/net/ehea/ehea.h |9 - drivers/net/ehea/ehea_ethtool.c | 15 +++ drivers/net/ehea/ehea_main.c| 82 +++--- 3 files changed, 98 insertions(+), 8 deletions(-) snip +module_param(use_lro, int, 0); +module_param(lro_max_aggr, int, 0); this should obviously (and Stephen H. probably agrees) be implemented in ethtool instead, and opens up the question as to what changes we need to add to ethtool. .. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote: Changes to http://www.spinics.net/lists/netdev/msg36912.html 1) A new field called features has been added to the net_lro_mgr struct. It is set by the driver to indicate: - LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack - LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN packets but does not modify ETH protocol (ETH_P_8021Q) 2) Padded frames are not aggregated for now. Bug fixed 3) Correct header length now used. No minimal header length for aggregated packets used anymore. 4) Statistic counters were introduced. They are stored in a new struct in the net_lro_mgr. This has the advantage that no locking is required in cases where the driver uses multiple lro_mgrs for different receive queues. Thus we get the following statistics per lro_mgr / eth device: - Number of aggregated packets - Number of flushed packets - Number of times we run out of lro_desc. The ratio of aggregated packets and flushed packets give you an idea how well LRO is working. I'd like to see an edited form of this, together with an introduction to LRO, written up in the Documentation subdirectory. As someone with some driver experience, but not on te bleeding edge, some basc newbie questions pop into mind: -- what is LRO? -- Basic principles of operation? -- Can I use it in my driver? -- Does my hardware have to have some special feature before I can use it? -- What sort of performance improvement does it provide? Throughput? Latency? CPU usage? How does it affect DMA allocation? Does it improve only a certain type of traffic (large/small packets, etc.) -- Example code? What's the API? How should my driver use it? Right now, I can maybe find answers by doing lots of googling. I'd like to have some quick way of getting a grip on this. --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile
Jeff Garzik wrote: Stephen Hemminger wrote: Why make this a user selectable option at all? Unless you want to deal with out of tree drivers (not my problem), it should be hidden to avoid having to explain an support it. In this case it's an optional library kernel module. That seems to be a common setup for library modules. We do the same with CONFIG_MII and drivers/net/mii.ko as well. Originally it was done purely in the Makefile, but that does not account for net drivers in sub-directories (or out of tree as you point out). speaking of that, shouldn't there be a NETIF_F_LRO ? Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fib_trie: whitespace cleanup
On Thu, Jul 26, 2007 at 09:56:30PM -0700, Andrew Morton wrote: On Thu, 26 Jul 2007 08:44:21 -0700 Paul E. McKenney [EMAIL PROTECTED] wrote: On Thu, Jul 26, 2007 at 11:49:42AM +0100, Stephen Hemminger wrote: Whitespace cleanup run code through lindent then cleanup results. Applys after other two patches. --- a/net/ipv4/fib_trie.c 2007-07-26 10:17:21.0 +0100 +++ b/net/ipv4/fib_trie.c 2007-07-26 11:47:52.0 +0100 @@ -156,7 +156,8 @@ struct trie { }; static void put_child(struct trie *t, struct tnode *tn, int i, struct node *n); -static void tnode_put_child_reorg(struct tnode *tn, int i, struct node *n, int wasfull); +static void tnode_put_child_reorg(struct tnode *tn, int i, struct node *n, + int wasfull); static struct node *resize(struct trie *t, struct tnode *tn); static struct tnode *inflate(struct trie *t, struct tnode *tn); static struct tnode *halve(struct trie *t, struct tnode *tn); @@ -167,13 +168,12 @@ static struct trie *trie_local = NULL, * static inline struct tnode *node_parent(struct node *node) { - return rcu_dereference((struct tnode *) (node-parent ~NODE_TYPE_MASK)); + return rcu_dereference((struct tnode *)(node-parent ~NODE_TYPE_MASK)); The potential issue is applying rcu_dereference() to an rvalue as opposed to an lvalue. So how about the following? I did this: static inline struct tnode *node_parent(struct node *node) { struct tnode *ret; ret = (struct tnode *)(node-parent ~NODE_TYPE_MASK); return rcu_dereference(ret); } I would feel more comfortable with the rcu_dereference() covering the initial fetch from node-parent, but do not have any hard objections to your approach. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6 0/2] [TCP]: Fix bidirectional brokeness
Hi Dave, While testing frto with bidirection TCP a while (months) ago, I encountered time-seq graphs which made absolutely no sense as if recoveries only completed after RTO. As a result, I noticed that rate-halving has problem when a flow is bidirection but testing the patch has been on my todo list for ages. ...Finally, here it is... ...Please think the first one a bit because there might be some corner cases with reno. While testing it I came across with an additional issue that can occur with bidirectional traffic. And, even better, the second fix seems to also solves a third issue which affects both unidirectional and bidirectional flows, though it's a marginal case (cumulative ACK that causes a larger number of new SACKed skbs). I've never seen that third one to occur but it's there if you get enough ACK losses, subsequent ACKs (if one gets them) do solve it so that's not a very bad issue to begin with... I've verified these from time-seq graphs on top of tcp-2.6, which had some additional (mostly cleanup and the rebase I promised you earlier) though I added the tp-fackets_out tp-reordering check afterwards as it seems necessary to avoid going to lost marker too often (wouldn't have had any effect in my test case anyway). ...Please consider to net-2.6 and to stable too. These will generate you some hassle when you rebase tcp-2.6. Btw, you forgot to push tcp-2.6 out last time though I could assume it's state... :-) In case you're going to now push it out, could you please drop [TCP]: Remove num_acked0 checks from cong.ctrl mods pkts_acked from it as it seems to be out of place in tcp-2.6, I can resubmit it to net-2.6.24 when you open it (unless you want to put it to net-2.6 directly as it's rather trivial one). -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows
Actually, the ratehalving seems to work too well, as cwnd is reduced on every second ACK even though the packets in flight remains unchanged. Recoveries in a bidirectional flows suffer quite badly because of this, both NewReno and SACK are affected. After this patch, rate halving is performed for ACK only if packets in flight was supposedly changed too. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 23 +-- 1 files changed, 13 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 0163051..767f92c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1848,19 +1848,22 @@ static inline u32 tcp_cwnd_min(const struct sock *sk) } /* Decrease cwnd each second ack. */ -static void tcp_cwnd_down(struct sock *sk) +static void tcp_cwnd_down(struct sock *sk, int flag) { struct tcp_sock *tp = tcp_sk(sk); int decr = tp-snd_cwnd_cnt + 1; + + if ((flagFLAG_FORWARD_PROGRESS) || + (IsReno(tp) !(flagFLAG_NOT_DUP))) { + tp-snd_cwnd_cnt = decr1; + decr = 1; - tp-snd_cwnd_cnt = decr1; - decr = 1; + if (decr tp-snd_cwnd tcp_cwnd_min(sk)) + tp-snd_cwnd -= decr; - if (decr tp-snd_cwnd tcp_cwnd_min(sk)) - tp-snd_cwnd -= decr; - - tp-snd_cwnd = min(tp-snd_cwnd, tcp_packets_in_flight(tp)+1); - tp-snd_cwnd_stamp = tcp_time_stamp; + tp-snd_cwnd = min(tp-snd_cwnd, tcp_packets_in_flight(tp)+1); + tp-snd_cwnd_stamp = tcp_time_stamp; + } } /* Nothing was retransmitted or returned timestamp is less @@ -2057,7 +2060,7 @@ static void tcp_try_to_open(struct sock *sk, int flag) } tcp_moderate_cwnd(tp); } else { - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); } } @@ -2257,7 +2260,7 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una, if (is_dupack || tcp_head_timedout(sk)) tcp_update_scoreboard(sk); - tcp_cwnd_down(sk); + tcp_cwnd_down(sk, flag); tcp_xmit_retransmit_queue(sk); } -- 1.5.0.6
[PATCH net-2.6 2/2] [TCP]: Bidir flow must not disregard SACK blocks for lost marking
It's possible that new SACK blocks that should trigger new LOST markings arrive with new data (which previously made is_dupack false). In addition, I think this fixes a case where we get a cumulative ACK with enough SACK blocks to trigger the fast recovery (is_dupack would be false there too). I'm not completely pleased with this solution because readability of the code is somewhat questionable as 'is_dupack' in SACK case is no longer about dupacks only but would mean something like 'lost_marker_work_todo' too... But because of Eifel stuff done in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to the if statement which seems attractive solution. Nevertheless, I didn't like adding another variable just for that either... :-) Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 767f92c..cfe6ac7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2109,7 +2109,10 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una, { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - int is_dupack = (tp-snd_una == prior_snd_una !(flagFLAG_NOT_DUP)); + int is_dupack = (tp-snd_una == prior_snd_una +(!(flagFLAG_NOT_DUP) || + ((flagFLAG_DATA_SACKED) + (tp-fackets_out tp-reordering; /* Some technical things: * 1. Reno does not count dupacks (sacked_out) automatically. */ -- 1.5.0.6
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Here is a quick reply before something more official can be written up: Linas Vepstas wrote: -- what is LRO? Large Receive Offload -- Basic principles of operation? LRO is analogous to a receive side version of TSO. The NIC (or driver) merges several consecutive segments from the same connection, fixing up checksums, etc. Rather than up to 45 separate 1500 byte frames (meaning up to 45 trips through the network stack), the driver merges them into one 65212 byte frame. It currently works only with TCP over IPv4. LRO was, AFAIK, first though of by Neterion. They had a paper about it at OLS2005. http://www.linuxinsight.com/files/ols2005/grossman-reprint.pdf -- Can I use it in my driver? Yes, it can be used in any driver. -- Does my hardware have to have some special feature before I can use it? No. -- What sort of performance improvement does it provide? Throughput? Latency? CPU usage? How does it affect DMA allocation? Does it improve only a certain type of traffic (large/small packets, etc.) The benefit is directly proportional to the packet rate. See my reply to the previous RFC for performance information. The executive summary is that for the myri10ge 10GbE driver on low end hardware with 1500b frames, I've seen it increase throughput by a factor of nearly 2.5x, while at the same time reducing CPU utilization by 17%. The affect for jumbo frames is less dramatic, but still impressive (1.10x, 14% CPU reduction) You can achieve better speedups if your driver receives into high-order pages. -- Example code? What's the API? How should my driver use it? The 3/4 in this patch showed an example of converting a driver to use LRO for skb based receive buffers. I'm working on a patch for myri10ge that shows how you would use it in a driver which receives into pages. Cheers, Drew - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec
On Mon, Jul 30, 2007 at 08:09:53PM +0530, pravin wrote: comments? please avoid using plain sendmsg/recvmsg from kernelspace. Adding these segment checks is more than a bad hack and not something that should clutter up fasthpathes. Add to that problems with kthreads vs blocking recvmsg and you have a bigger mess than you can handle. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
Hi Peter, What's the problem with Dustin's driver? It seems to work fine here with a lan9117. Why not just add lan921x support to the existing driver? I have heard Dustin's driver works very well on PXA, but on others it doesn't even compile (hence why it depends on ARCH_PXA). Dustin's driver was based on the smc91x code, but these two ethernet devices share nothing other than a similar name. smsc911x is a new, completely platform-independent driver with workarounds for several hardware issues, and it doesn't suffer from quite as much macro abuse :-) Regards, -- Steve Glendinning SMSC GmbH m: +44 777 933 9124 e: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linksys Gigabit USB2.0 adapter (asix) regression
On Mon, 2007-07-30 at 11:27 +0200, Erik Slagter wrote: They are either garbled are they are not passed on the wire. The transmitted packets are shown by tshark, but a tshark run on the other end of the line does not show them. Platform is indeed x86, to be precise: fedora 7, kernel 2.6.22-rc6, cpu pentium M, dell laptop inspiron 9300, ICH6. If you want me to test something please yell, it's no trouble at all to change a few lines in the driver's source and recompile the module. Could you send me a complete dmesg dump when the driver is compiled with DEBUG enabled (at least from then usb logs that the device was inserted to the end). I'll need to see what it reports the values of the registers. Have you tried using the F7 2.6.22 kernel? I know that has worked fine for me on my system. Please note I cannot send mail to you: (conversation with dhollis.dyndns.org[71.251.104.159] timed out while sending MAIL FROM) I've fixed that issue so my mail delivery isn't sporadic. -- David Hollis [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[TULIP] Need new maintainer
The Tulip network driver needs a new maintainer! I no longer have time to maintain the Tulip network driver and I'm stepping down. Jeff Garzik would be happy to get volunteers. The only current major outstanding patch I know of is Grant's shutdown race patch, which was incorrectly dropped as obsoleted from -mm (my fault, I was moving at the time): http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html I have a very much non-working patch to do it with the preferred order, ask me for it and I'll see if I can dig it up. It's unpleasant partly because it pointed out a lot of latent bugs (e.g., del_timer_sync() in interrupt context). Also, someone is working on support for an emulated Tulip card (yes, Tulip will _never_ die), so expect possible patches for that. -VAL - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tulip: Remove tulip maintainer
Remove Val Henson as tulip maintainer and let her roam free, FREE! Signed-off-by: Val Henson [EMAIL PROTECTED] --- linux-2.6.orig/MAINTAINERS +++ linux-2.6/MAINTAINERS @@ -3569,11 +3569,9 @@ W: http://www.auk.cx/tms380tr/ S: Maintained TULIP NETWORK DRIVER -P: Valerie Henson -M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] W: http://sourceforge.net/projects/tulip/ -S: Maintained +S: Orphan TUN/TAP driver P: Maxim Krasnyansky - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] possible deadlock in tulip driver
(No longer maintainer, btw.) What situation have you tested this under? Thanks, -VAL On Tue, Jul 24, 2007 at 11:49:08AM +0400, Denis V. Lunev wrote: Calling flush_scheduled_work() may deadlock if called under rtnl_lock (from dev-stop) as linkwatch_event() may be on the workqueue and it will try to get the rtnl_lock Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- tulip_core.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- ./drivers/net/tulip/tulip_core.c.tulip2007-07-16 12:54:29.0 +0400 +++ ./drivers/net/tulip/tulip_core.c 2007-07-23 19:06:24.0 +0400 @@ -726,8 +726,6 @@ static void tulip_down (struct net_devic void __iomem *ioaddr = tp-base_addr; unsigned long flags; - flush_scheduled_work(); - del_timer_sync (tp-timer); #ifdef CONFIG_TULIP_NAPI del_timer_sync (tp-oom_timer); @@ -1788,6 +1786,8 @@ static void __devexit tulip_remove_one ( if (!dev) return; + flush_scheduled_work(); + tp = netdev_priv(dev); unregister_netdev(dev); pci_free_consistent (pdev, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TULIP] Need new maintainer
On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote: The Tulip network driver needs a new maintainer! I no longer have time to maintain the Tulip network driver and I'm stepping down. Jeff Garzik would be happy to get volunteers. Since I already take care of a major consumer of these devices (parisc, which pretty much all have tulip) I'm willing to take care of this. Alternately, Grant is probably willing. Cheers, Kyle - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in write_bulk_callback() in drivers/net/usb/pegasus.c
Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference pegasus before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] --- --- a/drivers/net/usb/pegasus.c +++ b/drivers/net/usb/pegasus.c @@ -768,11 +768,13 @@ static void write_bulk_callback(struct urb *urb) { pegasus_t *pegasus = urb-context; - struct net_device *net = pegasus-net; + struct net_device *net; if (!pegasus) return; + net = pegasus-net; + applied (manually) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gfar: Fix modpost warning
Kumar Gala wrote: Fix the following modpost warning: WARNING: vmlinux.o(.init.text+0x1aa6c): Section mismatch: reference to .exit.text:gfar_mdio_exit (between 'gfar_init' and 'gfar_mdio_init') Signed-off-by: Kumar Gala [EMAIL PROTECTED] --- drivers/net/gianfar_mii.c |2 +- drivers/net/gianfar_mii.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20-2.6.21 - networking dies after random time
Jarek Poplawski wrote: Hi, Very below is my patch proposal with a comment, which in my opinion is precious enough to save it for future help in reading and understanding the code. I hope Alan will not blame me I've not asked for his permission before sending, and he would ack this patch as it is or at least most of this. Thanks regards, Jarek P. On Wed, Jul 25, 2007 at 03:46:56PM +0100, Alan Cox wrote: The code in question lib8390.c does disable_irq(); fiddle_with_the_network_card_hardware() enable_irq(); ... No idea how this affects the network card, as the code there must be able to handle interrupts, which are not originated from the card due to interrupt sharing. I think, in this last yesterday's patch Ingo could be right, yet! The comment at the beginnig points this is done like that because of chip's slowness. And problems with timing are mysterious. On the other hand author of this code didn't use spin_lock_irqsave for some reason, probably after testing this option too. So, I hope this is the right path, but alas, I'm not sure this patch has to prove this 100%. The author (me) didn't use spin_lock_irqsave because the slowness of the card means that approach caused horrible problems like losing serial data at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA chips with FPGA front ends. Anyway, in my opinion this situation where interrupts could/have_to be used for such strange things should confirm the need of more options for handling irqs individually. Ok the logic behind the 8390 is very simple: Things to know - IRQ delivery is asynchronous to the PCI bus - Blocking the local CPU IRQ via spin locks was too slow - The chip has register windows needing locking work So the path was once (I say once as people appear to have changed it in the mean time and it now looks rather bogus if the changes to use disable_irq_nosync_irqsave are disabling the local IRQ) Take the page lock Mask the IRQ on chip Disable the IRQ (but not mask locally- someone seems to have broken this with the lock validator stuff) [This must be _nosync as the page lock may otherwise deadlock us] Drop the page lock and turn IRQs back on At this point an existing IRQ may still be running but we can't get a new one Take the lock (so we know the IRQ has terminated) but don't mask the IRQs on the processor Set irqlock [for debug] Transmit (slow as ) re-enable the IRQ We have to use disable_irq because otherwise you will get delayed interrupts on the APIC bus deadlocking the transmit path. Quite hairy but the chip simply wasn't designed for SMP and you can't even ACK an interrupt without risking corrupting other parallel activities on the chip. Alan -- From: Jarek Poplawski [EMAIL PROTECTED] Subject: lib8390: comment on locking by Alan Cox Additional explanation of problems with locking by Alan Cox. Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] Cc: Alan Cox [EMAIL PROTECTED] Cc: Paul Gortmaker [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tulip: Remove tulip maintainer
Valerie Henson wrote: Remove Val Henson as tulip maintainer and let her roam free, FREE! Signed-off-by: Val Henson [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] forcedeth: mac address correct
Ayaz Abdulla wrote: Resending: In older chipsets, the mac address was stored in reversed order. However, in newer chipsets, the mac address is in correct order. This patch takes those newer chipsets into account and does not rely on a special bit setup by BIOS'. Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND 1/2] netxen: re-init station address after h/w init
[EMAIL PROTECTED] wrote: This is a workaround for firmware bug with 2nd port of multiport adapter, where MAC address is reset. Driver just needs to overwrite it with the value read from PROM. Signed-off-by: Dhananjay Phadke [EMAIL PROTECTED] applied 1-2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netxen: drop redudant spinlock
Dhananjay Phadke wrote: Some leftover code that makes use of adapter-lock in tx_timeout function, which resets the interface under this lock. In close() when the workqueue is flushed, prints the warning about sleeping with interrupts disabled (when spinlock debug is enabled). The lock was required with private netxen IOCTLs, which were removed a while ago. applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22 1/3]S2io: Mask spurious interrupts
Ramkrishna Vepa wrote: - Mask single and double bit ETQ ecc errors to inhibit spurious interrupts. (Resending; Removed HTML sections in the patch) Signed-off-by: Santosh Rastapur [EMAIL PROTECTED] applied 1-3 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in mace_interrupt() in drivers/net/pcmcia/nmclan_cs.c
Micah Gruber wrote: This patch fixes a potential null dereference bug where we dereference DEV before a null check. This patch simply moves the dereferencing after the null check. Signed-off-by: Micah Gruber [EMAIL PROTECTED] applied (git-am worked!) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22 1/4]S2IO: Removing 3 buffer mode support from the driver
Veena Parat wrote: - Removed 3 buffer mode support from driver - unused feature - Incorporated Jeff Garzik's comments on elimination of inline typecasting - Code cleanup : Removed a few extra spaces Signed-off-by: Veena Parat [EMAIL PROTECTED] applied 1-4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Could someone add this to: http://linux-net.osdl.org Wiki? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [drivers/net/cxgb3] removed several unneeded zero initilization
Denis Cheng wrote: Cc: [EMAIL PROTECTED] Signed-off-by: Denis Cheng [EMAIL PROTECTED] --- drivers/net/cxgb3/cxgb3_main.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET] IOC3: Switch hw checksumming to ethtool configurable.
Ralf Baechle wrote: Signed-off-by: Ralf Baechle [EMAIL PROTECTED] applied to #upstream (2.6.24) I've previously sent out this patch a long time ago. At that time I was told NETIF_F_IP_CSUM wouldn't make any sense without NETIF_F_SG. IOC3's S/G abilities are very limited; it can do upto three segments of which the first one is upto 104 bytes and part of the packet's TX ring entry, the second and 3rd ones can be anywhere in the 64-bit PCI address space but may not cross a 16kB page boundary. So setting NETIF_F_SG isn't really an option unless the IOC3 was going to linearize any packet it can't cope with itself. So the big question, does NETIF_F_IP_CSUM without NETIF_F_SG make sense? Conventional wisdom has always been that NETIF_F_SG is required if NETIF_F_*CSUM is present, and vice versa. I admit I've not verified this in the past year or two. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.23 1/2] Make the iw_cxgb3 module parameters writable.
ugh, missed these before my last merge... anyway: why do we want to parameters writable? a good changelog tells me what, why and how, and this changelog just covered the what. Also, I assume you've checked that it's OK for these variables to change at any time? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure
Moni Shoua [EMAIL PROTECTED] wrote: Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit in dev-state field is on. This improves the chances for the arp packet to be transmitted. Under what circumstances were you seeing problems that delaying the gratuitous ARP until linkwatch is done improves things? Is this really an IB thing, or did you experience problems here over regular ethernet? Signed-off-by: Moni Shoua [EMAIL PROTECTED] --- drivers/net/bonding/bond_main.c | 25 + drivers/net/bonding/bonding.h |1 + 2 files changed, 22 insertions(+), 4 deletions(-) Index: net-2.6/drivers/net/bonding/bond_main.c === --- net-2.6.orig/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.0 +0300 +++ net-2.6/drivers/net/bonding/bond_main.c2007-07-26 18:42:59.296296622 +0300 @@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon if (new_active !bond-do_set_mac_addr) memcpy(bond-dev-dev_addr, new_active-dev-dev_addr, new_active-dev-addr_len); - - bond_send_gratuitous_arp(bond); + if (bond-curr_active_slave + test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)){ + dprintk(delaying gratuitous arp on %s\n,bond-curr_active_slave-dev-name); + bond-send_grat_arp=1; + }else{ + bond_send_gratuitous_arp(bond); + } Style issues throughout the patch series: many lines are too long, many things are all smashed together, e.g., }else{ instead of } else {, send_grat_arp=1 instead of send_grat_arp = 1, and so on. } } @@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device * program could monitor the link itself if needed. */ + if (bond-send_grat_arp) { + if (bond-curr_active_slave test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state)) + dprintk(Needs to send gratuitous arp but not yet\n,__FUNCTION__); + else { + dprintk(sending delayed gratuitous arp on ond-curr_active_slave-dev-name\n); + bond_send_gratuitous_arp(bond); + bond-send_grat_arp=0; + } + } read_lock(bond-curr_slave_lock); oldcurrent = bond-curr_active_slave; read_unlock(bond-curr_slave_lock); @@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str struct slave *slave = bond-curr_active_slave; struct vlan_entry *vlan; struct net_device *vlan_dev; + int i; dprintk(bond_send_grat_arp: bond %s slave %s\n, bond-dev-name, slave ? slave-dev-name : NULL); @@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str return; if (bond-master_ip) { - bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, -bond-master_ip, 0); + for (i=0;i3;i++) + bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip, +bond-master_ip, 0); } If you delay the grat ARP until linkwatch is done, why is it also necessary to shotgun several ARPs instead of one? Why are the ARPs sent for VLANs not also shotgunned in a similar fashion? If shotgunning like this really is useful, would it not make more sense to space them out a bit, e.g., over successive monitor passes? list_for_each_entry(vlan, bond-vlan_list, vlan_list) { @@ -4331,6 +4347,7 @@ static int bond_init(struct net_device * bond-current_arp_slave = NULL; bond-primary_slave = NULL; bond-dev = bond_dev; + bond-send_grat_arp=0; INIT_LIST_HEAD(bond-vlan_list); /* Initialize the device entry points */ Index: net-2.6/drivers/net/bonding/bonding.h === --- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.0 +0300 +++ net-2.6/drivers/net/bonding/bonding.h 2007-07-26 18:42:43.652087660 +0300 @@ -203,6 +203,7 @@ struct bonding { struct vlan_group *vlgrp; struct packet_type arp_mon_pt; s8 do_set_mac_addr; + int send_grat_arp; This need not be a full int, and (this applies to do_set_mac_addr, also) could probably be squeezed into gaps already existing within the struct bonding somewhere. -J --- -Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed interface for per-packet mesh-ttl
Hi Stephen, On 7/27/07, Stephen Hemminger [EMAIL PROTECTED] wrote: In this case perhaps you can have a table that maps skb-priority to mesh ttl? priorty can already by handled by existing setsockopt calls, and modified by netfilter and QoS managements. Thanks for the feedback. IMHO overloading the [SOL_SOCKET] SO_PRIORITY sockoption for something as different as changing the mesh-ttl is quite a stretch. But I admit that it meets the driver requirements of never add sockopt and not [ab]use netfilter hooks. If that's acceptable to all, I'll modify the patch in that direction. Cheers, Javier - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
Jan-Bernd Themann wrote: Hi, this patch set contains the latest generic LRO code, a Kconfig / Makefile and an eHEA patch demonstrating how the aggregate SKB interface has to to be used. Drew, could you provide a patch for the myri10ge driver to show how the receive in page interface works? Hi, Thank you for the many fixes, and especially for the counters! I was working on testing the myri10ge patch, and I ran into a few problems. I've attached a patch to inet_lro.c to fix some of them, and a patch to myri10ge.c to show how to use the page based interface. Both patches are signed off by Andrew Gallatin [EMAIL PROTECTED] First, the LRO_MAX_PG_HLEN is still a problem. Minimally sized 60 byte frames still cause problems in lro_gen_skb due to skb-len going negative. Fixed in the attached patch. It may be simpler to just drop LRO_MAX_PG_HLEN to ETH_ZLEN, but I'm not sure if that is enough. Are there smart NICs which might chop off padding themselves? Second, you still need to set skb-ip_summed = CHECKSUM_UNNECESSARY when modified packets are flushed, else the stack will see bad checksums for packets from CHECKSUM_COMPLETE drivers using the skb interface. Fixed in the attached patch. Third, for some reason I have yet to figure out, this version of the patch takes a while to ramp up, but only when the receiver MTU is 9000 and the sender MTU is 1500. If the receiver MTU is also 1500, even a 10 second netperf easily shows line rate. I don't see this with our LRO, and I'm so far at a loss to explain it. Here is the first 3 seconds of output from a simple program which diffs the interface counters once / sec. Ipkts IBytesOpkts Obytes rx MTU=9000: 0000 7299092 14 1102 105 1589707 420 750324 113598708417804 1068240 814427 123304247818529 740 rx MTU=1500 0000 44134466818016813132 788182 815938 123532865618716 1122960 817698 123799477218628 1117680 . Once it has ramped up, the bandwidth is fine, and there doesn't seem to be much difference between setting the receiver MTU to 1500 or 9000. But it takes a very long netperf run to overcome the ramp up period. Fourth, I did some traffic sniffing to try to figure out what's going on above, and saw tcpdump complain about bad checksums. Have you tried running tcpdump -s 65535 -vvv? Have you also seen bad checksums? I seem to see this for both page- and skb-based versions of the driver. Last, the pointer to the TCP header in __lro_proc_segment() in the unaccelerated vlan hdr case needs to also be offset by vlan_hdr_len from skb-data. I've fixed this in the attached patch. Thanks, Drew --- linux-2.6.23-rc1.orig/net/ipv4/inet_lro.c 2007-07-30 13:10:49.0 -0400 +++ linux-2.6.23-rc1/net/ipv4/inet_lro.c2007-07-30 15:17:51.0 -0400 @@ -126,6 +126,7 @@ static void lro_update_tcp_ip_header(str htons(lro_desc-ip_tot_len) - IP_HDR_LEN(iph), IPPROTO_TCP, lro_desc-data_csum); + lro_desc-parent-ip_summed = CHECKSUM_UNNECESSARY; } static __wsum lro_tcp_data_csum(struct iphdr *iph, struct tcphdr *tcph, int len) @@ -396,6 +397,9 @@ static struct sk_buff *lro_gen_skb(struc if (!skb) return NULL; + if (hlen len) + hlen = len; + skb-len = len; skb-data_len = len - hlen; skb-truesize += true_size; diff -urp a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c --- a/drivers/net/myri10ge/myri10ge.c 2007-07-24 15:57:12.0 -0400 +++ b/drivers/net/myri10ge/myri10ge.c 2007-07-30 16:12:06.0 -0400 @@ -48,6 +48,7 @@ #include linux/etherdevice.h #include linux/if_ether.h #include linux/if_vlan.h +#include linux/inet_lro.h #include linux/ip.h #include linux/inet.h #include linux/in.h @@ -62,6 +63,8 @@ #include linux/io.h #include linux/log2.h #include net/checksum.h +#include net/ip.h +#include net/tcp.h #include asm/byteorder.h #include asm/io.h #include asm/processor.h @@ -89,6 +92,7 @@ MODULE_LICENSE(Dual BSD/GPL); #define MYRI10GE_EEPROM_STRINGS_SIZE 256 #define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2) +#define MYRI10GE_MAX_LRO_DESCRIPTORS 8 #define MYRI10GE_NO_CONFIRM_DATA htonl(0x) #define MYRI10GE_NO_RESPONSE_RESULT 0x @@ -151,6 +155,8 @@ struct myri10ge_rx_done { dma_addr_t bus; int cnt; int idx; + struct net_lro_mgr lro_mgr; + struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS]; }; struct myri10ge_priv { @@ -276,6 +282,14 @@ static int myri10ge_debug = -1;/* defau module_param(myri10ge_debug, int, 0);
RE: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec
On 7/26/07, David Miller [EMAIL PROTECTED] wrote: From: Shannon Nelson [EMAIL PROTECTED] Date: Tue, 24 Jul 2007 17:36:06 -0700 (repost - original eaten by vger?) Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was unreachable and thus unused. The code originally was there to support in-kernel dma needs, but since it remains unused, we'll pull it out. Signed-off-by: Shannon Nelson [EMAIL PROTECTED] Applied, thanks Shannon. NET_DMA on kernel buffer is pretty useful in ndb, iSCSI target and initiators which uses kernel buffer to receive data. Are there any other issues with dma-memcpy on kernel buffers, if not then following patch makes dma_memcpy_to_kernel_iovec() reachable from tcp_recvmsg. I tested this patch and it work fine with unh iSCSI target. comments? --pravin. Pravin, Currently, NET_DMA only has one provider in the kernel, namely TCP. As the driver stands now, I'd like to keep this function out since it really wasn't being used before. Shannon has posted patches that will hopefully make it into 2.6.24 that allow multiple clients to request DMA channels, so things like iSCSI, ndb, UDP, and NFS can ask for channels without depending on TCP. Once we get to that point, if we see the need for the function, we can add it back. We have tried getting iSCSI accelerated through NET_DMA already, and didn't see the performance boost that made it worthwhile to do. Once the multiple client channel patches are in, we can revisit that, since it can be done right. Cheers, -PJ Waskiewicz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed interface for per-packet mesh-ttl
On Mon, 30 Jul 2007 13:37:20 -0700 Javier Cardona [EMAIL PROTECTED] wrote: Hi Stephen, On 7/27/07, Stephen Hemminger [EMAIL PROTECTED] wrote: In this case perhaps you can have a table that maps skb-priority to mesh ttl? priorty can already by handled by existing setsockopt calls, and modified by netfilter and QoS managements. Thanks for the feedback. IMHO overloading the [SOL_SOCKET] SO_PRIORITY sockoption for something as different as changing the mesh-ttl is quite a stretch. But I admit that it meets the driver requirements of never add sockopt and not [ab]use netfilter hooks. If that's acceptable to all, I'll modify the patch in that direction. Cheers, Javier Take it as idea not a proscription. You can use TC rules to rewrite the priority. It is kind of like how priority is being mapped to TOS already in softmac. Alternatively, the TTL can be set with sockopt, but it is a IP so it would need an IP ttl to mesh mapping. The fundamental thing is to try and avoid topology specific options bleeding all the way up the socket layer, especially since the network layer is involved and may need to multipath. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver
1. When bonding enslaves an IPoIB device the bonding neighbor holds a reference to a cleanup function in the IPoIB drives. This makes it unsafe to unload the IPoIB module if there are bonding neighbors in the air. So, to avoid this race one must unload bonding before unloading IPoIB. I think we really want to resolve this somehow. Getting an oops by doing modprobe -r ipoib isn't that friendly. Also, what happened to the problem of having an address handle belonging to the wrong device on bond failover? Did you figure out a way to fix that one? - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
From: Jeff Garzik [EMAIL PROTECTED] Date: Mon, 30 Jul 2007 12:17:58 -0400 David, thoughts on merging? I'm not We could stick this into your tree or mine. Whether yours or mine, I would like to keep the driver and net-core patches together in the same git tree. No objections and I'll stick it into my net-2.6.24 tree once I cut that, I'll have the NAPI stuff in there too so I'll keep these two merge nightmares out of your hair :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver fixes
On Mon, 30 Jul 2007, Jeff Garzik wrote: true, we should just remove the dev==NULL check Patch below: [PATCH] nmclan_cs: Remove bogus (dev==NULL) check in mace_interrupt() The (dev == NULL) check in drivers/net/pcmcia/nmclan_cs.c:mace_interrupt() handler is always false, so let's remove it. Signed-off-by: Satyam Sharma [EMAIL PROTECTED] --- drivers/net/pcmcia/nmclan_cs.c |6 -- 1 files changed, 0 insertions(+), 6 deletions(-) diff --git a/drivers/net/pcmcia/nmclan_cs.c b/drivers/net/pcmcia/nmclan_cs.c index 73da611..3446be9 100644 --- a/drivers/net/pcmcia/nmclan_cs.c +++ b/drivers/net/pcmcia/nmclan_cs.c @@ -1000,12 +1000,6 @@ static irqreturn_t mace_interrupt(int irq, void *dev_id) int status; int IntrCnt = MACE_MAX_IR_ITERATIONS; - if (dev == NULL) { -DEBUG(2, mace_interrupt(): irq 0x%X for unknown device.\n, - irq); -return IRQ_NONE; - } - if (lp-tx_irq_disabled) { printk( (lp-tx_irq_disabled? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC]: napi_struct V4
From: Roland Dreier [EMAIL PROTECTED] Date: Mon, 30 Jul 2007 08:04:24 -0700 IPoIB can cope but it really seems like an unfortunate feature of these changes that we can't do something like what we have today, which imposes no overhead unless an event actually lands in the race window. It is not a feature of these changes. These races exist in the current code, it is a bug fix. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pktgen - define and use pktgen_dbg,err,warn,info
From: Joe Perches [EMAIL PROTECTED] Date: Wed, 18 Jul 2007 16:21:00 -0700 On Wed, 2007-07-18 at 15:49 -0700, David Miller wrote: From: Joe Perches [EMAIL PROTECTED] Date: Wed, 18 Jul 2007 15:14:13 -0700 -#define VERSION pktgen v2.68: Packet Generator for packet performance testing.\n +#define PKTGEN_NAMEpktgen +#define PKTGEN_VERSION v2.68 +#define PKTGEN_DESCPacket Generator for packet performance testing +#define PKTGEN_FULLINFO PKTGEN_NAME PKTGEN_VERSION : PKTGEN_DESC .\n ... -static char version[] __initdata = VERSION; - Using PKTGEN_FULLINFO explicitly won't get that string into the __initdata section as the version[] will, that's why people do it that way with the explicit version[] array. Please put it back, assign PKTGEN_FULLINFO to it, and use it. Done. Here you are. fyi: pktgen_info(version) fails compile, pktgen_info(%s, version) works. I still don't know about this patch. Instead of the simple transformation: - printk(foo); + printk(KERN_INFO foo); we get this new macro, and the lines changes to use that macro. It seems very over-engineered and adds ifdefs to a foo.c file which is severly frowned upon. I'm going to add the needed KERN_* to pktgen.c, that's just so much nicer a way to handle this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPVS]: Use skb_forward_csum
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 16:59:39 +0800 [IPVS]: Use skb_forward_csum As a path that forwards packets, IPVS should be using skb_forward_csum instead of directly setting ip_summed to CHECKSUM_NONE. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Patch applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] [NET]: Call uninit if necessary in register_netdevice
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:08 +0800 [NET]: Call uninit if necessary in register_netdevice This patch makes register_netdevice call dev-uninit if the regsitration fails after dev-init has completed successfully. Very few drivers use the init/uninit calls but at least one (drivers/net/wan/sealevel.c) may leak without this change. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Looks great, applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] [NET]: Take dev_base_lock when moving device name hash list entry
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:30 +0800 [NET]: Take dev_base_lock when moving device name hash list entry When we added name-based hashing the dev_base_lock was designated as the lock to take when changing the name hash list. Unfortunately, because it was a preexisting lock that just happened to be taken in the right spots we neglected to take it in dev_change_name. The race can affect calles of __dev_get_by_name that do so without taking the RTNL. They may end up walking down the wrong hash chain and end up missing the device that they're looking for. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Good catch, applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/6] [NET] loopback: Panic if registration fails
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:33 +0800 [NET] loopback: Panic if registration fails Because IPv4 and IPv6 both depend on the presence of the loopback device to function, failure in registration the loopback device should be fatal. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied. It might be nice to tightly integrate the loopback driver into net/core/dev.c because that is what is happening anyways and it would allow us to cons the thing up by hand and deal with these dependencies more reliably. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic
David Miller wrote: From: Jeff Garzik [EMAIL PROTECTED] Date: Mon, 30 Jul 2007 12:17:58 -0400 David, thoughts on merging? I'm not We could stick this into your tree or mine. Whether yours or mine, I would like to keep the driver and net-core patches together in the same git tree. No objections and I'll stick it into my net-2.6.24 tree once I cut that, I'll have the NAPI stuff in there too so I'll keep these two merge nightmares out of your hair :-) Works for me. Thanks, Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] [NET]: Allow netdev REGISTER/CHANGENAME events to fail
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:36 +0800 [NET]: Allow netdev REGISTER/CHANGENAME events to fail This patch adds code to allow errors to be passed up from event handlers of NETDEV_REGISTER and NETDEV_CHANGENAME. It also adds the notifier_from_errno/notifier_to_errnor helpers to pass the errno value up to the notifier caller. If an error is detected when a device is registered, it causes that operation to fail. A NETDEV_UNREGISTER will be sent to all event handlers. Similarly if NETDEV_CHANGENAME fails the original name is restored and a new NETDEV_CHANGENAME event is sent. As such all event handlers must be idempotent with respect to these events. When an event handler is registered NETDEV_REGISTER events are sent for all devices currently registered. Should any of them fail, we will send NETDEV_GOING_DOWN/NETDEV_DOWN/NETDEV_UNREGISTER events to that handler for the devices which have already been registered with it. The handler registration itself will fail. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied. Interesting encoding scheme :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] [IPV4/IPV6]: Fail registration if inet device construction fails
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:38 +0800 [IPV4/IPV6]: Fail registration if inet device construction fails Now that netdev notifications can fail, we can use this to signal errors during registration for IPv4/IPv6. In particular, if we fail to allocate memory for the inet device, we can fail the netdev registration. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] [IPV6]: Remove circular dependency on if_inet6.h
From: Herbert Xu [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 17:09:40 +0800 [IPV6]: Remove circular dependency on if_inet6.h net/if_inet6.h includes linux/ipv6.h which also tries to include net/if_inet6.h. Since the latter only needs it for forward declarations, we can fix this by adding the declarations. A number of files are implicitly including net/if_inet6.h through linux/ipv6.h. They also use net/ipv6.h so this patch includes net/if_inet6.h there. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Also applied, thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability
On Mon, 30 Jul 2007 08:17:41 +0530 Satyam Sharma [EMAIL PROTECTED] wrote: [0/9] netconsole: Multiple targets and dynamic reconfigurability This is v3 of the patchset That all looks pretty reasonable, thanks. Are we all comfortable with the attributions? As I have it now, everything will go in as having been authored by yourself. There is no formal (ie: understood-by-git) way of recording joint authorship, unfortunately. But one can make a record of such things in the changelog. I'll queue this material up in the to-be-sent-to-davem queue. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs
On Mon, 30 Jul 2007 08:19:10 +0530 Satyam Sharma [EMAIL PROTECTED] wrote: +/* + * Wrapper over simple_strtol (base 10) with sanity and range checking. + * We return (signed) long only because we may want to return errors. + * Do not use this to convert numbers that are allowed to be negative. + */ +static long strtol10_check_range(const char *cp, long min, long max) +{ + long ret; + char *p = (char *) cp; + + WARN_ON(min 0); + WARN_ON(max min); + + ret = simple_strtol(p, p, 10); + + if (*p (*p != '\n')) { + printk(KERN_ERR netconsole: invalid input\n); + return -EINVAL; + } + if ((ret min) || (ret max)) { + printk(KERN_ERR netconsole: input %ld must be between + %ld and %ld\n, ret, min, max); + return -EINVAL; + } + + return ret; +} There's probably other code around the place which does this. It might be worth making this function a general-purpose thing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc
From: Waskiewicz Jr, Peter P [EMAIL PROTECTED] Date: Fri, 27 Jul 2007 09:22:11 -0700 Are these queued for 2.6.24, or are they going to make it into 2.6.23? I know you're busy with patches and NAPI, but I was curious. Thanks! I've applied both fixes and will push them into 2.6.23 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc
-Original Message- From: David Miller [mailto:[EMAIL PROTECTED] Sent: Monday, July 30, 2007 5:14 PM To: Waskiewicz Jr, Peter P Cc: netdev@vger.kernel.org; [EMAIL PROTECTED] Subject: Re: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc From: Waskiewicz Jr, Peter P [EMAIL PROTECTED] Date: Fri, 27 Jul 2007 09:22:11 -0700 Are these queued for 2.6.24, or are they going to make it into 2.6.23? I know you're busy with patches and NAPI, but I was curious. Thanks! I've applied both fixes and will push them into 2.6.23 Many thanks Dave! Cheers, -PJ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv6: ipv6_addr_type() doesn't know about RFC4193 addresses
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 13:34:52 -0500 (CDT) In article [EMAIL PROTECTED] (at Wed, 25 Jul 2007 19:49:09 -0400), Dave Johnson [EMAIL PROTECTED] says: ipv6_addr_type() doesn't check for 'Unique Local IPv6 Unicast Addresses' (RFC4193) and returns IPV6_ADDR_RESERVED for that range. Acked-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] Ok, I've applied Dave's patch. Dave, although it's customary and fine to use +foo-list email addresses for mailing list subscriptions and discussions, I ask that you don't add that cookie to your signoff lines in patch submissions and I've removed it from your's in this patch. Thanks a lot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability
On Mon, 30 Jul 2007, Andrew Morton wrote: On Mon, 30 Jul 2007 08:17:41 +0530 Satyam Sharma [EMAIL PROTECTED] wrote: [0/9] netconsole: Multiple targets and dynamic reconfigurability This is v3 of the patchset That all looks pretty reasonable, thanks. Are we all comfortable with the attributions? As I have it now, everything will go in as having been authored by yourself. There is no formal (ie: understood-by-git) way of recording joint authorship, unfortunately. But one can make a record of such things in the changelog. Ok, making such a record would be good, if possible. I did see that the author field for git commits can have only one entry, unfortunately, and it turns out Signed-off-by and Acked-by have other meanings attached that have to do with the patch-journey-from-author-to-you chain. But I'd like these to be somehow marked as jointly authored, definitely. I'll queue this material up in the to-be-sent-to-davem queue. Ok, thanks. Satyam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC]: napi_struct V4
From: Michael Chan [EMAIL PROTECTED] Date: Thu, 26 Jul 2007 14:38:09 -0700 If the driver wants a simple solution, it can do what you did in the patch: wrap the tx cleanup code with netif_tx_lock() and netif_tx_unlock(). If a NAPI driver wants to be more clever, it can do something such as this in tg3's poll_controller: if (netif_rx_schedule_prep(dev, tp-napi)) { tg3_tx(tp); netif_poll_enable(tp-napi); } Thanks Michael, that's a good suggestion and would work. In thinking about this some more over the weekend I've decided that my plan to rip out RX support from netpoll is a bit too ambitious. Therefore, for the time being, I'm going to add a special driver function for netpoll that will allow it to ask the driver to invoke -poll() over all the NAPI structs assosciated with the netdev. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs
On Mon, 30 Jul 2007, Andrew Morton wrote: On Mon, 30 Jul 2007 08:19:10 +0530 Satyam Sharma [EMAIL PROTECTED] wrote: +/* + * Wrapper over simple_strtol (base 10) with sanity and range checking. + * We return (signed) long only because we may want to return errors. + * Do not use this to convert numbers that are allowed to be negative. + */ +static long strtol10_check_range(const char *cp, long min, long max) +{ + long ret; + char *p = (char *) cp; + + WARN_ON(min 0); + WARN_ON(max min); + + ret = simple_strtol(p, p, 10); + + if (*p (*p != '\n')) { + printk(KERN_ERR netconsole: invalid input\n); + return -EINVAL; + } + if ((ret min) || (ret max)) { + printk(KERN_ERR netconsole: input %ld must be between + %ld and %ld\n, ret, min, max); + return -EINVAL; + } + + return ret; +} There's probably other code around the place which does this. It might be worth making this function a general-purpose thing. Probably, yes, I thought along similar lines. [ BTW somewhere in the 9/9 patch I remember having to define a __U16_MAX as well. That could be pushed out to some generic or appropriate header as well, possibly. ] Satyam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 1/3] PPPoE: improved hashing routine
From: Florian Zumbiehl [EMAIL PROTECTED] Date: Sun, 29 Jul 2007 08:04:23 +0200 Hi, I'm not sure whether this is really worth it, but it looked so extremely inefficient that I couldn't resist - so let's hope providers will keep PPPoE around for a while, at least until terabit dsl ;-) The new code produces the same results as the old version and is ~ 3 to 6 times faster for 4-bit hashes on the CPUs I tested. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 9e51fcc..954328c 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -108,19 +108,24 @@ static inline int cmp_addr(struct pppoe_addr *a, unsigned long sid, char *addr) (memcmp(a-remote,addr,ETH_ALEN) == 0)); } -static int hash_item(unsigned long sid, unsigned char *addr) +#if 8%PPPOE_HASH_BITS +#error 8 must be a multiple of PPPOE_HASH_BITS +#endif Since PPPOE_HASH_BITS is 4 I would think this check will break the build. :-) Anyways, I looked at this myself and half of the problem comes from the fact that sid is passed around as an unsigned long throughout this entire file yet the thing is just a __u16. So the first thing to fix is to use __u16 consistently for sid values. Then the sid hash looks obviously wrong and is easy to fix. Then you end up with a hash_item() that simply looks like: static unsigned int hash_item(__u16 sid, unsigned char *addr) { unsigned int hash; hash = (((unsigned int)addr[0] 24) | ((unsigned int)addr[1] 16) | ((unsigned int)addr[2] 8) | ((unsigned int)addr[3] 0)); hash ^= (((unsigned int)addr[4] 8) | ((unsigned int)addr[5] 0)); hash ^= sid; return ((hash ^ (hash 8) ^ (hash 16)) (PPPOE_HASH_SIZE - 1)); } which is what I've checked into my tree. Thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 2/3] PPPoX/E: return ENOTTY on unknown ioctl requests
From: Florian Zumbiehl [EMAIL PROTECTED] Date: Sun, 29 Jul 2007 08:04:36 +0200 here another patch for the PPPoX/E code that makes sure that ENOTTY is returned for unknown ioctl requests rather than 0 (and removes another unneeded initializer which I didn't bother creating a separate patch for). Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] Applied, thanks Florian. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 3/3] PPPoE: move lock_sock() in pppoe_sendmsg() to the right location
From: Florian Zumbiehl [EMAIL PROTECTED] Date: Sun, 29 Jul 2007 08:04:46 +0200 and the last one for now: Acquire the sock lock in pppoe_sendmsg() before accessing the sock - and in particular avoid releasing the lock even though it hasn't been acquired. Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] Also applied, thanks Florian. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RFC: on [ab]use of skb-cb by VLAN code
I was going to forget this, but its been playing in the back of my head and wont go away Matt Carlson recently (while fixing the tg3 driver in my batching patches) pointed to me that skb-cb[] was being used to pass around vlan data. This seems like a bad use since there can be a lot of things between a real hardware driver and something that sets a vlan tag (qdiscs come to mind). Creating a new skb field be the reasonable thing to do here but i know that we are trying to avoid adding new fields. Thoughts? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] make pktgen.c:get_ipsec_sa() static and non-inline
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 29 Jul 2007 16:58:40 +0200 Non-static inline code usually doesn't makes sense. In this case making is static and non-inline is the correct solution. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Applied, thanks Adrian. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6 patch] make nf_ct_ipv6_skip_exthdr() static
From: Adrian Bunk [EMAIL PROTECTED] Date: Sun, 29 Jul 2007 16:58:46 +0200 nf_ct_ipv6_skip_exthdr() can now become static. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html