Re: 2.6.20-2.6.21 - networking dies after random time

2007-07-30 Thread Marcin Ślusarz
2007/7/26, Ingo Molnar [EMAIL PROTECTED]:
 (..)
 yeah - i meant to cover both arches but forgot about x86_64 - updated
 patch attached below.

 Ingo

 -
 Subject: x86: activate HARDIRQS_SW_RESEND
 From: Ingo Molnar [EMAIL PROTECTED]

 activate the software-triggered IRQ-resend logic.

 it appears some chipsets/cpus do not handle local-APIC driven IRQ
 resends all that well, so always use the soft mechanism to trigger
 the execution of pending interrupts.

 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
 ---
  arch/i386/Kconfig   |4 
  arch/x86_64/Kconfig |4 
  kernel/irq/manage.c |8 
  3 files changed, 16 insertions(+)

 Index: linux-rt-rebase.q/arch/i386/Kconfig
 ===
 --- linux-rt-rebase.q.orig/arch/i386/Kconfig
 +++ linux-rt-rebase.q/arch/i386/Kconfig
 @@ -1284,6 +1284,10 @@ config GENERIC_PENDING_IRQ
 depends on GENERIC_HARDIRQS  SMP
 default y

 +config HARDIRQS_SW_RESEND
 +   bool
 +   default y
 +
  config X86_SMP
 bool
 depends on SMP  !X86_VOYAGER
 Index: linux-rt-rebase.q/arch/x86_64/Kconfig
 ===
 --- linux-rt-rebase.q.orig/arch/x86_64/Kconfig
 +++ linux-rt-rebase.q/arch/x86_64/Kconfig
 @@ -721,6 +721,10 @@ config GENERIC_PENDING_IRQ
 depends on GENERIC_HARDIRQS  SMP
 default y

 +config HARDIRQS_SW_RESEND
 +   bool
 +   default y
 +
  menu Power management options

  source kernel/power/Kconfig
 Index: linux-rt-rebase.q/kernel/irq/manage.c
 ===
 --- linux-rt-rebase.q.orig/kernel/irq/manage.c
 +++ linux-rt-rebase.q/kernel/irq/manage.c
 @@ -175,6 +175,14 @@ void enable_irq(unsigned int irq)
 desc-depth--;
 }
 spin_unlock_irqrestore(desc-lock, flags);
 +#ifdef CONFIG_HARDIRQS_SW_RESEND
 +   /*
 +* Do a bh disable/enable pair to trigger any pending
 +* irq resend logic:
 +*/
 +   local_bh_disable();
 +   local_bh_enable();
 +#endif
  }
  EXPORT_SYMBOL(enable_irq);

This patch didn't help (tested on 2.6.22.1) - ne2k_pci timed out.

ps: I retested all patches posted in this thread on top of 2.6.22.1
and behavior from 2.6.21.3 didn't changed. My next tests will be on
2.6.22.x only.

Regards,
Marcin Slusarz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-30 Thread Petko Manolov

On Sun, 29 Jul 2007, Oliver Neukum wrote:


Am Sonntag 29 Juli 2007 schrieb Jesper Juhl:

On 29/07/07, Satyam Sharma [EMAIL PROTECTED] wrote:

Hi,

On 7/29/07, Jesper Juhl [EMAIL PROTECTED] wrote:

Hello,

This patch makes sure we don't dereference a NULL pointer in
drivers/net/usb/pegasus.c::write_bulk_callback() in the initial
struct net_device *net = pegasus-net; assignment.
The existing code checks if 'pegasus' is NULL and bails out if
it is, so we better not touch that pointer until after that check.
[...]
diff --git a/drivers/net/usb/pegasus.c b/drivers/net/usb/pegasus.c
index a05fd97..04cba6b 100644
--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -768,11 +768,13 @@ done:
 static void write_bulk_callback(struct urb *urb)
 {
pegasus_t *pegasus = urb-context;
-   struct net_device *net = pegasus-net;
+   struct net_device *net;

if (!pegasus)
return;

+   net = pegasus-net;
+
if (!netif_device_present(net) || !netif_running(net))
return;


Is it really possible that we're called into this function with
urb-context == NULL? If not, I'd suggest let's just get rid of
the check itself, instead.


I'm not sure. I am not very familiar with this code. I just figured
that moving the assignment is potentially a little safer and it is
certainly no worse than the current code, so that's a safe and
potentially benneficial change. Removing the check may be safe but I'm
not certain enough to make that call...


pegasus == NULL there would be a kernel bug. Silently ignoring
it, like the code now wants to do is bad. As the oops has never been
reported, I figure turning it into an explicit debugging test is overkill,
so removal seems to be the best option.


In the past urb-context was not guaranteed to be non-null for any 
asynchronous calls.  If this is not the case anymore then it should be 
removed from at least two more locations in the driver.


Attached you'll find the resulting patch.


Petko--- drivers/net/usb/pegasus.c.old	2007-07-10 11:39:50.0 +0300
+++ drivers/net/usb/pegasus.c	2007-07-30 11:02:10.0 +0300
@@ -100,9 +100,6 @@ static void ctrl_callback(struct urb *ur
 {
 	pegasus_t *pegasus = urb-context;
 
-	if (!pegasus)
-		return;
-
 	switch (urb-status) {
 	case 0:
 		if (pegasus-flags  ETH_REGS_CHANGE) {
@@ -609,15 +606,11 @@ static inline struct sk_buff *pull_skb(p
 static void read_bulk_callback(struct urb *urb)
 {
 	pegasus_t *pegasus = urb-context;
-	struct net_device *net;
+	struct net_device *net = pegasus-net;
 	int rx_status, count = urb-actual_length;
 	u8 *buf = urb-transfer_buffer;
 	__u16 pkt_len;
 
-	if (!pegasus)
-		return;
-
-	net = pegasus-net;
 	if (!netif_device_present(net) || !netif_running(net))
 		return;
 
@@ -770,9 +763,6 @@ static void write_bulk_callback(struct u
 	pegasus_t *pegasus = urb-context;
 	struct net_device *net = pegasus-net;
 
-	if (!pegasus)
-		return;
-
 	if (!netif_device_present(net) || !netif_running(net))
 		return;
 
@@ -805,13 +795,9 @@ static void write_bulk_callback(struct u
 static void intr_callback(struct urb *urb)
 {
 	pegasus_t *pegasus = urb-context;
-	struct net_device *net;
+	struct net_device *net = pegasus-net;
 	int status;
 
-	if (!pegasus)
-		return;
-	net = pegasus-net;
-
 	switch (urb-status) {
 	case 0:
 		break;


Re: [linux-usb-devel] [PATCH] USB Pegasus driver - avoid a potential NULL pointer dereference.

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Petko Manolov wrote:

 On Sun, 29 Jul 2007, Oliver Neukum wrote:
 
  [...]
  pegasus == NULL there would be a kernel bug. Silently ignoring
  it, like the code now wants to do is bad. As the oops has never been
  reported, I figure turning it into an explicit debugging test is overkill,
  so removal seems to be the best option.
 
 In the past urb-context was not guaranteed to be non-null for any
 asynchronous calls.  If this is not the case anymore then it should be removed
 from at least two more locations in the driver.
 
 Attached you'll find the resulting patch.

Given Oliver's earlier comment, it looks okay to me. Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: source interface ping bug ?

2007-07-30 Thread nano bug
Can someone have a look a this and tell if it's kernel related or if I
posted this in the wrong place ? Thanks.

On 7/29/07, nano bug [EMAIL PROTECTED] wrote:
 Any news about this ?

 On 7/27/07, nano bug [EMAIL PROTECTED] wrote:
  Hello there,
 
  I'm facing the following issue : when I try to ping using source
  interface instead of a source ip address the ping utility starts to
  send arp requests instead of icmp requests though the ip address I'm
  pinging it's not in the subnets directly connected to my linux box.
  I've noticed this situation since I upgraded from kernel 2.6.20 to
  2.6.21. On 2.6.20 and lower I haven't had this problem. Now I upgraded
  to 2.6.22 but it's the same. I'm using latest iproute and iputils.
  Here is an output of tcpdump when I try to ping an outside ip address,
  like for example www.yahoo.com, using source interface :
 
  [EMAIL PROTECTED]:~# uname -a
  Linux darkstar 2.6.22 #1 Thu Jul 26 21:22:11 EEST 2007 i686 Pentium II
  (Deschutes) GenuineIntel GNU/Linux
 
  [EMAIL PROTECTED]:~# ip -V
  ip utility, iproute2-ss070710
  [EMAIL PROTECTED]:~#
 
 
  [EMAIL PROTECTED]:~# ip address show dev eth2
  3: eth2: BROADCAST,MULTICAST,NOTRAILERS,UP,LOWER_UP mtu 1500 qdisc
  pfifo_fast qlen 1000
 link/ether 00:90:27:0f:79:f3 brd ff:ff:ff:ff:ff:ff
 inet 86.106.19.75/23 brd 86.106.19.255 scope global eth2
  [EMAIL PROTECTED]:~# ip route get 87.248.113.14 from 86.106.19.75 oif eth2
  87.248.113.14 from 86.106.19.75 via 86.106.18.1 dev eth2
 cache  mtu 1500 advmss 1460 hoplimit 64
  [EMAIL PROTECTED]:~#
 
 
  [EMAIL PROTECTED]:~/iputils# ./ping -V
  ping utility, iputils-sss20070202
  [EMAIL PROTECTED]:~/iputils# ./ping -I 86.106.19.75 87.248.113.14 -c 2
  PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 : 56(84) bytes of data.
  64 bytes from 87.248.113.14: icmp_seq=1 ttl=51 time=60.5 ms
  64 bytes from 87.248.113.14: icmp_seq=2 ttl=51 time=63.2 ms
 
  --- 87.248.113.14 ping statistics ---
  2 packets transmitted, 2 received, 0% packet loss, time 999ms
  rtt min/avg/max/mdev = 60.574/61.924/63.274/1.350 ms
  [EMAIL PROTECTED]:~/iputils#
 
 
  [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host
  86.106.19.75
  tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 
  bytes
  01:18:09.572603 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF],
  proto: ICMP (1), length: 84) 86.106.19.75  87.248.113.14: ICMP echo
  request, id 27166, seq 1, length 64
  01:18:09.632861 IP (tos 0x0, ttl  51, id 6100, offset 0, flags [none],
  proto: ICMP (1), length: 84) 87.248.113.14  86.106.19.75: ICMP echo
  reply, id 27166, seq 1, length 64
  01:18:10.572746 IP (tos 0x0, ttl  64, id 0, offset 0, flags [DF],
  proto: ICMP (1), length: 84) 86.106.19.75  87.248.113.14: ICMP echo
  request, id 27166, seq 2, length 64
  01:18:10.634951 IP (tos 0x0, ttl  51, id 8790, offset 0, flags [none],
  proto: ICMP (1), length: 84) 87.248.113.14  86.106.19.75: ICMP echo
  reply, id 27166, seq 2, length 64
 
 
 
  using source interface :
 
  [EMAIL PROTECTED]:~/iputils# ./ping -I eth2 87.248.113.14
  PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 eth2: 56(84) bytes of 
  data.
  From 86.106.19.75 icmp_seq=1 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=2 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=3 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=5 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=6 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=7 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=9 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=10 Destination Host Unreachable
  From 86.106.19.75 icmp_seq=11 Destination Host Unreachable
 
  --- 87.248.113.14 ping statistics ---
  13 packets transmitted, 0 received, +9 errors, 100% packet loss, time 
  12006ms
  , pipe 3
  [EMAIL PROTECTED]:~/iputils#
 
 
  [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host
  86.106.19.75
  tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 
  bytes
  01:19:24.292911 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:25.292897 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:26.292901 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:27.302906 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:28.302911 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:29.302912 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:31.302917 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:32.302921 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:33.302923 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:35.302932 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:36.302932 arp who-has 87.248.113.14 tell 86.106.19.75
  01:19:37.302939 arp who-has 87.248.113.14 tell 86.106.19.75
 
  12 packets captured
  12 packets received by filter
  0 packets dropped by kernel
  [EMAIL PROTECTED]:~#
 
 
  There is one exception though, it works when using eth0. I'm
  

Re: 2.6.20-2.6.21 - networking dies after random time

2007-07-30 Thread Ingo Molnar

* Alan Cox [EMAIL PROTECTED] wrote:

 Ok the logic behind the 8390 is very simple:

thanks for the explanation Alan! A few comments and a question:

 Things to know
   - IRQ delivery is asynchronous to the PCI bus
   - Blocking the local CPU IRQ via spin locks was too slow
   - The chip has register windows needing locking work
 
 So the path was once (I say once as people appear to have changed it 
 in the mean time and it now looks rather bogus if the changes to use 
 disable_irq_nosync_irqsave are disabling the local IRQ)
 
 
   Take the page lock
   Mask the IRQ on chip
   Disable the IRQ (but not mask locally- someone seems to have
   broken this with the lock validator stuff)
   [This must be _nosync as the page lock may otherwise
   deadlock us]

( side-note: you can ignore the lock validator stuff here, the validator
  changes are supposed to a NOP on the !lockdep case. Local irqs will
  only be disabled if the validator is running. This could cause dropped
  serial irqs on very old boxes but i doubt anyone will want to run the
  validator on those. )

   Drop the page lock and turn IRQs back on
   
   At this point an existing IRQ may still be running but we can't
   get a new one
 
   Take the lock (so we know the IRQ has terminated) but don't mask
 the IRQs on the processor
   Set irqlock [for debug]
 
   Transmit (slow as )
 
   re-enable the IRQ
 
 
 We have to use disable_irq because otherwise you will get delayed 
 interrupts on the APIC bus deadlocking the transmit path.
 
 Quite hairy but the chip simply wasn't designed for SMP and you can't 
 even ACK an interrupt without risking corrupting other parallel 
 activities on the chip.

So the whole locking is to be able to keep irqs enabled for a long time, 
without risking entry of the same IRQ handler on this same CPU, correct?

Marcin's test results suggest that if an IRQ is resent right at the 
enable_irq() point [be that via the hw irq-resend mechanism or the sw 
irq-resend mechanism], the hang happens.

In the previous 2.6.20 logic we'd not normally generate an IRQ at that 
point (because we masked the irq and the card itself deasserts the line 
so any level-triggered irq is now moot).

Once Thomas hacked off this resend mechanism for level-triggered irqs, 
Marcin saw the hangs go away.

So it seems to me that maybe the driver could be surprised via these 
spurious interrupts that happen right after the irq_enable(). Does the 
patch below make any sense in your opinion?

Ingo

Index: linux/drivers/net/lib8390.c
===
--- linux.orig/drivers/net/lib8390.c
+++ linux/drivers/net/lib8390.c
@@ -375,6 +375,8 @@ static int ei_start_xmit(struct sk_buff 
/* Turn 8390 interrupts back on. */
ei_local-irqlock = 0;
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
+   /* force POST: */
+   ei_inb_p(e8390_base + EN0_IMR);
 
spin_unlock(ei_local-page_lock);
enable_irq_lockdep_irqrestore(dev-irq, flags);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-2.6.21 - networking dies after random time

2007-07-30 Thread Ingo Molnar

* Marcin Ślusarz [EMAIL PROTECTED] wrote:

  Subject: x86: activate HARDIRQS_SW_RESEND
  From: Ingo Molnar [EMAIL PROTECTED]
 
  activate the software-triggered IRQ-resend logic.

 This patch didn't help (tested on 2.6.22.1) - ne2k_pci timed out.

ok. This makes it more likely that the driver itself (or the card) gets 
confused by the resend.

does the patch below fix those timeouts? It tests the theory whether any 
POST latency could expose this problem.

Ingo

Index: linux/drivers/net/lib8390.c
===
--- linux.orig/drivers/net/lib8390.c
+++ linux/drivers/net/lib8390.c
@@ -375,6 +375,8 @@ static int ei_start_xmit(struct sk_buff 
/* Turn 8390 interrupts back on. */
ei_local-irqlock = 0;
ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
+   /* force POST: */
+   ei_inb_p(e8390_base + EN0_IMR);
 
spin_unlock(ei_local-page_lock);
enable_irq_lockdep_irqrestore(dev-irq, flags);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linksys Gigabit USB2.0 adapter (asix) regression

2007-07-30 Thread Erik Slagter
David Hollis wrote:
 On Mon, 2007-06-25 at 19:05 +0200, Erik Slagter wrote:
 drivers/net/usb/asix.c: PHYID=0x01410cc2
 
 Ok, it is using a Marvell PHY so that part should be fine.  You
 mentioned that it looks like the packets are being transmitted, but are
 garbled in some way.  The device does prepend a 'header' to ethernet
 packets as they are transmitted down the USB pipe.  The device strips
 this off and puts the packets on the wire.  This could be where the
 issue lies.  Are you on x86 by chance or something else?

They are either garbled are they are not passed on the wire. The
transmitted packets are shown by tshark, but a tshark run on the other
end of the line does not show them.

Platform is indeed x86, to be precise: fedora 7, kernel 2.6.22-rc6, cpu
pentium M, dell laptop inspiron 9300, ICH6.

If you want me to test something please yell, it's no trouble at all to
change a few lines in the driver's source and recompile the module.

Please note I cannot send mail to you: (conversation with
dhollis.dyndns.org[71.251.104.159] timed out while sending MAIL FROM)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.23-rc1] PPPOL2TP: Add CONFIG_INET Kconfig dependency

2007-07-30 Thread James Chapman
[PPPOL2TP]: Add CONFIG_INET Kconfig dependency.

PPPOL2TP uses UDP so it obviously depends on CONFIG_INET. 

Signed-off-by: James Chapman [EMAIL PROTECTED]

--- 
Toralf Foerster reported that make rndconfig failed in 2.6.23-rc1
when selecting CONFIG_PPPOL2TP without CONFIG_INET.

Index: linux-2.6.23-rc1/drivers/net/Kconfig
===
--- linux-2.6.23-rc1.orig/drivers/net/Kconfig
+++ linux-2.6.23-rc1/drivers/net/Kconfig
@@ -2851,7 +2851,7 @@ config PPPOATM
 
 config PPPOL2TP
tristate PPP over L2TP (EXPERIMENTAL)
-   depends on EXPERIMENTAL  PPP
+   depends on EXPERIMENTAL  PPP  INET
help
  Support for PPP-over-L2TP socket family. L2TP is a protocol
  used by ISPs and enterprises to tunnel PPP traffic over UDP
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] make nf_ct_ipv6_skip_exthdr() static

2007-07-30 Thread Patrick McHardy
Adrian Bunk wrote:
 nf_ct_ipv6_skip_exthdr() can now become static.


Applied, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] HW VLAN filtering control

2007-07-30 Thread Patrick McHardy
Mitch Williams wrote:
 This patchset adds the capability to disable hardware VLAN filtering at
 runtime via the existing vconfig utility.  It's useful for debugging
 purposes.
 
 The first patch modifies the VLAN subsystem to define the flag, and to
 support passing the flag on to the base driver.
 
 The second patch modifies e1000 to support the flag.  Since it's only
 one function, other drivers can be easily modified to support this
 functionality.
 
 vconfig is used without modification to enable or disable filtering:
   # vconfig [vlan-interface] 2 1
 will disable filtering, and
   # vconfig [vlan-interface] 2 0
 will enable filtering.


I think an ethtool option for this would be more consistent with
configuration of other device features and its questionable whether
new feature should be added to the ioctl interface at all. The
vlan driver could perform the necessary reconfiguration in the
FEAT_CHANGE notifier call.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 1/7] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-07-30 Thread Moni Shoua
IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb-dst-neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb-dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n-dev is an ipoib device and hence netdev_priv(n-dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n-dev-flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/ipoib/ipoib.h   |4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |   17 +++--
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |2 +-
 3 files changed, 19 insertions(+), 4 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h   2007-07-25 
14:56:13.0 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib.h2007-07-25 
14:57:48.095724495 +0300
@@ -328,6 +328,7 @@ struct ipoib_neigh {
struct sk_buff_head queue;
 
struct neighbour   *neighbour;
+   struct net_device *dev;
 
struct list_headlist;
 };
@@ -344,7 +345,8 @@ static inline struct ipoib_neigh **to_ip
 INFINIBAND_ALEN, sizeof(void *));
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh);
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neigh,
+ struct net_device *dev);
 void ipoib_neigh_free(struct net_device *dev, struct ipoib_neigh *neigh);
 
 extern struct workqueue_struct *ipoib_workqueue;
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-07-25 
14:56:13.0 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-07-25 
15:03:11.291291271 +0300
@@ -510,7 +510,7 @@ static void neigh_add_path(struct sk_buf
struct ipoib_path *path;
struct ipoib_neigh *neigh;
 
-   neigh = ipoib_neigh_alloc(skb-dst-neighbour);
+   neigh = ipoib_neigh_alloc(skb-dst-neighbour, skb-dev);
if (!neigh) {
++priv-stats.tx_dropped;
dev_kfree_skb_any(skb);
@@ -817,6 +817,16 @@ static void ipoib_neigh_cleanup(struct n
unsigned long flags;
struct ipoib_ah *ah = NULL;
 
+   if (n-dev-flags  IFF_MASTER) {
+   /* n-dev is not an IPoIB device and we have to take priv from 
elsewhere */
+   neigh = *to_ipoib_neigh(n);
+   if (neigh){
+   priv = netdev_priv(neigh-dev);
+   ipoib_dbg(priv, neigh_destructor for bonding device: 
%s\n,
+ n-dev-name);
+   } else
+   return;
+   }
ipoib_dbg(priv,
  neigh_cleanup for %06x  IPOIB_GID_FMT \n,
  IPOIB_QPN(n-ha),
@@ -838,7 +848,9 @@ static void ipoib_neigh_cleanup(struct n
ipoib_put_ah(ah);
 }
 
-struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour)
+struct ipoib_neigh *ipoib_neigh_alloc(struct neighbour *neighbour,
+ struct net_device *dev)
+
 {
struct ipoib_neigh *neigh;
 
@@ -847,6 +859,7 @@ struct ipoib_neigh *ipoib_neigh_alloc(st
return NULL;
 
neigh-neighbour = neighbour;
+   neigh-dev = dev;
*to_ipoib_neigh(neighbour) = neigh;
skb_queue_head_init(neigh-queue);
ipoib_cm_set(neigh, NULL);
Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2007-07-25 
14:56:13.0 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_multicast.c  2007-07-25 
14:57:48.097724142 +0300
@@ -727,7 +727,7 @@ out:
if (skb-dst
skb-dst-neighbour 
!*to_ipoib_neigh(skb-dst-neighbour)) {
-   struct ipoib_neigh *neigh = 
ipoib_neigh_alloc(skb-dst-neighbour);
+   struct ipoib_neigh *neigh = 
ipoib_neigh_alloc(skb-dst-neighbour, skb-dev);
 
if (neigh) {
kref_get(mcast-ah-ref);

-
To unsubscribe from this list: send the line 

[PATCH V3 2/7] IB/ipoib: Verify address handle validity on send

2007-07-30 Thread Moni Shoua
When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the 
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

Index: net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c
===
--- net-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c  2007-07-25 
14:57:48.0 +0300
+++ net-2.6/drivers/infiniband/ulp/ipoib/ipoib_main.c   2007-07-25 
15:02:55.525131034 +0300
@@ -685,9 +685,10 @@ static int ipoib_start_xmit(struct sk_bu
goto out;
}
} else if (neigh-ah) {
-   if (unlikely(memcmp(neigh-dgid.raw,
+   if (unlikely((memcmp(neigh-dgid.raw,
skb-dst-neighbour-ha + 4,
-   sizeof(union ib_gid {
+   sizeof(union ib_gid))) ||
+(neigh-dev != dev))) {
spin_lock(priv-lock);
/*
 * It's safe to call ipoib_put_ah() inside

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/7] net/bonding: Enable bonding to enslave non ARPHRD_ETHER

2007-07-30 Thread Moni Shoua
This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by 
ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c |   38 ++
 1 files changed, 38 insertions(+)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 
15:02:10.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:24:30.913343981 
+0300
@@ -1277,6 +1277,26 @@ static int bond_compute_features(struct 
return 0;
 }
 
+
+static void bond_setup_by_slave(struct net_device *bond_dev,
+   struct net_device *slave_dev)
+{
+   bond_dev-hard_header   = slave_dev-hard_header;
+   bond_dev-rebuild_header= slave_dev-rebuild_header;
+   bond_dev-hard_header_cache = slave_dev-hard_header_cache;
+   bond_dev-header_cache_update   = slave_dev-header_cache_update;
+   bond_dev-hard_header_parse = slave_dev-hard_header_parse;
+
+   bond_dev-neigh_setup   = slave_dev-neigh_setup;
+
+   bond_dev-type  = slave_dev-type;
+   bond_dev-hard_header_len   = slave_dev-hard_header_len;
+   bond_dev-addr_len  = slave_dev-addr_len;
+
+   memcpy(bond_dev-broadcast, slave_dev-broadcast,
+   slave_dev-addr_len);
+}
+
 /* enslave device slave to bond device master */
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 {
@@ -1351,6 +1371,24 @@ int bond_enslave(struct net_device *bond
goto err_undo_flags;
}
 
+   /* set bonding device ether type by slave - bonding netdevices are
+* created with ether_setup, so when the slave type is not ARPHRD_ETHER
+* there is a need to override some of the type dependent attribs/funcs.
+*
+* bond ether type mutual exclusion - don't allow slaves of dissimilar
+* ether type (eg ARPHRD_ETHER and ARPHRD_INFINIBAND) share the same 
bond
+*/
+   if (bond-slave_cnt == 0) {
+   if (slave_dev-type != ARPHRD_ETHER)
+   bond_setup_by_slave(bond_dev, slave_dev);
+   } else if (bond_dev-type != slave_dev-type) {
+   printk(KERN_ERR DRV_NAME : %s ether type (%d) is different 
from 
+   other slaves (%d), can not enslave it.\n, 
slave_dev-name,
+   slave_dev-type, bond_dev-type);
+   res = -EINVAL;
+   goto err_undo_flags;
+   }
+
if (slave_dev-set_mac_address == NULL) {
printk(KERN_ERR DRV_NAME
: %s: Error: The slave device you specified does 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 4/7] net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()

2007-07-30 Thread Moni Shoua
This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c |   88 +++-
 drivers/net/bonding/bonding.h   |1 

 2 files changed, 61 insertions(+), 28 deletions(-)
Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-29 
16:24:30.913343981 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-29 16:36:53.234602471 
+0300
@@ -1127,6 +1127,14 @@ void bond_change_active_slave(struct bon
if (new_active) {
bond_set_slave_active_flags(new_active);
}
+
+   /* when bonding does not set the slave MAC address, the bond MAC
+* address is the one of the active slave.
+*/
+   if (new_active  !bond-do_set_mac_addr)
+   memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
+   new_active-dev-addr_len);
+
bond_send_gratuitous_arp(bond);
}
 }
@@ -1390,13 +1398,22 @@ int bond_enslave(struct net_device *bond
}
 
if (slave_dev-set_mac_address == NULL) {
-   printk(KERN_ERR DRV_NAME
-   : %s: Error: The slave device you specified does 
-   not support setting the MAC address. 
-   Your kernel likely does not support slave 
-   devices.\n, bond_dev-name);
-   res = -EOPNOTSUPP;
-   goto err_undo_flags;
+   if (bond-slave_cnt == 0) {
+   printk(KERN_WARNING DRV_NAME
+   : %s: Warning: The first slave device you 
+   specified does not support setting the MAC 
+   address. This bond MAC address would be that 
+   of the active slave.\n, bond_dev-name);
+   bond-do_set_mac_addr = 0;
+   } else if (bond-do_set_mac_addr) {
+   printk(KERN_ERR DRV_NAME
+   : %s: Error: The slave device you specified 
+   does not support setting the MAC addres,.
+   but this bond uses this practice. \n
+   , bond_dev-name);
+   res = -EOPNOTSUPP;
+   goto err_undo_flags;
+   }
}
 
new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
@@ -1417,16 +1434,18 @@ int bond_enslave(struct net_device *bond
 */
memcpy(new_slave-perm_hwaddr, slave_dev-dev_addr, ETH_ALEN);
 
-   /*
-* Set slave to master's mac address.  The application already
-* set the master's mac address to that of the first slave
-*/
-   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
-   addr.sa_family = slave_dev-type;
-   res = dev_set_mac_address(slave_dev, addr);
-   if (res) {
-   dprintk(Error %d calling set_mac_address\n, res);
-   goto err_free;
+   if (bond-do_set_mac_addr) {
+   /*
+* Set slave to master's mac address.  The application already
+* set the master's mac address to that of the first slave
+*/
+   memcpy(addr.sa_data, bond_dev-dev_addr, bond_dev-addr_len);
+   addr.sa_family = slave_dev-type;
+   res = dev_set_mac_address(slave_dev, addr);
+   if (res) {
+   dprintk(Error %d calling set_mac_address\n, res);
+   goto err_free;
+   }
}
 
res = netdev_set_master(slave_dev, bond_dev);
@@ -1651,9 +1670,11 @@ err_close:
dev_close(slave_dev);
 
 err_restore_mac:
-   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
-   addr.sa_family = slave_dev-type;
-   dev_set_mac_address(slave_dev, addr);
+   if (bond-do_set_mac_addr) {
+   memcpy(addr.sa_data, new_slave-perm_hwaddr, ETH_ALEN);
+   addr.sa_family = slave_dev-type;
+   dev_set_mac_address(slave_dev, addr);
+   }
 
 err_free:
kfree(new_slave);
@@ -1831,10 +1852,12 @@ int bond_release(struct net_device *bond
/* close slave before restoring its mac address */
dev_close(slave_dev);
 
-   /* restore original (permanent) mac address */
-   memcpy(addr.sa_data, slave-perm_hwaddr, ETH_ALEN);
-   addr.sa_family = 

[PATCH V3 5/7] net/bonding: Enable IP multicast for bonding IPoIB devices

2007-07-30 Thread Moni Shoua

Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast 
groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding 
device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another 
ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
Signed-off-by: Or Gerlitz [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c  |4 ++--
 drivers/net/bonding/bond_sysfs.c |6 ++
 2 files changed, 4 insertions(+), 6 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 
15:04:50.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:06:17.175820632 
+0300
@@ -1325,8 +1325,8 @@ int bond_enslave(struct net_device *bond
 
/* bond must be initialized by bond_open() before enslaving */
if (!(bond_dev-flags  IFF_UP)) {
-   dprintk(Error, master_dev is not up\n);
-   return -EPERM;
+   printk(KERN_WARNING DRV_NAME
+%s: master_dev is not up in bond_enslave\n, 
bond_dev-name);
}
 
/* already enslaved */
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-07-25 
14:18:12.0 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:06:17.176820452 
+0300
@@ -266,11 +266,9 @@ static ssize_t bonding_store_slaves(stru
 
/* Quick sanity check -- is the bond interface up? */
if (!(bond-dev-flags  IFF_UP)) {
-   printk(KERN_ERR DRV_NAME
-  : %s: Unable to update slaves because interface is 
down.\n,
+   printk(KERN_WARNING DRV_NAME
+  : %s: doing slave updates when interface is down.\n,
   bond-dev-name);
-   ret = -EPERM;
-   goto out;
}
 
/* Note:  We can't hold bond-lock here, as bond_create grabs it. */

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver

2007-07-30 Thread Moni Shoua
This patch series is the third version (see below link to V2) of the 
suggested changes to the bonding driver so it would be able to support 
non ARPHRD_ETHER netdevices for its High-Availability (active-backup) mode. 

The motivation is to enable the bonding driver on its HA mode to work with 
the IP over Infiniband (IPoIB) driver. With these patches I was able to enslave 
IPoIB netdevices and run TCP, UDP, IP (UDP) Multicast and ICMP traffic with 
fail-over and fail-back working fine. The working environment was the net-2.6 
git. 

More over, as IPoIB is also the IB ARP provider for the RDMA CM driver which 
is used by native IB ULPs whose addressing scheme is based on IP (e.g. iSER, 
SDP, Lustre, NFSoRDMA, RDS), bonding support for IPoIB devices **enables** HA 
for these ULPs. This holds as when the ULP is informed by the IB HW on the 
failure of the current IB connection, it just need to reconnect, where the 
bonding device will now issue the IB ARP over the active IPoIB slave. 

This series also includes patches to the IPoIB driver that fix some fix 
some neighboring related issues. 

There are still 2 open issues here: 
1. When bonding enslaves an IPoIB device the bonding neighbor holds a 
reference to a cleanup function in the IPoIB drives. This makes it unsafe to 
unload the IPoIB module if there are bonding neighbors in the air. So, to 
avoid this race one must unload bonding before unloading IPoIB. 

2. Patch No. 7 is a workaround to a problem where gratuitous were not sent 
quite often.
I didn't find something better that fixes this and I would 
appreciate advices and comments regarding it. However, this doesn't seem to me 
as 
an issue related exclusively to IPoIB. 

Links to earlier discussion: 

1. A discussion in netdev about bonding support for IPoIB.
http://lists.openwall.net/netdev/2006/11/30/46

2. A discussion in openfabrics regarding changes in the IPoIB that 
enable using it as a slave for bonding.
http://lists.openfabrics.org/pipermail/general/2007-March/034033.html



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 6/7] net/bonding: Handlle wrong assumptions that slave is always an Ethernet device

2007-07-30 Thread Moni Shoua

bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c  |2 +-
 drivers/net/bonding/bond_sysfs.c |   10 --
 drivers/net/bonding/bonding.h|1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 
15:06:17.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-25 15:33:25.012883360 
+0300
@@ -1255,7 +1255,7 @@ static int bond_compute_features(struct 
unsigned long features = BOND_INTERSECT_FEATURES;
struct slave *slave;
struct net_device *bond_dev = bond-dev;
-   unsigned short max_hard_header_len = ETH_HLEN;
+   u16 max_hard_header_len = max((u16)ETH_HLEN, bond_dev-hard_header_len);
int i;
 
bond_for_each_slave(bond, slave, i) {
Index: net-2.6/drivers/net/bonding/bond_sysfs.c
===
--- net-2.6.orig/drivers/net/bonding/bond_sysfs.c   2007-07-25 
15:06:17.0 +0300
+++ net-2.6/drivers/net/bonding/bond_sysfs.c2007-07-25 15:20:10.224527636 
+0300
@@ -260,6 +260,7 @@ static ssize_t bonding_store_slaves(stru
char command[IFNAMSIZ + 1] = { 0, };
char *ifname;
int i, res, found, ret = count;
+   u32 original_mtu;
struct slave *slave;
struct net_device *dev = NULL;
struct bonding *bond = to_bond(d);
@@ -325,6 +326,7 @@ static ssize_t bonding_store_slaves(stru
}
 
/* Set the slave's MTU to match the bond */
+   original_mtu = dev-mtu;
if (dev-mtu != bond-dev-mtu) {
if (dev-change_mtu) {
res = dev-change_mtu(dev,
@@ -339,6 +341,9 @@ static ssize_t bonding_store_slaves(stru
}
rtnl_lock();
res = bond_enslave(bond-dev, dev);
+   bond_for_each_slave(bond, slave, i)
+   if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0)
+   slave-original_mtu=original_mtu;
rtnl_unlock();
if (res) {
ret = res;
@@ -351,6 +356,7 @@ static ssize_t bonding_store_slaves(stru
bond_for_each_slave(bond, slave, i)
if (strnicmp(slave-dev-name, ifname, IFNAMSIZ) == 0) {
dev = slave-dev;
+   original_mtu = slave-original_mtu;
break;
}
if (dev) {
@@ -365,9 +371,9 @@ static ssize_t bonding_store_slaves(stru
}
/* set the slave MTU to the default */
if (dev-change_mtu) {
-   dev-change_mtu(dev, 1500);
+   dev-change_mtu(dev, original_mtu);
} else {
-   dev-mtu = 1500;
+   dev-mtu = original_mtu;
}
}
else {
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-07-25 15:03:32.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-07-25 15:20:10.223527810 
+0300
@@ -156,6 +156,7 @@ struct slave {
s8 link;/* one of BOND_LINK_ */
s8 state;   /* one of BOND_STATE_ */
u32original_flags;
+   u32original_mtu;
u32link_failure_count;
u16speed;
u8 duplex;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure

2007-07-30 Thread Moni Shoua

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev-state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c |   25 +
 drivers/net/bonding/bonding.h   |1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c2007-07-25 
15:33:25.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c 2007-07-26 18:42:59.296296622 
+0300
@@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon
if (new_active  !bond-do_set_mac_addr)
memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
new_active-dev-addr_len);
-
-   bond_send_gratuitous_arp(bond);
+   if (bond-curr_active_slave 
+   test_bit(__LINK_STATE_LINKWATCH_PENDING, 
bond-curr_active_slave-dev-state)){
+   dprintk(delaying gratuitous arp on 
%s\n,bond-curr_active_slave-dev-name);
+   bond-send_grat_arp=1;
+   }else{
+   bond_send_gratuitous_arp(bond);
+   }
}
 }
 
@@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device 
 * program could monitor the link itself if needed.
 */
 
+   if (bond-send_grat_arp) {
+   if (bond-curr_active_slave  
test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state))
+   dprintk(Needs to send gratuitous arp but not 
yet\n,__FUNCTION__);
+   else {
+   dprintk(sending delayed gratuitous arp on 
ond-curr_active_slave-dev-name\n);
+   bond_send_gratuitous_arp(bond);
+   bond-send_grat_arp=0;
+   }
+   }
read_lock(bond-curr_slave_lock);
oldcurrent = bond-curr_active_slave;
read_unlock(bond-curr_slave_lock);
@@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str
struct slave *slave = bond-curr_active_slave;
struct vlan_entry *vlan;
struct net_device *vlan_dev;
+   int i;
 
dprintk(bond_send_grat_arp: bond %s slave %s\n, bond-dev-name,
slave ? slave-dev-name : NULL);
@@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str
return;
 
if (bond-master_ip) {
-   bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
- bond-master_ip, 0);
+   for (i=0;i3;i++)
+   bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
+ bond-master_ip, 0);
}
 
list_for_each_entry(vlan, bond-vlan_list, vlan_list) {
@@ -4331,6 +4347,7 @@ static int bond_init(struct net_device *
bond-current_arp_slave = NULL;
bond-primary_slave = NULL;
bond-dev = bond_dev;
+   bond-send_grat_arp=0;
INIT_LIST_HEAD(bond-vlan_list);
 
/* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h  2007-07-25 15:20:10.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h   2007-07-26 18:42:43.652087660 
+0300
@@ -203,6 +203,7 @@ struct bonding {
struct   vlan_group *vlgrp;
struct   packet_type arp_mon_pt;
s8   do_set_mac_addr;
+   int  send_grat_arp;
 };
 
 /**

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20-2.6.21 - networking dies after random time

2007-07-30 Thread Alan Cox
 So the whole locking is to be able to keep irqs enabled for a long time, 
 without risking entry of the same IRQ handler on this same CPU, correct?

As implemented - on any CPU.

We also need to know that the IRQ handler is not doing useful work on
another processor which is why we take the lock after disabling the
interrupt line everywhere. Without that we might be completing an IRQ on
another CPU and that would race the transmit and make a nasty mess.

 So it seems to me that maybe the driver could be surprised via these 
 spurious interrupts that happen right after the irq_enable(). Does the 
 patch below make any sense in your opinion?

For MMIO it does look like that may be needed. Looks sensible.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] Preparatory refactoring part 1.

2007-07-30 Thread Patrick McHardy
Corey Hickey wrote:
 diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
 index 9579573..8ae077f 100644
 --- a/net/sched/sch_sfq.c
 +++ b/net/sched/sch_sfq.c
 @@ -77,6 +77,9 @@
  #define SFQ_DEPTH128
  #define SFQ_HASH_DIVISOR 1024
  
 +#define SFQ_HEAD 0
 +#define SFQ_TAIL 1
 +
  /* This type should contain at least SFQ_DEPTH*2 values */
  typedef unsigned char sfq_index;
  
 @@ -244,10 +247,8 @@ static unsigned int sfq_drop(struct Qdisc *sch)
   return 0;
  }
  
 -static int
 -sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch)
 +static void sfq_q_enqueue(struct sk_buff *skb, struct sfq_sched_data *q, 
 unsigned int end)


Please make sure to break at 80 chars and to keep the style
in this file consistent (newline before function name).

  {
 - struct sfq_sched_data *q = qdisc_priv(sch);
   unsigned hash = sfq_hash(q, skb);
   sfq_index x;
  
 @@ -256,8 +257,12 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch)
   q-ht[hash] = x = q-dep[SFQ_DEPTH].next;
   q-hash[x] = hash;
   }
 - sch-qstats.backlog += skb-len;


Why not keep this instead of having both callers do it?

 - __skb_queue_tail(q-qs[x], skb);
 +
 + if (end == SFQ_TAIL)
 + __skb_queue_tail(q-qs[x], skb);
 + else
 + __skb_queue_head(q-qs[x], skb);
 +
   sfq_inc(q, x);
   if (q-qs[x].qlen == 1) {   /* The flow is new */
   if (q-tail == SFQ_DEPTH) { /* It is the first flow */
 @@ -270,12 +275,21 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch)
   q-tail = x;
   }
   }
 +}
 +
 +static int
 +sfq_enqueue(struct sk_buff *skb, struct Qdisc* sch)
 +{
 + struct sfq_sched_data *q = qdisc_priv(sch);


newline please.

 + sfq_q_enqueue(skb, q, SFQ_TAIL);
 + sch-qstats.backlog += skb-len;
   if (++sch-q.qlen  q-limit-1) {
   sch-bstats.bytes += skb-len;
   sch-bstats.packets++;
   return 0;
   }
  
 + sch-qstats.drops++;


sfq_drop already increments this.

   sfq_drop(sch);
   return NET_XMIT_CN;
  }
 @@ -284,28 +298,8 @@ static int
  sfq_requeue(struct sk_buff *skb, struct Qdisc* sch)
  {
   struct sfq_sched_data *q = qdisc_priv(sch);

newline please

 - unsigned hash = sfq_hash(q, skb);
 - sfq_index x;
 -
 - x = q-ht[hash];
 - if (x == SFQ_DEPTH) {
 - q-ht[hash] = x = q-dep[SFQ_DEPTH].next;
 - q-hash[x] = hash;
 - }
 + sfq_q_enqueue(skb, q, SFQ_HEAD);
   sch-qstats.backlog += skb-len;
 - __skb_queue_head(q-qs[x], skb);
 - sfq_inc(q, x);
 - if (q-qs[x].qlen == 1) {   /* The flow is new */
 - if (q-tail == SFQ_DEPTH) { /* It is the first flow */
 - q-tail = x;
 - q-next[x] = x;
 - q-allot[x] = q-quantum;
 - } else {
 - q-next[x] = q-next[q-tail];
 - q-next[q-tail] = x;
 - q-tail = x;
 - }
 - }
   if (++sch-q.qlen  q-limit - 1) {
   sch-qstats.requeues++;
   return 0;
 @@ -316,13 +310,8 @@ sfq_requeue(struct sk_buff *skb, struct Qdisc* sch)
   return NET_XMIT_CN;
  }
  
 -
 -
 -
 -static struct sk_buff *
 -sfq_dequeue(struct Qdisc* sch)
 +static struct sk_buff *sfq_q_dequeue(struct sfq_sched_data *q)


Keep style consistent please.

  {
 - struct sfq_sched_data *q = qdisc_priv(sch);
   struct sk_buff *skb;
   sfq_index a, old_a;
  
 @@ -335,8 +324,6 @@ sfq_dequeue(struct Qdisc* sch)
   /* Grab packet */
   skb = __skb_dequeue(q-qs[a]);
   sfq_dec(q, a);
 - sch-q.qlen--;
 - sch-qstats.backlog -= skb-len;
  
   /* Is the slot empty? */
   if (q-qs[a].qlen == 0) {
 @@ -353,6 +340,21 @@ sfq_dequeue(struct Qdisc* sch)
   a = q-next[a];
   q-allot[a] += q-quantum;
   }
 +
 + return skb;
 +}
 +
 +static struct sk_buff
 +*sfq_dequeue(struct Qdisc* sch)
 +{
 + struct sfq_sched_data *q = qdisc_priv(sch);
 + struct sk_buff *skb;
 +
 + skb = sfq_q_dequeue(q);
 + if (skb == NULL)
 + return NULL;
 + sch-q.qlen--;
 + sch-qstats.backlog -= skb-len;
   return skb;
  }
  

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] Preparatory refactoring part 2.

2007-07-30 Thread Patrick McHardy
Corey Hickey wrote:
 diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
 index 8ae077f..0c46938 100644
 --- a/net/sched/sch_sfq.c
 +++ b/net/sched/sch_sfq.c
 @@ -380,71 +380,71 @@ static void sfq_perturbation(unsigned long arg)
   }
  }
  
 -static int sfq_change(struct Qdisc *sch, struct rtattr *opt)
 +static int sfq_q_init(struct sfq_sched_data *q, struct rtattr *opt)
  {
 - struct sfq_sched_data *q = qdisc_priv(sch);
   struct tc_sfq_qopt *ctl = RTA_DATA(opt);
 - unsigned int qlen;
 + int i;
  
 - if (opt-rta_len  RTA_LENGTH(sizeof(*ctl)))
 + if (opt  opt-rta_len  RTA_LENGTH(sizeof(*ctl)))


opt is dereferenced above (RTA_DATA), so if it is NULL we've already
crashed.

   return -EINVAL;
  
 - sch_tree_lock(sch);
 - q-quantum = ctl-quantum ? : psched_mtu(sch-dev);
 - q-perturb_period = ctl-perturb_period*HZ;
 - if (ctl-limit)
 - q-limit = min_t(u32, ctl-limit, SFQ_DEPTH);
 + q-perturbation = 0;
 + q-max_depth = 0;
 + q-tail = q-limit = SFQ_DEPTH;
 + if (opt == NULL) {
 + q-perturb_period = 0;
 + } else {
 + struct tc_sfq_qopt *ctl = RTA_DATA(opt);
 + if (ctl-quantum)
 + q-quantum = ctl-quantum;
 + q-perturb_period = ctl-perturb_period*HZ;
  
 - qlen = sch-q.qlen;
 - while (sch-q.qlen = q-limit-1)
 - sfq_drop(sch);
 - qdisc_tree_decrease_qlen(sch, qlen - sch-q.qlen);


I hope that patch that makes changing possible brings this back ..
checking .. it doesn't. Please either keep this or fix up 6/7
to bring it back.

 + if (ctl-limit)
 + q-limit = min_t(u32, ctl-limit, SFQ_DEPTH);
 + }
  
 - del_timer(q-perturb_timer);
 - if (q-perturb_period) {
 - q-perturb_timer.expires = jiffies + q-perturb_period;
 - add_timer(q-perturb_timer);
 + for (i=0; iSFQ_HASH_DIVISOR; i++)
 + q-ht[i] = SFQ_DEPTH;
 + for (i=0; iSFQ_DEPTH; i++) {
 + skb_queue_head_init(q-qs[i]);
 + q-dep[i+SFQ_DEPTH].next = i+SFQ_DEPTH;
 + q-dep[i+SFQ_DEPTH].prev = i+SFQ_DEPTH;
   }
 - sch_tree_unlock(sch);
 +
 + for (i=0; iSFQ_DEPTH; i++)
 + sfq_link(q, i);
   return 0;
  }
  

 +static void sfq_q_destroy(struct sfq_sched_data *q)
 +{
 + del_timer(q-perturb_timer);
 +}
 +
  static void sfq_destroy(struct Qdisc *sch)
  {
   struct sfq_sched_data *q = qdisc_priv(sch);
 - del_timer(q-perturb_timer);
 + sfq_q_destroy(q);
  }

That really does look a bit pointless.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] Make qdisc changeable.

2007-07-30 Thread Patrick McHardy
Corey Hickey wrote:
 Re-implement sfq_change() and enable Qdisc_opts.change so tc qdisc
 change will work.
 
 Signed-off-by: Corey Hickey [EMAIL PROTECTED]
 ---
  net/sched/sch_sfq.c |   51 
 ++-
  1 files changed, 50 insertions(+), 1 deletions(-)
 
 diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
 index e6a6a21..e042cd0 100644
 --- a/net/sched/sch_sfq.c
 +++ b/net/sched/sch_sfq.c
 @@ -485,6 +485,55 @@ static int sfq_init(struct Qdisc *sch, struct rtattr 
 *opt)
   return 0;
  }
  
 +static int sfq_change(struct Qdisc *sch, struct rtattr *opt)
 +{
 + struct sfq_sched_data *q = qdisc_priv(sch);
 + struct sfq_sched_data tmp;
 + struct sk_buff *skb;
 + int err;
 + 
 + /* set up tmp queue */
 + memset(tmp, 0, sizeof(struct sfq_sched_data));
 + tmp.quantum = psched_mtu(sch-dev); /* default */


If no value is given it should use the old value instead of
reinitializing to the default.

 + if ((err = sfq_q_init(tmp, opt)))
 + return err;


This will also use defaults for all unspecified values. It would
be more consistent with other qdiscs to only change those values
that are actually specified, so something like tc qdisc change ...
perturb 10 will *only* change the perturbation parameter.
I'm not sure reusing the initialization function and copying the
parameters is the best way to do this.

 +
 + /* copy packets from the old queue to the tmp queue */
 + sch_tree_lock(sch);
 + while (sch-q.qlen = tmp.limit - 1)
 + sfq_drop(sch);
 + while ((skb = sfq_q_dequeue(q)) != NULL)
 + sfq_q_enqueue(skb, tmp, SFQ_TAIL);
 + 
 + /* clean up the old queue */
 + sfq_q_destroy(q);
 +
 + /* copy elements of the tmp queue into the old queue */
 + q-perturb_period = tmp.perturb_period;
 + q-quantum= tmp.quantum;
 + q-limit  = tmp.limit;
 + q-depth  = tmp.depth;
 + q-hash_divisor   = tmp.hash_divisor;
 + q-tail   = tmp.tail;
 + q-max_depth  = tmp.max_depth;
 + q-ht= tmp.ht;
 + q-dep   = tmp.dep;
 + q-next  = tmp.next;
 + q-allot = tmp.allot;
 + q-hash  = tmp.hash;
 + q-qs= tmp.qs;
 +
 + /* finish up */
 + if (q-perturb_period) {
 + q-perturb_timer.expires = jiffies + q-perturb_period;
 + add_timer(q-perturb_timer);
 + } else {
 + q-perturbation = 0;
 + }
 + sch_tree_unlock(sch);
 + return 0;
 +}
 +
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-30 Thread pravin
On 7/26/07, David Miller [EMAIL PROTECTED] wrote:
 From: Shannon Nelson [EMAIL PROTECTED]
 Date: Tue, 24 Jul 2007 17:36:06 -0700

  (repost - original eaten by vger?)
 
  Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was
  unreachable and thus unused.  The code originally was there to support
  in-kernel dma needs, but since it remains unused, we'll pull it out.
 
  Signed-off-by: Shannon Nelson [EMAIL PROTECTED]

 Applied, thanks Shannon.

NET_DMA on kernel buffer is pretty useful in ndb, iSCSI target and
initiators which uses kernel buffer to receive data. Are there any
other issues with dma-memcpy on kernel buffers, if not then following
patch makes dma_memcpy_to_kernel_iovec() reachable from tcp_recvmsg.
I tested this patch and it work fine with unh iSCSI target.


comments?

--pravin.

Index: linux-2.6.23-rc1/net/ipv4/tcp.c
===
--- linux-2.6.23-rc1.orig/net/ipv4/tcp.c2007-07-23 02:11:00.0 
+0530
+++ linux-2.6.23-rc1/net/ipv4/tcp.c 2007-07-30 17:43:51.0 +0530
@@ -1115,7 +1115,7 @@
int target; /* Read at least this many bytes */
long timeo;
struct task_struct *user_recv = NULL;
-   int copied_early = 0;
+   int copied_early = 0, kernel_dma = 0;
struct sk_buff *skb;

lock_sock(sk);
@@ -1154,6 +1154,9 @@
!sysctl_tcp_low_latency 
__get_cpu_var(softnet_data).net_dma) {
preempt_enable_no_resched();
+   if (segment_eq(get_fs(), KERNEL_DS))
+   kernel_dma = 1;
+   else
tp-ucopy.pinned_list =
dma_pin_iovec_pages(msg-msg_iov, len);
} else {
@@ -1362,7 +1365,7 @@

if (!(flags  MSG_TRUNC)) {
  #ifdef CONFIG_NET_DMA
-   if (!tp-ucopy.dma_chan  tp-ucopy.pinned_list)
+   if (!tp-ucopy.dma_chan   (kernel_dma || 
tp-ucopy.pinned_list))
tp-ucopy.dma_chan = get_softnet_dma();

if (tp-ucopy.dma_chan) {



 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: napi_struct V4

2007-07-30 Thread Roland Dreier
  If you have a means in the device (like tg3, bnx2, e1000, and a score
  of others do) to force the device to trigger a HW interrupt, that's
  what you do if you detect that events are pending after re-enabling
  interrupt in the -poll() handler.

It is possible to trigger an interrupt for IPoIB I think but it will
lead to a fair bit of code, since one would have to do something like
a dummy send operation to generate an event that leads to the
interrupt.  I think this will be rather complex/ugly to implement
because we have to make sure we have space in the send queue to post
the send operation, etc.

  Frankly I don't think the lock is a big deal and you need something
  like it anyways typically.

If I understand this correctly, you're suggesting a spin_lock_irqsave()
around the netif_rx_complete() in the poll routine, and a
corresponding lock in the interrupt handler.  That seems like a pretty
big step backwards for performance to me.  Especially since in my
experience, fast machines handling full-MTU traffic often end up being
basically interrupt driven because they drain the RX ring too quickly
to stay in NAPI polling.  Yes, it's only one more lock but look at
the tricky smp_mb() usage that tg3, bnx2, etc have to avoid using a
spinlock...

IPoIB can cope but it really seems like an unfortunate feature of
these changes that we can't do something like what we have today,
which imposes no overhead unless an event actually lands in the race
window.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] ehea: Fix workqueue handling

2007-07-30 Thread Thomas Klein
Fix: Workqueue ehea_driver_wq was not destroyed

Signed-off-by: Thomas Klein [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 8ee2c2c..d67f97b 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0072
+#define DRV_VERSIONEHEA_0073
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 58702f5..d43ab0f 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -3099,6 +3099,7 @@ out:
 
 static void __exit ehea_module_exit(void)
 {
+   destroy_workqueue(ehea_driver_wq);
driver_remove_file(ehea_driver.driver, driver_attr_capabilities);
ibmebus_unregister_driver(ehea_driver);
ehea_destroy_busmap();
-- 
1.5.2


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] ehea: Simplify resource usage check

2007-07-30 Thread Thomas Klein
Use shorter method to determine whether adapter has configured ports

Signed-off-by: Thomas Klein [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |   18 ++
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index d43ab0f..36ca322 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2165,24 +2165,18 @@ static int ehea_clean_all_portres(struct ehea_port 
*port)
return ret;
 }
 
-static void ehea_remove_adapter_mr (struct ehea_adapter *adapter)
+static void ehea_remove_adapter_mr(struct ehea_adapter *adapter)
 {
-   int i;
-
-   for (i=0; i  EHEA_MAX_PORTS; i++)
-   if (adapter-port[i])
-   return;
+   if (adapter-active_ports)
+   return;
 
ehea_rem_mr(adapter-mr);
 }
 
-static int ehea_add_adapter_mr (struct ehea_adapter *adapter)
+static int ehea_add_adapter_mr(struct ehea_adapter *adapter)
 {
-   int i;
-
-   for (i=0; i  EHEA_MAX_PORTS; i++)
-   if (adapter-port[i])
-   return 0;
+   if (adapter-active_ports)
+   return 0;
 
return ehea_reg_kernel_mr(adapter, adapter-mr);
 }
-- 
1.5.2



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] ehea: Eliminated some compiler warnings

2007-07-30 Thread Thomas Klein
Fixed wrongly casted pointers

Signed-off-by: Thomas Klein [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |   25 ++---
 1 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 36ca322..9756211 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1326,7 +1326,6 @@ static void write_swqe2_TSO(struct sk_buff *skb,
u8 *imm_data = swqe-u.immdata_desc.immediate_data[0];
int skb_data_size = skb-len - skb-data_len;
int headersize;
-   u64 tmp_addr;
 
/* Packet is TCP with TSO enabled */
swqe-tx_control |= EHEA_SWQE_TSO;
@@ -1347,9 +1346,8 @@ static void write_swqe2_TSO(struct sk_buff *skb,
/* set sg1entry data */
sg1entry-l_key = lkey;
sg1entry-len = skb_data_size - headersize;
-
-   tmp_addr = (u64)(skb-data + headersize);
-   sg1entry-vaddr = ehea_map_vaddr(tmp_addr);
+   sg1entry-vaddr =
+   ehea_map_vaddr(skb-data + headersize);
swqe-descriptors++;
}
} else
@@ -1362,7 +1360,6 @@ static void write_swqe2_nonTSO(struct sk_buff *skb,
int skb_data_size = skb-len - skb-data_len;
u8 *imm_data = swqe-u.immdata_desc.immediate_data[0];
struct ehea_vsgentry *sg1entry = swqe-u.immdata_desc.sg_entry;
-   u64 tmp_addr;
 
/* Packet is any nonTSO type
 *
@@ -1379,8 +1376,8 @@ static void write_swqe2_nonTSO(struct sk_buff *skb,
/* copy sg1entry data */
sg1entry-l_key = lkey;
sg1entry-len = skb_data_size - SWQE2_MAX_IMM;
-   tmp_addr = (u64)(skb-data + SWQE2_MAX_IMM);
-   sg1entry-vaddr = ehea_map_vaddr(tmp_addr);
+   sg1entry-vaddr =
+   ehea_map_vaddr(skb-data + SWQE2_MAX_IMM);
swqe-descriptors++;
}
} else {
@@ -1395,7 +1392,6 @@ static inline void write_swqe2_data(struct sk_buff *skb, 
struct net_device *dev,
struct ehea_vsgentry *sg_list, *sg1entry, *sgentry;
skb_frag_t *frag;
int nfrags, sg1entry_contains_frag_data, i;
-   u64 tmp_addr;
 
nfrags = skb_shinfo(skb)-nr_frags;
sg1entry = swqe-u.immdata_desc.sg_entry;
@@ -1417,9 +1413,9 @@ static inline void write_swqe2_data(struct sk_buff *skb, 
struct net_device *dev,
/* copy sg1entry data */
sg1entry-l_key = lkey;
sg1entry-len = frag-size;
-   tmp_addr =  (u64)(page_address(frag-page)
- + frag-page_offset);
-   sg1entry-vaddr = ehea_map_vaddr(tmp_addr);
+   sg1entry-vaddr =
+   ehea_map_vaddr(page_address(frag-page)
+  + frag-page_offset);
swqe-descriptors++;
sg1entry_contains_frag_data = 1;
}
@@ -1431,10 +1427,9 @@ static inline void write_swqe2_data(struct sk_buff *skb, 
struct net_device *dev,
 
sgentry-l_key = lkey;
sgentry-len = frag-size;
-
-   tmp_addr = (u64)(page_address(frag-page)
-+ frag-page_offset);
-   sgentry-vaddr = ehea_map_vaddr(tmp_addr);
+   sgentry-vaddr =
+   ehea_map_vaddr(page_address(frag-page)
+  + frag-page_offset);
swqe-descriptors++;
}
}
-- 
1.5.2


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Jan-Bernd Themann
Generic Large Receive Offload for TCP traffic

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 include/linux/inet_lro.h |  173 ++
 net/ipv4/inet_lro.c  |  590 ++
 2 files changed, 763 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..0957234
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,173 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+/*
+ * LRO statistics
+ */
+
+struct net_lro_stats {
+   unsigned long aggregated;
+   unsigned long flushed;
+   unsigned long no_desc;
+};
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev;
+   struct net_lro_stats stats;
+
+   /* LRO features */
+   unsigned long features;
+#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
+#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
+   from received packets and eth protocol
+   is still ETH_P_8021Q */
+
+   u32 ip_summed; /* Options to be set in generated SKB in page mode */
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /* hdr_flags: */
+#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
+#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
+ struct sk_buff *skb,
+ struct vlan_group *vgrp,
+ u16 vlan_tag,
+ void *priv);
+
+/*
+ * Processes a fragment list
+ *
+ * This functions aggregate fragments and generate SKBs do pass
+ * the packets to the stack.
+ *
+ * @lro_mgr: LRO manager to use
+ * @frags: Fragment to be processed. Must contain entire header in first
+ * element.
+ * @len: Length 

[PATCH 2/4][RFC] lro: Kconfig and Makefile

2007-07-30 Thread Jan-Bernd Themann
Kconfig and Makefile for LRO

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 net/ipv4/Kconfig  |8 
 net/ipv4/Makefile |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index fb79097..d894f61 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -394,6 +394,14 @@ config INET_XFRM_MODE_BEET
 
  If unsure, say Y.
 
+config INET_LRO
+   tristate Large Receive Offload (ipv4/tcp)
+
+   ---help---
+ Support for Large Receive Offload (ipv4/tcp).
+
+ If unsure, say Y.
+
 config INET_DIAG
tristate INET: socket monitoring interface
default y
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index fbf1674..a02c36d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_INET_ESP) += esp4.o
 obj-$(CONFIG_INET_IPCOMP) += ipcomp.o
 obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_tunnel.o
 obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o
+obj-$(CONFIG_INET_LRO) += inet_lro.o
 obj-$(CONFIG_INET_TUNNEL) += tunnel4.o
 obj-$(CONFIG_INET_XFRM_MODE_TRANSPORT) += xfrm4_mode_transport.o
 obj-$(CONFIG_INET_XFRM_MODE_TUNNEL) += xfrm4_mode_tunnel.o
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Jan-Bernd Themann
Hi,

this patch set contains the latest generic LRO code, a Kconfig / Makefile
and an eHEA patch demonstrating how the aggregate SKB interface has to
to be used.
Drew, could you provide a patch for the myri10ge driver to show how the
receive in page interface works?

Please check the Kconfig / Makefile patch. Is that the right place for
the LRO entries?

There is still one open question for the receive in page mode:
How many data (length) has to be copied to skb-data for packets that
do not work for LRO (other protocols?). Currently I choose 64 as default.
Is that ok?

Thanks,
Jan-Bernd

[PATCH 1/4][RFC] lro: Generic Large Receive Offload for TCP traffic
[PATCH 2/4][RFC] lro: Kconfig and Makefile
[PATCH 3/4][RFC] ehea: LRO support
[PATCH 4/4][RFC] ehea: Kconfig


Changes to http://www.spinics.net/lists/netdev/msg36912.html

1) A new field called features has been added to the net_lro_mgr struct.
   It is set by the driver to indicate:
   - LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack

   - LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN
packets but does not modify ETH protocol (ETH_P_8021Q)

2) Padded frames are not aggregated for now. Bug fixed

3) Correct header length now used. No minimal header length for aggregated
   packets used anymore.

4) Statistic counters were introduced. They are stored in a new struct in
   the net_lro_mgr. This has the advantage that no locking is required in
   cases where the driver uses multiple lro_mgrs for different receive queues.
   Thus we get the following statistics per lro_mgr / eth device:
   - Number of aggregated packets
   - Number of flushed packets
   - Number of times we run out of lro_desc.

   The ratio of aggregated packets and flushed packets give you an
   idea how well LRO is working.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4][RFC] ehea: LRO support

2007-07-30 Thread Jan-Bernd Themann
Added LRO support using the SKB aggregate interface

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h |9 -
 drivers/net/ehea/ehea_ethtool.c |   15 +++
 drivers/net/ehea/ehea_main.c|   82 +++---
 3 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..70e33fe 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -33,13 +33,14 @@
 #include linux/ethtool.h
 #include linux/vmalloc.h
 #include linux/if_vlan.h
+#include linux/inet_lro.h
 
 #include asm/ibmebus.h
 #include asm/abs_addr.h
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -58,6 +59,7 @@
 
 #define EHEA_SMALL_QUEUES
 #define EHEA_NUM_TX_QP 1
+#define EHEA_LRO_MAX_AGGR 64
 
 #ifdef EHEA_SMALL_QUEUES
 #define EHEA_MAX_CQE_COUNT  1023
@@ -84,6 +86,8 @@
 #define EHEA_RQ2_PKT_SIZE   1522
 #define EHEA_L_PKT_SIZE 256/* low latency */
 
+#define MAX_LRO_DESCRIPTORS 8
+
 /* Send completion signaling */
 
 /* Protection Domain Identifier */
@@ -376,6 +380,8 @@ struct ehea_port_res {
u64 tx_packets;
u64 rx_packets;
u32 poll_counter;
+   struct net_lro_mgr lro_mgr;
+   struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS];
 };
 
 
@@ -427,6 +433,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u32 lro_max_aggr;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c
index decec8c..29ef7a9 100644
--- a/drivers/net/ehea/ehea_ethtool.c
+++ b/drivers/net/ehea/ehea_ethtool.c
@@ -183,6 +183,9 @@ static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = {
{PR5 free_swqes},
{PR6 free_swqes},
{PR7 free_swqes},
+   {LRO aggregated},
+   {LRO flushed},
+   {LRO no_desc},
 };
 
 static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -239,6 +242,18 @@ static void ehea_get_ethtool_stats(struct net_device *dev,
for (k = 0; k  8; k++)
data[i++] = atomic_read(port-port_res[k].swqe_avail);
 
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.aggregated;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.flushed;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.no_desc;
+   data[i++] = tmp;
+
 }
 
 const struct ethtool_ops ehea_ethtool_ops = {
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 9756211..41bc075 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -52,6 +52,8 @@ static int rq2_entries = EHEA_DEF_ENTRIES_RQ2;
 static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
+static int use_lro = 0;
+static int lro_max_aggr = EHEA_LRO_MAX_AGGR;
 static int num_tx_qps = EHEA_NUM_TX_QP;
 
 module_param(msg_level, int, 0);
@@ -60,6 +62,8 @@ module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
 module_param(use_mcs, int, 0);
+module_param(use_lro, int, 0);
+module_param(lro_max_aggr, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
@@ -77,6 +81,10 @@ MODULE_PARM_DESC(sq_entries,  Number of entries for the 
Send Queue  
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ));
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 1 );
+MODULE_PARM_DESC(lro_max_aggr,  LRO: Max packets to be aggregated. Default = 
+__MODULE_STRING(EHEA_LRO_MAX_AGGR));
+MODULE_PARM_DESC(use_lro,  Large Receive Offload, 1: enable, 0: disable, 
+ Default = 0);
 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
@@ -389,6 +397,60 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, 
int rq,
return 0;
 }
 
+static int get_skb_hdr(struct sk_buff *skb, void **iphdr,
+  void **tcph, u64 *hdr_flags, void *priv)
+{
+   struct ehea_cqe *cqe = priv;
+   unsigned int ip_len;
+   struct iphdr *iph;
+
+/* non tcp/udp packets */
+   if (!cqe-header_length)
+   return -1;
+
+/* non tcp packet */
+   skb_reset_network_header(skb);
+   iph = ip_hdr(skb);
+   if (iph-protocol != IPPROTO_TCP)
+   return -1;
+
+   ip_len = ip_hdrlen(skb);
+   skb_set_transport_header(skb, ip_len);
+   *tcph = tcp_hdr(skb);
+
+/* check if ip header and tcp header are complete */
+   

[PATCH 4/4][RFC] ehea: Kconfig

2007-07-30 Thread Jan-Bernd Themann
Kconfig changes for LRO

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f8a602c..fec4004 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2399,6 +2399,7 @@ config CHELSIO_T3
 config EHEA
tristate eHEA Ethernet support
depends on IBMEBUS
+   select INET_LRO
---help---
  This driver supports the IBM pSeries eHEA ethernet adapter.
 
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: source interface ping bug ?

2007-07-30 Thread Ben Greear

nano bug wrote:

Can someone have a look a this and tell if it's kernel related or if I
posted this in the wrong place ? Thanks.
  

Last I checked, ping did not do an SO_BINDTODEVICE even if you did -i ethX.
I think it just looked up the IP for that port and treated it as -i a.b.c.d.

That said, I'm not sure why the behaviour changes for you between kernel
releases.

Maybe an 'strace' of your ping command on the different kernels would help
figure out what the problem is?

Ben

--
Ben Greear [EMAIL PROTECTED] 
Candela Technologies Inc  http://www.candelatech.com



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET]: fix multicast list when cloning sockets

2007-07-30 Thread Flavio Leitner

The sock_copy() function uses memcpy() to clone the socket
including the struct ip_mc_socklist *mc_list pointer.

The ip_mc_drop_socket() function is called when socket is closed
to free these objects leaving the other sockets cloned from the
same master socket with invalid pointers.

This patch sets mc_list of cloned socket to NULL.

Signed-off-by: Flavio Leitner [EMAIL PROTECTED]

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index fbe7714..8ee0f54 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -506,6 +506,8 @@ struct sock *inet_csk_clone(struct sock *sk, const struct 
request_sock *req,
newicsk-icsk_backoff = 0;
newicsk-icsk_probes_out  = 0;
 
+   inet_sk(inet)-mc_list = NULL;
+
/* Deinitialize accept_queue to trap illegal accesses. */
memset(newicsk-icsk_accept_queue, 0, 
sizeof(newicsk-icsk_accept_queue));
 
-- 
1.5.2.4

-- 
Flavio
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Jeff Garzik
Seems pretty good to me, save for one minor detail:  patches #1/#2 
should be combined together for greater git-bisect happiness.  Ditto for 
patches #3/#4.  Largely harmless in this case, but keeps the git history 
pollution to a minimum.


Caveat reviewer:  I'm not an expert of net/ipv4/* code, so I reviewed 
largely from the driver API perspective.


David, thoughts on merging?  I'm not We could stick this into your tree 
or mine.  Whether yours or mine, I would like to keep the driver and 
net-core patches together in the same git tree.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: source interface ping bug ?

2007-07-30 Thread Patrick McHardy
nano bug wrote:
 [...]
 using source interface :
 
 [EMAIL PROTECTED]:~/iputils# ./ping -I eth2 87.248.113.14
 PING 87.248.113.14 (87.248.113.14) from 86.106.19.75 eth2: 56(84) bytes of 
 data.
From 86.106.19.75 icmp_seq=1 Destination Host Unreachable

 [EMAIL PROTECTED]:~# tcpdump -i eth2 -vvv -n host 87.248.113.14 and host
 86.106.19.75
 tcpdump: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
 01:19:24.292911 arp who-has 87.248.113.14 tell 86.106.19.75


Are you using (or running) NAT locally? What do your routing tables look
like?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile

2007-07-30 Thread Stephen Hemminger
On Mon, 30 Jul 2007 17:24:45 +0200
Jan-Bernd Themann [EMAIL PROTECTED] wrote:

 Kconfig and Makefile for LRO
 
 Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]
 
 ---
  net/ipv4/Kconfig  |8 
  net/ipv4/Makefile |1 +
  2 files changed, 9 insertions(+), 0 deletions(-)
 
 diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
 index fb79097..d894f61 100644
 --- a/net/ipv4/Kconfig
 +++ b/net/ipv4/Kconfig
 @@ -394,6 +394,14 @@ config INET_XFRM_MODE_BEET
  
 If unsure, say Y.
  
 +config INET_LRO
 + tristate Large Receive Offload (ipv4/tcp)
 +
 + ---help---
 +   Support for Large Receive Offload (ipv4/tcp).
 +
 +   If unsure, say Y.
 +

Why make this a user selectable option at all? Unless you want
to deal with out of tree drivers (not my problem), it should be hidden
to avoid having to explain an support it.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/3] 2.6.23-rc1: known regressions v3

2007-07-30 Thread Michal Piotrowski
Hi all,

Here is a list of some known regressions in 2.6.23-rc1.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

List of Aces

NameRegressions fixed since 21-Jun-2007
Adrian Bunk6
Andi Kleen 4
Andrew Morton  4
Linus Torvalds 4
Al Viro3
Jens Axboe 3
Tejun Heo  3
David Woodhouse2
Hugh Dickins   2



CPUFREQ/ACPI

Subject : ide problems: 2.6.22-git17 working, 2.6.23-rc1* is not
References  : http://lkml.org/lkml/2007/7/27/298
  http://lkml.org/lkml/2007/7/29/371
Last known good : ?
Submitter   : dth [EMAIL PROTECTED]
Caused-By   : ?
Handled-By  : ?
Status  : unknown



Networking

Subject : sky2 boot crash in sky2_mac_intr
References  : http://lkml.org/lkml/2007/7/24/91
Last known good : ?
Submitter   : Florian Lohoff [EMAIL PROTECTED]
Caused-By   : 
Handled-By  : Stephen Hemminger [EMAIL PROTECTED]
Status  : unknown

Subject : New wake ups from sky2
References  : http://lkml.org/lkml/2007/7/20/386
Last known good : ?
Submitter   : Thomas Meyer [EMAIL PROTECTED]
Caused-By   : Stephen Hemminger [EMAIL PROTECTED]
  commit eb35cf60e462491249166182e3e755d3d5d91a28
Handled-By  : Stephen Hemminger [EMAIL PROTECTED]
Status  : unknown



Power Management

Subject : New ACPI error/warning with Linus' latest GIT
References  : http://lkml.org/lkml/2007/7/26/395
Last known good : ?
Submitter   : Ismail Dönmez [EMAIL PROTECTED]
Caused-By   : ?
Handled-By  : ?
Status  : unknown



SCSI

Subject : lpfc_sli.c: off-by-10
References  : http://lkml.org/lkml/2007/7/22/284
Last known good : ?
Submitter   : Adrian Bunk [EMAIL PROTECTED]
Caused-By   : ?
Handled-By  : ?
Status  : unknown



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile

2007-07-30 Thread Jeff Garzik

Stephen Hemminger wrote:

Why make this a user selectable option at all? Unless you want
to deal with out of tree drivers (not my problem), it should be hidden
to avoid having to explain an support it.



In this case it's an optional library kernel module.  That seems to be a 
common setup for library modules.  We do the same with CONFIG_MII and 
drivers/net/mii.ko as well.


Originally it was done purely in the Makefile, but that does not account 
for net drivers in sub-directories (or out of tree as you point out).


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4][RFC] ehea: LRO support

2007-07-30 Thread Kok, Auke

Jan-Bernd Themann wrote:

Added LRO support using the SKB aggregate interface

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h |9 -
 drivers/net/ehea/ehea_ethtool.c |   15 +++
 drivers/net/ehea/ehea_main.c|   82 +++---
 3 files changed, 98 insertions(+), 8 deletions(-)


snip


+module_param(use_lro, int, 0);
+module_param(lro_max_aggr, int, 0);


this should obviously (and Stephen H. probably agrees) be implemented in ethtool 
instead, and opens up the question as to what changes we need to add to ethtool. ..


Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Linas Vepstas
On Mon, Jul 30, 2007 at 05:24:33PM +0200, Jan-Bernd Themann wrote:
 
 Changes to http://www.spinics.net/lists/netdev/msg36912.html
 
 1) A new field called features has been added to the net_lro_mgr struct.
It is set by the driver to indicate:
- LRO_F_NAPI:Use NAPI / netif_rx to pass packets to stack
 
- LRO_F_EXTRACT_VLAN_ID: Set by driver if HW extracts VLAN IDs for VLAN
 packets but does not modify ETH protocol (ETH_P_8021Q)
 
 2) Padded frames are not aggregated for now. Bug fixed
 
 3) Correct header length now used. No minimal header length for aggregated
packets used anymore.
 
 4) Statistic counters were introduced. They are stored in a new struct in
the net_lro_mgr. This has the advantage that no locking is required in
cases where the driver uses multiple lro_mgrs for different receive queues.
Thus we get the following statistics per lro_mgr / eth device:
- Number of aggregated packets
- Number of flushed packets
- Number of times we run out of lro_desc.
 
The ratio of aggregated packets and flushed packets give you an
idea how well LRO is working.

I'd like to see an edited form of this, together with an introduction to
LRO, written up in the Documentation subdirectory.  

As someone with some driver experience, but not on te bleeding edge,
some basc newbie questions pop into mind:

-- what is LRO?
-- Basic principles of operation?
-- Can I use it in my driver?  
-- Does my hardware have to have some special feature before I can use it?
-- What sort of performance improvement does it provide? Throughput?
   Latency? CPU usage? How does it affect DMA allocation? Does it 
   improve only a certain type of traffic (large/small packets, etc.)
-- Example code? What's the API? How should my driver use it?

Right now, I can maybe find answers by doing lots of googling.  I'd like
to have some quick way of getting a grip on this.

--linas
   
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4][RFC] lro: Kconfig and Makefile

2007-07-30 Thread Kok, Auke

Jeff Garzik wrote:

Stephen Hemminger wrote:

Why make this a user selectable option at all? Unless you want
to deal with out of tree drivers (not my problem), it should be hidden
to avoid having to explain an support it.


In this case it's an optional library kernel module.  That seems to be a 
common setup for library modules.  We do the same with CONFIG_MII and 
drivers/net/mii.ko as well.


Originally it was done purely in the Makefile, but that does not account 
for net drivers in sub-directories (or out of tree as you point out).


speaking of that, shouldn't there be a NETIF_F_LRO ?

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fib_trie: whitespace cleanup

2007-07-30 Thread Paul E. McKenney
On Thu, Jul 26, 2007 at 09:56:30PM -0700, Andrew Morton wrote:
 On Thu, 26 Jul 2007 08:44:21 -0700 Paul E. McKenney [EMAIL PROTECTED] 
 wrote:
 
  On Thu, Jul 26, 2007 at 11:49:42AM +0100, Stephen Hemminger wrote:
   Whitespace cleanup run code through lindent then cleanup results.
   Applys after other two patches.
   
   --- a/net/ipv4/fib_trie.c 2007-07-26 10:17:21.0 +0100
   +++ b/net/ipv4/fib_trie.c 2007-07-26 11:47:52.0 +0100
   @@ -156,7 +156,8 @@ struct trie {
};
   
static void put_child(struct trie *t, struct tnode *tn, int i, struct 
   node *n);
   -static void tnode_put_child_reorg(struct tnode *tn, int i, struct node 
   *n, int wasfull);
   +static void tnode_put_child_reorg(struct tnode *tn, int i, struct node 
   *n,
   +   int wasfull);
static struct node *resize(struct trie *t, struct tnode *tn);
static struct tnode *inflate(struct trie *t, struct tnode *tn);
static struct tnode *halve(struct trie *t, struct tnode *tn);
   @@ -167,13 +168,12 @@ static struct trie *trie_local = NULL, *
   
static inline struct tnode *node_parent(struct node *node)
{
   - return rcu_dereference((struct tnode *) (node-parent  
   ~NODE_TYPE_MASK));
   + return rcu_dereference((struct tnode *)(node-parent  
   ~NODE_TYPE_MASK));
  
  The potential issue is applying rcu_dereference() to an rvalue
  as opposed to an lvalue.  So how about the following?
  
 
 I did this:
 
 static inline struct tnode *node_parent(struct node *node)
 {
   struct tnode *ret;
 
   ret = (struct tnode *)(node-parent  ~NODE_TYPE_MASK);
   return rcu_dereference(ret);
 }

I would feel more comfortable with the rcu_dereference() covering the
initial fetch from node-parent, but do not have any hard objections
to your approach.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6 0/2] [TCP]: Fix bidirectional brokeness

2007-07-30 Thread Ilpo Järvinen
Hi Dave,

While testing frto with bidirection TCP a while (months) ago, I 
encountered time-seq graphs which made absolutely no sense as if 
recoveries only completed after RTO. As a result, I noticed that 
rate-halving has problem when a flow is bidirection but testing the
patch has been on my todo list for ages.  ...Finally, here it is...
...Please think the first one a bit because there might be some
corner cases with reno.

While testing it I came across with an additional issue that can
occur with bidirectional traffic. And, even better, the second fix
seems to also solves a third issue which affects both unidirectional
and bidirectional flows, though it's a marginal case (cumulative
ACK that causes a larger number of new SACKed skbs). I've never
seen that third one to occur but it's there if you get enough ACK
losses, subsequent ACKs (if one gets them) do solve it so that's
not a very bad issue to begin with...

I've verified these from time-seq graphs on top of tcp-2.6, which
had some additional (mostly cleanup and the rebase I promised you
earlier) though I added the tp-fackets_out  tp-reordering check
afterwards as it seems necessary to avoid going to lost marker too
often (wouldn't have had any effect in my test case anyway).

...Please consider to net-2.6 and to stable too.

These will generate you some hassle when you rebase tcp-2.6. Btw, you 
forgot to push tcp-2.6 out last time though I could assume it's
state... :-) In case you're going to now push it out, could you please
drop [TCP]: Remove num_acked0 checks from cong.ctrl mods pkts_acked
from it as it seems to be out of place in tcp-2.6, I can resubmit it
to net-2.6.24 when you open it (unless you want to put it to net-2.6 
directly as it's rather trivial one).


-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6 1/2] [TCP]: Fix ratehalving with bidirectional flows

2007-07-30 Thread Ilpo Järvinen

Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.

After this patch, rate halving is performed for ACK only if
packets in flight was supposedly changed too.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   23 +--
 1 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0163051..767f92c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1848,19 +1848,22 @@ static inline u32 tcp_cwnd_min(const struct sock *sk)
 }
 
 /* Decrease cwnd each second ack. */
-static void tcp_cwnd_down(struct sock *sk)
+static void tcp_cwnd_down(struct sock *sk, int flag)
 {
struct tcp_sock *tp = tcp_sk(sk);
int decr = tp-snd_cwnd_cnt + 1;
+   
+   if ((flagFLAG_FORWARD_PROGRESS) ||
+   (IsReno(tp)  !(flagFLAG_NOT_DUP))) {
+   tp-snd_cwnd_cnt = decr1;
+   decr = 1;
 
-   tp-snd_cwnd_cnt = decr1;
-   decr = 1;
+   if (decr  tp-snd_cwnd  tcp_cwnd_min(sk))
+   tp-snd_cwnd -= decr;
 
-   if (decr  tp-snd_cwnd  tcp_cwnd_min(sk))
-   tp-snd_cwnd -= decr;
-
-   tp-snd_cwnd = min(tp-snd_cwnd, tcp_packets_in_flight(tp)+1);
-   tp-snd_cwnd_stamp = tcp_time_stamp;
+   tp-snd_cwnd = min(tp-snd_cwnd, tcp_packets_in_flight(tp)+1);
+   tp-snd_cwnd_stamp = tcp_time_stamp;
+   }
 }
 
 /* Nothing was retransmitted or returned timestamp is less
@@ -2057,7 +2060,7 @@ static void tcp_try_to_open(struct sock *sk, int flag)
}
tcp_moderate_cwnd(tp);
} else {
-   tcp_cwnd_down(sk);
+   tcp_cwnd_down(sk, flag);
}
 }
 
@@ -2257,7 +2260,7 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 
if (is_dupack || tcp_head_timedout(sk))
tcp_update_scoreboard(sk);
-   tcp_cwnd_down(sk);
+   tcp_cwnd_down(sk, flag);
tcp_xmit_retransmit_queue(sk);
 }
 
-- 
1.5.0.6


[PATCH net-2.6 2/2] [TCP]: Bidir flow must not disregard SACK blocks for lost marking

2007-07-30 Thread Ilpo Järvinen

It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).

I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 767f92c..cfe6ac7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2109,7 +2109,10 @@ tcp_fastretrans_alert(struct sock *sk, u32 prior_snd_una,
 {
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
-   int is_dupack = (tp-snd_una == prior_snd_una  !(flagFLAG_NOT_DUP));
+   int is_dupack = (tp-snd_una == prior_snd_una 
+(!(flagFLAG_NOT_DUP) ||
+ ((flagFLAG_DATA_SACKED) 
+  (tp-fackets_out  tp-reordering;
 
/* Some technical things:
 * 1. Reno does not count dupacks (sacked_out) automatically. */
-- 
1.5.0.6


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Andrew Gallatin


Here is a quick reply before something more official can
be written up:

Linas Vepstas wrote:

 -- what is LRO?

Large Receive Offload

 -- Basic principles of operation?

LRO is analogous to a receive side version of TSO.  The NIC (or
driver) merges several consecutive segments from the same connection,
fixing up checksums, etc.  Rather than up to 45 separate 1500 byte
frames (meaning up to 45 trips through the network stack), the driver
merges them into one 65212 byte frame.  It currently works only
with TCP over IPv4.

LRO was, AFAIK, first though of by Neterion.  They had a paper about
it at OLS2005.
http://www.linuxinsight.com/files/ols2005/grossman-reprint.pdf

 -- Can I use it in my driver?

Yes, it can be used in any driver.

 -- Does my hardware have to have some special feature before I can 
use it?


No.

 -- What sort of performance improvement does it provide? Throughput?
Latency? CPU usage? How does it affect DMA allocation? Does it
improve only a certain type of traffic (large/small packets, etc.)

The benefit is directly proportional to the packet rate.

See my reply to the previous RFC for performance information.  The
executive summary is that for the myri10ge 10GbE driver on low end
hardware with 1500b frames, I've seen it increase throughput by a
factor of nearly 2.5x, while at the same time reducing CPU utilization
by 17%.  The affect for jumbo frames is less dramatic, but still
impressive (1.10x, 14% CPU reduction)

You can achieve better speedups if your driver receives into
high-order pages.

 -- Example code? What's the API? How should my driver use it?

The 3/4 in this patch showed an example of converting a driver
to use LRO for skb based receive buffers.   I'm working on
a patch for myri10ge that shows how you would use it in a driver
which receives into pages.

Cheers,

Drew
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-30 Thread Christoph Hellwig
On Mon, Jul 30, 2007 at 08:09:53PM +0530, pravin wrote:
 comments?


please avoid using plain sendmsg/recvmsg from kernelspace.  Adding these
segment checks is more than a bad hack and not something that should
clutter up fasthpathes.  Add to that problems with kthreads vs blocking
recvmsg and you have a bigger mess than you can handle.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SMSC LAN911x and LAN921x vendor driver

2007-07-30 Thread Steve . Glendinning
Hi Peter,

 What's the problem with Dustin's driver? It seems to work fine here
 with a lan9117. Why not just add lan921x support to the existing
 driver?

I have heard Dustin's driver works very well on PXA, but on others it 
doesn't even compile (hence why it depends on ARCH_PXA).

Dustin's driver was based on the smc91x code, but these two ethernet 
devices share nothing other than a similar name.  smsc911x is a new, 
completely platform-independent driver with workarounds for several 
hardware issues, and it doesn't suffer from quite as much macro abuse :-)

Regards,
--
Steve Glendinning
SMSC GmbH
m: +44 777 933 9124
e: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linksys Gigabit USB2.0 adapter (asix) regression

2007-07-30 Thread David Hollis
On Mon, 2007-07-30 at 11:27 +0200, Erik Slagter wrote:

 They are either garbled are they are not passed on the wire. The
 transmitted packets are shown by tshark, but a tshark run on the other
 end of the line does not show them.
 
 Platform is indeed x86, to be precise: fedora 7, kernel 2.6.22-rc6, cpu
 pentium M, dell laptop inspiron 9300, ICH6.
 
 If you want me to test something please yell, it's no trouble at all to
 change a few lines in the driver's source and recompile the module.
 

Could you send me a complete dmesg dump when the driver is compiled with
DEBUG enabled (at least from then usb logs that the device was inserted
to the end).  I'll need to see what it reports the values of the
registers.

Have you tried using the F7 2.6.22 kernel?  I know that has worked fine
for me on my system.

 Please note I cannot send mail to you: (conversation with
 dhollis.dyndns.org[71.251.104.159] timed out while sending MAIL FROM)

I've fixed that issue so my mail delivery isn't sporadic.

-- 
David Hollis [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TULIP] Need new maintainer

2007-07-30 Thread Valerie Henson
The Tulip network driver needs a new maintainer!  I no longer have
time to maintain the Tulip network driver and I'm stepping down.  Jeff
Garzik would be happy to get volunteers.

The only current major outstanding patch I know of is Grant's shutdown
race patch, which was incorrectly dropped as obsoleted from -mm (my
fault, I was moving at the time):

http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html

I have a very much non-working patch to do it with the preferred
order, ask me for it and I'll see if I can dig it up.  It's unpleasant
partly because it pointed out a lot of latent bugs (e.g.,
del_timer_sync() in interrupt context).

Also, someone is working on support for an emulated Tulip card (yes,
Tulip will _never_ die), so expect possible patches for that.

-VAL
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] tulip: Remove tulip maintainer

2007-07-30 Thread Valerie Henson
Remove Val Henson as tulip maintainer and let her roam free, FREE!

Signed-off-by: Val Henson [EMAIL PROTECTED]

--- linux-2.6.orig/MAINTAINERS
+++ linux-2.6/MAINTAINERS
@@ -3569,11 +3569,9 @@ W:   http://www.auk.cx/tms380tr/
 S: Maintained
 
 TULIP NETWORK DRIVER
-P: Valerie Henson
-M: [EMAIL PROTECTED]
 L: [EMAIL PROTECTED]
 W: http://sourceforge.net/projects/tulip/
-S: Maintained
+S: Orphan
 
 TUN/TAP driver
 P: Maxim Krasnyansky
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] possible deadlock in tulip driver

2007-07-30 Thread Valerie Henson
(No longer maintainer, btw.)

What situation have you tested this under?  Thanks,

-VAL

On Tue, Jul 24, 2007 at 11:49:08AM +0400, Denis V. Lunev wrote:
 Calling flush_scheduled_work() may deadlock if called under rtnl_lock
 (from dev-stop) as linkwatch_event() may be on the workqueue and it will try
 to get the rtnl_lock
 
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
 ---
 
  tulip_core.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 --- ./drivers/net/tulip/tulip_core.c.tulip2007-07-16 12:54:29.0 
 +0400
 +++ ./drivers/net/tulip/tulip_core.c  2007-07-23 19:06:24.0 +0400
 @@ -726,8 +726,6 @@ static void tulip_down (struct net_devic
   void __iomem *ioaddr = tp-base_addr;
   unsigned long flags;
  
 - flush_scheduled_work();
 -
   del_timer_sync (tp-timer);
  #ifdef CONFIG_TULIP_NAPI
   del_timer_sync (tp-oom_timer);
 @@ -1788,6 +1786,8 @@ static void __devexit tulip_remove_one (
   if (!dev)
   return;
  
 + flush_scheduled_work();
 +
   tp = netdev_priv(dev);
   unregister_netdev(dev);
   pci_free_consistent (pdev,
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TULIP] Need new maintainer

2007-07-30 Thread Kyle McMartin
On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote:
 The Tulip network driver needs a new maintainer!  I no longer have
 time to maintain the Tulip network driver and I'm stepping down.  Jeff
 Garzik would be happy to get volunteers.
 

Since I already take care of a major consumer of these devices (parisc,
which pretty much all have tulip) I'm willing to take care of this.
Alternately, Grant is probably willing.

Cheers,
Kyle
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in write_bulk_callback() in drivers/net/usb/pegasus.c

2007-07-30 Thread Jeff Garzik

Micah Gruber wrote:
This patch fixes a potential null dereference bug where we dereference 
pegasus before a null check. This patch simply moves the dereferencing 
after the null check.


Signed-off-by: Micah Gruber [EMAIL PROTECTED]

---

--- a/drivers/net/usb/pegasus.c
+++ b/drivers/net/usb/pegasus.c
@@ -768,11 +768,13 @@
static void write_bulk_callback(struct urb *urb)
{
   pegasus_t *pegasus = urb-context;
-   struct net_device *net = pegasus-net;
+   struct net_device *net;

   if (!pegasus)
   return;

+   net = pegasus-net;
+


applied (manually)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gfar: Fix modpost warning

2007-07-30 Thread Jeff Garzik

Kumar Gala wrote:

Fix the following modpost warning:

WARNING: vmlinux.o(.init.text+0x1aa6c): Section mismatch: reference to 
.exit.text:gfar_mdio_exit (between 'gfar_init' and 'gfar_mdio_init')

Signed-off-by: Kumar Gala [EMAIL PROTECTED]
---
 drivers/net/gianfar_mii.c |2 +-
 drivers/net/gianfar_mii.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][netdrvr] lib8390: comment on locking by Alan Cox Re: 2.6.20-2.6.21 - networking dies after random time

2007-07-30 Thread Jeff Garzik

Jarek Poplawski wrote:

Hi,

Very below is my patch proposal with a comment, which in my opinion
is precious enough to save it for future help in reading and
understanding the code.

I hope Alan will not blame me I've not asked for his permission before
sending, and he would ack this patch as it is or at least most of this.

Thanks  regards,
Jarek P.

On Wed, Jul 25, 2007 at 03:46:56PM +0100, Alan Cox wrote:

The code in question lib8390.c does

disable_irq();
fiddle_with_the_network_card_hardware()
enable_irq();

...

No idea how this affects the network card, as the code there must be
able to handle interrupts, which are not originated from the card due to
interrupt sharing.

I think, in this last yesterday's patch Ingo could be right, yet!
The comment at the beginnig points this is done like that because
of chip's slowness. And problems with timing are mysterious.

On the other hand author of this code didn't use spin_lock_irqsave
for some reason, probably after testing this option too. So, I hope
this is the right path, but alas, I'm not sure this patch has to
prove this 100%.

The author (me) didn't use spin_lock_irqsave because the slowness of the
card means that approach caused horrible problems like losing serial data
at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA
chips with FPGA front ends.


Anyway, in my opinion this situation where interrupts could/have_to
be used for such strange things should confirm the need of more
options for handling irqs individually.

Ok the logic behind the 8390 is very simple:

Things to know
- IRQ delivery is asynchronous to the PCI bus
- Blocking the local CPU IRQ via spin locks was too slow
- The chip has register windows needing locking work

So the path was once (I say once as people appear to have changed it
in the mean time and it now looks rather bogus if the changes to use
disable_irq_nosync_irqsave are disabling the local IRQ)


Take the page lock
Mask the IRQ on chip
Disable the IRQ (but not mask locally- someone seems to have
broken this with the lock validator stuff)
[This must be _nosync as the page lock may otherwise
deadlock us]
Drop the page lock and turn IRQs back on

At this point an existing IRQ may still be running but we can't
get a new one

Take the lock (so we know the IRQ has terminated) but don't mask
the IRQs on the processor
Set irqlock [for debug]

Transmit (slow as )

re-enable the IRQ


We have to use disable_irq because otherwise you will get delayed
interrupts on the APIC bus deadlocking the transmit path.

Quite hairy but the chip simply wasn't designed for SMP and you can't
even ACK an interrupt without risking corrupting other parallel
activities on the chip.

Alan


--

From: Jarek Poplawski [EMAIL PROTECTED]

Subject: lib8390: comment on locking by Alan Cox

Additional explanation of problems with locking by Alan Cox.

Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]
Cc: Alan Cox [EMAIL PROTECTED]
Cc: Paul Gortmaker [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tulip: Remove tulip maintainer

2007-07-30 Thread Jeff Garzik

Valerie Henson wrote:

Remove Val Henson as tulip maintainer and let her roam free, FREE!

Signed-off-by: Val Henson [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] forcedeth: mac address correct

2007-07-30 Thread Jeff Garzik

Ayaz Abdulla wrote:

Resending:

In older chipsets, the mac address was stored in reversed order.
However, in newer chipsets, the mac address is in correct order. This
patch takes those newer chipsets into account and does not rely on a
special bit setup by BIOS'.

Signed-Off-By: Ayaz Abdulla [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND 1/2] netxen: re-init station address after h/w init

2007-07-30 Thread Jeff Garzik

[EMAIL PROTECTED] wrote:

This is a workaround for firmware bug with 2nd port of multiport adapter,
where MAC address is reset. Driver just needs to overwrite it with the
value read from PROM.

Signed-off-by: Dhananjay Phadke [EMAIL PROTECTED]


applied 1-2


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netxen: drop redudant spinlock

2007-07-30 Thread Jeff Garzik

Dhananjay Phadke wrote:

Some leftover code that makes use of adapter-lock in tx_timeout function,
which resets the interface under this lock. In close() when the workqueue
is flushed, prints the warning about sleeping with interrupts disabled
(when spinlock debug is enabled). The lock was required with private netxen
IOCTLs, which were removed a while ago.


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22 1/3]S2io: Mask spurious interrupts

2007-07-30 Thread Jeff Garzik

Ramkrishna Vepa wrote:

- Mask single and double bit ETQ ecc errors to inhibit spurious interrupts.

(Resending; Removed HTML sections in the patch)

Signed-off-by: Santosh Rastapur [EMAIL PROTECTED]


applied 1-3


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in mace_interrupt() in drivers/net/pcmcia/nmclan_cs.c

2007-07-30 Thread Jeff Garzik

Micah Gruber wrote:
This patch fixes a potential null dereference bug where we dereference 
DEV before a null check. This patch simply moves the dereferencing after 
the null check.


Signed-off-by: Micah Gruber [EMAIL PROTECTED]


applied (git-am worked!)


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22 1/4]S2IO: Removing 3 buffer mode support from the driver

2007-07-30 Thread Jeff Garzik

Veena Parat wrote:

 - Removed 3 buffer mode support from driver - unused feature
 - Incorporated Jeff Garzik's comments on elimination of inline typecasting
 - Code cleanup : Removed a few extra spaces

Signed-off-by: Veena Parat [EMAIL PROTECTED]


applied 1-4


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Stephen Hemminger
Could someone add this to:
http://linux-net.osdl.org Wiki?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [drivers/net/cxgb3] removed several unneeded zero initilization

2007-07-30 Thread Jeff Garzik

Denis Cheng wrote:

Cc: [EMAIL PROTECTED]
Signed-off-by: Denis Cheng [EMAIL PROTECTED]
---
 drivers/net/cxgb3/cxgb3_main.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET] IOC3: Switch hw checksumming to ethtool configurable.

2007-07-30 Thread Jeff Garzik

Ralf Baechle wrote:

Signed-off-by: Ralf Baechle [EMAIL PROTECTED]


applied to #upstream (2.6.24)



I've previously sent out this patch a long time ago.  At that time I was
told NETIF_F_IP_CSUM wouldn't make any sense without NETIF_F_SG.  IOC3's
S/G abilities are very limited; it can do upto three segments of which
the first one is upto 104 bytes and part of the packet's TX ring entry,
the second and 3rd ones can be anywhere in the 64-bit PCI address space
but may not cross a 16kB page boundary.  So setting NETIF_F_SG isn't
really an option unless the IOC3 was going to linearize any packet it
can't cope with itself.

So the big question, does NETIF_F_IP_CSUM without NETIF_F_SG make sense?


Conventional wisdom has always been that NETIF_F_SG is required if 
NETIF_F_*CSUM is present, and vice versa.


I admit I've not verified this in the past year or two.

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.23 1/2] Make the iw_cxgb3 module parameters writable.

2007-07-30 Thread Roland Dreier
ugh, missed these before my last merge...

anyway:

why do we want to parameters writable?  a good changelog tells me
what, why and how, and this changelog just covered the what.  Also,
I assume you've checked that it's OK for these variables to change at
any time?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 7/7] net/bonding: Delay sending of gratuitous ARP to avoid failure

2007-07-30 Thread Jay Vosburgh

Moni Shoua [EMAIL PROTECTED] wrote:

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev-state field is on. This improves the chances for the arp packet to
be transmitted.

Under what circumstances were you seeing problems that delaying
the gratuitous ARP until linkwatch is done improves things?  Is this
really an IB thing, or did you experience problems here over regular
ethernet?

Signed-off-by: Moni Shoua [EMAIL PROTECTED]
---
 drivers/net/bonding/bond_main.c |   25 +
 drivers/net/bonding/bonding.h   |1 +
 2 files changed, 22 insertions(+), 4 deletions(-)

Index: net-2.6/drivers/net/bonding/bond_main.c
===
--- net-2.6.orig/drivers/net/bonding/bond_main.c   2007-07-25 
15:33:25.0 +0300
+++ net-2.6/drivers/net/bonding/bond_main.c2007-07-26 18:42:59.296296622 
+0300
@@ -1134,8 +1134,13 @@ void bond_change_active_slave(struct bon
   if (new_active  !bond-do_set_mac_addr)
   memcpy(bond-dev-dev_addr,  new_active-dev-dev_addr,
   new_active-dev-addr_len);
-
-  bond_send_gratuitous_arp(bond);
+  if (bond-curr_active_slave 
+  test_bit(__LINK_STATE_LINKWATCH_PENDING, 
bond-curr_active_slave-dev-state)){
+  dprintk(delaying gratuitous arp on 
%s\n,bond-curr_active_slave-dev-name);
+  bond-send_grat_arp=1;
+  }else{
+  bond_send_gratuitous_arp(bond);
+  }

Style issues throughout the patch series: many lines are too
long, many things are all smashed together, e.g., }else{ instead of
} else {, send_grat_arp=1 instead of send_grat_arp = 1, and so on.

   }
 }

@@ -2120,6 +2125,15 @@ void bond_mii_monitor(struct net_device 
* program could monitor the link itself if needed.
*/

+  if (bond-send_grat_arp) {
+  if (bond-curr_active_slave  
test_bit(__LINK_STATE_LINKWATCH_PENDING, bond-curr_active_slave-dev-state))
+  dprintk(Needs to send gratuitous arp but not 
yet\n,__FUNCTION__);
+  else {
+  dprintk(sending delayed gratuitous arp on 
ond-curr_active_slave-dev-name\n);
+  bond_send_gratuitous_arp(bond);
+  bond-send_grat_arp=0;
+  }
+  }


   read_lock(bond-curr_slave_lock);
   oldcurrent = bond-curr_active_slave;
   read_unlock(bond-curr_slave_lock);
@@ -2513,6 +2527,7 @@ static void bond_send_gratuitous_arp(str
   struct slave *slave = bond-curr_active_slave;
   struct vlan_entry *vlan;
   struct net_device *vlan_dev;
+  int i;

   dprintk(bond_send_grat_arp: bond %s slave %s\n, bond-dev-name,
   slave ? slave-dev-name : NULL);
@@ -2520,8 +2535,9 @@ static void bond_send_gratuitous_arp(str
   return;

   if (bond-master_ip) {
-  bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
-bond-master_ip, 0);
+  for (i=0;i3;i++)
+  bond_arp_send(slave-dev, ARPOP_REPLY, bond-master_ip,
+bond-master_ip, 0);
   }

If you delay the grat ARP until linkwatch is done, why is it
also necessary to shotgun several ARPs instead of one?  Why are the ARPs
sent for VLANs not also shotgunned in a similar fashion?

If shotgunning like this really is useful, would it not make
more sense to space them out a bit, e.g., over successive monitor
passes?

   list_for_each_entry(vlan, bond-vlan_list, vlan_list) {
@@ -4331,6 +4347,7 @@ static int bond_init(struct net_device *
   bond-current_arp_slave = NULL;
   bond-primary_slave = NULL;
   bond-dev = bond_dev;
+  bond-send_grat_arp=0;
   INIT_LIST_HEAD(bond-vlan_list);

   /* Initialize the device entry points */
Index: net-2.6/drivers/net/bonding/bonding.h
===
--- net-2.6.orig/drivers/net/bonding/bonding.h 2007-07-25 15:20:10.0 
+0300
+++ net-2.6/drivers/net/bonding/bonding.h  2007-07-26 18:42:43.652087660 
+0300
@@ -203,6 +203,7 @@ struct bonding {
   struct   vlan_group *vlgrp;
   struct   packet_type arp_mon_pt;
   s8   do_set_mac_addr;
+  int  send_grat_arp;

This need not be a full int, and (this applies to
do_set_mac_addr, also) could probably be squeezed into gaps already
existing within the struct bonding somewhere.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed interface for per-packet mesh-ttl

2007-07-30 Thread Javier Cardona
Hi Stephen,

On 7/27/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
 In this case perhaps you can have a table that maps skb-priority to
 mesh ttl? priorty can already by handled by existing setsockopt calls,
 and modified by netfilter and QoS managements.

Thanks for the feedback.  IMHO overloading the [SOL_SOCKET]
SO_PRIORITY  sockoption for something as different as changing the
mesh-ttl is quite a stretch.   But I admit that it meets the driver
requirements of never add sockopt and not [ab]use netfilter hooks.
 If that's acceptable to all, I'll modify the patch in that direction.

Cheers,

Javier
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Andrew Gallatin

Jan-Bernd Themann wrote:

Hi,

this patch set contains the latest generic LRO code, a Kconfig / Makefile
and an eHEA patch demonstrating how the aggregate SKB interface has to
to be used.
Drew, could you provide a patch for the myri10ge driver to show how the
receive in page interface works?


Hi,

Thank you for the many fixes, and especially for the counters!

I was working on testing the myri10ge patch, and I ran into a few
problems.  I've attached a patch to inet_lro.c to fix some of them,
and a patch to myri10ge.c to show how to use the page based
interface.  Both patches are signed off by Andrew Gallatin
[EMAIL PROTECTED]

First, the LRO_MAX_PG_HLEN is still a problem.  Minimally sized 60
byte frames still cause problems in lro_gen_skb due to skb-len
going negative.  Fixed in the attached patch.  It may be simpler
to just drop LRO_MAX_PG_HLEN to ETH_ZLEN, but I'm not sure if
that is enough.  Are there smart NICs which might chop off padding
themselves?

Second, you still need to set skb-ip_summed = CHECKSUM_UNNECESSARY
when modified packets are flushed, else the stack will see bad
checksums for packets from CHECKSUM_COMPLETE drivers using the
skb interface.  Fixed in the attached patch.

Third, for some reason I have yet to figure out, this version of the
patch takes a while to ramp up, but only when the receiver MTU
is 9000 and the sender MTU is 1500.  If the receiver MTU is also
1500, even a 10 second netperf easily shows line rate.  I don't
see this with our LRO, and I'm so far at a loss to explain it.

Here is the first 3 seconds of  output from a simple program
which diffs the interface counters once / sec.
   Ipkts   IBytesOpkts   Obytes

rx MTU=9000:
   0000
  7299092   14 1102
 105   1589707  420
  750324   113598708417804  1068240
  814427   123304247818529  740



rx MTU=1500
   0000
  44134466818016813132   788182
  815938   123532865618716  1122960
  817698   123799477218628  1117680
.

Once it has ramped up, the bandwidth is fine, and there
doesn't seem to be much difference between setting the
receiver MTU to 1500 or 9000.  But it takes a very long
netperf run to overcome the ramp up period.

Fourth, I did some traffic sniffing to try to figure out what's going
on above, and saw tcpdump complain about bad checksums.  Have you tried
running tcpdump -s 65535 -vvv?  Have you also seen bad checksums?
I seem to see this for both page- and skb-based versions of the driver.

Last, the pointer to the TCP header in __lro_proc_segment() in the
unaccelerated vlan hdr case needs to also be offset by vlan_hdr_len
from skb-data.   I've fixed this in the attached patch.



Thanks,

Drew
--- linux-2.6.23-rc1.orig/net/ipv4/inet_lro.c   2007-07-30 13:10:49.0 
-0400
+++ linux-2.6.23-rc1/net/ipv4/inet_lro.c2007-07-30 15:17:51.0 
-0400
@@ -126,6 +126,7 @@ static void lro_update_tcp_ip_header(str
htons(lro_desc-ip_tot_len) -
IP_HDR_LEN(iph), IPPROTO_TCP,
lro_desc-data_csum);
+   lro_desc-parent-ip_summed = CHECKSUM_UNNECESSARY;
 }
 
 static __wsum lro_tcp_data_csum(struct iphdr *iph, struct tcphdr *tcph, int 
len)
@@ -396,6 +397,9 @@ static struct sk_buff *lro_gen_skb(struc
if (!skb)
return NULL;
 
+   if (hlen  len)
+   hlen = len;
+
skb-len = len;
skb-data_len = len - hlen;
skb-truesize += true_size;
diff -urp a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
--- a/drivers/net/myri10ge/myri10ge.c   2007-07-24 15:57:12.0 -0400
+++ b/drivers/net/myri10ge/myri10ge.c   2007-07-30 16:12:06.0 -0400
@@ -48,6 +48,7 @@
 #include linux/etherdevice.h
 #include linux/if_ether.h
 #include linux/if_vlan.h
+#include linux/inet_lro.h
 #include linux/ip.h
 #include linux/inet.h
 #include linux/in.h
@@ -62,6 +63,8 @@
 #include linux/io.h
 #include linux/log2.h
 #include net/checksum.h
+#include net/ip.h
+#include net/tcp.h
 #include asm/byteorder.h
 #include asm/io.h
 #include asm/processor.h
@@ -89,6 +92,7 @@ MODULE_LICENSE(Dual BSD/GPL);
 
 #define MYRI10GE_EEPROM_STRINGS_SIZE 256
 #define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2)
+#define MYRI10GE_MAX_LRO_DESCRIPTORS 8
 
 #define MYRI10GE_NO_CONFIRM_DATA htonl(0x)
 #define MYRI10GE_NO_RESPONSE_RESULT 0x
@@ -151,6 +155,8 @@ struct myri10ge_rx_done {
dma_addr_t bus;
int cnt;
int idx;
+   struct net_lro_mgr lro_mgr;
+   struct net_lro_desc lro_desc[MYRI10GE_MAX_LRO_DESCRIPTORS];
 };
 
 struct myri10ge_priv {
@@ -276,6 +282,14 @@ static int myri10ge_debug = -1;/* defau
 module_param(myri10ge_debug, int, 0);
 

RE: [PATCH] NET_DMA: remove unused dma_memcpy_to_kernel_iovec

2007-07-30 Thread Waskiewicz Jr, Peter P
 On 7/26/07, David Miller [EMAIL PROTECTED] wrote:
  From: Shannon Nelson [EMAIL PROTECTED]
  Date: Tue, 24 Jul 2007 17:36:06 -0700
 
   (repost - original eaten by vger?)
  
   Al Viro pointed out that dma_memcpy_to_kernel_iovec() really was 
   unreachable and thus unused.  The code originally was there to 
   support in-kernel dma needs, but since it remains unused, 
 we'll pull it out.
  
   Signed-off-by: Shannon Nelson [EMAIL PROTECTED]
 
  Applied, thanks Shannon.
 
 NET_DMA on kernel buffer is pretty useful in ndb, iSCSI 
 target and initiators which uses kernel buffer to receive 
 data. Are there any other issues with dma-memcpy on kernel 
 buffers, if not then following patch makes 
 dma_memcpy_to_kernel_iovec() reachable from tcp_recvmsg.
 I tested this patch and it work fine with unh iSCSI target.
 
 
 comments?
 
 --pravin.

Pravin,
Currently, NET_DMA only has one provider in the kernel, namely
TCP.  As the driver stands now, I'd like to keep this function out since
it really wasn't being used before.  Shannon has posted patches that
will hopefully make it into 2.6.24 that allow multiple clients to
request DMA channels, so things like iSCSI, ndb, UDP, and NFS can ask
for channels without depending on TCP.  Once we get to that point, if we
see the need for the function, we can add it back.
We have tried getting iSCSI accelerated through NET_DMA already,
and didn't see the performance boost that made it worthwhile to do.
Once the multiple client channel patches are in, we can revisit that,
since it can be done right.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed interface for per-packet mesh-ttl

2007-07-30 Thread Stephen Hemminger
On Mon, 30 Jul 2007 13:37:20 -0700
Javier Cardona [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 On 7/27/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
  In this case perhaps you can have a table that maps skb-priority to
  mesh ttl? priorty can already by handled by existing setsockopt calls,
  and modified by netfilter and QoS managements.
 
 Thanks for the feedback.  IMHO overloading the [SOL_SOCKET]
 SO_PRIORITY  sockoption for something as different as changing the
 mesh-ttl is quite a stretch.   But I admit that it meets the driver
 requirements of never add sockopt and not [ab]use netfilter hooks.
  If that's acceptable to all, I'll modify the patch in that direction.
 
 Cheers,
 
 Javier

Take it as idea not a proscription.  You can use TC rules to rewrite
the priority.  It is kind of like how priority is being mapped to TOS already
in softmac. Alternatively, the TTL can be set with sockopt, but it is a IP so
it would need an IP ttl to mesh mapping. The fundamental thing is to try
and avoid topology specific options bleeding all the way up the socket layer,
especially since the network layer is involved and may need to multipath.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/7] net/bonding: ADD IPoIB support for the bonding driver

2007-07-30 Thread Roland Dreier
  1. When bonding enslaves an IPoIB device the bonding neighbor holds a 
  reference to a cleanup function in the IPoIB drives. This makes it unsafe to 
  unload the IPoIB module if there are bonding neighbors in the air. So, to 
  avoid this race one must unload bonding before unloading IPoIB. 

I think we really want to resolve this somehow.  Getting an oops by
doing modprobe -r ipoib isn't that friendly.

Also, what happened to the problem of having an address handle
belonging to the wrong device on bond failover?  Did you figure out a
way to fix that one?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread David Miller
From: Jeff Garzik [EMAIL PROTECTED]
Date: Mon, 30 Jul 2007 12:17:58 -0400

 David, thoughts on merging?  I'm not We could stick this into your tree 
 or mine.  Whether yours or mine, I would like to keep the driver and 
 net-core patches together in the same git tree.

No objections and I'll stick it into my net-2.6.24 tree once I cut
that, I'll have the NAPI stuff in there too so I'll keep these two
merge nightmares out of your hair :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver fixes

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Jeff Garzik wrote:

 true, we should just remove the dev==NULL check

Patch below:

[PATCH] nmclan_cs: Remove bogus (dev==NULL) check in mace_interrupt()

The (dev == NULL) check in drivers/net/pcmcia/nmclan_cs.c:mace_interrupt()
handler is always false, so let's remove it.

Signed-off-by: Satyam Sharma [EMAIL PROTECTED]

---

 drivers/net/pcmcia/nmclan_cs.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/drivers/net/pcmcia/nmclan_cs.c b/drivers/net/pcmcia/nmclan_cs.c
index 73da611..3446be9 100644
--- a/drivers/net/pcmcia/nmclan_cs.c
+++ b/drivers/net/pcmcia/nmclan_cs.c
@@ -1000,12 +1000,6 @@ static irqreturn_t mace_interrupt(int irq, void *dev_id)
   int status;
   int IntrCnt = MACE_MAX_IR_ITERATIONS;
 
-  if (dev == NULL) {
-DEBUG(2, mace_interrupt(): irq 0x%X for unknown device.\n,
- irq);
-return IRQ_NONE;
-  }
-
   if (lp-tx_irq_disabled) {
 printk(
   (lp-tx_irq_disabled?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: napi_struct V4

2007-07-30 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Mon, 30 Jul 2007 08:04:24 -0700

 IPoIB can cope but it really seems like an unfortunate feature of
 these changes that we can't do something like what we have today,
 which imposes no overhead unless an event actually lands in the race
 window.

It is not a feature of these changes.

These races exist in the current code, it is a bug fix.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pktgen - define and use pktgen_dbg,err,warn,info

2007-07-30 Thread David Miller
From: Joe Perches [EMAIL PROTECTED]
Date: Wed, 18 Jul 2007 16:21:00 -0700

 On Wed, 2007-07-18 at 15:49 -0700, David Miller wrote:
  From: Joe Perches [EMAIL PROTECTED]
  Date: Wed, 18 Jul 2007 15:14:13 -0700
  
   -#define VERSION  pktgen v2.68: Packet Generator for packet performance 
   testing.\n
   +#define PKTGEN_NAMEpktgen
   +#define PKTGEN_VERSION v2.68
   +#define PKTGEN_DESCPacket Generator for packet performance testing
   +#define PKTGEN_FULLINFO PKTGEN_NAME   PKTGEN_VERSION :  PKTGEN_DESC 
   .\n
   ...
  
   -static char version[] __initdata = VERSION;
   -
  
  Using PKTGEN_FULLINFO explicitly won't get that string into the
  __initdata section as the version[] will, that's why people do it that
  way with the explicit version[] array.
  
  Please put it back, assign PKTGEN_FULLINFO to it, and use it.
 
 Done. Here you are.
 
 fyi:
 pktgen_info(version) fails compile,
 pktgen_info(%s, version) works.

I still don't know about this patch.  Instead of the simple
transformation:

-   printk(foo);
+   printk(KERN_INFO foo);

we get this new macro, and the lines changes to use that macro.

It seems very over-engineered and adds ifdefs to a foo.c file
which is severly frowned upon.

I'm going to add the needed KERN_* to pktgen.c, that's just so
much nicer a way to handle this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPVS]: Use skb_forward_csum

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 16:59:39 +0800

 [IPVS]: Use skb_forward_csum
 
 As a path that forwards packets, IPVS should be using
 skb_forward_csum instead of directly setting ip_summed
 to CHECKSUM_NONE.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Patch applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] [NET]: Call uninit if necessary in register_netdevice

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:08 +0800

 [NET]: Call uninit if necessary in register_netdevice
 
 This patch makes register_netdevice call dev-uninit if the regsitration
 fails after dev-init has completed successfully.  Very few drivers use
 the init/uninit calls but at least one (drivers/net/wan/sealevel.c) may
 leak without this change.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Looks great, applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] [NET]: Take dev_base_lock when moving device name hash list entry

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:30 +0800

 [NET]: Take dev_base_lock when moving device name hash list entry
 
 When we added name-based hashing the dev_base_lock was designated as the
 lock to take when changing the name hash list.  Unfortunately, because
 it was a preexisting lock that just happened to be taken in the right
 spots we neglected to take it in dev_change_name.
 
 The race can affect calles of __dev_get_by_name that do so without taking
 the RTNL.  They may end up walking down the wrong hash chain and end up
 missing the device that they're looking for.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Good catch, applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] [NET] loopback: Panic if registration fails

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:33 +0800

 [NET] loopback: Panic if registration fails
 
 Because IPv4 and IPv6 both depend on the presence of the loopback device
 to function, failure in registration the loopback device should be fatal.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied.

It might be nice to tightly integrate the loopback driver into
net/core/dev.c because that is what is happening anyways and it would
allow us to cons the thing up by hand and deal with these dependencies
more reliably.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Jeff Garzik

David Miller wrote:

From: Jeff Garzik [EMAIL PROTECTED]
Date: Mon, 30 Jul 2007 12:17:58 -0400

David, thoughts on merging?  I'm not We could stick this into your tree 
or mine.  Whether yours or mine, I would like to keep the driver and 
net-core patches together in the same git tree.


No objections and I'll stick it into my net-2.6.24 tree once I cut
that, I'll have the NAPI stuff in there too so I'll keep these two
merge nightmares out of your hair :-)


Works for me.

Thanks,

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] [NET]: Allow netdev REGISTER/CHANGENAME events to fail

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:36 +0800

 [NET]: Allow netdev REGISTER/CHANGENAME events to fail
 
 This patch adds code to allow errors to be passed up from event
 handlers of NETDEV_REGISTER and NETDEV_CHANGENAME.  It also adds
 the notifier_from_errno/notifier_to_errnor helpers to pass the
 errno value up to the notifier caller.
 
 If an error is detected when a device is registered, it causes
 that operation to fail.  A NETDEV_UNREGISTER will be sent to
 all event handlers.
 
 Similarly if NETDEV_CHANGENAME fails the original name is restored
 and a new NETDEV_CHANGENAME event is sent.
 
 As such all event handlers must be idempotent with respect to
 these events.
 
 When an event handler is registered NETDEV_REGISTER events are
 sent for all devices currently registered.  Should any of them
 fail, we will send NETDEV_GOING_DOWN/NETDEV_DOWN/NETDEV_UNREGISTER
 events to that handler for the devices which have already been
 registered with it.  The handler registration itself will fail.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied.

Interesting encoding scheme :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] [IPV4/IPV6]: Fail registration if inet device construction fails

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:38 +0800

 [IPV4/IPV6]: Fail registration if inet device construction fails
 
 Now that netdev notifications can fail, we can use this to signal
 errors during registration for IPv4/IPv6.  In particular, if we
 fail to allocate memory for the inet device, we can fail the netdev
 registration.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] [IPV6]: Remove circular dependency on if_inet6.h

2007-07-30 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 17:09:40 +0800

 [IPV6]: Remove circular dependency on if_inet6.h
 
 net/if_inet6.h includes linux/ipv6.h which also tries to include
 net/if_inet6.h.  Since the latter only needs it for forward
 declarations, we can fix this by adding the declarations.
 
 A number of files are implicitly including net/if_inet6.h through
 linux/ipv6.h.  They also use net/ipv6.h so this patch includes
 net/if_inet6.h there.
 
 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Also applied, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-30 Thread Andrew Morton
On Mon, 30 Jul 2007 08:17:41 +0530
Satyam Sharma [EMAIL PROTECTED] wrote:

 [0/9] netconsole: Multiple targets and dynamic reconfigurability
 
 This is v3 of the patchset

That all looks pretty reasonable, thanks.

Are we all comfortable with the attributions?  As I have it now, everything
will go in as having been authored by yourself.  There is no formal (ie:
understood-by-git) way of recording joint authorship, unfortunately.  But one
can make a record of such things in the changelog.

I'll queue this material up in the to-be-sent-to-davem queue.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-30 Thread Andrew Morton
On Mon, 30 Jul 2007 08:19:10 +0530
Satyam Sharma [EMAIL PROTECTED] wrote:

 +/*
 + * Wrapper over simple_strtol (base 10) with sanity and range checking.
 + * We return (signed) long only because we may want to return errors.
 + * Do not use this to convert numbers that are allowed to be negative.
 + */
 +static long strtol10_check_range(const char *cp, long min, long max)
 +{
 + long ret;
 + char *p = (char *) cp;
 +
 + WARN_ON(min  0);
 + WARN_ON(max  min);
 +
 + ret = simple_strtol(p, p, 10);
 +
 + if (*p  (*p != '\n')) {
 + printk(KERN_ERR netconsole: invalid input\n);
 + return -EINVAL;
 + }
 + if ((ret  min) || (ret  max)) {
 + printk(KERN_ERR netconsole: input %ld must be between 
 + %ld and %ld\n, ret, min, max);
 + return -EINVAL;
 + }
 +
 + return ret;
 +}

There's probably other code around the place which does this.  It might
be worth making this function a general-purpose thing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc

2007-07-30 Thread David Miller
From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
Date: Fri, 27 Jul 2007 09:22:11 -0700

   Are these queued for 2.6.24, or are they going to make it into
 2.6.23?  I know you're busy with patches and NAPI, but I was
 curious.  Thanks!

I've applied both fixes and will push them into 2.6.23
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [FIX] NET: Fix sch_api and sch_prio to properly set and detect the root qdisc

2007-07-30 Thread Waskiewicz Jr, Peter P
 -Original Message-
 From: David Miller [mailto:[EMAIL PROTECTED] 
 Sent: Monday, July 30, 2007 5:14 PM
 To: Waskiewicz Jr, Peter P
 Cc: netdev@vger.kernel.org; [EMAIL PROTECTED]
 Subject: Re: [FIX] NET: Fix sch_api and sch_prio to properly 
 set and detect the root qdisc
 
 From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
 Date: Fri, 27 Jul 2007 09:22:11 -0700
 
  Are these queued for 2.6.24, or are they going to make it into 
  2.6.23?  I know you're busy with patches and NAPI, but I 
 was curious.  
  Thanks!
 
 I've applied both fixes and will push them into 2.6.23


Many thanks Dave!

Cheers,
-PJ
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv6: ipv6_addr_type() doesn't know about RFC4193 addresses

2007-07-30 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 13:34:52 -0500 (CDT)

 In article [EMAIL PROTECTED] (at Wed, 25 Jul 2007 19:49:09 -0400), Dave 
 Johnson [EMAIL PROTECTED] says:
 
  ipv6_addr_type() doesn't check for 'Unique Local IPv6 Unicast
  Addresses' (RFC4193) and returns IPV6_ADDR_RESERVED for that range.
 
 Acked-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

Ok, I've applied Dave's patch.

Dave, although it's customary and fine to use +foo-list email
addresses for mailing list subscriptions and discussions, I
ask that you don't add that cookie to your signoff lines in
patch submissions and I've removed it from your's in this patch.

Thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 0/9] netconsole: Multiple targets and dynamic reconfigurability

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Andrew Morton wrote:

 On Mon, 30 Jul 2007 08:17:41 +0530
 Satyam Sharma [EMAIL PROTECTED] wrote:
 
  [0/9] netconsole: Multiple targets and dynamic reconfigurability
  
  This is v3 of the patchset
 
 That all looks pretty reasonable, thanks.
 
 Are we all comfortable with the attributions?  As I have it now, everything
 will go in as having been authored by yourself.  There is no formal (ie:
 understood-by-git) way of recording joint authorship, unfortunately.  But one
 can make a record of such things in the changelog.

Ok, making such a record would be good, if possible. I did see that the
author field for git commits can have only one entry, unfortunately,
and it turns out Signed-off-by and Acked-by have other meanings
attached that have to do with the patch-journey-from-author-to-you chain.
But I'd like these to be somehow marked as jointly authored, definitely.

 I'll queue this material up in the to-be-sent-to-davem queue.

Ok, thanks.


Satyam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC]: napi_struct V4

2007-07-30 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Thu, 26 Jul 2007 14:38:09 -0700

 If the driver wants a simple solution, it can do what you did in the
 patch:  wrap the tx cleanup code with netif_tx_lock() and
 netif_tx_unlock().
 
 If a NAPI driver wants to be more clever, it can do something such as
 this in tg3's poll_controller:
 
 if (netif_rx_schedule_prep(dev, tp-napi)) {
   tg3_tx(tp);
   netif_poll_enable(tp-napi);
 }

Thanks Michael, that's a good suggestion and would work.

In thinking about this some more over the weekend I've decided
that my plan to rip out RX support from netpoll is a bit too
ambitious.

Therefore, for the time being, I'm going to add a special driver
function for netpoll that will allow it to ask the driver to invoke
-poll() over all the NAPI structs assosciated with the netdev.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 -mm 9/9] netconsole: Support dynamic reconfiguration using configfs

2007-07-30 Thread Satyam Sharma


On Mon, 30 Jul 2007, Andrew Morton wrote:

 On Mon, 30 Jul 2007 08:19:10 +0530
 Satyam Sharma [EMAIL PROTECTED] wrote:
 
  +/*
  + * Wrapper over simple_strtol (base 10) with sanity and range checking.
  + * We return (signed) long only because we may want to return errors.
  + * Do not use this to convert numbers that are allowed to be negative.
  + */
  +static long strtol10_check_range(const char *cp, long min, long max)
  +{
  +   long ret;
  +   char *p = (char *) cp;
  +
  +   WARN_ON(min  0);
  +   WARN_ON(max  min);
  +
  +   ret = simple_strtol(p, p, 10);
  +
  +   if (*p  (*p != '\n')) {
  +   printk(KERN_ERR netconsole: invalid input\n);
  +   return -EINVAL;
  +   }
  +   if ((ret  min) || (ret  max)) {
  +   printk(KERN_ERR netconsole: input %ld must be between 
  +   %ld and %ld\n, ret, min, max);
  +   return -EINVAL;
  +   }
  +
  +   return ret;
  +}
 
 There's probably other code around the place which does this.  It might
 be worth making this function a general-purpose thing.

Probably, yes, I thought along similar lines.

[ BTW somewhere in the 9/9 patch I remember having to define a
  __U16_MAX as well. That could be pushed out to some generic
  or appropriate header as well, possibly. ]


Satyam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 1/3] PPPoE: improved hashing routine

2007-07-30 Thread David Miller
From: Florian Zumbiehl [EMAIL PROTECTED]
Date: Sun, 29 Jul 2007 08:04:23 +0200

 Hi,
 
 I'm not sure whether this is really worth it, but it looked so
 extremely inefficient that I couldn't resist - so let's hope providers
 will keep PPPoE around for a while, at least until terabit dsl ;-)
 
 The new code produces the same results as the old version and is
 ~ 3 to 6 times faster for 4-bit hashes on the CPUs I tested.
 
 Florian
 
 ---
 Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]
 
 diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
 index 9e51fcc..954328c 100644
 --- a/drivers/net/pppoe.c
 +++ b/drivers/net/pppoe.c
 @@ -108,19 +108,24 @@ static inline int cmp_addr(struct pppoe_addr *a, 
 unsigned long sid, char *addr)
   (memcmp(a-remote,addr,ETH_ALEN) == 0));
  }
  
 -static int hash_item(unsigned long sid, unsigned char *addr)
 +#if 8%PPPOE_HASH_BITS
 +#error 8 must be a multiple of PPPOE_HASH_BITS
 +#endif

Since PPPOE_HASH_BITS is 4 I would think this check will break the
build. :-)

Anyways, I looked at this myself and half of the problem comes from
the fact that sid is passed around as an unsigned long throughout
this entire file yet the thing is just a __u16.

So the first thing to fix is to use __u16 consistently for sid values.
Then the sid hash looks obviously wrong and is easy to fix.

Then you end up with a hash_item() that simply looks like:

static unsigned int hash_item(__u16 sid, unsigned char *addr)
{
unsigned int hash;

hash = (((unsigned int)addr[0]  24) |
((unsigned int)addr[1]  16) |
((unsigned int)addr[2]   8) |
((unsigned int)addr[3]   0));

hash ^= (((unsigned int)addr[4]  8) |
 ((unsigned int)addr[5]  0));

hash ^= sid;

return ((hash ^ (hash  8) ^ (hash  16)) 
(PPPOE_HASH_SIZE - 1));
}

which is what I've checked into my tree.

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 2/3] PPPoX/E: return ENOTTY on unknown ioctl requests

2007-07-30 Thread David Miller
From: Florian Zumbiehl [EMAIL PROTECTED]
Date: Sun, 29 Jul 2007 08:04:36 +0200

 here another patch for the PPPoX/E code that makes sure that ENOTTY is
 returned for unknown ioctl requests rather than 0 (and removes another
 unneeded initializer which I didn't bother creating a separate patch for).
 
 Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]

Applied, thanks Florian.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH 3/3] PPPoE: move lock_sock() in pppoe_sendmsg() to the right location

2007-07-30 Thread David Miller
From: Florian Zumbiehl [EMAIL PROTECTED]
Date: Sun, 29 Jul 2007 08:04:46 +0200

 and the last one for now: Acquire the sock lock in pppoe_sendmsg()
 before accessing the sock - and in particular avoid releasing the lock
 even though it hasn't been acquired.
 
 Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED]

Also applied, thanks Florian.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: on [ab]use of skb-cb by VLAN code

2007-07-30 Thread jamal

I was going to forget this, but its been playing in the back of my head
and wont go away
Matt Carlson recently (while fixing the tg3 driver in my batching
patches) pointed to me that skb-cb[] was being used to pass around vlan
data. 
This seems like a bad use since there can be a lot of things between 
a real hardware driver and something that sets a vlan tag (qdiscs come
to mind).

Creating a new skb field be the reasonable thing to do here but i know
that we are trying to avoid adding new fields. Thoughts?

cheers,
jamal 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] make pktgen.c:get_ipsec_sa() static and non-inline

2007-07-30 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 29 Jul 2007 16:58:40 +0200

 Non-static inline code usually doesn't makes sense.
 
 In this case making is static and non-inline is the correct solution.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Applied, thanks Adrian.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2.6 patch] make nf_ct_ipv6_skip_exthdr() static

2007-07-30 Thread David Miller
From: Adrian Bunk [EMAIL PROTECTED]
Date: Sun, 29 Jul 2007 16:58:46 +0200

 nf_ct_ipv6_skip_exthdr() can now become static.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >