[PATCH] ipv4: Only destroy inet devices when we receive an NETDEV_UNREGISTER event

2007-06-21 Thread Eric W. Biederman

Currently we destroy inet devices when we remove the last interface
from an inet device, and during NETDEV_UNREGISTER.  We only create
them during NETDEV_REGISTER event.  The result is if you and an ipv4
address to a device delete it (so the device has no ipv4 addresses)
and attampt to add any ipv4 address to that device you will receive 
an -ENOBUFS error.

To correct the problem this patch simply deletes the excess inet
device destroy.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 net/ipv4/devinet.c |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index fa97b96..abf6352 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -327,12 +327,8 @@ static void __inet_del_ifa(struct in_device *in_dev, 
struct in_ifaddr **ifap,
}
 
}
-   if (destroy) {
+   if (destroy)
inet_free_ifa(ifa1);
-
-   if (!in_dev->ifa_list)
-   inetdev_destroy(in_dev);
-   }
 }
 
 static void inet_del_ifa(struct in_device *in_dev, struct in_ifaddr **ifap,
-- 
1.5.1.1.181.g2de0

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV6] NDISC: Fix thinko to control Router Preference support.

2007-06-21 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Fri, 22 Jun 2007 15:15:27 +0900 (JST)

> Bug reported by Haruhito Watanabe <[EMAIL PROTECTED]>.
> 
> This is also appropriate for -stable releases.
> 
> Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

Thank you, I will apply this and push around soon.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPV6] NDISC: Fix thinko to control Router Preference support.

2007-06-21 Thread YOSHIFUJI Hideaki / 吉藤英明
Bug reported by Haruhito Watanabe <[EMAIL PROTECTED]>.

This is also appropriate for -stable releases.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
---
 net/ipv6/ndisc.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index d8b3645..0358e60 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1062,7 +1062,7 @@ static void ndisc_router_discovery(struct sk_buff *skb)
pref = ra_msg->icmph.icmp6_router_pref;
/* 10b is handled as if it were 00b (medium) */
if (pref == ICMPV6_ROUTER_PREF_INVALID ||
-   in6_dev->cnf.accept_ra_rtr_pref)
+   !in6_dev->cnf.accept_ra_rtr_pref)
pref = ICMPV6_ROUTER_PREF_MEDIUM;
 #endif
 
-- 
1.5.1

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


strange sky disable/enable in the kernel log

2007-06-21 Thread Tino Keitel
Hi,

I just found this in the kernel log:

2007-06-22_05:47:50.69894 kern.info: sky2 eth0: disabling interface
2007-06-22_05:47:50.72039 kern.info: sky2 eth0: enabling interface
2007-06-22_05:47:50.72240 kern.info: sky2 eth0: ram buffer 48K
2007-06-22_05:47:52.40903 kern.info: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both

What could be the cause for this? I use kernel 2.6.22-rc5 with this
hardware:

sky2 :01:00.0: v1.14 addr 0x9020 irq 17 Yukon-EC (0xb6) rev 2

01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
PCI-E Gigabit Ethernet Controller (rev 22)

Regards,
Tino
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Eric W. Biederman
Ben Greear <[EMAIL PROTECTED]> writes:

> Patrick McHardy wrote:
>> Eric W. Biederman wrote:
>
>>> For the macvlan code do we need to do anything special if we transmit
>>> to a mac we would normally receive?  Another unicast mac of the same
>>> nic for example.
>>
>> That doesn't happen under normal circumstances. I don't believe
>> it would work.
>
> Assuming you mean you want to send between two mac-vlans on the same physical
> nic...
>
> This can work if your mac-vlans are on different subnets and you are
> routing between them (and if you have my send-to-self patch or have
> another way to let a system send packets to itself).

Ok.  I didn't know if you could trigger this case without without
having then endpoints in separate namespaces.  I was suspecting
the routing code would realize what we were doing realize the
route is local and route through lo.

> A normal ethernet switch will NOT turn a packet around on the same
> interface it was received, so that is why you must have them on different
> subnets and have a router in between.

Yes.  That is essentially the configuration I was wondering about.

> For sending directly to yourself, something like the 'veth' driver
> is probably more useful.

True.  And I think it has a place.  However the common case with
the tunnel devices is to just hook them all up to an ethernet
bridge as well as a real ethernet device.

The far ends of the ethernet tunnels are dropped into different namespaces.

Which gets a very similar effect to the mac vlan code.

I'm just wondering if I can not setup an ethernet tunnel device
when my primary purpose is to talk to the outside world, but occasionally
want a little in the box traffic.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Sky2 driver in 2.6.22-rc5-git1-cfs-v17

2007-06-21 Thread Stephen Hemminger
On Fri, 22 Jun 2007 04:45:25 +0200
Ian Kumlien <[EMAIL PROTECTED]> wrote:

> On tor, 2007-06-21 at 18:57 -0700, Stephen Hemminger wrote:
> > Redirected of LKML,  netdev is the proper list.
> 
> Thanks =)
> 
> > On Thu, 21 Jun 2007 22:51:32 +0200
> > Ian Kumlien <[EMAIL PROTECTED]> wrote:
> > 
> > > Hi, 
> > > 
> > > recently have started to see this in my dmesg:
> > > 
> > > NETDEV WATCHDOG: eth0: transmit timed out
> > > sky2 eth0: tx timeout
> > > sky2 eth0: transmit ring 449 .. 408 report=449 done=449
> > > sky2 eth0: disabling interface
> > > sky2 eth0: enabling interface
> > > sky2 eth0: ram buffer 48K
> > > sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
> > > 
> > > I'm not using MSI since it seems to have caused problems in the past.
> > > 
> > > I run with a 9k mtu
> > > 
> > > sky2 eth0: transmit ring 18 .. 489 report=18 done=18
> > >  I assume ring max is 512 (ie 1-512) since:
> > >   Ring parameters for eth0:
> > >   Current hardware settings:
> > >   RX: 168
> > >   RX Mini:0
> > >   RX Jumbo:   0
> > >   TX: 511
> > >   
> > >   And 489 + 41 - 18 = 512
> > > 
> > > sky2 eth0: transmit ring 197 .. 156 report=197 done=197
> > > sky2 eth0: transmit ring 480 .. 439 report=480 done=480
> > > sky2 eth0: transmit ring 413 .. 372 report=413 done=413
> > > sky2 eth0: transmit ring 320 .. 279 report=320 done=320
> > > 
> > > Else, they are all off by 41.
> > > 
> > > Is this a known bug?
> > no
> 
> Damn =P
> 
> > > Comments? ideas?
> > >
> > which chip version. probably Yukon EC that seems to be the only one
> > that does gigabit with Ram buffer.
> 
> sky2 :02:00.0: v1.14 addr 0xdbffc000 irq 18 Yukon-EC (0xb6) rev 2
> 
> > Does it work alright if you set transmit ring size smaller with ethtool?
> > There might be an off-by-one bug in the worst case calculations about
> > list element usage.
> 
> I tried this... but not with a specific size, i think i did 480, and yes
> it timed out... any ideas on a more educated value?
> 
> -- 
> Ian Kumlien  -- http://pomac.netswarm.net

Also try setting the idle_timeout module parameter to something link 10 (ms).
It will fix problems with lost interrupts.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 7/7] CAN: Add documentation

2007-06-21 Thread Urs Thuermann
This patch adds documentation for the PF_CAN protocol family.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 Documentation/networking/00-INDEX |2 
 Documentation/networking/can.txt  |  635 ++
 2 files changed, 637 insertions(+)

Index: linux-2.6.22-rc5/Documentation/networking/can.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5/Documentation/networking/can.txt   2007-06-20 
14:11:31.0 +0200
@@ -0,0 +1,635 @@
+
+
+can.txt
+
+Readme file for the Controller Area Network Protocol Family (aka Socket CAN)
+
+This file contains
+
+  1 Overview / What is Socket CAN
+
+  2 Motivation / Why using the socket API
+
+  3 Socket CAN concept
+3.1 receive lists
+3.2 loopback
+3.3 network security issues (capabilities)
+3.4 network problem notifications
+
+  4 How to use Socket CAN
+4.1 RAW protocol sockets with can_filters (SOCK_RAW)
+  4.1.1 RAW socket option CAN_RAW_FILTER
+  4.1.2 RAW socket option CAN_RAW_ERR_FILTER
+  4.1.3 RAW socket option CAN_RAW_LOOPBACK
+  4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
+4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+4.3 connected transport protocols (SOCK_SEQPACKET)
+4.4 unconnected transport protocols (SOCK_DGRAM)
+
+  5 Socket CAN core module
+5.1 can.ko module params
+5.2 procfs content
+5.3 writing own CAN protocol modules
+
+  6 CAN network drivers
+6.1 general settings
+6.2 loopback
+6.3 CAN controller hardware filters
+6.4 currently supported CAN hardware
+6.5 todo
+
+  7 Credits
+
+
+
+1. Overview / What is Socket CAN
+
+
+The socketcan package is an implementation of CAN protocols
+(Controller Area Network) for Linux.  CAN is a networking technology
+which has wide-spread use in automation, embedded devices, and
+automotive fields.  While there have been other CAN implementations
+for Linux based on character devices, Socket CAN uses the Berkeley
+socket API, the Linux network stack and implements the CAN device
+drivers as network interfaces.  The CAN socket API has been designed
+as similar as possible to the TCP/IP protocols to allow programmers,
+familiar with network programming, to easily learn how to use CAN
+sockets.
+
+2. Motivation / Why using the socket API
+
+
+There have been CAN implementations for Linux before Socket CAN so the
+question arises, why we have started another project.  Most existing
+implementations come as a device driver for some CAN hardware, they
+are based on character devices and provide comparatively little
+functionality.  Usually, there is only a hardware-specific device
+driver which provides a character device interface to send and
+receive raw CAN frames, directly to/from the controller hardware.
+Queueing of frames and higher-level transport protocols like ISO-TP
+have to be implemented in user space applications.  Also, most
+character-device implementations support only one single process to
+open the device at a time, similar to a serial interface.  Exchanging
+the CAN controller requires employment of another device driver and
+often the need for adaption of large parts of the application to the
+new driver's API.
+
+Socket CAN was designed to overcome all of these limitations.  A new
+protocol family has been implemented which provides a socket interface
+to user space applications and which builds upon the Linux network
+layer, so to use all of the provided queueing functionality.  Device
+drivers for CAN controller hardware register itself with the Linux
+network layer as a network device, so that CAN frames from the
+controller can be passed up to the network layer and on to the CAN
+protocol family module and also vice-versa.  Also, the protocol family
+module provides an API for transport protocol modules to register, so
+that any number of transport protocols can be loaded or unloaded
+dynamically.  In fact, the can core module alone does not provide any
+protocol and can not be used without loading at least one additional
+protocol module.  Multiple sockets can be opened at the same time,
+on different or the same protocol module and they can listen/send
+frames on different or the same CAN IDs.  Several sockets listening on
+the same interface for frames with the same CAN ID are all passed the
+same received matching CAN frames.  An application wishing to
+communicate using a specific transport protocol, e.g. ISO-TP, just
+selects that protocol when opening the socket, and then can read and
+write application data byte streams, without having to deal with
+CAN-IDs, frames, etc.
+
+Similar functionality visible f

[patch 4/7] CAN: Add broadcast manager (bcm) protocol

2007-06-21 Thread Urs Thuermann
This patch adds the CAN broadcast manager (bcm) protocol.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 include/linux/can/bcm.h |   65 +
 net/can/Kconfig |   28 
 net/can/Makefile|3 
 net/can/bcm.c   | 1750 
 4 files changed, 1846 insertions(+)

Index: linux-2.6.22-rc5-git5/include/linux/can/bcm.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5-git5/include/linux/can/bcm.h   2007-06-21 
14:08:38.0 +0200
@@ -0,0 +1,65 @@
+/*
+ * linux/can/bcm.h
+ *
+ * Definitions for CAN Broadcast Manager (BCM)
+ *
+ * Author: Oliver Hartkopp <[EMAIL PROTECTED]>
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to <[EMAIL PROTECTED]>
+ *
+ */
+
+#ifndef CAN_BCM_H
+#define CAN_BCM_H
+
+/**
+ * struct bcm_msg_head - head of messages to/from the broadcast manager
+ * @opcode:opcode, see enum below.
+ * @flags: special flags, see below.
+ * @count: number of frames to send before changing interval.
+ * @ival1: interval for the first @count frames.
+ * @ival2: interval for the following frames.
+ * @can_id:CAN ID of frames to be sent or received.
+ * @nframes:   number of frames appended to the message head.
+ * @frames:array of CAN frames.
+ */
+struct bcm_msg_head {
+   int opcode;
+   int flags;
+   int count;
+   struct timeval ival1, ival2;
+   canid_t can_id;
+   int nframes;
+   struct can_frame frames[0];
+};
+
+enum {
+   TX_SETUP = 1,   /* create (cyclic) transmission task */
+   TX_DELETE,  /* remove (cyclic) transmission task */
+   TX_READ,/* read properties of (cyclic) transmission task */
+   TX_SEND,/* send one CAN frame */
+   RX_SETUP,   /* create RX content filter subscription */
+   RX_DELETE,  /* remove RX content filter subscription */
+   RX_READ,/* read properties of RX content filter subscription */
+   TX_STATUS,  /* reply to TX_READ request */
+   TX_EXPIRED, /* notification on performed transmissions (count=0) */
+   RX_STATUS,  /* reply to RX_READ request */
+   RX_TIMEOUT, /* cyclic message is absent */
+   RX_CHANGED  /* updated CAN frame (detected content change) */
+};
+
+#define SETTIMER0x0001
+#define STARTTIMER  0x0002
+#define TX_COUNTEVT 0x0004
+#define TX_ANNOUNCE 0x0008
+#define TX_CP_CAN_ID0x0010
+#define RX_FILTER_ID0x0020
+#define RX_CHECK_DLC0x0040
+#define RX_NO_AUTOTIMER 0x0080
+#define RX_ANNOUNCE_RESUME  0x0100
+#define TX_RESET_MULTI_IDX  0x0200
+#define RX_RTR_FRAME0x0400
+
+#endif /* CAN_BCM_H */
Index: linux-2.6.22-rc5-git5/net/can/Kconfig
===
--- linux-2.6.22-rc5-git5.orig/net/can/Kconfig  2007-06-21 14:05:26.0 
+0200
+++ linux-2.6.22-rc5-git5/net/can/Kconfig   2007-06-21 14:08:38.0 
+0200
@@ -42,6 +42,34 @@
  Say Y here if you want non-root users to be able to access CAN_RAW
  sockets.
 
+config CAN_BCM
+   tristate "Broadcast Manager CAN Protocol (with content filtering)"
+   depends on CAN
+   default N
+   ---help---
+ The Broadcast Manager offers content filtering, timeout monitoring,
+ sending of RTR-frames and cyclic CAN messages without permanent user
+ interaction. The BCM can be 'programmed' via the BSD socket API and
+ informs you on demand e.g. only on content updates / timeouts.
+ You probably want to use the bcm socket in most cases where cyclic
+ CAN messages are used on the bus (e.g. in automotive environments).
+ To use the Broadcast Manager, use AF_CAN with protocol CAN_BCM.
+
+config CAN_BCM_USER
+   bool "Allow non-root users to access CAN broadcast manager sockets"
+   depends on CAN_BCM
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_BCM sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_BCM
+ sockets.
+
 config CAN_DEBUG_CORE
bool "CAN Core debugging messages"
depends on CAN
Index: linux-2.6.22-rc5-git5/net/can/Makefile
===
--- linux-2.6.22-rc5-

[patch 0/7] CAN: Add new PF_CAN protocol family, try #3

2007-06-21 Thread Urs Thuermann
Hello Dave,

this is the third post of the patch series that adds the PF_CAN
protocol family for the Controller Area Network.

Since our last post we have changed the code quite a lot:

* Use sbk->sk and skb->pkt_type instead of skb->cb to pass loopback
  flags and originating socket down to the driver and back to the
  receiving socket.  Thanks to Patrick McHardy for pointing out our
  wrong use of sbk->cb.

* Use skb->iif instead of skb->cb to pass receiving interface from
  raw_rcv() and bcm_rcv() up to raw_recvmsg() and bcm_recvmsg().
  
* Set skb->protocol when sending CAN frames to netdevices.

* Removed struct raw_opt and struct bcm_opt and integrated these
  directly into struct raw_sock and bcm_sock resp., like most other
  proto implementations do.

* We have found and fixed race conditions between raw_bind(),
  raw_{set,get}sockopt() and raw_notifier().  This resulted in
  - complete removal of our own notifier list infrastructure in
af_can.c.  raw.c and bcm.c now use normal netdevice notifiers.
  - removal of ro->lock spinlock.  We use lock_sock(sk) now.
  - changed deletion of dev_rcv_lists, which are now marked for
deletion in the netdevice notifier in af_can.c and are actually
deleted when all entries have been deleted using can_rx_unregister().

* Follow changes in 2.6.22 (e.g. ktime_t timestamps in skb).

* Removed obsolete code from vcan.c, as pointed out by Stephen Hemminger.

This patch series applies against linux-2.6.22-rc5-git5 and is derived from
Subversion revision r390 of http://svn.berlios.de/svnroot/repos/socketcan.
It can be found in the directory
http://svn.berlios.de/svnroot/repos/socketcan/trunk/patch-series/.

This patch doesn't touch anything in the kernel except for the allocation
of a couple of numbers for protocol, arp hw type, and a line discipline.

Please consider this patch series for integration into your tree.

Thanks very much for your work!

Best regards,

Urs Thuermann
Oliver Hartkopp
--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 6/7] CAN: Add maintainer entries

2007-06-21 Thread Urs Thuermann
This patch adds entries in the CREDITS and MAINTAINERS file for CAN.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 CREDITS |   16 
 MAINTAINERS |9 +
 2 files changed, 25 insertions(+)

Index: linux-2.6.22-rc5/CREDITS
===
--- linux-2.6.22-rc5.orig/CREDITS   2007-06-20 14:10:41.0 +0200
+++ linux-2.6.22-rc5/CREDITS2007-06-20 14:11:27.0 +0200
@@ -1330,6 +1330,14 @@
 S: 5623 HZ Eindhoven
 S: The Netherlands
 
+N: Oliver Hartkopp
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Andrew Haylett
 E: [EMAIL PROTECTED]
 D: Selection mechanism
@@ -3283,6 +3291,14 @@
 S: F-35042 Rennes Cedex
 S: France
 
+N: Urs Thuermann
+E: [EMAIL PROTECTED]
+W: http://www.volkswagen.de
+D: Controller Area Network (network layer core)
+S: Brieffach 1776
+S: 38436 Wolfsburg
+S: Germany
+
 N: Jon Tombs
 E: [EMAIL PROTECTED]
 W: http://www.esi.us.es/~jon
Index: linux-2.6.22-rc5/MAINTAINERS
===
--- linux-2.6.22-rc5.orig/MAINTAINERS   2007-06-20 14:10:41.0 +0200
+++ linux-2.6.22-rc5/MAINTAINERS2007-06-20 14:11:27.0 +0200
@@ -951,6 +951,15 @@
 L: [EMAIL PROTECTED]
 S: Maintained
 
+CAN NETWORK LAYER
+P: Urs Thuermann
+M: [EMAIL PROTECTED]
+P: Oliver Hartkopp
+M: [EMAIL PROTECTED]
+L: [EMAIL PROTECTED]
+W: http://developer.berlios.de/projects/socketcan/
+S: Maintained
+
 CALGARY x86-64 IOMMU
 P: Muli Ben-Yehuda
 M: [EMAIL PROTECTED]

--
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/7] CAN: Add virtual CAN netdevice driver

2007-06-21 Thread Urs Thuermann
This patch adds the virtual CAN bus (vcan) network driver.
The vcan device is just a loopback device for CAN frames, no
real CAN hardware is involved.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 drivers/net/Makefile |1 
 drivers/net/can/Kconfig  |   25 
 drivers/net/can/Makefile |5 
 drivers/net/can/vcan.c   |  287 +++
 net/can/Kconfig  |3 
 5 files changed, 321 insertions(+)

Index: linux-2.6.22-rc5/drivers/net/Makefile
===
--- linux-2.6.22-rc5.orig/drivers/net/Makefile  2007-06-20 14:10:41.0 
+0200
+++ linux-2.6.22-rc5/drivers/net/Makefile   2007-06-20 14:11:19.0 
+0200
@@ -8,6 +8,7 @@
 obj-$(CONFIG_CHELSIO_T1) += chelsio/
 obj-$(CONFIG_CHELSIO_T3) += cxgb3/
 obj-$(CONFIG_EHEA) += ehea/
+obj-$(CONFIG_CAN) += can/
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_ATL1) += atl1/
 obj-$(CONFIG_GIANFAR) += gianfar_driver.o
Index: linux-2.6.22-rc5/drivers/net/can/Kconfig
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5/drivers/net/can/Kconfig2007-06-20 14:11:19.0 
+0200
@@ -0,0 +1,25 @@
+menu "CAN Device Drivers"
+   depends on CAN
+
+config CAN_VCAN
+   tristate "Virtual Local CAN Interface (vcan)"
+   depends on CAN
+   default N
+   ---help---
+ Similar to the network loopback devices, vcan offers a
+ virtual local CAN interface.
+
+ This driver can also be built as a module.  If so, the module
+ will be called vcan.
+
+config CAN_DEBUG_DEVICES
+   bool "CAN devices debugging messages"
+   depends on CAN
+   default N
+   ---help---
+ Say Y here if you want the CAN device drivers to produce a bunch of
+ debug messages to the system log.  Select this if you are having
+ a problem with CAN support and want to see more of what is going
+ on.
+
+endmenu
Index: linux-2.6.22-rc5/drivers/net/can/Makefile
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5/drivers/net/can/Makefile   2007-06-20 14:11:19.0 
+0200
@@ -0,0 +1,5 @@
+#
+#  Makefile for the Linux Controller Area Network drivers.
+#
+
+obj-$(CONFIG_CAN_VCAN) += vcan.o
Index: linux-2.6.22-rc5/drivers/net/can/vcan.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5/drivers/net/can/vcan.c 2007-06-20 14:11:19.0 
+0200
@@ -0,0 +1,287 @@
+/*
+ * vcan.c - Virtual CAN interface
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2 as distributed in the 'COPYING'
+ * file from the main directory of the linux kernel source.
+ *
+ * The provided data structures and external interfaces from this code
+ * are not restricted to be used by modules with a GPL compatible license.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ * Send feedback to <[EMAIL PROTECTED]>
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+s

[patch 3/7] CAN: Add raw protocol

2007-06-21 Thread Urs Thuermann
This patch adds the CAN raw protocol.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 include/linux/can/raw.h |   31 +
 net/can/Kconfig |   26 +
 net/can/Makefile|3 
 net/can/raw.c   |  751 
 4 files changed, 811 insertions(+)

Index: linux-2.6.22-rc5-git5/include/linux/can/raw.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5-git5/include/linux/can/raw.h   2007-06-21 
14:05:26.0 +0200
@@ -0,0 +1,31 @@
+/*
+ * linux/can/raw.h
+ *
+ * Definitions for raw CAN sockets
+ *
+ * Authors: Oliver Hartkopp <[EMAIL PROTECTED]>
+ *  Urs Thuermann   <[EMAIL PROTECTED]>
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Send feedback to <[EMAIL PROTECTED]>
+ *
+ */
+
+#ifndef CAN_RAW_H
+#define CAN_RAW_H
+
+#include 
+
+#define SOL_CAN_RAW (SOL_CAN_BASE + CAN_RAW)
+
+/* for socket options affecting the socket (not the global system) */
+
+enum {
+   CAN_RAW_FILTER = 1, /* set 0 .. n can_filter(s)  */
+   CAN_RAW_ERR_FILTER, /* set filter for error frames   */
+   CAN_RAW_LOOPBACK,   /* local loopback (default:on)   */
+   CAN_RAW_RECV_OWN_MSGS   /* receive my own msgs (default:off) */
+};
+
+#endif
Index: linux-2.6.22-rc5-git5/net/can/Kconfig
===
--- linux-2.6.22-rc5-git5.orig/net/can/Kconfig  2007-06-21 14:03:50.0 
+0200
+++ linux-2.6.22-rc5-git5/net/can/Kconfig   2007-06-21 14:05:26.0 
+0200
@@ -16,6 +16,32 @@
  If you want CAN support, you should say Y here and also to the
  specific driver for your controller(s) below.
 
+config CAN_RAW
+   tristate "Raw CAN Protocol (raw access with CAN-ID filtering)"
+   depends on CAN
+   default N
+   ---help---
+ The Raw CAN protocol option offers access to the CAN bus via
+ the BSD socket API. You probably want to use the raw socket in
+ most cases where no higher level protocol is being used. The raw
+ socket has several filter options e.g. ID-Masking / Errorframes.
+ To receive/send raw CAN messages, use AF_CAN with protocol CAN_RAW.
+
+config CAN_RAW_USER
+   bool "Allow non-root users to access Raw CAN Protocol sockets"
+   depends on CAN_RAW
+   default N
+   ---help---
+ The Controller Area Network is a local field bus transmitting only
+ broadcast messages without any routing and security concepts.
+ In the majority of cases the user application has to deal with
+ raw CAN frames. Therefore it might be reasonable NOT to restrict
+ the CAN access only to the user root, as known from other networks.
+ Since CAN_RAW sockets can only send and receive frames to/from CAN
+ interfaces this does not affect security of others networks.
+ Say Y here if you want non-root users to be able to access CAN_RAW
+ sockets.
+
 config CAN_DEBUG_CORE
bool "CAN Core debugging messages"
depends on CAN
Index: linux-2.6.22-rc5-git5/net/can/Makefile
===
--- linux-2.6.22-rc5-git5.orig/net/can/Makefile 2007-06-21 14:03:50.0 
+0200
+++ linux-2.6.22-rc5-git5/net/can/Makefile  2007-06-21 14:05:26.0 
+0200
@@ -4,3 +4,6 @@
 
 obj-$(CONFIG_CAN)  += can.o
 can-objs   := af_can.o proc.o
+
+obj-$(CONFIG_CAN_RAW)  += can-raw.o
+can-raw-objs   := raw.o
Index: linux-2.6.22-rc5-git5/net/can/raw.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.22-rc5-git5/net/can/raw.c 2007-06-21 14:05:42.0 +0200
@@ -0,0 +1,751 @@
+/*
+ * raw.c - Raw sockets for protocol family CAN
+ *
+ * Copyright (c) 2002-2007 Volkswagen Group Electronic Research
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions, the following disclaimer and
+ *the referenced file 'COPYING'.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of Volkswagen nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of

[patch 1/7] CAN: Allocate protocol numbers for PF_CAN

2007-06-21 Thread Urs Thuermann
This patch adds a protocol/address family number, ARP hardware type,
ethernet packet type, and a line discipline number for the SocketCAN
implementation.

Signed-Off-By: Oliver Hartkopp <[EMAIL PROTECTED]>
Signed-Off-By: Urs Thuermann <[EMAIL PROTECTED]>

---
 include/linux/if_arp.h   |1 +
 include/linux/if_ether.h |1 +
 include/linux/socket.h   |2 ++
 include/linux/tty.h  |3 ++-
 net/core/sock.c  |4 ++--
 5 files changed, 8 insertions(+), 3 deletions(-)

Index: linux-2.6.22-rc5/include/linux/if_arp.h
===
--- linux-2.6.22-rc5.orig/include/linux/if_arp.h2007-06-20 
14:10:41.0 +0200
+++ linux-2.6.22-rc5/include/linux/if_arp.h 2007-06-20 14:11:00.0 
+0200
@@ -52,6 +52,7 @@
 #define ARPHRD_ROSE270
 #define ARPHRD_X25 271 /* CCITT X.25   */
 #define ARPHRD_HWX25   272 /* Boards with X.25 in firmware */
+#define ARPHRD_CAN 280 /* Controller Area Network  */
 #define ARPHRD_PPP 512
 #define ARPHRD_CISCO   513 /* Cisco HDLC   */
 #define ARPHRD_HDLCARPHRD_CISCO
Index: linux-2.6.22-rc5/include/linux/if_ether.h
===
--- linux-2.6.22-rc5.orig/include/linux/if_ether.h  2007-06-20 
14:10:41.0 +0200
+++ linux-2.6.22-rc5/include/linux/if_ether.h   2007-06-20 14:11:00.0 
+0200
@@ -90,6 +90,7 @@
 #define ETH_P_WAN_PPP   0x0007  /* Dummy type for WAN PPP frames*/
 #define ETH_P_PPP_MP0x0008  /* Dummy type for PPP MP frames */
 #define ETH_P_LOCALTALK 0x0009 /* Localtalk pseudo type*/
+#define ETH_P_CAN  0x000C  /* Controller Area Network  */
 #define ETH_P_PPPTALK  0x0010  /* Dummy type for Atalk over PPP*/
 #define ETH_P_TR_802_2 0x0011  /* 802.2 frames */
 #define ETH_P_MOBITEX  0x0015  /* Mobitex ([EMAIL PROTECTED])  */
Index: linux-2.6.22-rc5/include/linux/socket.h
===
--- linux-2.6.22-rc5.orig/include/linux/socket.h2007-06-20 
14:10:41.0 +0200
+++ linux-2.6.22-rc5/include/linux/socket.h 2007-06-20 14:11:00.0 
+0200
@@ -185,6 +185,7 @@
 #define AF_PPPOX   24  /* PPPoX sockets*/
 #define AF_WANPIPE 25  /* Wanpipe API Sockets */
 #define AF_LLC 26  /* Linux LLC*/
+#define AF_CAN 29  /* Controller Area Network  */
 #define AF_TIPC30  /* TIPC sockets */
 #define AF_BLUETOOTH   31  /* Bluetooth sockets*/
 #define AF_IUCV32  /* IUCV sockets */
@@ -220,6 +221,7 @@
 #define PF_PPPOX   AF_PPPOX
 #define PF_WANPIPE AF_WANPIPE
 #define PF_LLC AF_LLC
+#define PF_CAN AF_CAN
 #define PF_TIPCAF_TIPC
 #define PF_BLUETOOTH   AF_BLUETOOTH
 #define PF_IUCVAF_IUCV
Index: linux-2.6.22-rc5/include/linux/tty.h
===
--- linux-2.6.22-rc5.orig/include/linux/tty.h   2007-06-20 14:10:41.0 
+0200
+++ linux-2.6.22-rc5/include/linux/tty.h2007-06-20 14:11:00.0 
+0200
@@ -24,7 +24,7 @@
 #define NR_PTYSCONFIG_LEGACY_PTY_COUNT   /* Number of legacy ptys */
 #define NR_UNIX98_PTY_DEFAULT  4096  /* Default maximum for Unix98 ptys */
 #define NR_UNIX98_PTY_MAX  (1 << MINORBITS) /* Absolute limit */
-#define NR_LDISCS  17
+#define NR_LDISCS  18
 
 /* line disciplines */
 #define N_TTY  0
@@ -45,6 +45,7 @@
 #define N_SYNC_PPP 14  /* synchronous PPP */
 #define N_HCI  15  /* Bluetooth HCI UART */
 #define N_GIGASET_M101 16  /* Siemens Gigaset M101 serial DECT adapter */
+#define N_SLCAN17  /* Serial / USB serial CAN Adaptors */
 
 /*
  * This character is the same as _POSIX_VDISABLE: it cannot be used as
Index: linux-2.6.22-rc5/net/core/sock.c
===
--- linux-2.6.22-rc5.orig/net/core/sock.c   2007-06-20 14:10:41.0 
+0200
+++ linux-2.6.22-rc5/net/core/sock.c2007-06-20 14:11:00.0 +0200
@@ -153,7 +153,7 @@
   "sk_lock-AF_ASH"   , "sk_lock-AF_ECONET"   , "sk_lock-AF_ATMSVC"   ,
   "sk_lock-21"   , "sk_lock-AF_SNA"  , "sk_lock-AF_IRDA" ,
   "sk_lock-AF_PPPOX" , "sk_lock-AF_WANPIPE"  , "sk_lock-AF_LLC"  ,
-  "sk_lock-27"   , "sk_lock-28"  , "sk_lock-29"  ,
+  "sk_lock-27"   , "sk_lock-28"  , "sk_lock-AF_CAN"  ,
   "sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV",
   "sk_lock-AF_RXRPC" , "sk_lock-AF_MAX"
 };
@@ -167,7 +167,7 @@
   "slock-AF_ASH"   , "slock-AF_ECONET"   , "slock-AF_ATMSVC"   ,
   "slock-21"   , "slock-AF_SNA"

Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Ben Greear

Patrick McHardy wrote:

Eric W. Biederman wrote:



For the macvlan code do we need to do anything special if we transmit
to a mac we would normally receive?  Another unicast mac of the same
nic for example.


That doesn't happen under normal circumstances. I don't believe
it would work.


Assuming you mean you want to send between two mac-vlans on the same physical
nic...

This can work if your mac-vlans are on different subnets and you are
routing between them (and if you have my send-to-self patch or have
another way to let a system send packets to itself).

A normal ethernet switch will NOT turn a packet around on the same
interface it was received, so that is why you must have them on different
subnets and have a router in between.

For sending directly to yourself, something like the 'veth' driver
is probably more useful.




For the macvlan hash you just use an upper byte.  Is that just a
simple starting place, or do we not need a more complex hash.
  


It comes from the original code, I think it should be good enough.


Ahhh, I knew my hash was lame for some reason!

Ben

--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Ben Greear

Patrick McHardy wrote:

Eric W. Biederman wrote:

For the macvlan hash you just use an upper byte.  Is that just a
simple starting place, or do we not need a more complex hash.
  


That gave me an idea, since the default addresses are random
anyway I'm now using an incrementing counter for the upper byte.


Is there not a (relatively) easy way to hash the entire 6 bytes?

I'd prefer to be able to set the MACs to anything I want, without
worrying about trivially hitting a worst-case hash scenario.

Thanks,
Ben




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Ben Greear <[EMAIL PROTECTED]>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Sky2 driver in 2.6.22-rc5-git1-cfs-v17

2007-06-21 Thread Ian Kumlien
On tor, 2007-06-21 at 18:57 -0700, Stephen Hemminger wrote:
> Redirected of LKML,  netdev is the proper list.

Thanks =)

> On Thu, 21 Jun 2007 22:51:32 +0200
> Ian Kumlien <[EMAIL PROTECTED]> wrote:
> 
> > Hi, 
> > 
> > recently have started to see this in my dmesg:
> > 
> > NETDEV WATCHDOG: eth0: transmit timed out
> > sky2 eth0: tx timeout
> > sky2 eth0: transmit ring 449 .. 408 report=449 done=449
> > sky2 eth0: disabling interface
> > sky2 eth0: enabling interface
> > sky2 eth0: ram buffer 48K
> > sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
> > 
> > I'm not using MSI since it seems to have caused problems in the past.
> > 
> > I run with a 9k mtu
> > 
> > sky2 eth0: transmit ring 18 .. 489 report=18 done=18
> >  I assume ring max is 512 (ie 1-512) since:
> > Ring parameters for eth0:
> > Current hardware settings:
> > RX: 168
> > RX Mini:0
> > RX Jumbo:   0
> > TX: 511
> > 
> > And 489 + 41 - 18 = 512
> > 
> > sky2 eth0: transmit ring 197 .. 156 report=197 done=197
> > sky2 eth0: transmit ring 480 .. 439 report=480 done=480
> > sky2 eth0: transmit ring 413 .. 372 report=413 done=413
> > sky2 eth0: transmit ring 320 .. 279 report=320 done=320
> > 
> > Else, they are all off by 41.
> > 
> > Is this a known bug?
> no

Damn =P

> > Comments? ideas?
> >
> which chip version. probably Yukon EC that seems to be the only one
> that does gigabit with Ram buffer.

sky2 :02:00.0: v1.14 addr 0xdbffc000 irq 18 Yukon-EC (0xb6) rev 2

> Does it work alright if you set transmit ring size smaller with ethtool?
> There might be an off-by-one bug in the worst case calculations about
> list element usage.

I tried this... but not with a specific size, i think i did 480, and yes
it timed out... any ideas on a more educated value?

-- 
Ian Kumlien  -- http://pomac.netswarm.net


signature.asc
Description: This is a digitally signed message part


Re: [BUG] Sky2 driver in 2.6.22-rc5-git1-cfs-v17

2007-06-21 Thread Stephen Hemminger
Redirected of LKML,  netdev is the proper list.

On Thu, 21 Jun 2007 22:51:32 +0200
Ian Kumlien <[EMAIL PROTECTED]> wrote:

> Hi, 
> 
> recently have started to see this in my dmesg:
> 
> NETDEV WATCHDOG: eth0: transmit timed out
> sky2 eth0: tx timeout
> sky2 eth0: transmit ring 449 .. 408 report=449 done=449
> sky2 eth0: disabling interface
> sky2 eth0: enabling interface
> sky2 eth0: ram buffer 48K
> sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control rx
> 
> I'm not using MSI since it seems to have caused problems in the past.
> 
> I run with a 9k mtu
> 
> sky2 eth0: transmit ring 18 .. 489 report=18 done=18
>  I assume ring max is 512 (ie 1-512) since:
>   Ring parameters for eth0:
>   Current hardware settings:
>   RX: 168
>   RX Mini:0
>   RX Jumbo:   0
>   TX: 511
>   
>   And 489 + 41 - 18 = 512
> 
> sky2 eth0: transmit ring 197 .. 156 report=197 done=197
> sky2 eth0: transmit ring 480 .. 439 report=480 done=480
> sky2 eth0: transmit ring 413 .. 372 report=413 done=413
> sky2 eth0: transmit ring 320 .. 279 report=320 done=320
> 
> Else, they are all off by 41.
> 
> Is this a known bug?
no
> Comments? ideas?
>
which chip version. probably Yukon EC that seems to be the only one
that does gigabit with Ram buffer.

Does it work alright if you set transmit ring size smaller with ethtool?
There might be an off-by-one bug in the worst case calculations about
list element usage.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Patrick McHardy

Eric W. Biederman wrote:

For the macvlan hash you just use an upper byte.  Is that just a
simple starting place, or do we not need a more complex hash.
  


That gave me an idea, since the default addresses are random
anyway I'm now using an incrementing counter for the upper byte.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iproute2: Added support for RR qdisc (sch_rr)

2007-06-21 Thread Stephen Hemminger
On Thu, 21 Jun 2007 14:27:04 -0700
PJ Waskiewicz <[EMAIL PROTECTED]> wrote:

> Add tc support for the sch_rr qdisc.  This qdisc supports multiple queues
> on hardware.  The syntax for sch_rr is the same as sch_prio.
> 
> Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
> 

If the rr discipline makes it into mainline, I'll add it to iproute2.
If RR stays out of tree, then you need maintain it yourself
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-21 Thread Zhu Yi
On Thu, 2007-06-21 at 11:39 -0400, jamal wrote:
> I gave you two opportunities to bail out of this discussion, i am gonna
> take that your rejection to that offer implies you my friend wants to
> get to the bottom of this i.e you are on a mission to find the truth.
> So lets continue this.

It sounds stupid I'm still trying to convince you why we need multiqueue
support in Qdisc when everybody else are already working on the code,
fixing bugs and preparing for merge. The only reason I keep the
conversation is that I think you _might_ have some really good points
that buried under everybody else's positive support for multiqueue. But
with the conversation goes on, it turns out not the truth. Let me snip
the nonsense part below and only focus on technical.

> > Besides, the lower THL you choose, the more CPU time is wasted in busy
> > loop for the only PL case; 
> 
> Your choice of THL and THH has nothing to do with what i am proposing.
> I am not proposing you even touch that. What numbers do you have today?

We don't have THL and THH in our driver. They are what you suggested.
The queue wakeup number is 1/4 of the ring size.

> What i am saying is you use _some_ value for opening up the driver; some
> enlightened drivers such as the tg3 (and the e1000 - for which i
> unashamedly take credit) do have such parametrization. This has already
> been proven to be valuable.
> 
> The timer fires only if a ring shuts down the interface. Where is the
> busy loop? If packets go out, there is no timer.

The busy loop happens in the period after the ring is shut down and
before it is opened again. During this period, the Qdisc will keep
dequeuing and requeuing PL packets in the Tx SoftIRQ, where the busy
loop happens.

> > the higher THL you choose, the slower the PH
> > packets will be sent out than expected (the driver doesn't fully utilize
> > the device function -- multiple rings, 
> 
> I dont think you understood: Whatever value you choose for THL and THH
> today, keep those. OTOH, the wake threshold is what i was refering to.

I don't even care about the threshold. Even you set it to 1, there is
still busy loop during the period before this first packet is sent out
in the air. But you cannot ignore this small time, because it could be
longer when the wireless medium is congested with high prio packets.

Thanks,
-yi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Patrick McHardy

Waskiewicz Jr, Peter P wrote:
The dependencies seem to be very confused. SCHED_PRIO does 
not depend on anything new, SCH_RR also doesn't depend on 
anything. SCH_PRIO_MQ and SCH_RR_MQ (which is missing) depend 
on SCH_PRIO/SCH_RR. A single NET_SCH_MULTIQUEUE option seems 
better than adding one per scheduler though.



I agree with a NET_SCH_MULTIQUEUE option.  However, SCH_RR does depend
on SCH_PRIO being built since it's the same code, doesn't it?  Maybe I'm
not understanding something about the build process.  I'll clean this
up.


The easiest solution is to select SCH_PRIO from SCH_RR.
I head something else in mind initially but that is
needlessly complicated.




For the tenth time now, the user should enable this at 
runtime. You can't just break things dependant on config options.



I had this in sch_prio and tc before, and was told to remove it because
of ABI issues.  I can put it back in, but I'm not sure what those
previous ABI issues were.  Was it backwards compatibility that you
referred to before that was broken?


Your tc changes changed the structure in a way that old tc binaries
wouldn't work anymore. This version breaks configurations that use
a number of bands not matching the HW queues when the user enables
the multiqueue compile time option.

Unfortunately prio does not use nested attributes, so the easiest
way is extending struct tc_prio_qopt at the end and checking the
attribute size to decide whether its an old or a new version.

A better fix would be to introduce a new qdisc configuration
attribute that takes precedence before TCA_OPTIONS and have
userspace send both the old non-nested structure and a new
nested configuration. That would make sure we never run into
this problem again.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Patrick McHardy

Eric W. Biederman wrote:

Patrick McHardy <[EMAIL PROTECTED]> writes:

  

Eric W. Biederman wrote:


I'm trying to understand what the point of this patch is.

In once sense I find the concept of filtering and listening for multiple
mac addresses very interesting, especially if we could break out different
streams of traffic by destination mac address into separate network devices.
This would remove the need to any kind of ethernet tunnel and makes multiple
network namespaces much more pleasant.

However this just seems to allow a card to decode multiple mac addresses
which in some oddball load balancing configurations may actually be
useful, but it seems fairly limited.

Do you have a specific use case you envision for this multiple mac
functionality?

  

Yes, please see the MACVLAN patch I posted one or two days earlier.



Thanks.  That is what I was envisioning.  I keep suspecting one of
the cool multi-rx queue nics my start doing some of the demux in hardware.
But whatever if it works and is relatively fast it is good enough for me.
  


When NICs support that I guess they the macvlan driver could be adapted
to take advantage of that.
  

8021q can also make use of it and Dave mentioned some virtualization
devices want this as well.



Right makes sense.  And ethernet bridging (which is the general case
of the virtualization Dave mentioned should also be able to take
advantage of multiple unicast addresses).  So this definitely make
sense.
  



It needs promiscous mode to learn, so I'm not sure how much
this will help bridging.

Have you done any performance testing with the macvlan code?  With
the ethernet tunnel device we keep getting copied unicast packets on
some path or other which slowed things down.  Simply not doing the
firewalling until the packets have made it through the macvlan device
should help here.
  


Performance should be at least as good as on a bridge device since
the macvlan driver does basically nothing and uses the same functions
for receiving and sending packets.


For the macvlan code do we need to do anything special if we transmit
to a mac we would normally receive?  Another unicast mac of the same
nic for example.


That doesn't happen under normal circumstances. I don't believe
it would work.


For the macvlan hash you just use an upper byte.  Is that just a
simple starting place, or do we not need a more complex hash.
  



It comes from the original code, I think it should be good enough.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Waskiewicz Jr, Peter P
> The dependencies seem to be very confused. SCHED_PRIO does 
> not depend on anything new, SCH_RR also doesn't depend on 
> anything. SCH_PRIO_MQ and SCH_RR_MQ (which is missing) depend 
> on SCH_PRIO/SCH_RR. A single NET_SCH_MULTIQUEUE option seems 
> better than adding one per scheduler though.

I agree with a NET_SCH_MULTIQUEUE option.  However, SCH_RR does depend
on SCH_PRIO being built since it's the same code, doesn't it?  Maybe I'm
not understanding something about the build process.  I'll clean this
up.

> 
> > --- a/net/sched/sch_prio.c
> > +++ b/net/sched/sch_prio.c
> > @@ -9,6 +9,8 @@
> >   * Authors:Alexey Kuznetsov, <[EMAIL PROTECTED]>
> >   * Fixes:   19990609: J Hadi Salim <[EMAIL PROTECTED]>:
> >   *  Init --  EINVAL when opt undefined
> > + * Additions:  Peter P. Waskiewicz Jr. 
> <[EMAIL PROTECTED]>
> > + * Added round-robin scheduling for selection at load-time
> >   
> 
> git keeps changelogs, please don't add it here.

Roger.

> > struct tcf_proto *filter_list;
> > u8  prio2band[TC_PRIO_MAX+1];
> > struct Qdisc *queues[TCQ_PRIO_BANDS];
> > +   u16 band2queue[TC_PRIO_MAX + 1];
> >   
> 
> Why is this still here? Its a 1:1 mapping.

I'll fix this.

> > @@ -211,6 +265,22 @@ static int prio_tune(struct Qdisc 
> *sch, struct rtattr *opt)
> > return -EINVAL;
> > }
> >  
> > +   /* If we're prio multiqueue or are using round-robin, make
> > +* sure the number of incoming bands matches the number of
> > +* queues on the device we're associating with.
> > +*/
> > +#ifdef CONFIG_NET_SCH_RR
> > +   if (strcmp("rr", sch->ops->id) == 0)
> > +   if (qopt->bands != sch->dev->egress_subqueue_count)
> > +   return -EINVAL;
> > +#endif
> > +
> > +#ifdef CONFIG_NET_SCH_PRIO_MQ
> > +   if (strcmp("prio", sch->ops->id) == 0)
> > +   if (qopt->bands != sch->dev->egress_subqueue_count)
> > +   return -EINVAL;
> > +#endif
> >   
> 
> For the tenth time now, the user should enable this at 
> runtime. You can't just break things dependant on config options.

I had this in sch_prio and tc before, and was told to remove it because
of ABI issues.  I can put it back in, but I'm not sure what those
previous ABI issues were.  Was it backwards compatibility that you
referred to before that was broken?

As always, the feedback is very much appreciated.  I'll get these fixes
in as soon as possible.

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Patrick McHardy

PJ Waskiewicz wrote:

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 475df84..ca0b352 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -102,8 +102,16 @@ config NET_SCH_ATM
  To compile this code as a module, choose M here: the
  module will be called sch_atm.
 
+config NET_SCH_BANDS

+bool "Multi Band Queueing (PRIO and RR)"
+---help---
+  Say Y here if you want to use n-band multiqueue packet
+  schedulers.  These include a priority-based scheduler and
+  a round-robin scheduler.
+
 config NET_SCH_PRIO
tristate "Multi Band Priority Queueing (PRIO)"
+   depends on NET_SCH_BANDS
---help---
  Say Y here if you want to use an n-band priority queue packet
  scheduler.
@@ -111,6 +119,30 @@ config NET_SCH_PRIO
  To compile this code as a module, choose M here: the
  module will be called sch_prio.
 
+config NET_SCH_PRIO_MQ

+   bool "Multiple hardware queue support for PRIO"
+   depends on NET_SCH_PRIO
+   ---help---
+ Say Y here if you want to allow the PRIO qdisc to assign
+ flows to multiple hardware queues on an ethernet device.  This
+ will still work on devices with 1 queue.
+
+ Consider this scheduler for devices that do not use
+ hardware-based scheduling policies.  Otherwise, use NET_SCH_RR.
+
+ Most people will say N here.
+
+config NET_SCH_RR
+   bool "Multi Band Round Robin Queuing (RR)"
+   depends on NET_SCH_BANDS && NET_SCH_PRIO
+   ---help---
+ Say Y here if you want to use an n-band round robin packet
+ scheduler.
+
+ The module uses sch_prio for its framework and is aliased as
+ sch_rr, so it will load sch_prio, although it is referred
+ to using sch_rr.
  


The dependencies seem to be very confused. SCHED_PRIO does not depend
on anything new, SCH_RR also doesn't depend on anything. SCH_PRIO_MQ
and SCH_RR_MQ (which is missing) depend on SCH_PRIO/SCH_RR. A single
NET_SCH_MULTIQUEUE option seems better than adding one per scheduler
though.


--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -9,6 +9,8 @@
  * Authors:Alexey Kuznetsov, <[EMAIL PROTECTED]>
  * Fixes:   19990609: J Hadi Salim <[EMAIL PROTECTED]>:
  *  Init --  EINVAL when opt undefined
+ * Additions:  Peter P. Waskiewicz Jr. <[EMAIL PROTECTED]>
+ * Added round-robin scheduling for selection at load-time
  


git keeps changelogs, please don't add it here.


  */
 
 #include 

@@ -40,9 +42,13 @@
 struct prio_sched_data
 {
int bands;
+#ifdef CONFIG_NET_SCH_RR
+   int curband; /* for round-robin */
+#endif
struct tcf_proto *filter_list;
u8  prio2band[TC_PRIO_MAX+1];
struct Qdisc *queues[TCQ_PRIO_BANDS];
+   u16 band2queue[TC_PRIO_MAX + 1];
  


Why is this still here? Its a 1:1 mapping.

@@ -211,6 +265,22 @@ static int prio_tune(struct Qdisc *sch, struct rtattr *opt)
return -EINVAL;
}
 
+	/* If we're prio multiqueue or are using round-robin, make

+* sure the number of incoming bands matches the number of
+* queues on the device we're associating with.
+*/
+#ifdef CONFIG_NET_SCH_RR
+   if (strcmp("rr", sch->ops->id) == 0)
+   if (qopt->bands != sch->dev->egress_subqueue_count)
+   return -EINVAL;
+#endif
+
+#ifdef CONFIG_NET_SCH_PRIO_MQ
+   if (strcmp("prio", sch->ops->id) == 0)
+   if (qopt->bands != sch->dev->egress_subqueue_count)
+   return -EINVAL;
+#endif
  


For the tenth time now, the user should enable this at
runtime. You can't just break things dependant on config
options.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] alpha: fix alignment problem in csum_ipv6_magic()

2007-06-21 Thread Andrew Morton
> On Sun, 17 Jun 2007 01:20:20 +0400 Ivan Kokshaysky <[EMAIL PROTECTED]> wrote:
> Hopefully this fixes http://bugzilla.kernel.org/show_bug.cgi?id=8635
> 
> The struct in6_addr passed to csum_ipv6_magic() is 4 byte aligned,
> so we can't use the regular 64-bit loads.
> Since the cost of handling of 4 byte and 1 byte aligned 64-bit data is
> roughly the same, this code can cope with any src/dst [mis]alignment.
> 
> Signed-off-by: Ivan Kokshaysky <[EMAIL PROTECTED]>
> 
> Ivan.
> 
> --- 2.6.22-rc4/arch/alpha/lib/ev6-csum_ipv6_magic.S   Sun Feb  4 21:44:54 2007
> +++ linux/arch/alpha/lib/ev6-csum_ipv6_magic.SSun Jun 17 00:41:53 2007
> @@ -46,6 +46,10 @@
>   * add the 3 low ushorts together, generating a uint
>   * a final add of the 2 lower ushorts
>   * truncating the result.
> + *
> + * Misalignment handling added by Ivan Kokshaysky <[EMAIL PROTECTED]>
> + * The cost is 16 instructions (~8 cycles), including two extra loads which
> + * may cause additional delay in rare cases (load-load replay traps).
>   */
>  
>   .globl csum_ipv6_magic
> @@ -55,25 +59,45 @@
>  csum_ipv6_magic:
>   .prologue 0
>  
> - ldq $0,0($16)   # L : Latency: 3
> + ldq_u   $0,0($16)   # L : Latency: 3
>   inslh   $18,7,$4# U : 00AABBCC
> - ldq $1,8($16)   # L : Latency: 3
> + ldq_u   $1,8($16)   # L : Latency: 3
>   sll $19,8,$7# U : U L U L : 0x 00aabb00
>  
> + and $16,7,$6# E : src misalignment
> + ldq_u   $5,15($16)  # L : Latency: 3
>   zapnot  $20,15,$20  # U : zero extend incoming csum
> - ldq $2,0($17)   # L : Latency: 3
> - sll $19,24,$19  # U : U L L U : 0x00aa bb00
> - inswl   $18,3,$18   # U : 00CCDD00
> + ldq_u   $2,0($17)   # L : U L U L : Latency: 3
> +
> + extql   $0,$6,$0# U :
> + extqh   $1,$6,$22   # U :
> + ldq_u   $3,8($17)   # L : Latency: 3
> + sll $19,24,$19  # U : U U L U : 0x00aa bb00
> +
> + cmoveq  $6,$31,$22  # E : src aligned?
> + ldq_u   $23,15($17) # L : Latency: 3
> + or  $18,$4,$18  # E : 00CCDDAABBCC
> + extql   $1,$6,$1# U : U L L U :
>  
> - ldq $3,8($17)   # L : Latency: 3
> - bis $18,$4,$18  # E : 00CCDDAABBCC
> + or  $0,$22,$0   # E : 1st src word complete
> + extqh   $5,$6,$5# U :
>   addl$19,$7,$19  # E : bbaabb00
> - nop # E : U L U L
> + and $17,7,$6# E : L U L U : dst misalignment
>  
> + inswl   $18,3,$18   # U : 00CCDD00
> + or  $1,$5,$1# E : 2nd src word complete
> + extql   $2,$6,$2# U :
> + extqh   $3,$6,$22   # U : U L U U :
> +
> + cmoveq  $6,$31,$22  # E : dst aligned?
> + extql   $3,$6,$3# U :
>   addq$20,$0,$20  # E : begin summing the words
> + extqh   $23,$6,$23  # U : L U L U :
> +
>   srl $18,16,$4   # U : 00CCDDAA
> + or  $2,$22,$2   # E : 1st dst word complete
>   zap $19,0x3,$19 # U : bbaa
> - nop # E : L U U L
> + or  $3,$23,$3   # E : U L U L : 2nd dst word complete
>  
>   cmpult  $20,$0,$0   # E :
>   addq$20,$1,$20  # E :
> --- 2.6.22-rc4/arch/alpha/lib/csum_ipv6_magic.S   Sun Feb  4 21:44:54 2007
> +++ linux/arch/alpha/lib/csum_ipv6_magic.SSun Jun 17 00:29:28 2007
> @@ -7,6 +7,9 @@
>   *__u32 len,
>   *unsigned short proto,
>   *unsigned int csum);
> + *
> + * Misalignment handling (which costs 16 instructions / 8 cycles) 
> + * added by Ivan Kokshaysky <[EMAIL PROTECTED]>
>   */
>  
>   .globl csum_ipv6_magic
> @@ -16,37 +19,57 @@
>  csum_ipv6_magic:
>   .prologue 0
>  
> - ldq $0,0($16)   # e0: load src & dst addr words
> + ldq_u   $0,0($16)   # e0: load src & dst addr words
>   zapnot  $20,15,$20  # .. e1 : zero extend incoming csum
>   extqh   $18,1,$4# e0: byte swap len & proto while we wait
> - ldq $1,8($16)   # .. e1 :
> + ldq_u   $21,7($16)  # .. e1 : handle misalignment
>  
>   extbl   $18,1,$5# e0:
> - ldq $2,0($17)   # .. e1 :
> + ldq_u   $1,8($16)   # .. e1 :
>   extbl   $18,2,$6# e0:
> - ldq $3,8($17)   # .. e1 :
> + ldq_u   $22,15($16) # .. e1 :
>  
>   extbl   $18,3,$18   # e0:
> + ldq_u   $2,0($17)   # .. e1 :
>   sra $4,32,$4# e0:
> + ldq_u   $23,7($17)  # .. e1 :
> +
> + extql   $0,$16,$0   # e0:
> + ldq_u   $3,8($17)   # .. e1 :
> + extqh   $21,$16,$21 # e0:
> + ldq_u   $24,15($17) # .. e1 :
> +
>   sll $5,16,$5# e0:
> +   

RE: [PATCH] NET: Multiple queue hardware support

2007-06-21 Thread Waskiewicz Jr, Peter P
> PJ Waskiewicz wrote:
> > I did not modify other users of netif_queue_stopped() in 
> > net/core/netpoll.c, net/core/dev.c, or net/core/pktgen.c, since no 
> > classification occurs for the skb being sent to the device.  
> > Therefore, packets should always be ending up in queue 0, 
> so there's no need to check the subqueue status either.
> >   
> 
> Thats not correct. Subqueue 0 may be full and the queue still running.
> 
> I'll look over the patches later.

I'm working something up to address this.  The last time I thought about
this, I had issues with software devices, such as loopback.  They
weren't allocating any subqueues at all, so they would call
netif_subqueue_stopped() and panic the kernel.  However, now with Dave's
request to index egress_subqueue, the first queue is allocated for
everyone, so loopback and other software devices should be happy.  Let
me put these checks back in, test it out, and resend if I don't see any
issues.

Sorry for the thrash,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiple queue hardware support

2007-06-21 Thread Patrick McHardy

PJ Waskiewicz wrote:

I did not modify other users of netif_queue_stopped() in net/core/netpoll.c,
net/core/dev.c, or net/core/pktgen.c, since no classification occurs for
the skb being sent to the device.  Therefore, packets should always be
ending up in queue 0, so there's no need to check the subqueue status either.
  


Thats not correct. Subqueue 0 may be full and the queue still running.

I'll look over the patches later.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iproute2: Added support for RR qdisc (sch_rr)

2007-06-21 Thread PJ Waskiewicz
Add tc support for the sch_rr qdisc.  This qdisc supports multiple queues
on hardware.  The syntax for sch_rr is the same as sch_prio.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 tc/Makefile |1 +
 tc/q_rr.c   |  113 +++
 2 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/tc/Makefile b/tc/Makefile
index 9d618ff..62e2697 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -9,6 +9,7 @@ TCMODULES += q_fifo.o
 TCMODULES += q_sfq.o
 TCMODULES += q_red.o
 TCMODULES += q_prio.o
+TCMODULES += q_rr.o
 TCMODULES += q_tbf.o
 TCMODULES += q_cbq.o
 TCMODULES += f_rsvp.o
diff --git a/tc/q_rr.c b/tc/q_rr.c
new file mode 100644
index 000..c5c1dc8
--- /dev/null
+++ b/tc/q_rr.c
@@ -0,0 +1,113 @@
+/*
+ * q_rr.c  RR.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:PJ Waskiewicz, <[EMAIL PROTECTED]>
+ * Original Authors:   Alexey Kuznetsov, <[EMAIL PROTECTED]> (from PRIO)
+ *
+ * Changes:
+ *
+ * Ole Husgaard <[EMAIL PROTECTED]>: 990513: prio2band map was always reset.
+ * J Hadi Salim <[EMAIL PROTECTED]>: 990609: priomap fix.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... rr bands NUMBER priomap P1 P2...\n");
+}
+
+#define usage() return(-1)
+
+static int rr_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct 
nlmsghdr *n)
+{
+   int ok = 0;
+   int pmap_mode = 0;
+   int idx = 0;
+   struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1 }};
+
+   while (argc > 0) {
+   if (strcmp(*argv, "bands") == 0) {
+   if (pmap_mode)
+   explain();
+   NEXT_ARG();
+   if (get_integer(&opt.bands, *argv, 10)) {
+   fprintf(stderr, "Illegal \"bands\"\n");
+   return -1;
+   }
+   ok++;
+   } else if (strcmp(*argv, "priomap") == 0) {
+   if (pmap_mode) {
+   fprintf(stderr, "Error: duplicate priomap\n");
+   return -1;
+   }
+   pmap_mode = 1;
+   } else if (strcmp(*argv, "help") == 0) {
+   explain();
+   return -1;
+   } else {
+   unsigned band;
+   if (!pmap_mode) {
+   fprintf(stderr, "What is \"%s\"?\n", *argv);
+   explain();
+   return -1;
+   }
+   if (get_unsigned(&band, *argv, 10)) {
+   fprintf(stderr, "Illegal \"priomap\" 
element\n");
+   return -1;
+   }
+   if (band > opt.bands) {
+   fprintf(stderr, "\"priomap\" element is out of 
bands\n");
+   return -1;
+   }
+   if (idx > TC_PRIO_MAX) {
+   fprintf(stderr, "\"priomap\" index > 
TC_RR_MAX=%u\n", TC_PRIO_MAX);
+   return -1;
+   }
+   opt.priomap[idx++] = band;
+   }
+   argc--; argv++;
+   }
+
+   addattr_l(n, 1024, TCA_OPTIONS, &opt, sizeof(opt));
+   return 0;
+}
+
+int rr_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+   int i;
+   struct tc_prio_qopt *qopt;
+
+   if (opt == NULL)
+   return 0;
+
+   if (RTA_PAYLOAD(opt)  < sizeof(*qopt))
+   return -1;
+   qopt = RTA_DATA(opt);
+   fprintf(f, "bands %u priomap ", qopt->bands);
+   for (i=0; i <= TC_PRIO_MAX; i++)
+   fprintf(f, " %d", qopt->priomap[i]);
+   return 0;
+}
+
+struct qdisc_util rr_qdisc_util = {
+   .id = "rr",
+   .parse_qopt = rr_parse_opt,
+   .print_qopt = rr_print_opt,
+};
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread PJ Waskiewicz
Add the new sch_rr qdisc for multiqueue network device support.
Allow sch_prio to be compiled with or without multiqueue hardware
support.

sch_rr is part of sch_prio, and is referenced from MODULE_ALIAS.  This
was done since sch_prio and sch_rr only differ in their dequeue routine.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 net/sched/Kconfig   |   32 
 net/sched/sch_generic.c |3 +
 net/sched/sch_prio.c|  123 ---
 3 files changed, 150 insertions(+), 8 deletions(-)

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 475df84..ca0b352 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -102,8 +102,16 @@ config NET_SCH_ATM
  To compile this code as a module, choose M here: the
  module will be called sch_atm.
 
+config NET_SCH_BANDS
+bool "Multi Band Queueing (PRIO and RR)"
+---help---
+  Say Y here if you want to use n-band multiqueue packet
+  schedulers.  These include a priority-based scheduler and
+  a round-robin scheduler.
+
 config NET_SCH_PRIO
tristate "Multi Band Priority Queueing (PRIO)"
+   depends on NET_SCH_BANDS
---help---
  Say Y here if you want to use an n-band priority queue packet
  scheduler.
@@ -111,6 +119,30 @@ config NET_SCH_PRIO
  To compile this code as a module, choose M here: the
  module will be called sch_prio.
 
+config NET_SCH_PRIO_MQ
+   bool "Multiple hardware queue support for PRIO"
+   depends on NET_SCH_PRIO
+   ---help---
+ Say Y here if you want to allow the PRIO qdisc to assign
+ flows to multiple hardware queues on an ethernet device.  This
+ will still work on devices with 1 queue.
+
+ Consider this scheduler for devices that do not use
+ hardware-based scheduling policies.  Otherwise, use NET_SCH_RR.
+
+ Most people will say N here.
+
+config NET_SCH_RR
+   bool "Multi Band Round Robin Queuing (RR)"
+   depends on NET_SCH_BANDS && NET_SCH_PRIO
+   ---help---
+ Say Y here if you want to use an n-band round robin packet
+ scheduler.
+
+ The module uses sch_prio for its framework and is aliased as
+ sch_rr, so it will load sch_prio, although it is referred
+ to using sch_rr.
+
 config NET_SCH_RED
tristate "Random Early Detection (RED)"
---help---
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9461e8a..203d5c4 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -168,7 +168,8 @@ static inline int qdisc_restart(struct net_device *dev)
spin_unlock(&dev->queue_lock);
 
ret = NETDEV_TX_BUSY;
-   if (!netif_queue_stopped(dev))
+   if (!netif_queue_stopped(dev) &&
+   !netif_subqueue_stopped(dev, skb->queue_mapping))
/* churn baby churn .. */
ret = dev_hard_start_xmit(skb, dev);
 
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 6d7542c..4eb3ba5 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -9,6 +9,8 @@
  * Authors:Alexey Kuznetsov, <[EMAIL PROTECTED]>
  * Fixes:   19990609: J Hadi Salim <[EMAIL PROTECTED]>:
  *  Init --  EINVAL when opt undefined
+ * Additions:  Peter P. Waskiewicz Jr. <[EMAIL PROTECTED]>
+ * Added round-robin scheduling for selection at load-time
  */
 
 #include 
@@ -40,9 +42,13 @@
 struct prio_sched_data
 {
int bands;
+#ifdef CONFIG_NET_SCH_RR
+   int curband; /* for round-robin */
+#endif
struct tcf_proto *filter_list;
u8  prio2band[TC_PRIO_MAX+1];
struct Qdisc *queues[TCQ_PRIO_BANDS];
+   u16 band2queue[TC_PRIO_MAX + 1];
 };
 
 
@@ -70,14 +76,19 @@ prio_classify(struct sk_buff *skb, struct Qdisc *sch, int 
*qerr)
 #endif
if (TC_H_MAJ(band))
band = 0;
+   skb->queue_mapping =
+   q->band2queue[q->prio2band[band&TC_PRIO_MAX]];
return q->queues[q->prio2band[band&TC_PRIO_MAX]];
}
band = res.classid;
}
band = TC_H_MIN(band) - 1;
-   if (band >= q->bands)
+   if (band >= q->bands) {
+   skb->queue_mapping = q->band2queue[q->prio2band[0]];
return q->queues[q->prio2band[0]];
+   }
 
+   skb->queue_mapping = q->band2queue[band];
return q->queues[band];
 }
 
@@ -144,17 +155,59 @@ prio_dequeue(struct Qdisc* sch)
struct Qdisc *qdisc;
 
for (prio = 0; prio < q->bands; prio++) {
-   qdisc = q->queues[prio];
-   skb = qdisc->dequeue(qdisc);
-   if (skb) {
-   sch->q.qlen--;
-   return skb;
+   /* Check if the target subqueue is available before
+* pulling an skb.  This way we avoid excessive req

[PATCH] iproute2: sch_rr support in tc

2007-06-21 Thread PJ Waskiewicz
This patch is to support the new sch_rr (round-robin) qdisc being proposed
in NET for multiqueue network device support in the Linux network stack.
It uses q_prio.c as the template, since the qdiscs are nearly identical,
outside of the ->dequeue() routine.

I'm soliciting feedback for a 2.6.23 multiqueue submission.  Thanks.

-- 
PJ Waskiewicz <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] NET: [DOC] Multiqueue hardware support documentation

2007-06-21 Thread PJ Waskiewicz
Add a brief howto to Documentation/networking for multiqueue.  It
explains how to use the multiqueue API in a driver to support
multiqueue paths from the stack, as well as the qdiscs to use for
feeding a multiqueue device.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 Documentation/networking/multiqueue.txt |  100 +++
 1 files changed, 100 insertions(+), 0 deletions(-)

diff --git a/Documentation/networking/multiqueue.txt 
b/Documentation/networking/multiqueue.txt
new file mode 100644
index 000..55b2db8
--- /dev/null
+++ b/Documentation/networking/multiqueue.txt
@@ -0,0 +1,100 @@
+
+   HOWTO for multiqueue network device support
+   ===
+
+Section 1: Base driver requirements for implementing multiqueue support
+Section 2: Qdisc support for multiqueue devices
+Section 3: Brief howto using PRIO or RR for multiqueue devices
+
+
+Intro: Kernel support for multiqueue devices
+-
+
+Kernel support for multiqueue devices is only an API that is presented to the
+netdevice layer for base drivers to implement.  This feature is part of the
+core networking stack, and all network devices will be running on the
+multiqueue-aware stack.  If a base driver only has one queue, then these
+changes are transparent to that driver.
+
+
+Section 1: Base driver requirements for implementing multiqueue support
+---
+
+Base drivers are required to use the new alloc_etherdev_mq() or
+alloc_netdev_mq() functions to allocate the subqueues for the device.  The
+underlying kernel API will take care of the allocation and deallocation of
+the subqueue memory, as well as netdev configuration of where the queues
+exist in memory.
+
+The base driver will also need to manage the queues as it does the global
+netdev->queue_lock today.  Therefore base drivers should use the
+netif_{start|stop|wake}_subqueue() functions to manage each queue while the
+device is still operational.  netdev->queue_lock is still used when the device
+comes online or when it's completely shut down (unregister_netdev(), etc.).
+
+Finally, the base driver should indicate that it is a multiqueue device.  The
+feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features
+bitmap on device initialization.  Below is an example from e1000:
+
+#ifdef CONFIG_E1000_MQ
+   if ( (adapter->hw.mac.type == e1000_82571) ||
+(adapter->hw.mac.type == e1000_82572) ||
+(adapter->hw.mac.type == e1000_80003es2lan))
+   netdev->features |= NETIF_F_MULTI_QUEUE;
+#endif
+
+
+Section 2: Qdisc support for multiqueue devices
+---
+
+Currently two qdiscs support multiqueue devices.  A new round-robin qdisc,
+sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to
+bands and queues, and will store the queue mapping into skb->queue_mapping.
+Use this field in the base driver to determine which queue to send the skb
+to.
+
+sch_rr has been added for hardware that doesn't want scheduling policies from
+software, so it's a straight round-robin qdisc.  It uses the same syntax and
+classification priomap that sch_prio uses, so it should be intuitive to
+configure for people who've used sch_prio.
+
+The PRIO qdisc naturally plugs into a multiqueue device.  If PRIO has been
+built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of
+bands requested is equal to the number of queues on the hardware.  If they
+are equal, it sets a one-to-one mapping up between the queues and bands.  If
+they're not equal, it will not load the qdisc.  This is the same behavior
+for RR.  Once the association is made, any skb that is classified will have
+skb->queue_mapping set, which will allow the driver to properly queue skb's
+to multiple queues.
+
+
+Section 3: Brief howto using PRIO and RR for multiqueue devices
+---
+
+The userspace command 'tc,' part of the iproute2 package, is used to configure
+qdiscs.  To add the PRIO qdisc to your network device, assuming the device is
+called eth0, run the following command:
+
+# tc qdisc add dev eth0 root handle 1: prio bands 4
+
+This will create 4 bands, 0 being highest priority, and associate those bands
+to the queues on your NIC.  Assuming eth0 has 4 Tx queues, the band mapping
+would look like:
+
+band 0 => queue 0
+band 1 => queue 1
+band 2 => queue 2
+band 3 => queue 3
+
+Traffic will begin flowing through each queue if your TOS values are assigning
+traffic across the various bands.  For example, ssh traffic will always try to
+go out band 0 based on TOS -> Linux priority conversion (realtime traffic),
+so it will be sent out queue 0.  ICMP traffic (pings) fall into the "normal"
+traffic classification, which is band 1.  Therefore pings will be sen

[PATCH 2/3] NET: [CORE] Stack changes to add multiqueue hardware support API

2007-06-21 Thread PJ Waskiewicz
Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them
at the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.

Signed-off-by: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>
---

 include/linux/etherdevice.h |3 +-
 include/linux/netdevice.h   |   62 ++-
 include/linux/skbuff.h  |4 ++-
 net/core/dev.c  |   20 ++
 net/core/skbuff.c   |3 ++
 net/ethernet/eth.c  |9 +++---
 6 files changed, 87 insertions(+), 14 deletions(-)

diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index f48eb89..b3fbb54 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -39,7 +39,8 @@ extern void   eth_header_cache_update(struct hh_cache 
*hh, struct net_device *dev
 extern int eth_header_cache(struct neighbour *neigh,
 struct hh_cache *hh);
 
-extern struct net_device *alloc_etherdev(int sizeof_priv);
+extern struct net_device *alloc_etherdev_mq(int sizeof_priv, int queue_count);
+#define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
 
 /**
  * is_zero_ether_addr - Determine if give Ethernet address is all zeros.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e7913ee..6509eb4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -108,6 +108,14 @@ struct wireless_dev;
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
+struct net_device_subqueue
+{
+   /* Give a control state for each queue.  This struct may contain
+* per-queue locks in the future.
+*/
+   unsigned long   state;
+};
+
 /*
  * Network device statistics. Akin to the 2.0 ether stats but
  * with byte counters.
@@ -325,6 +333,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED1024/* Device cannot handle VLAN 
packets */
 #define NETIF_F_GSO2048/* Enable software GSO. */
 #define NETIF_F_LLTX   4096/* LockLess TX */
+#define NETIF_F_MULTI_QUEUE16384   /* Has multiple TX/RX queues */
 
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
@@ -543,6 +552,10 @@ struct net_device
 
/* rtnetlink link ops */
const struct rtnl_link_ops *rtnl_link_ops;
+
+   /* The TX queue control structures */
+   int egress_subqueue_count;
+   struct net_device_subqueue  egress_subqueue[0];
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -705,6 +718,48 @@ static inline int netif_running(const struct net_device 
*dev)
return test_bit(__LINK_STATE_START, &dev->state);
 }
 
+/*
+ * Routines to manage the subqueues on a device.  We only need start
+ * stop, and a check if it's stopped.  All other device management is
+ * done at the overall netdevice level.
+ * Also test the device if we're multiqueue.
+ */
+static inline void netif_start_subqueue(struct net_device *dev, u16 
queue_index)
+{
+   clear_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_stop_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   set_bit(__LINK_STATE_XOFF, &dev->egress_subqueue[queue_index].state);
+}
+
+static inline int netif_subqueue_stopped(const struct net_device *dev,
+ u16 queue_index)
+{
+   return test_bit(__LINK_STATE_XOFF,
+   &dev->egress_subqueue[queue_index].state);
+}
+
+static inline void netif_wake_subqueue(struct net_device *dev, u16 queue_index)
+{
+#ifdef CONFIG_NETPOLL_TRAP
+   if (netpoll_trap())
+   return;
+#endif
+   if (test_and_clear_bit(__LINK_STATE_XOFF,
+  &dev->egress_subqueue[queue_index].state))
+   __netif_schedule(dev);
+}
+
+static inline int netif_is_multiqueue(const struct net_device *dev)
+{
+   return (!!(NETIF_F_MULTI_QUEUE & dev->features));
+}
 
 /* Use this variant when it is known for sure that it
  * is executing from interrupt context.
@@ -995,8 +1050,11 @@ static inline void netif_tx_disable(struct net_device 
*dev)
 extern voidether_setup(struct net_device *dev);
 
 /* Support for loadable net-drivers */
-extern struct net_device *alloc_netdev(int sizeof_priv, const char *name,
-  void (*setup)(struct net_device *));
+extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
+ void (*setup)(struct net_device *),
+ int queue_count);
+#define alloc_netdev(sizeof_priv, name, setup) \
+   alloc_netdev_mq(

[PATCH] NET: Multiple queue hardware support

2007-06-21 Thread PJ Waskiewicz
Please consider these patches for 2.6.23 inclusion.

Updates since the last submission:

1. skb->queue_mapping moved into the iff cacheline.  I looked at moving
   iff and queue_mapping, but there wasn't enough room anywhere else to
   logically group these in a different cacheline that I could see.  Thanks
   Patrick McHardy.

2. netdev->egress_subqueue is now indexed thanks to Dave Miller.

3. sch_rr is now a MODULE_ALIAS of sch_prio.  Thanks Patrick McHardy.

4. Both sch_rr and multiqueue sch_prio expect the number of bands to
   equal the number of queues on the netdev.

This patchset is an updated version of previous multiqueue network device
support patches.  The general approach of introducing a new API for multiqueue
network devices to register with the stack has remained.  The changes include
adding a round-robin qdisc, heavily based on sch_prio, which will allow
queueing to hardware with no OS-enforced queuing policy.  sch_prio still has
the multiqueue code in it, but has a Kconfig option to compile it out of the
qdisc.  This allows people with hardware containing scheduling policies to
use sch_rr (round-robin), and others without scheduling policies in hardware
to continue using sch_prio if they wish to have some notion of scheduling
priority.

The patches being sent are split into Documentation, Qdisc changes, and
core stack changes.  The requested e1000 changes are still being resolved,
and will be sent at a later date.

I did not modify other users of netif_queue_stopped() in net/core/netpoll.c,
net/core/dev.c, or net/core/pktgen.c, since no classification occurs for
the skb being sent to the device.  Therefore, packets should always be
ending up in queue 0, so there's no need to check the subqueue status either.

The patches to iproute2 for tc will be sent separately, to support sch_rr.

-- 
PJ Waskiewicz <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Caitlin Bestler
[EMAIL PROTECTED] wrote:
> From: [EMAIL PROTECTED] (Eric W. Biederman)
> Date: Thu, 21 Jun 2007 13:08:12 -0600
> 
>> However this just seems to allow a card to decode multiple mac
>> addresses which in some oddball load balancing configurations may
>> actually be useful, but it seems fairly limited.
>> 
>> Do you have a specific use case you envision for this multiple mac
>> functionality?
> 
> Virtualization.
> 
> If you can't tell the ethernet card that more than 1 MAC
> address are for it, you have to turn the thing into promiscuous mode.
> 
> Networking on virtualization is typically done by giving each
> guest a unique MAC address, the guests have a virtual network
> device that connects to the control node (or dom0 in Xen
> parlace) and/or other guests.
> 
> The control node has a switch that routes the packets from
> the guests either to other guests or out the real ethernet interface.
> 
> Each guest gets a unique MAC so that the switch can know
> which guest an incoming packet is for.

The same software switch could also throw away the excess
frames that promiscuous mode would have admitted. Unless
the misdirected frames were common it would not seem to 
be a major CPU burden.

Keep in mind that the only MAC addresses that would have
been transmitted are the ones that the input filter would
have listed. 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread Rick Jones
On Tue, 2007-06-19 at 17:21 +0400, Evgeniy Polyakov wrote:
> Hi.
> 
> On Thu, Jun 07, 2007 at 07:43:49AM -0400, jamal ([EMAIL PROTECTED]) wrote:
> > Folks, we need help. Please run this on different hardware. Evgeniy, i
> > thought this kind of stuff excites you, no? ;-> (wink, wink).
> > Only the sender needs the patch but the receiver must be a more powerful
> > machine (so that it is not the bottleneck).
> 
> I've ran several simple tests with desktop e1000 adapter I managed to
> find.
> 
> Test machine is amd athlon64 3500+ with 1gb of ram.
> Another point is dektop core duo 3.4 ghz with 2 gb of ram and sky2
> driver.
> 
> Simple test included test -> desktop and vice versa traffic with 128 and
> 4096 block size in netperf-2.4.3 setup.

Is that in conjunction with setting the test-specific -D to set
TCP_NODELAY, or was Nagle left-on?  If the latter, perhaps timing issues
could be why the confidence intervals weren't hit since the relative
batching of 128byte sends into larger segments is something of a race.

rick jones

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Eric W. Biederman
Patrick McHardy <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> I'm trying to understand what the point of this patch is.
>>
>> In once sense I find the concept of filtering and listening for multiple
>> mac addresses very interesting, especially if we could break out different
>> streams of traffic by destination mac address into separate network devices.
>> This would remove the need to any kind of ethernet tunnel and makes multiple
>> network namespaces much more pleasant.
>>
>> However this just seems to allow a card to decode multiple mac addresses
>> which in some oddball load balancing configurations may actually be
>> useful, but it seems fairly limited.
>>
>> Do you have a specific use case you envision for this multiple mac
>> functionality?
>>
>
> Yes, please see the MACVLAN patch I posted one or two days earlier.

Thanks.  That is what I was envisioning.  I keep suspecting one of
the cool multi-rx queue nics my start doing some of the demux in hardware.
But whatever if it works and is relatively fast it is good enough for me.

> 8021q can also make use of it and Dave mentioned some virtualization
> devices want this as well.

Right makes sense.  And ethernet bridging (which is the general case
of the virtualization Dave mentioned should also be able to take
advantage of multiple unicast addresses).  So this definitely make
sense.

Have you done any performance testing with the macvlan code?  With
the ethernet tunnel device we keep getting copied unicast packets on
some path or other which slowed things down.  Simply not doing the
firewalling until the packets have made it through the macvlan device
should help here.

For the macvlan code do we need to do anything special if we transmit
to a mac we would normally receive?  Another unicast mac of the same
nic for example.

For the macvlan hash you just use an upper byte.  Is that just a
simple starting place, or do we not need a more complex hash.

Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Waskiewicz Jr, Peter P
> > Yes.  This is the tc command I used to configure the qdisc (with 
> > q_rr.c attached from my patches iproute2 package):
> >
> > # tc qdisc add dev eth2 root handle 1: rr bands 8 RTNETLINK 
> answers: 
> > No such file or directory
> 
> Again, I bet you don't have CONFIG_NET_SCH_RR enabled:

Chalk this up to serious user error.  Having CONFIG_NET_SCH_RR=m isn't
defining it...I'm not sure why I thought that was correct.  Thanks for
bearing with me Patrick.  Working as intended now.  :-)

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Patrick McHardy

Eric W. Biederman wrote:

I'm trying to understand what the point of this patch is.

In once sense I find the concept of filtering and listening for multiple
mac addresses very interesting, especially if we could break out different
streams of traffic by destination mac address into separate network devices.
This would remove the need to any kind of ethernet tunnel and makes multiple
network namespaces much more pleasant.

However this just seems to allow a card to decode multiple mac addresses
which in some oddball load balancing configurations may actually be
useful, but it seems fairly limited.

Do you have a specific use case you envision for this multiple mac
functionality?
  


Yes, please see the MACVLAN patch I posted one or two days earlier.
8021q can also make use of it and Dave mentioned some virtualization
devices want this as well.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 21 Jun 2007 13:08:12 -0600

> However this just seems to allow a card to decode multiple mac addresses
> which in some oddball load balancing configurations may actually be
> useful, but it seems fairly limited.
> 
> Do you have a specific use case you envision for this multiple mac
> functionality?

Virtualization.

If you can't tell the ethernet card that more than 1 MAC address
are for it, you have to turn the thing into promiscuous mode.

Networking on virtualization is typically done by giving each
guest a unique MAC address, the guests have a virtual network
device that connects to the control node (or dom0 in Xen parlace)
and/or other guests.

The control node has a switch that routes the packets from the
guests either to other guests or out the real ethernet interface.

Each guest gets a unique MAC so that the switch can know which
guest an incoming packet is for.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Patrick McHardy

Waskiewicz Jr, Peter P wrote:
The code looks correct. Are you sure you had the config 
option enabled during your test?




Yes.  This is the tc command I used to configure the qdisc (with q_rr.c
attached from my patches iproute2 package):

# tc qdisc add dev eth2 root handle 1: rr bands 8
RTNETLINK answers: No such file or directory


Again, I bet you don't have CONFIG_NET_SCH_RR enabled:


# lsmod|grep prio
# tc qdisc add dev dummy0 root handle 1: rr bands 8
# lsmod|grep prio
sch_prio5760  1

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC NET 00/02]: Secondary unicast address support

2007-06-21 Thread Eric W. Biederman
Patrick McHardy <[EMAIL PROTECTED]> writes:

> These two patches contain a first short at secondary unicast address support.
> I'm still working on converting macvlan as an example, but since I'm about to
> leave for tonight I thougth I'd get them out for some comments now.
>
> The patch adds two new functions dev_unicast_add and dev_unicast_delete to
> add/remove addresses. Similar to dev_mc_add/dev_mc_delete they do refcounting
> of the addresses and the address on a list associated with the device.
>
> dev_address_upload is responsible for uploading both the multicast and
> unicast list to the device. Devices that are capable of filtering multiple
> unicast addresses need to provide a function dev->set_address_list that
> deals with setting both unicast and multicast address filters. This seemed
> like the easiest way for chips containing filters that can be used for
> any address type, also parts of the logic when to use HW filters is similar
> for unicast and multicast addresses. Devices not providing this function
> are put in promiscous mode when secondary addresses are present and the
> old set_multicast_list function is called to take care of multicast
> filtering.
>
> The dev_uc_list structure is kept similar to dev_mc_list to allow easier
> integration in existing "fill address filters" loops.
>
> E1000 is converted as an example, the patch worked fine in some limited
> testing.

>
> Comments welcome.

I'm trying to understand what the point of this patch is.

In once sense I find the concept of filtering and listening for multiple
mac addresses very interesting, especially if we could break out different
streams of traffic by destination mac address into separate network devices.
This would remove the need to any kind of ethernet tunnel and makes multiple
network namespaces much more pleasant.

However this just seems to allow a card to decode multiple mac addresses
which in some oddball load balancing configurations may actually be
useful, but it seems fairly limited.

Do you have a specific use case you envision for this multiple mac
functionality?

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Waskiewicz Jr, Peter P
> Waskiewicz Jr, Peter P wrote:
> >> Please post the code.
> >>
> >> 
> >
> > Code is attached.  Please forgive the attachment and any whitespace 
> > damage...currently using Doubtlook to send this (cringe).
> >   
> 
> The code looks correct. Are you sure you had the config 
> option enabled during your test?
>

Yes.  This is the tc command I used to configure the qdisc (with q_rr.c
attached from my patches iproute2 package):

# tc qdisc add dev eth2 root handle 1: rr bands 8
RTNETLINK answers: No such file or directory

At this point, sch_prio gets loaded correctly, but it obviously fails to
finish loading the qdisc.  Using prio works though:

# tc qdisc add dev eth2 root handle 1: prio bands 8

And yes, the NIC I'm working with has 8 queues, just to be clear.  Any
help is definitely appreciated; I'm going to keep this copy of the code
for now, but am going to get the separate module written back up just in
case this can't be solved in the short-term.  This is the only piece
keeping me from sending these patches back for consideration, so I'll
keep the parallel effort going.

Thanks Patrick,

-PJ Waskiewicz


q_rr.c
Description: q_rr.c


Re: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Patrick McHardy

Waskiewicz Jr, Peter P wrote:

Please post the code.




Code is attached.  Please forgive the attachment and any whitespace
damage...currently using Doubtlook to send this (cringe).
  


The code looks correct. Are you sure you had the config option
enabled during your test?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Waskiewicz Jr, Peter P

> Please post the code.
> 

Code is attached.  Please forgive the attachment and any whitespace
damage...currently using Doubtlook to send this (cringe).

Thanks,
-PJ Waskiewicz


sch_prio.c
Description: sch_prio.c


Re: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Patrick McHardy

Waskiewicz Jr, Peter P wrote:
BTw, couldn't you just merge sch_rr with prio? AFAICT you 
only need a new dequeue function, a new struct Qdisc_ops and 
a MODULE_ALIAS.


Ok, I have this somewhat working, but need to poll for some help from
the community.  I used MODULE_ALIAS("sch_rr") in sch_prio.c, and
modprobe is happily loading sch_prio.ko when I ask for sch_rr.ko.  It
also recognizes the correct ops struct to associate with the instance of
the module.  However, when I try to load the qdisc via tc (modified
version that knows sch_rr), I'm getting No Such File or Directory from
RTNETLINK.  It's looking for sch_rr.ko, and is bailing.  I've scoured
the code looking for a reason why, and am drawing a blank.  I'll
continue looking, but if this sounds familiar to someone who knows how
to get around this, please reply and let me know.


Please post the code.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-06-21 Thread Waskiewicz Jr, Peter P

> BTw, couldn't you just merge sch_rr with prio? AFAICT you 
> only need a new dequeue function, a new struct Qdisc_ops and 
> a MODULE_ALIAS.

Ok, I have this somewhat working, but need to poll for some help from
the community.  I used MODULE_ALIAS("sch_rr") in sch_prio.c, and
modprobe is happily loading sch_prio.ko when I ask for sch_rr.ko.  It
also recognizes the correct ops struct to associate with the instance of
the module.  However, when I try to load the qdisc via tc (modified
version that knows sch_rr), I'm getting No Such File or Directory from
RTNETLINK.  It's looking for sch_rr.ko, and is bailing.  I've scoured
the code looking for a reason why, and am drawing a blank.  I'll
continue looking, but if this sounds familiar to someone who knows how
to get around this, please reply and let me know.

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] NetXen: Make use of per port interrupt mask scheme

2007-06-21 Thread Mithlesh Thukral
NetXen: Make use of per port interrupt scheme.
This patch makes the driver inform the firmware that it can support the
per port interrupt mask scheme. The driver too needs to check whether
the firmware also supports the per port interrupt scheme. If yes, 
then interrupt for each port is enabled/disabled instead of disabling 
for the entire card as it was being done till now.

Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 drivers/net/netxen/netxen_nic.h  |  104 +
 drivers/net/netxen/netxen_nic_hw.c   |5 -
 drivers/net/netxen/netxen_nic_init.c |2 
 drivers/net/netxen/netxen_nic_main.c |   28 +++--
 drivers/net/netxen/netxen_nic_phan_reg.h |   14 ++
 5 files changed, 121 insertions(+), 32 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic.h b/drivers/net/netxen/netxen_nic.h
index 91f25e0..62aeab9 100644
--- a/drivers/net/netxen/netxen_nic.h
+++ b/drivers/net/netxen/netxen_nic.h
@@ -937,6 +937,7 @@ struct netxen_adapter {
struct netxen_ring_ctx *ctx_desc;
struct pci_dev *ctx_desc_pdev;
dma_addr_t ctx_desc_phys_addr;
+   int intr_scheme;
int (*enable_phy_interrupts) (struct netxen_adapter *);
int (*disable_phy_interrupts) (struct netxen_adapter *);
void (*handle_phy_intr) (struct netxen_adapter *);
@@ -1080,37 +1081,102 @@ struct net_device_stats *netxen_nic_get_
 
 static inline void netxen_nic_disable_int(struct netxen_adapter *adapter)
 {
-   /*
-* ISR_INT_MASK: Can be read from window 0 or 1.
-*/
-   writel(0x7ff, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
+   uint32_tmask = 0x7ff;
+   int count = 0;
+
+   DPRINTK(1,INFO,"Entered ISR Disable \n");
+
+   switch(adapter->portnum) {
+   case 0:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_0));
+   break;
+   case 1:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_1));
+   break;
+   case 2:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_2));
+   break;
+   case 3:
+   writel(0x0, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_3));
+   break;
+   }
+
+   if (adapter->intr_scheme != -1 &&
+   adapter->intr_scheme != INTR_SCHEME_PERPORT) {
+   writel(mask,
+   (void *)(PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_MASK)));
+   }
 
+   /* Window = 0 or 1 */
+   if (!(adapter->flags & NETXEN_NIC_MSI_ENABLED)) {
+   do {
+   writel(0x, (void *)
+   (PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_TARGET_STATUS)));
+   mask = readl((void *)
+   (pci_base_offset(adapter, 
ISR_INT_VECTOR)));
+   } while (((mask & 0x80) != 0) && (++count < 32));
+
+   if ((mask & 0x80) != 0) {
+   printk(KERN_NOTICE "Could not disable interrupt 
completely\n");
+   }
+   }
+
+   DPRINTK(1,INFO,"Done with Disable Int\n");
+
+   return;
 }
 
 static inline void netxen_nic_enable_int(struct netxen_adapter *adapter)
 {
u32 mask;
 
-   switch (adapter->ahw.board_type) {
-   case NETXEN_NIC_GBE:
-   mask = 0x77b;
-   break;
-   case NETXEN_NIC_XGBE:
-   mask = 0x77f;
-   break;
-   default:
-   mask = 0x7ff;
-   break;
-   }
+   DPRINTK(1, INFO, "Entered ISR Enable \n");
+
+   if (adapter->intr_scheme != -1 &&
+   adapter->intr_scheme != INTR_SCHEME_PERPORT) {
+   switch (adapter->ahw.board_type) {
+   case NETXEN_NIC_GBE:
+   mask  =  0x77b;
+   break;
+   case NETXEN_NIC_XGBE:
+   mask  =  0x77f;
+   break;
+   default:
+   mask  =  0x7ff;
+   break;
+   }
 
-   writel(mask, PCI_OFFSET_SECOND_RANGE(adapter, ISR_INT_MASK));
+   writel(mask,
+   (void *)(PCI_OFFSET_SECOND_RANGE(adapter, 
ISR_INT_MASK)));
+   }
+   switch (adapter->portnum) {
+   case 0:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_0));
+   break;
+   case 1:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_1));
+   break;
+   case 2:
+   writel(0x1, NETXEN_CRB_NORMALIZE(adapter, 
CRB_SW_INT_MASK_2));
+

[PATCH 1/2] NetXen: Fix MSI issues by using PCI function 0

2007-06-21 Thread Mithlesh Thukral
NetXen: Fix issue of MSI not working correctly
NetXen driver uses PCI function 0 to provide the functionality of MSI.
The patch makes driver check the bus master bit for function 0 and
enable it after the card initialization.

Signed-off-by: Milan Bag <[EMAIL PROTECTED]>
Signed-off-by: Wen Xiong <[EMAIL PROTECTED]>
Signed-off-by: Mithlesh Thukral <[EMAIL PROTECTED]>

---

 drivers/net/netxen/netxen_nic_main.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/netxen/netxen_nic_main.c 
b/drivers/net/netxen/netxen_nic_main.c
index 6167b58..e68356b 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -355,13 +355,6 @@ #endif
/* initialize the adapter */
netxen_initialize_adapter_hw(adapter);
 
-#ifdef CONFIG_PPC
-   if ((adapter->ahw.boardcfg.board_type ==
-   NETXEN_BRDTYPE_P2_SB31_10G_IMEZ) &&
-   (pci_func_id == 2))
-   goto err_out_free_adapter;
-#endif /* CONFIG_PPC */
-
/*
 *  Adapter in our case is quad port so initialize it before
 *  initializing the ports
@@ -509,6 +502,12 @@ #endif
NETXEN_CAM_RAM(0x1fc)));
if (val == 0x) {
/* This is the first boot after power up */
+   netxen_nic_read_w0(adapter, NETXEN_PCIE_REG(0x4), &val);
+   if (!(val & 0x4)) {
+   val |= 0x4;
+   netxen_nic_write_w0(adapter, NETXEN_PCIE_REG(0x4), val);
+   mdelay(100);
+   }
val = readl(NETXEN_CRB_NORMALIZE(adapter,
NETXEN_ROMUSB_GLB_SW_RESET));
printk(KERN_INFO"NetXen: read 0x%08x for reset reg.\n",val);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] NetXen: Updates and bug fixes for NetXen 1/10G driver

2007-06-21 Thread Mithlesh Thukral
Hi All,

I will be sending updates for NetXen NIC 1/10 G Ethernet driver
in the following emails. These are bug fixes and better interrupt
handling schemes. These have been test on x86 machines and 
PowerPC blades.

Thanks,
Mithlesh Thukral
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread Benjamin LaHaise
On Thu, Jun 21, 2007 at 12:08:19PM -0400, jamal wrote:
> The results in the table for opteron and xeon are swapped when
> cutnpasting from a larger test result. So Opteron is the one with better
> results.
> In any case - off for the day over here.

You should qualify that as 'Old P4 Xeon', as the Core 2 Xeons are leagues 
better.

-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread Evgeniy Polyakov
On Thu, Jun 21, 2007 at 11:54:17AM -0400, jamal ([EMAIL PROTECTED]) wrote:
> Evgeniy, did you sync on the batching case with the git tree?

My tree contains following commits:

Latest mainline commit: fa490cfd15d7ce0900097cc4e60cfd7a76381138
Latest batch commit: 9b8cc32088abfda8be7f394cfd5ee6ac694da39c

> Can you describe your hardware in /proc/cpuinfo and /proc/interupts?

Sure.
cpuinfo:
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 15
model name  : AMD Athlon(tm) 64 Processor 3500+
stepping: 0
cpu MHz : 2210.092
cache size  : 512 KB
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
3dnowext 3dnow up
bogomips: 4423.20
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

interrupts:
CPU0   
0:1668864   IO-APIC-edge  timer
1: 78   IO-APIC-edge  i8042
8:  0   IO-APIC-edge  rtc
9:  0   IO-APIC-fasteoi   acpi
12:102   IO-APIC-edge  i8042
14:465   IO-APIC-edge  ide0
18: 774515   IO-APIC-fasteoi   eth1
22:  0   IO-APIC-fasteoi   sata_nv
23:   5068   IO-APIC-fasteoi   sata_nv
NMI:  0 
LOC:1668914 
ERR:  0

I pulled the latest version recently and started netperf test - both
netperf on sending (batching) machine and netserver on receiver takes
about 16-25% of CPU time, which is likely a bug.
With 4096 block it is 819 mbit/sec, which is slightly more than mainline
result, but I can not say that it is noticebly higher than a noise rate.

I did not check CPU usage time of the previous reelases, but receiving
netserver was always around 15-16%.

Here is pktgen result:

Params: count 100  min_pkt_size: 60  max_pkt_size: 60 min_batch 0
 frags: 0  delay: 0  clone_skb: 1  ifname: eth1
 flows: 0 flowlen: 0
 dst_min: 192.168.4.81  dst_max: 
 src_min:   src_max: 
 src_mac: 00:0E:0C:B8:63:0A  dst_mac: 00:17:31:9A:E5:BE
 udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
 src_mac_count: 0  dst_mac_count: 0
 Flags: 
Current:
 pkts-sofar: 100  errors: 0
 started: 1182456838614560us  stopped: 1182456842533487us idle: 15us alloc 
3780137us txt 130388us
 seq_num: 101  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
 cur_saddr: 0x3000a8c0  cur_daddr: 0x5104a8c0
 cur_udp_dst: 9  cur_udp_src: 9
 flows: 0
Result: OK: T3918927(U3918912+I15+A3780137+T130388) usec, P100 
TE8511TS1(B60,-1frags)
  255171pps 122Mb/sec (122482080bps) errors: 0

There is no cloning.
When there is no cloning, mainline shows 112 Mb/sec, which is less, but
when there are 10k clones results are:
mainline:   469857pps 225Mb/sec
latest batch:   246089pps 118Mb/sec

So, that is definitely a sign, that batching has some issues with skb
reusage.

> cheers,
> jamal


-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] 2.6.22-rc5: known regressions with patches v2

2007-06-21 Thread Michal Piotrowski

Hi all,

Here is a list of some known regressions in 2.6.22-rc5
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

(BTW. There is a new category called "Will be fixed in 2.6.23")



Memory management

Subject: bug in i386 MTRR initialization
References : http://lkml.org/lkml/2007/5/19/93
Submitter  : Andrea Righi <[EMAIL PROTECTED]>
Status : patch available



MMC

Subject: Oops in a driver while using SLUB as a SLAB allocator
References : http://lkml.org/lkml/2007/6/21/69
Submitter  : Nicolas Ferre <[EMAIL PROTECTED]>
Handled-By : Marc Pignat <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/21/140
Status : patch available



Networking

Subject: ERROR: "__ucmpdi2" [drivers/net/s2io.ko] undefined!
References : http://lkml.org/lkml/2007/6/19/310
Submitter  : Olaf Hering <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/19/382
Status : patch was suggested

Subject: no irda0 interface (2.6.21 was OK), smsc does not find chip
References : http://lkml.org/lkml/2007/6/3/16
Submitter  : Andrey Borzenkov <[EMAIL PROTECTED]>
Handled-By : Samuel Ortiz <[EMAIL PROTECTED]>
Bjorn Helgaas <[EMAIL PROTECTED]>
Patch  : http://lkml.org/lkml/2007/6/7/237
Status : patch was suggested



Regards,
Michal

--
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] NET: [CORE] Re-enable irqs before pushing pending DMA requests

2007-06-21 Thread Shannon Nelson
This moves the local_irq_enable() call in net_rx_action() to before
calling the CONFIG_NET_DMA's dma_async_memcpy_issue_pending() rather
than after.  This shortens the irq disabled window and allows for DMA
drivers that need to do their own irq hold.

Signed-off-by: Shannon Nelson <[EMAIL PROTECTED]>
---

 net/core/dev.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2609062..ee051bb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2009,6 +2009,7 @@ static void net_rx_action(struct softirq_action *h)
}
}
 out:
+   local_irq_enable();
 #ifdef CONFIG_NET_DMA
/*
 * There may not be any more sk_buffs coming right now, so push
@@ -2022,7 +2023,6 @@ out:
rcu_read_unlock();
}
 #endif
-   local_irq_enable();
return;
 
 softnet_break:
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread jamal
On Thu, 2007-21-06 at 11:54 -0400, jamal wrote:

> The summary is: Batching always is better, jiffies is always the better
> clock source (and who would have thunk,eh? Opteron kicks a Xeons ass).

The results in the table for opteron and xeon are swapped when
cutnpasting from a larger test result. So Opteron is the one with better
results.
In any case - off for the day over here.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FSCKED clock sources WAS(Re: [WIP][PATCHES] Network xmit batching

2007-06-21 Thread jamal
On Tue, 2007-19-06 at 15:28 -0700, David Miller wrote:

> Converting pktgen over to ktime_t might be a nice cleanup.

Would that really solve it? i.e doesnt it still tie to what the clock
source is?

I had a friend of mine (Robert, you know Jeremy) and results are
slightly different from what Evginy found.

The summary is: Batching always is better, jiffies is always the better
clock source (and who would have thunk,eh? Opteron kicks a Xeons ass).
Attached results.

Evgeniy, did you sync on the batching case with the git tree?
Can you describe your hardware in /proc/cpuinfo and /proc/interupts?

cheers,
jamal
The test variables are:
--

1) A Intel Xeon[1] machine vs an AMD opteron[2].
2) A plain 2622-rc4 kernel vs a 2622-rc4 with batching
(from git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git)
3) Different clock sources acpi-pm, jiffies and tsc

Test setup
---

pktgen was used to send from the system under test (where
test variables #2-#3 were adjusted) to a second box. 
CPU affinity was tied to cpu2 in all case to reduce variables in all test 
cases...

Test validation
---

Throughput results were confirmed to match on receiver
and sender (as reported by pktgen)

Results
---
The AMD opteron always had better results.
The batching kernels always was better than non-batching.
The jiffies clock was always the most consistent and gave
best performance

Kernel-type | acpi-pm clock | jiffies clock | tsc clock |
+h/ware |   |   |   |
+---+---+---+
2622-rc4| 347Kpps   | 1.40 Mpps | 1.36Mpps  |
plain   |   |   |   |
Intel Xeon  |   |   |   |
+---+---+---+
2622-rc4| 342Kpps   | 853 kpps  | 821kpps   |
plain   |   |   |   |
AMD opteron |   |   |   |
+---+---+---+
2622-rc4| 615Kpps   | 1.46 Mpps | 1.46Mpps  |
batch   |   |   |   |
Intel Xeon  |   |   |   |
+---+---+---+
2622-rc4| 633Kpps   | 1.18 Mpps | 1.17Mpps  |
batch   |   |   |   |
AMD opteron |   |   |   |
+---+---+---+

The two systems under test 
---

[1]-
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 1
cpu MHz : 2793.329
cache size  : 1024 KB
physical id : 3
siblings: 2
core id : 0
cpu cores   : 1
-

[2]-
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 33
model name  : Dual Core AMD Opteron(tm) Processor 275
stepping: 2
cpu MHz : 2194.778
cache size  : 1024 KB
physical id : 1
siblings: 2
core id : 1
cpu cores   : 2
-



Re: Fwd: [PATCH] [-mm] ACPI: export ACPI events via netlink

2007-06-21 Thread jamal
On Wed, 2007-20-06 at 13:25 +0200, Johannes Berg wrote:

> Ok. That's definitely a bug in nl80211 as we have it in development
> right now. 

Sorry, have never looked at that code.

> Btw, what happens if the group id gets larger than 31?

You can use setsockopt to set the multicast groups. What you cant do
with that is subscribe to many groups in one shot.
The call in iproute2 hasnt reflected this reality yet.

> I'd really like to be able to reserve multicast groups with special
> semantics too, especially I might want to permit/deny non-CAP_NET_ADMIN
> users from binding specific multicast groups. That isn't actually
> possible with netlink nor genetlink right now afaict.

This would be hard - but doable via SELinux interface. I think you
should be able to extend your tool to make calls to that interface.

> If we register multiple IDs then we'll end up filling up the generic
> netlink family space really soon. 

Theres a huge number of these groups; and not just that, but considering
that some genetlink users may not be interested in such multicast
groups, it is quiet usable to have many groups as long as we avoid
conflict.

> I was under the impression that
> generic netlink was basically open-ended because the family is a large
> enough number, but with this arbitrary limit on multicast groups that's
> really not true and we might run out of multicast groups fairly soon
> since most users of generic netlink will want at least one...
> 

The multicast issue wasnt well-attacked. We have a group magically
assigned to a user based on their allocated id. It should be feasible
to add an API to the kernel for registering for many groups and allow
user space to discover these groups before registering. Maybe thats
the path to proceed to.

cheers,
jamal





-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-21 Thread jamal

I gave you two opportunities to bail out of this discussion, i am gonna
take that your rejection to that offer implies you my friend wants to
get to the bottom of this i.e you are on a mission to find the truth.
So lets continue this.

On Wed, 2007-20-06 at 13:51 +0800, Zhu Yi wrote:

> No, because this is over-engineered. 
> Furthermore, don't you think the
> algorithm is complicated and unnecessary (i.e. one timer per h/w queue)?

The (one-shot) timer is only necessary when a ring shuts down the
driver. This is only for the case of wireless media. Standard Wired
Ethernet doesnt need it.

Note: You are not going to convince me by throwing cliches like "this is
over-engineering" around. Because it leads to a response like "Not at
all. I think Sending flow control messages back to the stack is
over-engineering. " And where do we go then?

> Do you think the driver maintainer will accept such kind of workaround
> patch? 

Give me access to your manual for the chip on my laptop wireless which
is 3945ABG and i can produce a very simple patch for you. Actually if
you answer some questions for me, it may be good enough to produce such
a patch.

> You did too much to keep the Qdisc interface untouched!

What metric do you want to define for "too much" - lines of code?
Complexity? I consider architecture cleanliness to be more important.

> Besides, the lower THL you choose, the more CPU time is wasted in busy
> loop for the only PL case; 

Your choice of THL and THH has nothing to do with what i am proposing.
I am not proposing you even touch that. What numbers do you have today?

What i am saying is you use _some_ value for opening up the driver; some
enlightened drivers such as the tg3 (and the e1000 - for which i
unashamedly take credit) do have such parametrization. This has already
been proven to be valuable.

The timer fires only if a ring shuts down the interface. Where is the
busy loop? If packets go out, there is no timer.
 
> the higher THL you choose, the slower the PH
> packets will be sent out than expected (the driver doesn't fully utilize
> the device function -- multiple rings, 

I dont think you understood: Whatever value you choose for THL and THH
today, keep those. OTOH, the wake threshold is what i was refering to.

> which conlicts with a device driver's intention). 

I dont see how given i am talking about wake thresholds.

> You can never make a good trade off in this model.

Refer to above.

> I think I have fully understood you, 

Thanks for coming such a long way - you stated it couldnt be done before
unless you sent feedback to the stack.

> but your point is invalid. The
> Qdisc must be changed to have the hardware queue information to support
> multiple hardware queues devices.
> 

Handwaving as above doesnt add value to a discussion. If you want
meaningful discussions, stop these cliches.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow group ownership of TUN/TAP devices (fwd)

2007-06-21 Thread Jeff Dike
On Mon, Jun 18, 2007 at 10:36:54PM -0700, David Miller wrote:
> This patch looks fine.
> 
> I'd like it resubmitted with a proper changelog and signoff, and once
> I have that I will thus queue it up for the 2.6.23 merge window.

Introduce a new syscall TUNSETGROUP for group ownership setting of tap
devices. The user now is allowed to send packages if either his euid or
his egid matches the one specified via tunctl (via -u or -g
respecitvely). If both, gid and uid, are set via tunctl, both have to
match.

Signed-Off-By: Guido Guenther <[EMAIL PROTECTED]>
Signed-Off-By: Jeff Dike <[EMAIL PROTECTED]>

---
 drivers/net/tun.c  |   15 +--
 include/linux/if_tun.h |2 ++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a2c6caa..62b2b30 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -432,6 +432,7 @@ static void tun_setup(struct net_device *dev)
init_waitqueue_head(&tun->read_wait);
 
tun->owner = -1;
+   tun->group = -1;
 
SET_MODULE_OWNER(dev);
dev->open = tun_net_open;
@@ -467,8 +468,11 @@ static int tun_set_iff(struct file *file, struct ifreq 
*ifr)
return -EBUSY;
 
/* Check permissions */
-   if (tun->owner != -1 &&
-   current->euid != tun->owner && !capable(CAP_NET_ADMIN))
+   if (((tun->owner != -1 &&
+ current->euid != tun->owner) ||
+(tun->group != -1 &&
+ current->egid != tun->group)) &&
+!capable(CAP_NET_ADMIN))
return -EPERM;
}
else if (__dev_get_by_name(ifr->ifr_name))
@@ -610,6 +614,13 @@ static int tun_chr_ioctl(struct inode *inode, struct file 
*file,
DBG(KERN_INFO "%s: owner set to %d\n", tun->dev->name, 
tun->owner);
break;
 
+   case TUNSETGROUP:
+   /* Set group of the device */
+   tun->group= (gid_t) arg;
+
+   DBG(KERN_INFO "%s: group set to %d\n", tun->dev->name, 
tun->group);
+   break;
+
case TUNSETLINK:
/* Only allow setting the type when the interface is down */
if (tun->dev->flags & IFF_UP) {
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 88aef7b..42eb694 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -36,6 +36,7 @@ struct tun_struct {
unsigned long   flags;
int attached;
uid_t   owner;
+   gid_t   group;
 
wait_queue_head_t   read_wait;
struct sk_buff_head readq;
@@ -78,6 +79,7 @@ struct tun_struct {
 #define TUNSETPERSIST _IOW('T', 203, int) 
 #define TUNSETOWNER   _IOW('T', 204, int)
 #define TUNSETLINK_IOW('T', 205, int)
+#define TUNSETGROUP   _IOW('T', 206, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN0x0001
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix race in AF_UNIX

2007-06-21 Thread Eric W. Biederman
Miklos Szeredi <[EMAIL PROTECTED]> writes:

> [CC'd Al Viro and Alan Cox, restored patch]
>
>> > There are races involving the garbage collector, that can throw away
>> > perfectly good packets with AF_UNIX sockets in them.
>> > 
>> > The problems arise when a socket goes from installed to in-flight or
>> > vice versa during garbage collection.  Since gc is done with a
>> > spinlock held, this only shows up on SMP.
>> > 
>> > Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]>
>> 
>> I'm going to hold off on this one for now.
>> 
>> Holding all of the read locks kind of defeats the purpose of using
>> the per-socket lock.
>> 
>> Can't you just lock purely around the receive queue operation?
>
> That's already protected by the receive queue spinlock.  The race
> however happens _between_ pushing the root set and marking of the
> in-flight but reachable sockets.
>
> If in that space any of the AF_UNIX sockets goes from in-flight to
> installed into a file descriptor, the garbage collector can miss it.
> If we want to protect against this using unix_sk(s)->readlock, then we
> have to hold all of them for the duration of the marking.
>
> Al, Alan, you have more experience with this piece of code.  Do you
> have better ideas about how to fix this?

I haven't looked at the code closely enough to be confident of
changing something in this area.  However the classic solution to this
kind of gc problem is to mark things that are manipulated during
garbage collection as dirty (not orphaned).

It should be possible to fix this problem by simply changing gc_tree
when we perform a problematic manipulation of a passed socket, such
as installing a passed socket into the file descriptors of a process.

Essentially the idea is moving the current code in the direction of
an incremental gc algorithm.


If I understand the race properly.  What happens is that we dequeue
a socket (which has packets in it passing sockets) before the
garbage collector gets to it.  Therefore the garbage collector
never processes that socket.  So it sounds like we just
need to call maybe_unmark_and_push or possibly just wait for
the garbage collector to complete when we do that and the packet
we have pulled out 



So just looking at this quickly.  It looks like we need to hold
the u->readlock mutex in garbage.c while looking at the receive
queue.  Otherwise we may be processing a packet and have
the file descriptors in limbo.  Either that or we need
to slightly extend the scope of the receive queue lock on the
receive side.

Additionally we need to modify unix_notinflight to mark the sockets
as inuse, or to at least wait for the garbage collection to complete.
While being very careful to not add a deadlock scenario.

The require changes to fix this without adding heavy handed locking
look a bit nasty to make.

I half suspect switching to one of the simpler incremental garbage
collection algorithms might not be equally easy to implement and
give us more performance at the same time.  Having a garbage collector
that can block has advantages.

The practical problem is you can't modify the list of pushed object
aka gc_tree without holding a lock.  And we never drop the
unix_table_lock.


Anyway hopefully that is enough fodder for someone to come up with
a light weight fix for this problem.

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH][IPROUTE2] see SAD info

2007-06-21 Thread jamal
On Tue, 2007-19-06 at 16:25 -0700, Stephen Hemminger wrote:

> Using current xfrm.h from kernel headers, causes conflicts.
> Instead of XFRMA_SADCNT, it should be using XFRMA_SAD_CNT.
> 

Yeah, that changed in the kernel header. 
If it compiles, it should be fine; thanks Stephen.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NET 00/02]: MACVLAN driver

2007-06-21 Thread Patrick McHardy

Prafulla Deuskar wrote:

On 6/19/07, Jeff Garzik <[EMAIL PROTECTED]> wrote:

David Miller wrote:
> This is actually a real issue for virtualization, and many
> if not all current generation ethernet chips support
> programming several unicast ethernet addresses in the MAC.
>
> Networking switches in domain0 on virtualization hosts use
> this feature to support seperate MACs per guest node,
> and if the chip doesn't support this the chip is put into
> promiscuous mode.
>
> We don't have any clean interfaces by which to do this MAC
> programming, and we do need something for it soon.


Yep, that's been on my long term wish list for a while, as well.

Overall I would like to see a more flexible way of allowing the net
stack to learn each NIC's RX filter capabilities, and exploiting them.
Plenty of NICs, even 100Mbps ones, support RX filter management that
allows scanning for $hw_limit unicast addresses, before having to put
the hardware into promisc mode.

   Jeff


So how do we manage mac address to RX queue association? 



This is not about multiple RX queues but filtering multiple unicast
addresses without going to promiscous mode. The addresses are used
by something outside the driver, like macvlan.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing, take 2

2007-06-21 Thread pradeep singh

Hi,
My mistake.
Resending after reformatting the patch by hand.
Looks like gmail messes the plain text patches.

Thanks

Signed-off-by: Pradeep Singh <[EMAIL PROTECTED]>
---
drivers/net/chelsio/cxgb2.c |5 +
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/chelsio/cxgb2.c b/drivers/net/chelsio/cxgb2.c
index 231ce43..006c634 100644
--- a/drivers/net/chelsio/cxgb2.c
+++ b/drivers/net/chelsio/cxgb2.c
@@ -1022,6 +1022,11 @@ static int __devinit init_one(struct pci_dev *pdev,
  mmio_start = pci_resource_start(pdev, 0);
  mmio_len = pci_resource_len(pdev, 0);
  bi = t1_get_board_info(ent->driver_data);
+
+   if (!bi) {
+CH_ERR("%s: Board info array index out of
range\n",pci_name(pdev));
+goto out_disable_pdev;
+}

  for (i = 0; i < bi->port_number; ++i) {
  struct net_device *netdev;

Thanks

--
On 6/21/07, pradeep singh <[EMAIL PROTECTED]> wrote:

Hi,
This is second submission for a possible NULL dereference handling in
the Chelsio's 10G driver.

Thanks to Jens Axboe for pointing out my mistake of ignoring
subsequent dereferences in init_one routine.

Thanks

Signed-off-by: Pradeep Singh <[EMAIL PROTECTED]>
---
 drivers/net/chelsio/cxgb2.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/chelsio/cxgb2.c b/drivers/net/chelsio/cxgb2.c
index 231ce43..006c634 100644
--- a/drivers/net/chelsio/cxgb2.c
+++ b/drivers/net/chelsio/cxgb2.c
@@ -1022,6 +1022,11 @@ static int __devinit init_one(struct pci_dev *pdev,
mmio_start = pci_resource_start(pdev, 0);
mmio_len = pci_resource_len(pdev, 0);
bi = t1_get_board_info(ent->driver_data);
+
+   if (!bi) {
+CH_ERR("%s: Board info array index out of
range\n",pci_name(pdev));
+goto out_disable_pdev;
+}

for (i = 0; i < bi->port_number; ++i) {
struct net_device *netdev;
--
1.4.4.2

--
Pradeep




--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] [Net] Support accelerated network plugin modules

2007-06-21 Thread Kieran Mansley
Add xenbus_for_each_[back,front]end functions to iterate each bus

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> 

diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c
--- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe.c Fri Jun 15 15:34:45 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices(
 
 late_initcall(boot_wait_for_devices);
 #endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *))
+{
+   return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c
--- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c Fri Jun 15 15:34:55 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -285,3 +286,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
 }
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *))
+{
+   return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r d5e0eb7dd069 include/xen/xenbus.h
--- a/include/xen/xenbus.h  Sun Jun 10 19:50:32 2007 +0100
+++ b/include/xen/xenbus.h  Thu Jun 14 15:04:31 2007 +0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
 int xenbus_dev_is_online(struct xenbus_device *dev);
 int xenbus_frontend_closed(struct xenbus_device *dev);
 
+extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void 
*));
+extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void 
*));
+
 #endif /* _XEN_XENBUS_H */

Add xenbus_for_each_[back,front]end functions to iterate each bus

diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe.c
--- a/drivers/xen/xenbus/xenbus_probe.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe.c Fri Jun 15 15:34:45 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -1085,3 +1086,10 @@ static int __init boot_wait_for_devices(
 
 late_initcall(boot_wait_for_devices);
 #endif
+
+
+int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void *))
+{
+   return bus_for_each_dev(&xenbus_frontend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_frontend);
diff -r d5e0eb7dd069 drivers/xen/xenbus/xenbus_probe_backend.c
--- a/drivers/xen/xenbus/xenbus_probe_backend.c Sun Jun 10 19:50:32 2007 +0100
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c Fri Jun 15 15:34:55 2007 +0100
@@ -4,6 +4,7 @@
  * Copyright (C) 2005 Rusty Russell, IBM Corporation
  * Copyright (C) 2005 Mike Wray, Hewlett-Packard
  * Copyright (C) 2005, 2006 XenSource Ltd
+ * Copyright (C) 2007 Solarflare Communications, Inc.
  * 
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License version 2
@@ -285,3 +286,10 @@ void xenbus_backend_device_register(void
   xenbus_backend.error);
}
 }
+
+int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void *))
+{
+   return bus_for_each_dev(&xenbus_backend.bus, NULL, arg, fn);
+}
+EXPORT_SYMBOL_GPL(xenbus_for_each_backend);
+
diff -r d5e0eb7dd069 include/xen/xenbus.h
--- a/include/xen/xenbus.h  Sun Jun 10 19:50:32 2007 +0100
+++ b/include/xen/xenbus.h  Thu Jun 14 15:04:31 2007 +0100
@@ -299,4 +299,7 @@ int xenbus_dev_is_online(struct xenbus_d
 int xenbus_dev_is_online(struct xenbus_device *dev);
 int xenbus_frontend_closed(struct xenbus_device *dev);
 
+extern int xenbus_for_each_backend(void *arg, int (*fn)(struct device *, void 
*));
+extern int xenbus_for_each_frontend(void *arg, int (*fn)(struct device *, void 
*));
+
 #endif /* _XEN_XENBUS_H */


[PATCH 2/4] [Net] Support accelerated network plugin modules

2007-06-21 Thread Kieran Mansley
Add accel option to vif xend config

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> 

diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')
 
 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel
 
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model', 'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = 
devinfo
 
 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 405eb3e22887 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):
 
 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)
 
 config_vif.append([k, d[k]])

Add accel option to vif xend config

diff -r 405eb3e22887 tools/python/xen/xend/server/netif.py
--- a/tools/python/xen/xend/server/netif.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xend/server/netif.py Thu Jun 14 14:52:55 2007 +0100
@@ -107,6 +107,7 @@ class NetifController(DevController):
 uuid= config.get('uuid')
 ipaddr  = config.get('ip')
 model   = config.get('model')
+accel   = config.get('accel')
 
 if not typ:
 typ = xoptions.netback_type
@@ -131,6 +132,8 @@ class NetifController(DevController):
 back['uuid'] = uuid
 if model:
 back['model'] = model
+if accel:
+back['accel'] = accel
 
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 for x in back:
@@ -157,10 +160,10 @@ class NetifController(DevController):
 config_path = "device/%s/%d/" % (self.deviceClass, devid)
 devinfo = ()
 for x in ( 'script', 'ip', 'bridge', 'mac',
-   'type', 'vifname', 'rate', 'uuid', 'model' ):
+   'type', 'vifname', 'rate', 'uuid', 'model', 'accel'):
 y = self.vm._readVm(config_path + x)
 devinfo += (y,)
-(script, ip, bridge, mac, typ, vifname, rate, uuid, model) = devinfo
+(script, ip, bridge, mac, typ, vifname, rate, uuid, model, accel) = 
devinfo
 
 if script:
 result['script'] = script
@@ -180,5 +183,7 @@ class NetifController(DevController):
 result['uuid'] = uuid
 if model:
 result['model'] = model
+if accel:
+result['accel'] = accel
 
 return result
diff -r 405eb3e22887 tools/python/xen/xm/create.py
--- a/tools/python/xen/xm/create.py Thu Jun 14 14:50:04 2007 +0100
+++ b/tools/python/xen/xm/create.py Thu Jun 14 14:52:55 2007 +0100
@@ -710,7 +710,7 @@ def configure_vifs(config_devs, vals):
 
 def f(k):
 if k not in ['backend', 'bridge', 'ip', 'mac', 'script', 'type',
- 'vifname', 'rate', 'model']:
+ 'vifname', 'rate', 'model', 'accel']:
 err('Invalid vif option: ' + k)
 
 config_vif.append([k, d[k]])


[PATCH 3/4] [Net] Support accelerated network plugin modules

2007-06-21 Thread Kieran Mansley
Backend net driver acceleration

Signed-off-by: Kieran Mansley <[EMAIL PROTECTED]> 

diff -r 30c836e0575e drivers/xen/netback/Makefile
--- a/drivers/xen/netback/Makefile  Fri Jun 15 15:35:17 2007 +0100
+++ b/drivers/xen/netback/Makefile  Fri Jun 15 15:37:41 2007 +0100
@@ -1,5 +1,5 @@ obj-$(CONFIG_XEN_NETDEV_BACKEND) := netb
 obj-$(CONFIG_XEN_NETDEV_BACKEND) := netbk.o
 obj-$(CONFIG_XEN_NETDEV_LOOPBACK) += netloop.o
 
-netbk-y   := netback.o xenbus.o interface.o
+netbk-y   := netback.o xenbus.o interface.o accel.o
 netloop-y := loopback.o
diff -r 30c836e0575e drivers/xen/netback/accel.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/drivers/xen/netback/accel.c   Wed Jun 20 14:58:09 2007 +0100
@@ -0,0 +1,207 @@
+/**
+ * drivers/xen/netback/accel.c
+ *
+ * Interface between backend virtual network device and accelerated plugin. 
+ * 
+ * Copyright (C) 2007 Solarflare Communications, Inc
+ * 
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ * 
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ * 
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ * 
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "common.h"
+
+#if 0
+#undef DPRINTK
+#define DPRINTK(fmt, args...)  \
+   printk("netback/accel (%s:%d) " fmt ".\n", __FUNCTION__, __LINE__, 
##args)
+#endif
+
+/* 
+ * A list of available netback accelerator plugin modules (each list
+ * entry is of type struct netback_accelerator) 
+ */ 
+static struct list_head accelerators_list;
+/* Lock used to protect access to accelerators_list */
+static spinlock_t accelerators_lock;
+
+/* 
+ * Compare a backend to an accelerator, and decide if they are
+ * compatible (i.e. if the accelerator should be used by the
+ * backend) 
+ */
+static int match_accelerator(struct backend_info *be, 
+struct netback_accelerator *accelerator)
+{
+   /*
+* This could do with being more sophisticated.  For example,
+* determine which hardware is being used by each backend from
+* the bridge and network topology of the domain
+*/
+   return be->accelerator == NULL;
+}
+
+/*
+ * Notify all suitable backends that a new accelerator is available
+ * and connected.  This will also notify the accelerator plugin module
+ * that it is being used for a device through the probe hook.
+ */
+static int netback_accelerator_tell_backend(struct device *dev, void *arg)
+{
+   struct netback_accelerator *accelerator = 
+   (struct netback_accelerator *)arg;
+   struct xenbus_device *xendev = to_xenbus_device(dev);
+
+   if (!strcmp("vif", xendev->devicetype)) {
+   struct backend_info *be = xendev->dev.driver_data;
+
+   if (match_accelerator(be, accelerator)) {
+   be->accelerator = accelerator;
+   atomic_inc(&be->accelerator->use_count);
+   be->accelerator->hooks->probe(xendev);
+   }
+   }
+   return 0;
+}
+
+
+/*
+ * Entry point for an netback accelerator plugin module.  Called to
+ * advertise its presence, and connect to any suitable backends.
+ */
+void netback_connect_accelerator(int id, const char *frontend, 
+struct netback_accel_hooks *hooks)
+{
+   struct netback_accelerator *new_accelerator = 
+   kmalloc(sizeof(struct netback_accelerator), GFP_KERNEL);
+   unsigned frontend_len, flags;
+
+   if (!new_accelerator) {
+   DPRINTK("%s: failed to allocate memory for accelerator\n",
+   __FUNCTION__);
+   return;
+   }
+
+   new_accelerator->id = id;

[PATCH 0/4] [Net] Support accelerated network plugin modules

2007-06-21 Thread Kieran Mansley
This is a another iteration of some earlier patches sent to the xen-
devel mailing list, with a number of changes thanks to some useful
suggestions from others.

I've also CC'd netdev@vger.kernel.org at Herbert Xu's request as some of
the files being patched may be merged into upstream linux soon, and so
folks there may have opinions too.

Major changes from last time:
 - Modify protection in frontend to use an atomic ref count to reduce
the number of spinlocks that are required, as suggested by Keir Fraser
and Zhu Han.  This change required an improvement to the protection of
the hooks when they are being installed for a second (or subsequent
time) to prevent the new copy being inserted before the old ones have
been completely finished with.

 - Move the majority of the acceleration code out of existing
netfront/netback source files and into separate accel.c source file in
each of those directories, as requested by Keir.  Unfortunately separate
header files don't make a lot of sense due to mutual dependencies.

 - A number of coding style changes, again requested by Keir.  Apologies
for not getting this right first time.

What follows is the full description from the earlier posting, included
here for ease of access should anyone need them:

This set of patches provides the hooks and support necessary for
accelerated network plugin modules to attach to Xen's netback and
netfront.  These modules provide a fast path for network traffic where
there is hardware support available for the netfront driver to send and
receive packets directly to a NIC (such as those available from
Solarflare).

As there are currently no available plugins, I've attached a couple of
dummy ones to illustrate how the hooks could be used.  These are
incomplete (and clearly wouldn't even compile) in that they only include
code to show the interface between the accelerated module and
netfront/netback.  A lot of the comments hint at what code should go
where.  They don't show any interface between the accelerated frontend
and accelerated backend, or hardware access, for example, as those would
both be specific to the implementation.  I hope they help illustrate
this, but if you have any questions I'm happy to provide more
information.

A brief overview of the operation of the plugins:  When the accelerated
modules are loaded, a VI is created by the accelerated backend to allow
the accelerated frontend to safely access portions of the NIC.  For RX,
when packets are received by the accelerated backend, it will examine
them and if appropriate insert filters into the NIC to deliver future
packets on that address directly to the accelerated frontend's VI.  For
TX, netfront gives each accelerated frontend the option of sending each
packet, which it can accept (if it wants to send it directly to the
hardware) or decline (if it thinks this is more appropriate to send via
the normal network path).

We have found that using this approach to accelerating network traffic,
domU to domU connections (across the network) can achieve close to the
performance of dom0 to dom0 connections on a 10Gbps ethernet.  This is
roughly double the bandwidth seen with unmodified Xen.
/***/
/*! \file dumm_accel_backend.c Dummy accelerated plugin module

Copyright 2006 Solarflare Communications Inc,
   9501 Jeronimo Road, Suite 250,
   Irvine, CA 92618, USA

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License version 2 as published by the Free
Software Foundation, incorporated herein by reference.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

*/
/***/

static struct netback_accel_hooks accel_hooks = {
&netback_accel_probe,
&netback_accel_remove
};

static const char *frontend_name = "dummynetaccel";

static int netback_accel_init(void)
{
/* Initialise the rest of the module... */

/* Tell the netback that we're here */
netback_connect_accelerator(0, frontend_name, &accel_hooks);
}
module_init(netback_accel_init);


static void __exit netback_accel_exit(void)
{
netback_disconnect_accelerator(0, frontend_name);

/* ...and take down the rest of the module */
}
module_exit(netback_accel_exit);




int netback_accel_probe(struct xenbus_device *dev)
{
struct backend_info *binfo;

/* Setup per-device internal state */

/* Store internal state for future access */
binfo = (

Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing, take 2

2007-06-21 Thread pradeep singh

On 6/21/07, pradeep singh <[EMAIL PROTECTED]> wrote:

Hi,
This is second submission for a possible NULL dereference handling in
the Chelsio's 10G driver.

Thanks to Jens Axboe for pointing out my mistake of ignoring
subsequent dereferences in init_one routine.

Thanks

Apologies, looks like patch formatting got messed up during sending.
Resending just in case.

Signed-off-by: Pradeep Singh <[EMAIL PROTECTED]>
---
drivers/net/chelsio/cxgb2.c |5 +
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/chelsio/cxgb2.c b/drivers/net/chelsio/cxgb2.c
index 231ce43..006c634 100644
--- a/drivers/net/chelsio/cxgb2.c
+++ b/drivers/net/chelsio/cxgb2.c
@@ -1022,6 +1022,11 @@ static int __devinit init_one(struct pci_dev *pdev,
mmio_start = pci_resource_start(pdev, 0);
mmio_len = pci_resource_len(pdev, 0);
bi = t1_get_board_info(ent->driver_data);
+   
+   if (!bi) {
+CH_ERR("%s: Board info array index out of
range\n",pci_name(pdev));
+goto out_disable_pdev;
+}

for (i = 0; i < bi->port_number; ++i) {
struct net_device *netdev;
--
1.4.4.2

[snip]
--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing, take 2

2007-06-21 Thread pradeep singh

Hi,
This is second submission for a possible NULL dereference handling in
the Chelsio's 10G driver.

Thanks to Jens Axboe for pointing out my mistake of ignoring
subsequent dereferences in init_one routine.

Thanks

Signed-off-by: Pradeep Singh <[EMAIL PROTECTED]>
---
drivers/net/chelsio/cxgb2.c |5 +
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/chelsio/cxgb2.c b/drivers/net/chelsio/cxgb2.c
index 231ce43..006c634 100644
--- a/drivers/net/chelsio/cxgb2.c
+++ b/drivers/net/chelsio/cxgb2.c
@@ -1022,6 +1022,11 @@ static int __devinit init_one(struct pci_dev *pdev,
mmio_start = pci_resource_start(pdev, 0);
mmio_len = pci_resource_len(pdev, 0);
bi = t1_get_board_info(ent->driver_data);
+   
+   if (!bi) {
+CH_ERR("%s: Board info array index out of
range\n",pci_name(pdev));
+goto out_disable_pdev;
+}

for (i = 0; i < bi->port_number; ++i) {
struct net_device *netdev;
--
1.4.4.2

--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing

2007-06-21 Thread pradeep singh

On 6/21/07, Jens Axboe <[EMAIL PROTECTED]> wrote:

On Thu, Jun 21 2007, pradeep singh wrote:
> Hi
> On 6/21/07, Jens Axboe <[EMAIL PROTECTED]> wrote:
> >On Thu, Jun 21 2007, pradeep singh wrote:
> >> Hi,
> >>
> >> Chelsio's in kernel 10G driver does not checks the return value from
> >> t1_get_board_info() in cxgb2.c.
> >> t1_get_board_info may return a NULL and we still go on to dereference
> >> it in the for loop without checking for the NULL.
> >>
> >> This patch fixes this.
> >
> >Patch looks odd - bi is dereferenced a number of times after that loop
> >anyway, so I don't see your patch fixing much.
> Thanks for pointing that out Jens.
> Sorry, i pushed it in a haste :(.
> Will check again and resubmit it.

You're welcome. The first thing to do is analyze whether a NULL return
from t1_get_board_info() makes any sense. From a quick look, driver_data
should be the index into the t1 pci table. So if t1_get_board_info()
returns NULL, it must be some core bug. So I'd say either don't handle
it, or mark it with BUG_ON(), or do the !bi check and CH_ERR() a warning
and goto out_disable_pdev.

Thanks for the advice Jens :).

Yeah this is what i am doing right now.
Will resubmit the patch.

Thanks a lot.
--pradeep


--
Jens Axboe





--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing

2007-06-21 Thread Jens Axboe
On Thu, Jun 21 2007, pradeep singh wrote:
> Hi
> On 6/21/07, Jens Axboe <[EMAIL PROTECTED]> wrote:
> >On Thu, Jun 21 2007, pradeep singh wrote:
> >> Hi,
> >>
> >> Chelsio's in kernel 10G driver does not checks the return value from
> >> t1_get_board_info() in cxgb2.c.
> >> t1_get_board_info may return a NULL and we still go on to dereference
> >> it in the for loop without checking for the NULL.
> >>
> >> This patch fixes this.
> >
> >Patch looks odd - bi is dereferenced a number of times after that loop
> >anyway, so I don't see your patch fixing much.
> Thanks for pointing that out Jens.
> Sorry, i pushed it in a haste :(.
> Will check again and resubmit it.

You're welcome. The first thing to do is analyze whether a NULL return
from t1_get_board_info() makes any sense. From a quick look, driver_data
should be the index into the t1 pci table. So if t1_get_board_info()
returns NULL, it must be some core bug. So I'd say either don't handle
it, or mark it with BUG_ON(), or do the !bi check and CH_ERR() a warning
and goto out_disable_pdev.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing

2007-06-21 Thread pradeep singh

Hi
On 6/21/07, Jens Axboe <[EMAIL PROTECTED]> wrote:

On Thu, Jun 21 2007, pradeep singh wrote:
> Hi,
>
> Chelsio's in kernel 10G driver does not checks the return value from
> t1_get_board_info() in cxgb2.c.
> t1_get_board_info may return a NULL and we still go on to dereference
> it in the for loop without checking for the NULL.
>
> This patch fixes this.

Patch looks odd - bi is dereferenced a number of times after that loop
anyway, so I don't see your patch fixing much.

Thanks for pointing that out Jens.
Sorry, i pushed it in a haste :(.
Will check again and resubmit it.

Thanks
--pradeep
[snip]
--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing

2007-06-21 Thread Jens Axboe
On Thu, Jun 21 2007, pradeep singh wrote:
> Hi,
> 
> Chelsio's in kernel 10G driver does not checks the return value from
> t1_get_board_info() in cxgb2.c.
> t1_get_board_info may return a NULL and we still go on to dereference
> it in the for loop without checking for the NULL.
> 
> This patch fixes this.

Patch looks odd - bi is dereferenced a number of times after that loop
anyway, so I don't see your patch fixing much.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2.6.22-rc5] cxgb2: handle possible NULL pointer dereferencing

2007-06-21 Thread pradeep singh

Hi,

Chelsio's in kernel 10G driver does not checks the return value from
t1_get_board_info() in cxgb2.c.
t1_get_board_info may return a NULL and we still go on to dereference
it in the for loop without checking for the NULL.

This patch fixes this.

Signed-off-by: Pradeep Singh <[EMAIL PROTECTED]>
---
drivers/net/chelsio/cxgb2.c |2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/chelsio/cxgb2.c b/drivers/net/chelsio/cxgb2.c
index 231ce43..edcfeba 100644
--- a/drivers/net/chelsio/cxgb2.c
+++ b/drivers/net/chelsio/cxgb2.c
@@ -1023,7 +1023,7 @@ static int __devinit init_one(struct pci_dev *pdev,
mmio_len = pci_resource_len(pdev, 0);
bi = t1_get_board_info(ent->driver_data);

-   for (i = 0; i < bi->port_number; ++i) {
+   for (i = 0; bi && i < bi->port_number; ++i) {
struct net_device *netdev;

netdev = alloc_etherdev(adapter ? 0 : sizeof(*adapter));
--
1.4.4.2
Thanks

--
Pradeep
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Via Rhine II Network Card Failure

2007-06-21 Thread Mark Hannessen

Hi list,

I have some trouble getting my network card to run.
when I run dmesg I can clearly see it being detected

eth0: VIA Rhine II at 0xee006000, 00:e0:c5:54:88:a8, IRQ 11.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 45e1.

but when I try ifconfig eth0 up it fails with a no device error.

SIOCSIFADDR: No such device eth0:
ERROR while getting interface flags: No such device

I tried adding pci=routeirq too, but that didn't work either.

I know there is nothing wrong with the card itself because it works on
the linux distro that came with it.

does anyone have any hints as to what might get it up and running?

Thanks!
Mark

I attached a complete dmesg output just in case..
00:00.0 Host bridge: VIA Technologies, Inc. VT8623 [Apollo CLE266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP]
00:09.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller
00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 
AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8623 [Apollo 
CLE266] integrated CastleRock graphics (rev 03)
Linux version 2.6.20.14 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo 
4.1.1-r3)) #2 PREEMPT Thu Jun 21 09:11:50 CEST 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009fc00 end: 
0009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009fc00 size: 0400 end: 
000a type: 2
copy_e820_map() start: 000f size: 0001 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1f6f end: 
1f7f type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1f7f size: 3000 end: 
1f7f3000 type: 4
copy_e820_map() start: 1f7f3000 size: d000 end: 
1f80 type: 3
copy_e820_map() start:  size: 0001 end: 
0001 type: 2
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 1f7f (usable)
 BIOS-e820: 1f7f - 1f7f3000 (ACPI NVS)
 BIOS-e820: 1f7f3000 - 1f80 (ACPI data)
 BIOS-e820:  - 0001 (reserved)
503MB LOWMEM available.
Entering add_active_range(0, 0, 129008) 0 entries of 256 used
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   129008
early_node_map[1] active PFN ranges
0:0 ->   129008
On node 0 totalpages: 129008
  DMA zone: 32 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 4064 pages, LIFO batch:0
  Normal zone: 975 pages used for memmap
  Normal zone: 123937 pages, LIFO batch:31
DMI 2.2 present.
ACPI: RSDP (v000 CLE266) @ 0x000f6690
ACPI: RSDT (v001 CLE266 AWRDACPI 0x42302e31 AWRD 0x) @ 0x1f7f3040
ACPI: FADT (v001 CLE266 AWRDACPI 0x42302e31 AWRD 0x) @ 0x1f7f30c0
ACPI: DSDT (v001 CLE266 AWRDACPI 0x1000 MSFT 0x010e) @ 0x
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 2000 (gap: 1f80:e07f)
Detected 1000.450 MHz processor.
Built 1 zonelists.  Total pages: 128001
Kernel command line: BOOT_IMAGE=(hd0,1)/kernels/vmlinuz-2.6.20 root=/dev/sda2 
splash=silent,fadein,tty=tty8,theme:riecom console=tty1 vga=791 pci=routeirq
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 500536k/516032k available (2425k kernel code, 14860k reserved, 1369k 
data, 168k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0x7000 - 0xf000   (  32 kB)
vmalloc : 0xe000 - 0x5000   ( 511 MB)
lowmem  : 0xc000 - 0xdf7f   ( 503 MB)
  .init : 0xc04b8000 - 0xc04e2000   ( 168 kB)
  .data : 0xc035e41c - 0xc04b4a10   (1369 kB)
  .text : 0xc010 - 0xc035e41c   (2425 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
C

Re: Via Rhine II Network Card Failure

2007-06-21 Thread Satyam Sharma

Hi Mark,

Networking stuff generally goes to netdev. [ Added to Cc: ]

On 6/21/07, Mark Hannessen <[EMAIL PROTECTED]> wrote:

Hi list,

I have some trouble getting my network card to run.
when I run dmesg I can clearly see it being detected

eth0: VIA Rhine II at 0xee006000, 00:e0:c5:54:88:a8, IRQ 11.
eth0: MII PHY found at address 1, status 0x786d advertising 05e1 Link 45e1.

but when I try ifconfig eth0 up it fails with a no device error.

SIOCSIFADDR: No such device eth0:
ERROR while getting interface flags: No such device

I tried adding pci=routeirq too, but that didn't work either.

I know there is nothing wrong with the card itself because it works on
the linux distro that came with it.

does anyone have any hints as to what might get it up and running?

Thanks!
Mark

I attached a complete dmesg output just in case..


00:00.0 Host bridge: VIA Technologies, Inc. VT8623 [Apollo CLE266]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP]
00:09.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller
00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1
Controller (rev 80)
00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1
Controller (rev 80)
00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1
Controller (rev 80)
00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 50)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
01:00.0 VGA compatible controller: VIA Technologies, Inc. VT8623
[Apollo CLE266] integrated CastleRock graphics (rev 03)
Linux version 2.6.20.14 ([EMAIL PROTECTED]) (gcc version 4.1.1 (Gentoo
4.1.1-r3)) #2 PREEMPT Thu Jun 21 09:11:50 CEST 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009fc00 end:
0009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009fc00 size: 0400 end:
000a type: 2
copy_e820_map() start: 000f size: 0001 end:
0010 type: 2
copy_e820_map() start: 0010 size: 1f6f end:
1f7f type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1f7f size: 3000 end:
1f7f3000 type: 4
copy_e820_map() start: 1f7f3000 size: d000 end:
1f80 type: 3
copy_e820_map() start:  size: 0001 end:
0001 type: 2
BIOS-e820:  - 0009fc00 (usable)
BIOS-e820: 0009fc00 - 000a (reserved)
BIOS-e820: 000f - 0010 (reserved)
BIOS-e820: 0010 - 1f7f (usable)
BIOS-e820: 1f7f - 1f7f3000 (ACPI NVS)
BIOS-e820: 1f7f3000 - 1f80 (ACPI data)
BIOS-e820:  - 0001 (reserved)
503MB LOWMEM available.
Entering add_active_range(0, 0, 129008) 0 entries of 256 used
Zone PFN ranges:
 DMA 0 -> 4096
 Normal   4096 ->   129008
early_node_map[1] active PFN ranges
   0:0 ->   129008
On node 0 totalpages: 129008
 DMA zone: 32 pages used for memmap
 DMA zone: 0 pages reserved
 DMA zone: 4064 pages, LIFO batch:0
 Normal zone: 975 pages used for memmap
 Normal zone: 123937 pages, LIFO batch:31
DMI 2.2 present.
ACPI: RSDP (v000 CLE266) @ 0x000f6690
ACPI: RSDT (v001 CLE266 AWRDACPI 0x42302e31 AWRD 0x) @ 0x1f7f3040
ACPI: FADT (v001 CLE266 AWRDACPI 0x42302e31 AWRD 0x) @ 0x1f7f30c0
ACPI: DSDT (v001 CLE266 AWRDACPI 0x1000 MSFT 0x010e) @ 0x
ACPI: PM-Timer IO Port: 0x4008
Allocating PCI resources starting at 2000 (gap: 1f80:e07f)
Detected 1000.450 MHz processor.
Built 1 zonelists.  Total pages: 128001
Kernel command line: BOOT_IMAGE=(hd0,1)/kernels/vmlinuz-2.6.20
root=/dev/sda2 splash=silent,fadein,tty=tty8,theme:riecom console=tty1
vga=791 pci=routeirq
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 500536k/516032k available (2425k kernel code, 14860k reserved,
1369k data, 168k init, 0k highmem)
virtual kernel memory layout:
   fixmap  : 0x7000 - 0xf000   (  32 kB)
   vmalloc : 0xe000 - 0x5000   ( 511 MB)
   lowmem  : 0xc000 - 0xdf7f   ( 503 MB)
 .init : 0xc04b8000 - 0xc04e2000   ( 168 kB)
 .data : 0xc035e41c - 0xc04b4a10   (1369 kB)
 .text : 0xc010 - 0xc035e41c

Re: [RFC][PATCH -mm take5 6/7] add ioctls for adding/removing target

2007-06-21 Thread Satyam Sharma

Hi Keiichi,


> Please do consider configfs. Note that we'll have to lose the sysfs
> symlink from your target's kobject to the kobject of the ethernet
> device if we switch to configfs, but was that symlink needed for
> some essential functionality or was it simply for informational
> purpose? IMHO, this patchset only needs to bring in functionality
> to be able to create, destroy, and modify netconsole targets at
> run-time, and all these reconfiguration tasks would be handled
> quite well by configfs, AFAICT.

It was for informational pupose. But, if we used symlink to the net_device
kobject in sysfs, we could easily keep up with changing network device name by
changing symbolic link.


In that case (because it is non-essential) we can drop that link entirely.
(This means the associated code for the temporary modify_list, the
extra mutex and the kasprintf() stuff can also be dropped too!) I guess
the netconsole_event() notifier could now become simply:

if (event == NETDEV_CHANGENAME) {
spin_lock_irqsave(&target_list_lock, ...);
list_for_each_entry(nt, ..., &target_list, ...)
if (nt->np.dev == dev)
strcpy(nt->np.dev_name, dev->name);
spin_unlock_irqrestore(&target_list_lock, ...);
}


In the case of configfs, Do we use config_item related to the network interface
because the configfs doesn't have symlink that refers to net_device kobject
(e.g. "network_interface" in configfs, "network_interface" value is "eth0")?


Yes, that's a good idea, this way we continue to give the information we
were giving previously. We could do this by defining another configfs
attribute of the target's config_item which could then be a proxy for
the char dev_name[IFNAMSIZ] member of the target's underlying netpoll
structure. (this attribute would naturally be read-only from userspace;
sort of similar to what you're doing for the local_mac attribute currently).

Please feel free to discuss anything else regarding this with me off-list,
if you want.

Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: 2.6.22: ERROR: "__ucmpdi2" [drivers/net/s2io.ko] undefined!

2007-06-21 Thread Sivakumar Subramani
Hi,

We will include this fix in next set of patch submission. Thanks for the
fix.

Thanks,
~Siva 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Olaf Hering
Sent: Wednesday, June 20, 2007 2:11 AM
To: Stephen Hemminger
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org
Subject: Re: 2.6.22: ERROR: "__ucmpdi2" [drivers/net/s2io.ko] undefined!

On Tue, Jun 19, Stephen Hemminger wrote:

> On Tue, 19 Jun 2007 21:02:53 +0200
> Olaf Hering <[EMAIL PROTECTED]> wrote:
> 
> > 
> > What happend to __ucmpdi2 from David Woodhouse?
> > google has a few hits about stuff like this on 32bit powerpc with
gcc 4.1.2:
> > 
> > ERROR: "__ucmpdi2" [drivers/net/s2io.ko] undefined!
> > 
> > using the drivers/net/s2io* files from 2.6.21 with 2.6.22-rc5 fixes 
> > the compile.
> > 
> > 25805dcf9d83098cf5492117ad2669cd14cc9b24 adds two u64 >>= 48 
> > followed by a switch statement (line 2889 and 6816).
> 
> Probably the "switch(err) {" needs a cast to a smaller type (like u8).

This change removes the calls to __ucmpdi2.

---
 drivers/net/s2io.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -2868,6 +2868,7 @@ static void tx_intr_handler(struct fifo_
struct tx_curr_get_info get_info, put_info;
struct sk_buff *skb;
struct TxD *txdlp;
+   u8 err_mask;
 
get_info = fifo_data->tx_curr_get_info;
memcpy(&put_info, &fifo_data->tx_curr_put_info,
sizeof(put_info)); @@ -2886,8 +2887,8 @@ static void
tx_intr_handler(struct fifo_
}
 
/* update t_code statistics */
-   err >>= 48;
-   switch(err) {
+   err_mask = err >> 48;
+   switch(err_mask) {
case 2:
 
nic->mac_control.stats_info->sw_stat.
 
tx_buf_abort_cnt++;
@@ -6805,6 +6806,7 @@ static int rx_osm_handler(struct ring_in
u16 l3_csum, l4_csum;
unsigned long long err = rxdp->Control_1 & RXD_T_CODE;
struct lro *lro;
+   u8 err_mask;
 
skb->dev = dev;
 
@@ -6813,8 +6815,8 @@ static int rx_osm_handler(struct ring_in
if (err & 0x1) {
 
sp->mac_control.stats_info->sw_stat.parity_err_cnt++;
}
-   err >>= 48;
-   switch(err) {
+   err_mask = err >> 48;
+   switch(err_mask) {
case 1:
sp->mac_control.stats_info->sw_stat.
rx_parity_err_cnt++;
@@ -6867,9 +6869,9 @@ static int rx_osm_handler(struct ring_in
* Note that in this case, since checksum will be
incorrect,
* stack will validate the same.
*/
-   if (err != 0x5) {
-   DBG_PRINT(ERR_DBG, "%s: Rx error Value:
0x%llx\n",
-   dev->name, err);
+   if (err_mask != 0x5) {
+   DBG_PRINT(ERR_DBG, "%s: Rx error Value: 0x%x\n",
+   dev->name, err_mask);
sp->stats.rx_crc_errors++;
sp->mac_control.stats_info->sw_stat.mem_freed 
+= skb->truesize;
-
To unsubscribe from this list: send the line "unsubscribe netdev" in the
body of a message to [EMAIL PROTECTED] More majordomo info at
http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH -mm take5 6/7] add ioctls for adding/removing target

2007-06-21 Thread Keiichi KII

Hello Satyam,


Hmm, I might've missed this thread, but my opinion on the
alternatives, fwiw:

1. I think adding new ioctl's to the kernel are generally disliked for
obvious reasons. Perhaps Stephen meant to add some generic
ioctl's above (and not separate ones specially implemented for
the dynamically reconfigurable netconsole driver)?


You're right.
At first, I implemented ioctls to misc device because of using misc sysfs.
But, Andrew Morton said "Using an ioctl() against a miscdev is rather 
untypical for networking.". So, I implemented ioclts to tty_driver.



Please do consider configfs. Note that we'll have to lose the sysfs
symlink from your target's kobject to the kobject of the ethernet
device if we switch to configfs, but was that symlink needed for
some essential functionality or was it simply for informational
purpose? IMHO, this patchset only needs to bring in functionality
to be able to create, destroy, and modify netconsole targets at
run-time, and all these reconfiguration tasks would be handled
quite well by configfs, AFAICT.


It was for informational pupose. But, if we used symlink to the net_device 
kobject in sysfs, we could easily keep up with changing network device name by 
changing symbolic link.
In the case of configfs, Do we use config_item related to the network interface 
because the configfs doesn't have symlink that refers to net_device kobject

(e.g. "network_interface" in configfs, "network_interface" value is "eth0")?

I'm going to search configfs and modify interface to configfs.

Thanks
--
Keiichi KII
NEC Corporation OSS Platform Development Division
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH -mm take5 4/7] using symlink for the net_device

2007-06-21 Thread Keiichi KII

Hello Satyam,


In any case, however, the point to extend the critical section here
to encapsulate all the three parts still stands. We wouldn't want
ioctl(NETCON_REMOVE_TARGET) on the specified target to
return without removing the target that the user specified just
because that target's ethernet interface happens to be currently
undergoing a name change. The correct behaviour would be to
sleep on a mutex till the renaming has completed (which will
then relinquish the mutex) and then (after acquiring the mutex)
proceed to remove it, IMHO.


You're right. I misunderstood. All the three parts needs to encapsulate.


>> +static char *make_netdev_class_name(char *netdev_name)
>> +{
>> +   char *name;
>> +
>> +   name = kasprintf(GFP_KERNEL, "net:%s", netdev_name);
>
> Why the "net:" prefix in the filename?

Because I drew upon dev_change_name() method in net/core/dev.c.
The device_rename() in the above function makes use of same prefix
related to netdev.


I think you're referring to make_class_name() here? That seems to
be somewhat bulkier than simply being a wrapper over kasprintf()
like the make_netdev_class_name() here. I'd definitely recommend
not obfuscating this simple functionality here.


I understand. The wrapper method such as make_netdev_class_name() isn't
appropriate in this case.

Thanks
--
Keiichi KII
NEC Corporation OSS Platform Development Division
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-06-21 Thread Julian Anastasov

Hello,

On Wed, 20 Jun 2007, Balazs Scheidler wrote:

> Is there a chance that this, or a patch with similar spirit (e.g. a way
> to send packets from non-local IP addresses) could be merged?

Last patch from Krisztian is exactly what I preferred
to see, I hope that someone really tested it :). But i'm not the one
that decides, so far we don't see comments from other interested
parties, may be there are other opinions?

Regards

--
Julian Anastasov <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html