Re: Multiqueue and virtualization WAS(Re: [PATCH 3/3] NET: [SCHED] Qdisc changes and sch_rr added for multiqueue

2007-07-08 Thread David Miller
From: jamal [EMAIL PROTECTED]
Date: Fri, 06 Jul 2007 10:39:15 -0400

 If the issue is usability of listing 1024 netdevices, i can think of
 many ways to resolve it.

I would agree with this if there were a reason for it, it's totally
unnecessary complication as far as I can see.

These virtual devices are an ethernet with the subnet details exposed
to the driver, nothing more.

I see zero benefit to having a netdev for each guest or node we can
speak to whatsoever.  It's a very heavy abstraction to use for
something that is so bloody simple.

My demux on -hard_start_xmit() is _5 DAMN LINES OF CODE_, you want to
replace that with a full netdev because of some minor difficulties in
figuring out to record the queuing state.  It's beyond unreasonable.

Netdevs are like salt, if you put too much in your food it tastes
awful.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] net: netdevice mtu assumptions documentation

2007-07-08 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 6 Jul 2007 11:31:17 -0700

 Document the expectations about device MTU handling.
 The documentation about oversize packet handling is probably too
 loose.
 
 IMHO devices should drop oversize packets for robustness,
 but many devices allow it now. For example, if you set mtu to 1200
 bytes, most ether devices will allow a 1500 byte frame in.
 
 Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

Applied, thanks Stephen.

I agree that drivers should drop oversized frames, otherwise
setting up PMTU test scenerios is more difficult than it
really needs to be.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6.23 take 2] Per-datagram TTL and TOS via sendmsg()

2007-07-08 Thread Rémi Denis-Courmont
[This time with automatic carriage return off :-$]

This patch adds support for specifying IPv4 Time-To-Live (IP_TTL) and/or 
Type-Of-Service (IP_TOS) values on a per datagram basis through 
sendmsg() ancilliary data. Until then, it only worked for IPv6 sockets 
(using IPV6_HOPLIMIT and IPV6_TCLASS).

Signed-off-by: Rémi Denis-Courmont [EMAIL PROTECTED]

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 62daf21..7a6dc33 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -140,6 +140,8 @@ struct inet_sock {
int length; /* Total length of all frames */
__be32  addr;
struct flowifl;
+   __s16   ttl;
+   __s16   tos;
} cork;
 };
 
diff --git a/include/net/ip.h b/include/net/ip.h
index abf2820..dcfdb41 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -54,6 +54,8 @@ struct ipcm_cookie
__be32  addr;
int oif;
struct ip_options   *opt;
+   __s16   ttl;
+   __s16   tos;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)-cb))
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 02a899b..e10852d 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -392,8 +392,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, 
struct sk_buff *skb)
icmp_param-data.icmph.checksum = 0;
icmp_out_count(icmp_param-data.icmph.type);
 
-   inet-tos = ip_hdr(skb)-tos;
daddr = ipc.addr = rt-rt_src;
+   ipc.tos = ip_hdr(skb)-tos;
+   ipc.ttl = MULTICAST(daddr) ? inet-mc_ttl : inet-uc_ttl;
ipc.opt = NULL;
if (icmp_param-replyopts.optlen) {
ipc.opt = icmp_param-replyopts;
@@ -438,7 +439,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int 
code, __be32 info)
struct rtable *rt = (struct rtable *)skb_in-dst;
struct ipcm_cookie ipc;
__be32 saddr;
-   u8  tos;
 
if (!rt)
goto out;
@@ -526,9 +526,9 @@ void icmp_send(struct sk_buff *skb_in, int type, int 
code, __be32 info)
saddr = 0;
}
 
-   tos = icmp_pointers[type].error ? ((iph-tos  IPTOS_TOS_MASK) |
-  IPTOS_PREC_INTERNETCONTROL) :
- iph-tos;
+   ipc.tos = icmp_pointers[type].error ? ((iph-tos  IPTOS_TOS_MASK) |
+  IPTOS_PREC_INTERNETCONTROL) :
+ iph-tos;
 
if (ip_options_echo(icmp_param.replyopts, skb_in))
goto out_unlock;
@@ -545,7 +545,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int 
code, __be32 info)
icmp_param.skb= skb_in;
icmp_param.offset = skb_network_offset(skb_in);
icmp_out_count(icmp_param.data.icmph.type);
-   inet_sk(icmp_socket-sk)-tos = tos;
+   ipc.ttl = -1;
ipc.addr = iph-saddr;
ipc.opt = icmp_param.replyopts;
 
@@ -557,7 +557,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int 
code, __be32 info)
icmp_param.replyopts.faddr :
iph-saddr,
.saddr = saddr,
-   .tos = RT_TOS(tos)
+   .tos = RT_TOS(ipc.tos)
}
},
.proto = IPPROTO_ICMP,
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 34ea454..67ce657 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -806,6 +806,8 @@ int ip_append_data(struct sock *sk,
dst_mtu(rt-u.dst.path);
inet-cork.rt = rt;
inet-cork.length = 0;
+   inet-cork.ttl = ipc-ttl;
+   inet-cork.tos = ipc-tos;
sk-sk_sndmsg_page = NULL;
sk-sk_sndmsg_off = 0;
if ((exthdrlen = rt-u.dst.header_len) != 0) {
@@ -1233,7 +1235,9 @@ int ip_push_pending_frames(struct sock *sk)
if (inet-cork.flags  IPCORK_OPT)
opt = inet-cork.opt;
 
-   if (rt-rt_type == RTN_MULTICAST)
+   if (inet-cork.ttl != -1)
+   ttl = inet-cork.ttl;
+   else if (rt-rt_type == RTN_MULTICAST)
ttl = inet-mc_ttl;
else
ttl = ip_select_ttl(inet, rt-u.dst);
@@ -1245,7 +1249,7 @@ int ip_push_pending_frames(struct sock *sk)
iph-ihl += opt-optlen2;
ip_options_build(skb, opt, inet-cork.addr, rt, 0);
}
-   iph-tos = inet-tos;
+   iph-tos = (inet-cork.tos != -1) ? inet-cork.tos : inet-tos;
iph-tot_len = htons(skb-len);
iph-frag_off = df;
ip_select_ident(iph, rt-u.dst, sk);
@@ -1343,6 

[PATCH net-2.6.23 take 3] Per-datagram TTL and TOS via sendmsg()

2007-07-08 Thread Rémi Denis-Courmont
[Hmm, stupid me. Right this time. Sorry for the line noise.]

This patch adds support for specifying IPv4 Time-To-Live (IP_TTL) and/or 
Type-Of-Service (IP_TOS) values on a per datagram basis through 
sendmsg() ancilliary data. Until then, it only worked for IPv6 sockets 
(using IPV6_HOPLIMIT and IPV6_TCLASS).

Signed-off-by: Rémi Denis-Courmont [EMAIL PROTECTED]

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 62daf21..7a6dc33 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -140,6 +140,8 @@ struct inet_sock {
int length; /* Total length of all frames */
__be32  addr;
struct flowifl;
+   __s16   ttl;
+   __s16   tos;
} cork;
 };
 
diff --git a/include/net/ip.h b/include/net/ip.h
index abf2820..dcfdb41 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -54,6 +54,8 @@ struct ipcm_cookie
__be32  addr;
int oif;
struct ip_options   *opt;
+   __s16   ttl;
+   __s16   tos;
 };
 
 #define IPCB(skb) ((struct inet_skb_parm*)((skb)-cb))
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 02a899b..e10852d 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -392,8 +392,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct 
sk_buff *skb)
icmp_param-data.icmph.checksum = 0;
icmp_out_count(icmp_param-data.icmph.type);
 
-   inet-tos = ip_hdr(skb)-tos;
daddr = ipc.addr = rt-rt_src;
+   ipc.tos = ip_hdr(skb)-tos;
+   ipc.ttl = MULTICAST(daddr) ? inet-mc_ttl : inet-uc_ttl;
ipc.opt = NULL;
if (icmp_param-replyopts.optlen) {
ipc.opt = icmp_param-replyopts;
@@ -438,7 +439,6 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
struct rtable *rt = (struct rtable *)skb_in-dst;
struct ipcm_cookie ipc;
__be32 saddr;
-   u8  tos;
 
if (!rt)
goto out;
@@ -526,9 +526,9 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
saddr = 0;
}
 
-   tos = icmp_pointers[type].error ? ((iph-tos  IPTOS_TOS_MASK) |
-  IPTOS_PREC_INTERNETCONTROL) :
- iph-tos;
+   ipc.tos = icmp_pointers[type].error ? ((iph-tos  IPTOS_TOS_MASK) |
+  IPTOS_PREC_INTERNETCONTROL) :
+ iph-tos;
 
if (ip_options_echo(icmp_param.replyopts, skb_in))
goto out_unlock;
@@ -545,7 +545,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
icmp_param.skb= skb_in;
icmp_param.offset = skb_network_offset(skb_in);
icmp_out_count(icmp_param.data.icmph.type);
-   inet_sk(icmp_socket-sk)-tos = tos;
+   ipc.ttl = -1;
ipc.addr = iph-saddr;
ipc.opt = icmp_param.replyopts;
 
@@ -557,7 +557,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
icmp_param.replyopts.faddr :
iph-saddr,
.saddr = saddr,
-   .tos = RT_TOS(tos)
+   .tos = RT_TOS(ipc.tos)
}
},
.proto = IPPROTO_ICMP,
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 34ea454..67ce657 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -806,6 +806,8 @@ int ip_append_data(struct sock *sk,
dst_mtu(rt-u.dst.path);
inet-cork.rt = rt;
inet-cork.length = 0;
+   inet-cork.ttl = ipc-ttl;
+   inet-cork.tos = ipc-tos;
sk-sk_sndmsg_page = NULL;
sk-sk_sndmsg_off = 0;
if ((exthdrlen = rt-u.dst.header_len) != 0) {
@@ -1233,7 +1235,9 @@ int ip_push_pending_frames(struct sock *sk)
if (inet-cork.flags  IPCORK_OPT)
opt = inet-cork.opt;
 
-   if (rt-rt_type == RTN_MULTICAST)
+   if (inet-cork.ttl != -1)
+   ttl = inet-cork.ttl;
+   else if (rt-rt_type == RTN_MULTICAST)
ttl = inet-mc_ttl;
else
ttl = ip_select_ttl(inet, rt-u.dst);
@@ -1245,7 +1249,7 @@ int ip_push_pending_frames(struct sock *sk)
iph-ihl += opt-optlen2;
ip_options_build(skb, opt, inet-cork.addr, rt, 0);
}
-   iph-tos = inet-tos;
+   iph-tos = (inet-cork.tos != -1) ? inet-cork.tos : inet-tos;
iph-tot_len = htons(skb-len);
iph-frag_off = df;
ip_select_ident(iph, rt-u.dst, sk);

[PATCH] CXGB3: Replace kcalloc(1,...) with kmalloc() in cxgb3_offload.c.

2007-07-08 Thread Robert P. J. Day

Signed-off-by: Robert P. J. Day [EMAIL PROTECTED]

---

diff --git a/drivers/net/cxgb3/cxgb3_offload.c 
b/drivers/net/cxgb3/cxgb3_offload.c
index ebcf35e..999bc41 100644
--- a/drivers/net/cxgb3/cxgb3_offload.c
+++ b/drivers/net/cxgb3/cxgb3_offload.c
@@ -1105,7 +1105,7 @@ int cxgb3_offload_activate(struct adapter *adapter)
struct mtutab mtutab;
unsigned int l2t_capacity;

-   t = kcalloc(1, sizeof(*t), GFP_KERNEL);
+   t = kcalloc(sizeof(*t), GFP_KERNEL);
if (!t)
return -ENOMEM;

-- 

Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread James Chapman

Stephen Hemminger wrote:

I think rather than having a meta-discussion, which always seems to degenerate
into finger pointing. Let's look at the code. The problem with popular drivers
(as I too well know) is that user's seem to find every wart.  There are lots of
old drivers that never seem to get the same scrutiny, and if they did would
be tarred and feathered.


Auke has already posted an early snapshot, archived here:

http://marc.info/?l=linux-netdevm=118315951209350w=3

Jeff and I have already commented on the code.

--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] CXGB3: Replace kcalloc(1,...) with kmalloc() in cxgb3_offload.c.

2007-07-08 Thread Johannes Berg
On Sun, 2007-07-08 at 05:39 -0400, Robert P. J. Day wrote:

 - t = kcalloc(1, sizeof(*t), GFP_KERNEL);
 + t = kcalloc(sizeof(*t), GFP_KERNEL);
   ^
I don't think that's going to work, and shouldn't it probably use
kzalloc instead of kmalloc (as you wrote in subject)

johannes


signature.asc
Description: This is a digitally signed message part


[PATCH] ieee1394: first minimal NUMA awareness

2007-07-08 Thread Stefan Richter
Association of a host device with a node on NUMA machines optimizes
allocations of skbs given from the networking stack to eth1394.

Signed-off-by: Stefan Richter [EMAIL PROTECTED]
---

Note, in Linux 2.6.20 and earlier and 2.6.23 and later, eth1394 calls
or will call SET_NETDEV_DEV(eth1394_netdev, h-device).

 drivers/ieee1394/hosts.c |1 +
 1 file changed, 1 insertion(+)

Index: linux-2.6.22-rc7/drivers/ieee1394/hosts.c
===
--- linux-2.6.22-rc7.orig/drivers/ieee1394/hosts.c
+++ linux-2.6.22-rc7/drivers/ieee1394/hosts.c
@@ -154,6 +154,7 @@ struct hpsb_host *hpsb_alloc_host(struct
 
memcpy(h-device, nodemgr_dev_template_host, sizeof(h-device));
h-device.parent = dev;
+   set_dev_node(h-device, dev_to_node(dev));
snprintf(h-device.bus_id, BUS_ID_SIZE, fw-host%d, h-id);
 
h-host_dev.parent = h-device;

-- 
Stefan Richter
-=-=-=== -=== -=---
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET]: Another unnecessary net/tcp.h inclusion in net/dn.h

2007-07-08 Thread Ilpo Järvinen

No longer needed.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/net/dn.h |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/include/net/dn.h b/include/net/dn.h
index ac4ce90..6277783 100644
--- a/include/net/dn.h
+++ b/include/net/dn.h
@@ -3,7 +3,6 @@
 
 #include linux/dn.h
 #include net/sock.h
-#include net/tcp.h
 #include asm/byteorder.h
 
 #define dn_ntohs(x) le16_to_cpu(x)
-- 
1.5.0.6

Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

2007-07-08 Thread Muli Ben-Yehuda
On Fri, Jul 06, 2007 at 10:14:56AM -0700, Williams, Mitch A wrote:
 David Miller wrote:
  Okay, but then using SG lists makes skbuff's much bigger.
  
 fraglistscatterlistper skbuff
  32 bit 8   20  +12 * 18 = +216!
  64 bit 16  32  +16 * 18 = +288
  
  So never mind...
 
 I know, this is why nobody ever really tries to tackle this.
 
  I'll do a fraglist to scatter list set of routines, but not sure
  if it's worth it.
 
 It's better to add dma_map_skb() et al. interfaces to be honest.
 
 Also even with the scatterlist idea, we'd still need to do two
 map calls, one for skb-data and one for the page vector.
 
 FWIW, I tried this about a year ago to try to improve e1000
 performance on pSeries.  I was hoping to simplify the driver
 transmit code and make IOMMU mapping easier.  This was on 2.6.16 or
 thereabouts.
 
 Net result:  zilch.  No performance increase, no noticeable CPU
 utilization
 benefits.  Nothing.  So I dropped it.

Do you have pointers to the patches perchance?

 Slightly off topic:
 The real problem that I saw on pSeries is lock contention for the IOMMU.
 It's architected with a single table per slot, which is great in that
 two boards in separate slots won't have lock contention.  However, this
 all goes out the window when you drop a quad-port gigabit adapter in
 there.
 The time spent waiting for the IOMMU table lock goes up exponentially
 as you activate each additional port.
 
 In my opinion, IOMMU table locking is the major issue with this type
 of architecture.  Since both Intel and AMD are touting IOMMUs for
 virtual- ization support, this is an issue that's going to need a
 lot of scrutiny.

Agreed.

Cheers,
Muli
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/2] shrink size of scatterlist on common i386/x86-64

2007-07-08 Thread Muli Ben-Yehuda
On Fri, Jul 06, 2007 at 12:20:19PM -0700, David Miller wrote:
 From: Williams, Mitch A [EMAIL PROTECTED]
 Date: Fri, 6 Jul 2007 10:14:56 -0700
 
  In my opinion, IOMMU table locking is the major issue with this
  type of architecture.  Since both Intel and AMD are touting IOMMUs
  for virtual- ization support, this is an issue that's going to
  need a lot of scrutiny.
 
 For the allocation of IOMMU entries themselves you can play tricks
 using atomic operations on 64-bit words of the allocator bitmap to
 avoid locking that.

Hmm, any pointers?

 You can use per-cpu salts to determine where to start the search and
 avoid hitting the same cachelines as other cpus working on the same
 table.
 
 But you'll need to lock in order to flush the IOMMU tlb I'm afraid.
 The way to mitigate that is to only flush the IOMMU tlb once per
 allocator generation.

That works, but isn't optimal when you have an isolation-capable IOMMU
and you want the full isolation properties of the IOMMU. If you only
flush the IOTLB when the allocator wraps around, a stale entry in the
IOTLB can allow a DMA to go through for an IO entry that has already
been unmapped. One way to mitigate that and still retain full
isolation is to make sure no one else gets to use the frames that are
the targets of the DMA until the translation has been flushed out of
the IOTLB, but that requires pretty deep surgery.

Cheers,
Muli


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Arjan van de Ven

Stephen Hemminger wrote:
I would really like to continue with my original plan that I posted that follows 
Christoph's idea. I hope you can all agree with that so we can get on with this.


I think rather than having a meta-discussion, which always seems to degenerate
into finger pointing. Let's look at the code. The problem with popular drivers
(as I too well know) is that user's seem to find every wart.  There are lots of
old drivers that never seem to get the same scrutiny, and if they did would
be tarred and feathered.



I'd second this; also lets be honest and fair about things and use a 
similar standard for all drivers; are we going to ask all driver 
submitters to remove NAPI, TSO and other stuff? I would hope not. Are 
all new drivers that have even a single bitfield going to be rejected? 
That's be a new rule but ok, if it applies to all drivers it's fair.


So the question in my opinion should be is this driver good enough 
for merging, and if not, what specifically is wrong enough to hold of 
merging not what would the perfect ideal 
we-have-all-the-time-in-the-world driver be and certainly not how is 
this going to work in an enterprise distro since the later is the 
problem for the enterprise disro that they get paid to solve, not for 
kernel.org kernels.


What is there now is a driver that works, is relatively clean (and yes 
if you look deep enough you can ALWAYS nitpick on any code, even 
Linus' or Jeff's best code) and provides good performance with all the 
features a modern ethernet driver is supposed to have.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Jeff Garzik

Kok, Auke wrote:

This is not acceptable and hardly fair to expect from us.

It also exposes users to endless delays and uncertainties as to a final 
resolution. Not to mention that writing a driver from scratch for (just) 
ich9 will take significant time, is silly since it's almost identical to 
ich8 etc..


I don't think that anyone besides you and maybe one or two others are 
interested in doing this rewrite from scratch. The rest of us would 
rather see something much more similar to my original suggestion which 
relieves the ich9-wishes for everyone and provides a good starting point 
to go forward. Not to mention that a lot of that code is already there, 
and at least cleaned up quite a bit already.


Let's review the history here:

* For -years-, Intel has been told to stream structural cleanups into 
e1000 (and ixgb), along with feature enablement.  None of these cleanups 
happened, and the result was today's bloated driver.


* Intel posts a new e1000 driver, which is just as complex, if not more so.

* Intel proposes a Linux user transition plan that amounts to an abrupt 
NIC driver switch for users on a widely used NIC platform.


* Intel immediately dismisses all alternate paths to e1000 change, and 
all major e1000new structural comments.


Thus in effect you are presenting the Linux community with one option 
(and only one) -- your e1000new as it exists today -- and demanding that 
that option be accepted.


More importantly, Intel is asking the Linux community to TRUST that 
Intel will be a good driver MAINTAINER, despite lack of encouraging 
evidence in that regard.


Will we continue to see INATTENTION to basic driver maintenance, with 
100% of resources being devoted new features, and 0% devoted to thinking 
about how best to create a maintainable driver for those features long term?


With e1000new, I see a driver that's internally modular, but no more 
maintainable.  I also see a driver that runs directly counter to what we 
normally do in this situation -- split it up into multiple drivers. 
When you find yourself reinventing a net driver API, that's a big hint 
your engineering is ass-backwards.


It's not acceptable and hardly fair to expect the Linux community to 
blindly swallow the single option you've presented us.


We're talking about one of the most widely used NIC drivers here. 
Proposing to flip a switch and dump all users on the new driver in a 
couple months time does not serve Linux users at all.  Introducing the 
new driver more slowly -- starting with ICH9 and any other PCI IDs not 
covered by e1000 -- gives the driver a chance to prove itself in the 
field and be introduced more slowly, while at the same time not 
dramatically disturbing the existing, working e1000 installed base.


Starting with a new driver for the new hardware is the right way to go. 
   Get that driver into the field, get it right, and THEN use that 
field experience to see what PCI IDs you want to transition from the 
older driver.  That path is minimal disruption for Linux users, and 
maximizes the changes of Linux users running a working, proven driver. 
And coincidentally, that path also gives you the freedom to do heavy 
development on the new driver, with minimal disruption of the majority 
of your userbase.


So in sum, (a) Intel does not appear to have thought through the driver 
transition, (b) Intel has not proven itself to be a maintainer, and (c) 
e1000new needs to be restructured.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Jeff Garzik

Arjan van de Ven wrote:
I'd second this; also lets be honest and fair about things and use a 
similar standard for all drivers; are we going to ask all driver 
submitters to remove NAPI, TSO and other stuff? I would hope not. Are 


That was merely a suggestion.  My general meaning was small driver is 
easier to get into the kernel and into the field.



all new drivers that have even a single bitfield going to be rejected? 
That's be a new rule but ok, if it applies to all drivers it's fair.


That's one of a gazillion checklist items that a new driver has to pass 
through.  You know how it works for new drivers.


it looks better than our last try and it looks better than that ugly 
driver over there does not grant you a free pass on review.


I'm not asking for perfection, just something other than

* e1000 gets feedback
* Intel disappears for months
* Intel reappears with e1000 rewrite
* Intel fights tooth and nail when the driver is not accepted verboten

And specifically I worry that three years down the road we will reach 
the same point again, when e1000new is even more complex.


Will Intel shrug its shoulders and decide its time for another rewrite?


So the question in my opinion should be is this driver good enough for 
merging, and if not, what specifically is wrong enough to hold of 
merging not what would the perfect ideal 
we-have-all-the-time-in-the-world driver be and certainly not how is 
this going to work in an enterprise distro since the later is the 
problem for the enterprise disro that they get paid to solve, not for 
kernel.org kernels.


What is there now is a driver that works, is relatively clean (and yes 
if you look deep enough you can ALWAYS nitpick on any code, even Linus' 
or Jeff's best code) and provides good performance with all the features 
a modern ethernet driver is supposed to have.


Is this is the attitude, what's the point of even posting the driver for 
review?  Intel posted e1000new on June 29.  Feedback was then posted. 
Ten days later, without a single revision, Intel declares its own driver 
clean and working and good enough for merging as-is.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Jeff Garzik

Stephen Hemminger wrote:

I think rather than having a meta-discussion, which always seems to degenerate
into finger pointing. Let's look at the code. The problem with popular drivers
(as I too well know) is that user's seem to find every wart.  There are lots of
old drivers that never seem to get the same scrutiny, and if they did would
be tarred and feathered.



it's not as scruffy as that driver in the corner is no way to maintain 
a kernel.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?

2007-07-08 Thread Jeff Garzik

Arjan van de Ven wrote:

Kok, Auke wrote:

Jeff Garzik wrote:

Andrew Morton wrote:

On Fri, 29 Jun 2007 14:39:20 -0700
Kok, Auke [EMAIL PROTECTED] wrote:

That's why we want to introduce a second e1000 driver (named 
differently, pick any name) that contains the new code base, 
side-by-side into the kernel with the current e1000.
Sounds like a reasonable approach to me (it has plenty of 
precedent).  But

I forget what all the other issues were, so ignore me.


Given past history with duplicate drivers and the problems that they 
cause -- I know, I've caused some of those problems :( -- I strongly 
recommend against when it can be avoided.


Leaving e1000 with current hardware, and a new e1001 for newer 
hardware should be easier to manage for all involved, without the 
headaches that duplicate drivers cause.




Jeff,

ok first you hate the old e1000 and now you don't want to get rid of it ;)


No -- e1000 is big and bloated but also stable and working and a key 
popular NIC driver for a lot of people.


That means the transition has a wide impact, and thus should be 
considered a lot more carefully than (say) rewriting 8139cp.


I reject the notion that a flag day switchover for a huge mass of 
e1000 users is the correct path.  I do not think that best serves Linux 
users.


It is better to get a new driver out in the field and working for a 
small population, let time pass, then consider if we really want to 
transition the entire e1000 population to the new driver.  That leaves 
the mass of e1000 users with a working driver while the new driver 
proves itself.  Since a small population of users is required to use the 
new driver, by virtue of new/old PCI ID split, you are guaranteed a 
population of test users immediately for the new driver.


When the new driver proves itself stable, you will have plenty of 
knowledge from which to decide whether to transition 8257x users, all 
e1000 users, or whatever.



I appreciate the pain a temporary dual driver situation gives; it comes 
down to a few things that I can think of right now, if you see more 
please add to the list.


1) users who find a bug in the new one silently use the old one rather 
than reporting the bug; and only scream when the old one eventually goes 
away (see ALSA/OSS duplication)


2) users who enable both in KConfig may get a random one

3) distros really prefer only 1 driver per PCI ID for their 
infrastructure tools


4) there will be resistance against deleting the old one meaning it 
might not happen


You are missing the largest source of pain and headache:  Users will use 
the default driver, which means no field testing at all until flag day, 
with obvious results.


Furthermore, Linux kernel history demonstrates that temporary dual 
driver situations are rarely temporary.  Thus, selling it as such in 
the face of all contrary experience is pure hyperbole.


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?

2007-07-08 Thread Jeff Garzik

Roland Dreier wrote:

one possibility would be to merge e1000new with support only for chips
not supported by e1000, and semi-freeze e1000 (fixes only, new device
support goes into e1000new).


indeed.

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?

2007-07-08 Thread James Chapman

James Chapman wrote:

I envisage something like this:-

e1000_core.c- common code used by all drivers of the e1000 family.
  Exports functions used by actual drivers. Loadable
  as a separate module when built as a module.
e1000_82541.c- driver for 82541
e1000_82542.c- driver for 82542
e1000_x.c- driver for x


Please ignore the statement I made above. Having read the code and chip 
docs again, I realize I jumped to the wrong conclusions when I saw 
separate source files for each of the 7 chip variants in the new driver.


Just wanted to nip it in the bud before it causes any wasted time 
discussing it. :)


--
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Andrew Grover

On 7/8/07, Jeff Garzik [EMAIL PROTECTED] wrote:

* e1000 gets feedback
* Intel disappears for months
* Intel reappears with e1000 rewrite


* you ask them for another complete (simpler) rewrite


* Intel fights tooth and nail when the driver is not accepted verboten


I don't think it must be as-is (i.e. replacing e1000 for all HW) but
don't throw away all the work they've done in architecting the driver
to cleanly handle multiple chip generations.

How about:

1) Considering e1000new's current design, but for ICH9 only
2) test test test
3) Sometime in the future, considering incrementally moving previous
PCIe generations' support from e1000 to e1000new (like I initially
wanted, since that at least means there would be some technical reason
for where the split occurs :-)

Regards -- Andy
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Splitting e1000 (Was: Re: e1000: backport ich9 support from 7.5.5 ?)

2007-07-08 Thread Arjan van de Ven

Jeff Garzik wrote:

Arjan van de Ven wrote:
I'd second this; also lets be honest and fair about things and use a 
similar standard for all drivers; are we going to ask all driver 
submitters to remove NAPI, TSO and other stuff? I would hope not. Are 


That was merely a suggestion.  My general meaning was small driver is 
easier to get into the kernel and into the field.


it didn't quite come across as a suggestion, but if it was only a 
suggestion then ok; it's not really a practical option.


all new drivers that have even a single bitfield going to be rejected? 
That's be a new rule but ok, if it applies to all drivers it's fair.


That's one of a gazillion checklist items that a new driver has to pass 
through.  You know how it works for new drivers.


it looks better than our last try and it looks better than that ugly 
driver over there does not grant you a free pass on review.


I know how it works for new drivers; they need to be good enough to 
maintain forward and form a burden on the stack/maintainers. Any 
remaining items need to be small enough to not be a 'risk' for users.



What is there now is a driver that works, is relatively clean (and yes 
if you look deep enough you can ALWAYS nitpick on any code, even 
Linus' or Jeff's best code) and provides good performance with all the 
features a modern ethernet driver is supposed to have.


Is this is the attitude, what's the point of even posting the driver for 
review?  Intel posted e1000new on June 29.  Feedback was then posted. 


and feedback is being incorperated.

Ten days later, without a single revision, Intel declares its own driver 
clean and working and good enough for merging as-is.


No. Arjan looked at driver and sees a rather clean driver compared to 
some of the other recently merged drivers, sees a list of relatively 
simple todos (some of which obviously aren't merge stoppers, others 
are) and hopes that this driver will be treated objectively without 
the appearance of having different levels of bars depending on what 
your email address ends with.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFR: New e1000 driver (e1000new), was: Re: e1000: backport ich9 support from 7.5.5 ?

2007-07-08 Thread Arjan van de Ven


I reject the notion that a flag day switchover for a huge mass of 
e1000 users is the correct path.  I do not think that best serves Linux 
users.


so the options we have is do it one pci id a time; and the suggestion 
by others like Christoph has been to make the split at the PCI Express 
switchover. Do you agree with that approach?




I appreciate the pain a temporary dual driver situation gives; it 
comes down to a few things that I can think of right now, if you see 
more please add to the list.


1) users who find a bug in the new one silently use the old one rather 
than reporting the bug; and only scream when the old one eventually 
goes away (see ALSA/OSS duplication)


2) users who enable both in KConfig may get a random one

3) distros really prefer only 1 driver per PCI ID for their 
infrastructure tools


4) there will be resistance against deleting the old one meaning it 
might not happen


You are missing the largest source of pain and headache:  Users will use 
the default driver, which means no field testing at all until flag day, 
with obvious results.


which is why the proposal that was below the text you quoted talks 
about working with the distro kernel maintainers and get the new 
driver default for their development releases etc etc that adds
a significant level of granularity. For example I'd not be surprised 
if Fedora and Ubuntu test releases have orders of magnitude more users 
than kernel.org self compiloe kernels.



Furthermore, Linux kernel history demonstrates that temporary dual 
driver situations are rarely temporary.  Thus, selling it as such in 
the face of all contrary experience is pure hyperbole.


history also demonstrates it can be done (just look at Adrian Bunk and 
the OSS drivers) as long as someone is determined to do it. It's not 
easy, especially cases where the two drivers have different 
maintainers who disagree, or represent different 
interfaces/functionality; but it's very not impossible either.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8724] New: Unaligned acess in udp_recvmsg() on EV56

2007-07-08 Thread Andrew Morton
On Sun,  8 Jul 2007 14:30:17 -0700 (PDT) [EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8724
 
Summary: Unaligned acess in udp_recvmsg() on EV56
Product: Platform Specific/Hardware
Version: 2.5
  KernelVersion: 2.6.22-rc7-git7
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: Alpha
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: Occurs in all 2.6.2[12] at
 least
 Distribution: Debian testing/lenny
 Hardware Environment: Digital PWS 433au (EV56)
 Software Environment:
 Linux utopia 2.6.22-rc7-git7 #1 Thu Jul 8 10:34:17 CDT 2027 alpha GNU/Linux
 
 Gnu C  4.2.1
 Gnu make   3.81
 binutils   (GNU Binutils for Debian) 2.17.50.20070426
 util-linux 2.12r
 mount  2.12r
 module-init-tools  3.3-pre11
 e2fsprogs  1.40-WIP
 xfsprogs   2.8.18
 Linux C Library libc.2.5
 Dynamic linker (ldd)   2.5
 Procps 3.2.7
 Net-tools  1.60
 Kbd85:
 Sh-utils   5.97
 udev   105
 Modules Loaded ipt_TOS xt_multiport xt_tcpudp xt_state ip6table_mangle
 ip6table_filter ip6_tables ipv6 iptable_nat nf_nat nf_conntrack_ipv4
 nf_conntrack iptable_mangle iptable_filter ip_tables x_tables dm_mod serio_raw
 mxser_new ide_generic via_rhine mii generic ide_core tulip bitrev crc32 sg
 sr_mod cdrom raid1 md_mod loop
 
 Problem Description:
 kernel unaligned acc: 2248 (pc=fc583de4,va=fc00071c382a)
 
 fc583bc0 T udp_recvmsg
 fc583ed0 T udp_destroy_sock
 
 kernel unaligned acc: 1231 (pc=fc585190,va=fc000795e02e)
 
 fc585120 T __udp4_lib_rcv
 fc585bf0 T udp_rcv
 
 This problem does NOT seem to affect my Tsunami/Shark (EV68AL) box running the
 same kernel version on the same LAN.  Not sure if it's CPU generation related
 (EV5 vs EV6), or if it's the NIC (via_rhine on the EV5 vs e100 on the EV6).
 
 Steps to reproduce:
 According to tshark, the only UDP packets are DHCP packets:
 
 1815064200.131003   10.5.128.1 - 255.255.255.255 DHCP DHCP Offer-
 Transaction ID 0x27c6
 
 

That output isn't terribly illuminating.  We'd need to work out
which code corresponds with pc=fc583de4 and
pc=fc585190.

I don't think there's necessarily a bug here: that's just the kernel
telling us that there are unaligned accesses which got successfully
fixed up, so we're being perhaps a bit inefficient.  Yes?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cbe-oss-dev] [PATCH] ps3: gigabit ethernet driver for PS3, take3

2007-07-08 Thread Akira Tsukamoto
Hi,

On Fri, 6 Jul 2007 13:02:41 -0500, [EMAIL PROTECTED] (Linas Vepstas) mentioned: 
 of the spidernet device driver.  Please note that the old
 spidernet had absolutely disasterous performance for transmit;
 it also had a variety of crazy hangs and lockups under 
 high-stress conditions; or NFS operation, or certain back-to-back
 tcp usage scenarios. A few dozen bugfixes went in since
 the time that the gelic snapshot was taken.

I think we know the problems of old spidernet issues, we uses QS20 also 
in our lab.
Current gelic has fixed them all separated from your work and it have no 
remaining issues or performance problem.
In our measurement, PS3 network performance is better than IBM QS20 right
now.
I totally understand difficultness of your effort that fixing spidernet 
without decent documentation which we have, but the changes we made was 
significantly large as you see if you diff gelic driver with current 
spidernet driver (totally different), so the conclusion of discussion at 
3C common-linux wg (Sony, IBM and Toshiba) was to include main line 
differently first then consider unifying it later. Otherwise we won't 
able to see PS3 network stack working for long time.

Akira
-- 
Akira Tsukamoto
Sony Computer Entertainment Inc. 
Computer Development Div. Distributed OS Development Dept. 
Japan

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c

2007-07-08 Thread Jeff Garzik

Micah Gruber wrote:
This patch fixes a potential null dereference bug where we dereference 
nic before a null check. This patch simply moves the dereferencing after 
the null check.


Signed-off-by: Micah Gruber  [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]


any chance you can resend in an email format other than format=flowed?

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [2.6.22] Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c

2007-07-08 Thread Micah Gruber

This patch fixes a potential null dereference bug where we dereference
nic before a null check. This patch simply moves the dereferencing
after the null check.

Signed-off-by: Micah Gruber  [EMAIL PROTECTED]

--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -789,12 +789,14 @@
   struct mac_info *mac_control;
   struct config_param *config;
   int lst_size, lst_per_page;
-   struct net_device *dev = nic-dev;
+   struct net_device *dev;
   int page_num = 0;

   if (!nic)
   return;

+   dev = nic-dev;
+
   mac_control = nic-mac_control;
   config = nic-config;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [2.6.22] Remove unneeded pointer idev from addrconf_cleanup() in net/ipv6/addrconf.c

2007-07-08 Thread Micah Gruber

This trivial patch removes the unneeded pointer idev returned from
__in6_dev_get(), which is never used. The check for NULL can be simply
done by if (__in6_dev_get(dev) == NULL).

Signed-off-by: Micah Gruber  [EMAIL PROTECTED]

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4240,7 +4240,6 @@
void __exit addrconf_cleanup(void)
{
   struct net_device *dev;
-   struct inet6_dev *idev;
   struct inet6_ifaddr *ifa;
   int i;

@@ -4258,7 +4257,7 @@
*/

   for_each_netdev(dev) {
-   if ((idev = __in6_dev_get(dev)) == NULL)
+   if (__in6_dev_get(dev) == NULL)
   continue;
   addrconf_ifdown(dev, 1);
   }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html