date:20070916

[PATCH] jumbo all-NICs ethtool count cleanup

2007-09-16 Thread Jeff Garzik

Just checked this in locally...

The hooks -self_test_count() and -get_stats_count() are now unused
in the main tree.

(based off of latest davem/net-2.6.24.git)

 drivers/net/3c59x.c |   11 +++-
 drivers/net/8139cp.c|   11 +++-
 drivers/net/8139too.c   |   11 +++-
 drivers/net/atl1/atl1_ethtool.c |   11 +++-
 drivers/net/b44.c   |   11 +++-
 drivers/net/bnx2.c  |   20 
 drivers/net/cassini.c   |   11 +++-
 drivers/net/chelsio/cxgb2.c |   11 +++-
 drivers/net/cxgb3/cxgb3_main.c  |   11 +++-
 drivers/net/e100.c  |   19 
 drivers/net/e1000/e1000_ethtool.c   |   22 -
 drivers/net/e1000e/ethtool.c|   21 -
 drivers/net/ehea/ehea_ethtool.c |   13 -
 drivers/net/forcedeth.c |   45 +---
 drivers/net/gianfar_ethtool.c   |   20 
 drivers/net/ibm_emac/ibm_emac_core.c|   12 +++--
 drivers/net/ibmveth.c   |   11 +++-
 drivers/net/ixgb/ixgb_ethtool.c |   11 +++-
 drivers/net/ixgbe/ixgbe_ethtool.c   |   11 +++-
 drivers/net/mv643xx_eth.c   |   12 +++--
 drivers/net/myri10ge/myri10ge.c |   11 +++-
 drivers/net/netxen/netxen_nic_ethtool.c |   21 -
 drivers/net/pcnet32.c   |   11 +++-
 drivers/net/qla3xxx.c   |2 
 drivers/net/r8169.c |   11 +++-
 drivers/net/s2io.c  |   47 
 drivers/net/sc92031.c   |   11 +++-
 drivers/net/skge.c  |   11 +++-
 drivers/net/sky2.c  |   11 +++-
 drivers/net/spider_net_ethtool.c|   11 +++-
 drivers/net/tc35815.c   |   12 -
 drivers/net/tg3.c   |   19 
 drivers/net/ucc_geth_ethtool.c  |   26 ++-
 drivers/net/veth.c  |   11 +++-
 drivers/net/wireless/libertas/ethtool.c |   72 ++--
 35 files changed, 346 insertions(+), 246 deletions(-)

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index ad0f6a7..6295e94 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -2834,9 +2834,14 @@ static void vortex_set_msglevel(struct net_device *dev, 
u32 dbg)
vortex_debug = dbg;
 }
 
-static int vortex_get_stats_count(struct net_device *dev)
+static int vortex_get_sset_count(struct net_device *dev, int sset)
 {
-   return VORTEX_NUM_STATS;
+   switch (sset) {
+   case ETH_SS_STATS:
+   return VORTEX_NUM_STATS;
+   default:
+   return -EOPNOTSUPP;
+   }
 }
 
 static void vortex_get_ethtool_stats(struct net_device *dev,
@@ -2893,7 +2898,7 @@ static const struct ethtool_ops vortex_ethtool_ops = {
.get_msglevel   = vortex_get_msglevel,
.set_msglevel   = vortex_set_msglevel,
.get_ethtool_stats  = vortex_get_ethtool_stats,
-   .get_stats_count= vortex_get_stats_count,
+   .get_sset_count = vortex_get_sset_count,
.get_settings   = vortex_get_settings,
.set_settings   = vortex_set_settings,
.get_link   = ethtool_op_get_link,
diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c
index 58fad1b..eccaa16 100644
--- a/drivers/net/8139cp.c
+++ b/drivers/net/8139cp.c
@@ -1383,9 +1383,14 @@ static int cp_get_regs_len(struct net_device *dev)
return CP_REGS_SIZE;
 }
 
-static int cp_get_stats_count (struct net_device *dev)
+static int cp_get_sset_count (struct net_device *dev, int sset)
 {
-   return CP_NUM_STATS;
+   switch (sset) {
+   case ETH_SS_STATS:
+   return CP_NUM_STATS;
+   default:
+   return -EOPNOTSUPP;
+   }
 }
 
 static int cp_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
@@ -1563,7 +1568,7 @@ static void cp_get_ethtool_stats (struct net_device *dev,
 static const struct ethtool_ops cp_ethtool_ops = {
.get_drvinfo= cp_get_drvinfo,
.get_regs_len   = cp_get_regs_len,
-   .get_stats_count= cp_get_stats_count,
+   .get_sset_count = cp_get_sset_count,
.get_settings   = cp_get_settings,
.set_settings   = cp_set_settings,
.nway_reset = cp_nway_reset,
diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index 16b9196..565fbdb 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -2400,9 +2400,14 @@ static void rtl8139_get_regs(struct net_device *dev, 
struct ethtool_regs *regs,
 }
 #endif /* CONFIG_8139TOO_MMIO */
 
-static int rtl8139_get_stats_count(struct net_device *dev)
+static int rtl8139_get_sset_count(struct net_device *dev, int sset)
 {
-   return RTL_NUM_STATS;
+   switch (sset) {
+   case ETH_SS_STATS:
+   return

Re: [PATCH] jumbo all-NICs ethtool count cleanup

2007-09-16 Thread Sam Ravnborg

Hi Jeff.

You wrote:
 The hooks -self_test_count() and -get_stats_count() are now unused
 in the main tree.

So I'm suprised to see more lines added than deleted:
  35 files changed, 346 insertions(+), 246 deletions(-)

Puzzled - may need a bit more coffee (morning here)..

Sam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-16 Thread Kyle Moffett


On Sep 15, 2007, at 13:24:46, Andreas Dilger wrote:

On Sep 15, 2007  16:29 +0400, Evgeniy Polyakov wrote:
Yes, block device itself is not able to scale well, but it is the  
place for redundancy, since filesystem will just fail if  
underlying device does not work correctly and FS actually does not  
know about where it should place redundancy bits - it might happen  
to be the same broken disk, so I created a low-level device which  
distribute requests itself.


I actually think there is a place for this - and improvements are  
definitely welcome.  Even Lustre needs block-device level  
redundancy currently, though we will be working to make Lustre- 
level redundancy available in the future (the problem is WAY harder  
than it seems at first glance, if you allow writeback caches at the  
clients and servers).


I really think that to get proper non-block-device-level filesystem  
redundancy you need to base it on something similar to the GIT  
model.  Data replication is done in specific-sized chunks indexed by  
SHA-1 sum and you actually have a sort of merge algorithm for when  
local and remote changes differ.  The OS would only implement a very  
limited list of merge algorithms, IE one of:


(A)  Don't merge, each client gets its own branch and merges are manual
(B)  Most recent changed version is made the master every X-seconds/ 
open/close/write/other-event.
(C)  The tree at X (usually a particular client/server) is always  
used as the master when there are conflicts.


This lets you implement whatever replication policy you want:  You  
can require that some files are replicated (cached) on *EVERY*  
system, you can require that other files are cached on at least X  
systems.  You can say this needs to be replicated on at least X% of  
the online systems, or at most Y.  Moreover, the replication could  
be done pretty easily from userspace via a couple syscalls.  You also  
automatically keep track of history with some default purge policy.


The main point is that for efficiency and speed things are *not*  
always replicated; this also allows for offline operation.  You would  
of course have userspace merge drivers which notice that the tree  
on your laptop is not a subset/superset of the tree on your desktop  
and do various merges based on per-file metadata.  My address-book,  
for example, would have a custom little merge program which knows  
about how to merge changes between two address book files, asking me  
useful questions along the way.  Since a lot of this merging is  
mechanical, some of the code from GIT could easily be made into a  
merge library which knows how to do such things.


Moreover, this would allow me to have a shared root filesystem on  
my laptop and desktop.  It would have 'sub-project'-type trees, so  
that / would be an independent branch on each system. /etc would  
be separate branches but manually merged git-style as I make  
changes.  /home/* folders would be auto-created as separate  
subtrees so each user can version their own individually.  Specific  
subfolders (like address-book, email, etc) would be adjusted by the  
GUI programs that manage them to be separate subtrees with manual- 
merging controlled by that GUI program.


Backups/dumps/archival of such a system would be easy.  You would  
just need to clone the significant commits/trees/etc to a DVD and  
replace the old SHA-1-indexed objects to tiny object-deleted stubs;  
to rollback to an archived version you insert the DVD, mount it  
into the existing kernel SHA-1 index, and then mount the appropriate  
commit as a read-only volume somewhere to access.  The same procedure  
would also work for wide-area-network backups and such.


The effective result would be the ability to do things like the  
following:
  (A)  Have my homedir synced between both systems mostly- 
automatically as I make changes to different files on both systems
  (B)  Easily have 2 copies of all my files, so if one system's disk  
goes kaput I can just re-clone from the other.
  (C)  Keep archived copies of the last 5 years worth of work,  
including change history, on a stack of DVDs.
  (D)  Synchronize work between locations over a relatively slow  
link without much work.


As long as files were indirectly indexed by sub-block SHA1 (with the  
index depth based on the size of the file), and each individually- 
SHA1-ed object could have references, you could trivially have a 4TB- 
sized file where you modify 4 bytes at a thousand random locations  
throughout the file and only have to update about 5MB worth of on- 
disk data.  The actual overhead for that kind of operation under any  
existing filesystem would be 100% seek-dominated regardless whereas  
with this mechanism you would not directly be overwriting data and so  
you could append all the updates as a single 5MB chunk.  Data reads  
would be much more seek-y, but you could trivially have an on-line  
defragmenter tool which notices fragmented

Re: [PATCH] jumbo all-NICs ethtool count cleanup

2007-09-16 Thread Jeff Garzik


Sam Ravnborg wrote:

Hi Jeff.

You wrote:

The hooks -self_test_count() and -get_stats_count() are now unused
in the main tree.


So I'm suprised to see more lines added than deleted:

 35 files changed, 346 insertions(+), 246 deletions(-)


Puzzled - may need a bit more coffee (morning here)..


The new interface that supercedes these is -get_sset_count(), which was 
added to provide additional functionality without having to add a new 
hook each time we want to return a new integer value.  This new 
interface also (intentionally) aligns with the existing -get_strings() 
interface.  (sset in get_sset_count stands for string set)


Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Please pull 'adm8211' branch of wireless-2.6

2007-09-16 Thread Jeff Garzik


Michael Wu wrote:

On Saturday 15 September 2007 20:56, Jeff Garzik wrote:

+   if (flags  IFF_PROMISC)
+   dev-flags |= IEEE80211_HW_RX_INCLUDES_FCS;
+   else
+   dev-flags = ~IEEE80211_HW_RX_INCLUDES_FCS;

why does promisc dictate inclusion of FCS?

Because that's the way the hardware works.

Why not always include it, regardless of promisc?

I really do mean that's how the hardware works. If you turn on the promisc bit 
in the hardware (which IFF_PROMISC causes), it starts including the FCS, but 
if the bit is not set, the FCS is not included in frames.


OK, I was confused by the name.  Based on the constant's name, I was 
assuming that you could unconditionally enable it, promisc or not. 
Nevermind.  I thought that was a hardware rather than software bit.



What form of debugging are you talking about? I don't see how it makes a 
difference for debugging. The type checking provided by enums won't make a 


When you are tracing through with kgdb, the code is actually readable. 
You see


dev-flags |= IEEE80211_HW_RX_INCLUDES_FCS;

rather than the far more obtuse

dev-flags |= 8;

Ditto for any time you have to read pre-processed source code.  I do so 
at least once a month, since post-cpp code shows you precisely what the 
compiler is munching, after all the macro magic goes away.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] [PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

2007-09-16 Thread Or Gerlitz


Steve Wise wrote:

RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all this.
Without doing full ND, rdma address resolution fails in the presence of
dropped arp bcast packets.


Jay,

Is there a way to deploy something similar for the gratuitous arp being 
sent by the bonding driver at bond_arp_send()?


We have seen rare situations where the skb was dropped by the stack and 
hence bonding fail-over was detected by the remote peer only when its 
neighboring subsystem probe failures dictated that a new arp must be issued.


Or.


Signed-off-by: Steve Wise [EMAIL PROTECTED]
---

 drivers/infiniband/core/addr.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index c5c33d3..5381c80 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -161,8 +161,7 @@ static void addr_send_arp(struct sockadd
if (ip_route_output_key(rt, fl))
return;
 
-	arp_send(ARPOP_REQUEST, ETH_P_ARP, rt-rt_gateway, rt-idev-dev,

-rt-rt_src, NULL, rt-idev-dev-dev_addr, NULL);
+   neigh_event_send(rt-u.dst.neighbour, NULL);
ip_rt_put(rt);
 }


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: - revert-8139too-clean-up-i-o-remapping.patch removed from -mm tree

2007-09-16 Thread Jeff Garzik


[EMAIL PROTECTED] wrote:

The patch titled
 revert 8139too: clean up I/O remapping
has been removed from the -mm tree.  Its filename was
 revert-8139too-clean-up-i-o-remapping.patch

This patch was dropped because it was merged into mainline or a subsystem tree

--
Subject: revert 8139too: clean up I/O remapping
From: Andrew Morton [EMAIL PROTECTED]

Revert git-netdev-all's 9ee6b32a47b9abc565466a9c3b127a5246b452e5.  Michal was
getting oopses.

Cc: Michal Piotrowski [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]


Shit!

Thanks for reminding me that I need to fix that up before it goes 
upstream with the rest of net-2.6.24.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24

2007-09-16 Thread Or Gerlitz


Roland Dreier wrote:

With 2.6.24 probably opening in the not-too-distant future, it's
probably a good time to review what my plans are for when the merge
window opens.



Core:
 - Sean's QoS changes.  These look fine at first glance, and I just
   plan to understand the backwards compatibility story (ie how this
   works with an old SM) and merge.  Anyone who objects let me know.


Hi Roland,

I have reviewed the qos patches and provided comments which were 
deployed in v2 of the series. I also tested it (ipoib and iser which is 
rdma-cm based) against the Voltaire SM/SA to see that nothing was 
broken. I will send you a reviewed by: signature.



ULPs:



[ofa-general] [PATCH RFC] IB/ipoib: enable IGMP for userpsace multicast IB apps

The IGMP enabling patch posted by me on September 2nd isn't on your list
http://lists.openfabrics.org/pipermail/general/2007-September/040250.html
can you add it?



 - Moni's IPoIB bonding support.  This seems mostly an issue of
   getting the core bonding maintainer's attention.  However getting a
   Reviewed-by: for the IPoIB changes wouldn't hurt too.


Jay Vosburgh, the bonding driver maintainer just sent an ack on all 
patch series. As for the IPoIB changes, there are three patches, where 
two of them, namely

[PATCH 02/11] IB/ipoib: Notify the world before doing unregister
[PATCH 04/11] IB/ipoib: Verify address handle validity on send

are handling a corner-case problems pointed by Michael Tsirkin.
Michael, will you be able to look on it and provide a reviewed-by 
signature? the third patch

[PATCH 03/11] IB/ipoib: Bound the net device to the ipoib_neigh structue

is somehow much more simple, I don't think more review is needed for it.


 - Eli and Michael's IPoIB stateless offload (checksum offload, LSO,
   LRO, etc).  It's a big series that makes quite a few core changes.
   I think it needs some careful review and is probably at risk of
   missing this merge window.  Sorting in order of invasiveness so we
   can merge at least some of it (if splitting it makes sense) might
   be a good idea.


Just for the record, the 'etc' above relates to the interrupt moderation 
support (mlx4, core, ipoib {config through ethertool, usage). Among 
other things, what is not clear to me here is if/how this goes 
hand-in-hand with NAPI.


As you saw the patch adding checksum offload support had a long thread, 
and I think the discussion has reached the point where Michael is 
waiting for your take on it.


As for the LSO, LRO patches, I did not see any review comment.

I will see that I can review from the series, to begin with, will send 
Eli some comments and questions.



HW specific:
 - Jack and Michael's mlx4 FMR support.  Will merge I guess, although
   I do hope to have time to address the DMA API abuse that is being
   copied from mthca, so that mlx4 and mthca work in Xen domU.


This patch series is somehow important as without them iser is useless 
over connectx. Can be nice if you merge this and at max fix the abuse later.


Or.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-16 Thread Evgeniy Polyakov

On Sat, Sep 15, 2007 at 11:24:46AM -0600, Andreas Dilger ([EMAIL PROTECTED]) 
wrote:
  When Chris Mason announced btrfs, I found that quite a few new ideas 
  are already implemented there, so I postponed project (although
  direction of the developement of the btrfs seems to move to the zfs side
  with some questionable imho points, so I think I can jump to the wagon
  of new filesystems right now). 
 
 This is an area I'm always a bit sad about in OSS development - the need
 everyone has to make a new {fs, editor, gui, etc} themselves instead of
 spending more time improving the work we already have.  Imagine where the

If that would be true, we would be still in the stone age. 
Or not, actually I think the first cell in the universe would not bother 
itself dividing into the two just because it could spent infinite time 
trying to make itself better.

 internet would be (or not) if there were 50 different network protocols
 instead of TCP/IP?  If you don't like some things about btrfs, maybe you
 can fix them?

When some idea is implemented it is virtually impossible to change it,
only recreate new one with fixed issues. So, we have multiple ext,
reiser and many others. I do not say btrfs is broken or has design
problems, it is really interesting filesystem, but all we have our own 
opinions about how things should be done, that's it.

Btw, we do have so many network protocols for different purposes, that
number of (storage) filesystems is negligebly small compared to it. 
Internet as is popular today is just a subset of where network is used.

And we do invent new protocols each time we need something new, which
does not fit into existing models (for example TCP by design can not
work with very long-distance links with tooo long RTT). We have sctp to
fix some tcp issues. Number of IP layer 'neighbours' is even more.
Physical media layer has many different protocols too.
And that is just what exists in the linux tree...

 To be honest, developing a new filesystem that is actually widely useful
 and used is a very time consuming task (see Reiserfs and Reiser4).  It
 takes many years before the code is reliable enough for people to trust it,
 so most likely any effort you put into this would be wasted unless you can
 come up with something that is dramatically better than something existing.

Yep, I know. 
Wasting my time is one of the most pleasant things I ever tried in my life.

 The part that bothers me is that this same effort could have been used to
 improve something that more people would use (btrfs in this case).  Of
 course, sometimes the new code is substantially better than what currently
 exists, and I think btrfs may have laid claim to the current generation of
 filesystems.

Call me greedy bastard, but I do not care about world happiness, it is
just impossible to achieve. So I like what I do right now.
If it will be rest under the layer of dust I do not care, I like the
process of creating, so if it will fail, I just will get new knowledge.

:)

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Software Engineer
 Cluster File Systems, Inc.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH|[NET]: migrate HARD_TX_LOCK to header file

2007-09-16 Thread jamal

I wanted to get rid of the extrenous cpu arguement and ended moving this
to the header files since it looks common enough an operation that could
be used elsewhere.
It is a trivial change - i could resend with leaving it in dev.c and
just getting rid of the cpu arguement.


cheers,
jamal

[NET]: migrate HARD_TX_LOCK to header file

HARD_TX_LOCK micro is a nice aggregation that could be used
in other spots. move it to netdevice.h
Also get rid of superflous cpu arguement while doing this ..

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

---
commit 1bc3a7393737ab1f5239bd8dc2f2953dcee5391e
tree 83a7f39b61fe45282eee825286996ba4bf72c0f6
parent 1f08657fc9b0b56039a9378ca030c2c8ed7bd8ac
author Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 11:29:48 -0400
committer Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 11:29:48 -0400

 include/linux/netdevice.h |   12 
 net/core/dev.c|   14 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc5e35f..c83e667 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1297,6 +1297,18 @@ static inline void netif_tx_unlock_bh(struct net_device 
*dev)
spin_unlock_bh(dev-_xmit_lock);
 }
 
+#define HARD_TX_LOCK(dev) {\
+   if ((dev-features  NETIF_F_LLTX) == 0) {  \
+   netif_tx_lock(dev); \
+   }   \
+}
+
+#define HARD_TX_UNLOCK(dev) {  \
+   if ((dev-features  NETIF_F_LLTX) == 0) {  \
+   netif_tx_unlock(dev);   \
+   }   \
+}
+
 static inline void netif_tx_disable(struct net_device *dev)
 {
netif_tx_lock_bh(dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 2897352..7934d28 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1574,18 +1574,6 @@ out_kfree_skb:
return 0;
 }
 
-#define HARD_TX_LOCK(dev, cpu) {   \
-   if ((dev-features  NETIF_F_LLTX) == 0) {  \
-   netif_tx_lock(dev); \
-   }   \
-}
-
-#define HARD_TX_UNLOCK(dev) {  \
-   if ((dev-features  NETIF_F_LLTX) == 0) {  \
-   netif_tx_unlock(dev);   \
-   }   \
-}
-
 /**
  * dev_queue_xmit - transmit a buffer
  * @skb: buffer to transmit
@@ -1710,7 +1698,7 @@ gso:
 
if (dev-xmit_lock_owner != cpu) {
 
-   HARD_TX_LOCK(dev, cpu);
+   HARD_TX_LOCK(dev);
 
if (!netif_queue_stopped(dev) 
!netif_subqueue_stopped(dev, skb-queue_mapping)) {

[RFC][NET_SCHED] explict hold dev tx lock

2007-09-16 Thread jamal


While trying to port my batching changes to net-2.6.24 from this morning
i realized this is something i had wanted to probe people on

Challenge:
For N Cpus, with full throttle traffic on all N CPUs, funneling traffic
to the same ethernet device, the devices queue lock is contended by all
N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. 
In the current mode of operation, after all the work of entering the
dequeue region, we may endup aborting the path if we are unable to get
the tx lock and go back to contend for the queue lock. As N goes up,
this gets worse.

Testing:
I did some testing with a 4 cpu (2xdual core) with no irq binding. I run
about 10 runs of 30M packets each from the stack with a udp app i wrote
which is intended to run keep all 4 cpus busy -  and to my suprise i
found that we only bail out less than 0.1%. I may need a better test
case.

Changes:
I made changes to the code path as defined in the patch included to 
and noticed a slight increase (2-3%) in performance with both e1000 and
tg3; which was a relief because i thought the spinlock_irq (which is
needed because some drivers grab tx lock in interupts) may have negative
effects. The fact it didnt reduce performance was a good thing.
Note: This is the highest end machine ive ever laid hands on, so this
may be misleading.
 
So - what side effects do people see in doing this? If none, i will
clean it up and submit.

cheers,
jamal
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc5e35f..ab9966f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1271,6 +1271,12 @@ static inline void netif_tx_lock(struct net_device *dev)
 	dev-xmit_lock_owner = smp_processor_id();
 }
 
+static inline void netif_tx_lock_irq(struct net_device *dev)
+{
+	spin_lock_irq(dev-_xmit_lock);
+	dev-xmit_lock_owner = smp_processor_id();
+}
+
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
 	spin_lock_bh(dev-_xmit_lock);
@@ -1291,6 +1297,12 @@ static inline void netif_tx_unlock(struct net_device *dev)
 	spin_unlock(dev-_xmit_lock);
 }
 
+static inline void netif_tx_unlock_irq(struct net_device *dev)
+{
+	dev-xmit_lock_owner = -1;
+	spin_unlock_irq(dev-_xmit_lock);
+}
+
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
 	dev-xmit_lock_owner = -1;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e970e8e..f75a924 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -134,34 +134,23 @@ static inline int qdisc_restart(struct net_device *dev)
 {
 	struct Qdisc *q = dev-qdisc;
 	struct sk_buff *skb;
-	unsigned lockless;
+	unsigned lockless = (dev-features  NETIF_F_LLTX);
 	int ret;
 
 	/* Dequeue packet */
 	if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
 		return 0;
 
-	/*
-	 * When the driver has LLTX set, it does its own locking in
-	 * start_xmit. These checks are worth it because even uncongested
-	 * locks can be quite expensive. The driver can do a trylock, as
-	 * is being done here; in case of lock contention it should return
-	 * NETDEV_TX_LOCKED and the packet will be requeued.
-	 */
-	lockless = (dev-features  NETIF_F_LLTX);
-
-	if (!lockless  !netif_tx_trylock(dev)) {
-		/* Another CPU grabbed the driver tx lock */
-		return handle_dev_cpu_collision(skb, dev, q);
-	}
-
 	/* And release queue */
 	spin_unlock(dev-queue_lock);
 
+	if (!lockless)
+		netif_tx_lock_irq(dev);
+
 	ret = dev_hard_start_xmit(skb, dev);
 
 	if (!lockless)
-		netif_tx_unlock(dev);
+		netif_tx_unlock_irq(dev);
 
 	spin_lock(dev-queue_lock);
 	q = dev-qdisc;

Re: [PATCH] Configurable tap interface MTU

2007-09-16 Thread David Miller

From: Ed Swierk [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 09:54:35 -0700

 On 9/11/07, Herbert Xu [EMAIL PROTECTED] wrote:
  Please make it 65535 without an Ethernet header and 65521
  with an Ethernet header.

 Here is a revised patch that allows MTUs up to 65535 for tap
 interfaces and up to 65521 for tun interfaces.

 (If I set the MTU to 65521 on a tun interface, ping complains message
 too long when I send a 65521-byte packet; 65520 works okay, though.)

Applied to net-2.6.24

Please provide a proper Signed-off-by: line and a full
changelog with every patch submission and revision in
the future.

Thanks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

2.6.23-rc regression: bcm43xx does not work after commit 4cf92a3c

2007-09-16 Thread YOSHIFUJI Hideaki / 吉藤英明

Hello.

With latest git tree, bcm43xx driver does not work.
By bisect, I've found the commit 4cf92a3c is the first bad commit.

[PATCH] softmac: Fix ESSID problem

Victor Porton reported that the SoftMAC layer had random problem when 
setting the ESSID :
http://bugzilla.kernel.org/show_bug.cgi?id=8686 After investigation, it 
turned out to be
worse, the SoftMAC layer is left in an inconsistent state. The fix is 
pretty trivial.

Signed-off-by: Jean Tourrilhes [EMAIL PROTECTED]
Acked-by: Michael Buesch [EMAIL PROTECTED]
Acked-by: Larry Finger [EMAIL PROTECTED]
Signed-off-by: John W. Linville [EMAIL PROTECTED]

After reverting this commit, the driver starts working again.

Regards,

-- 
YOSHIFUJI Hideaki @ USAGI Project  [EMAIL PROTECTED]
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 11:48:45 -0400

 I wanted to get rid of the extrenous cpu arguement and ended moving this
 to the header files since it looks common enough an operation that could
 be used elsewhere.
 It is a trivial change - i could resend with leaving it in dev.c and
 just getting rid of the cpu arguement.

The only reason the cpu argument is superfluous is because
we don't provide a way to pass it on down to netif_tx_lock().

So instead netif_tx_lock() recomputes that value in this case which is
extra unnecessary work.

I would instead suggest, in netdevice.h:

static inline void __netif_tx_lock(struct net_device *dev, int cpu)
{
spin_lock(dev-_xmit_lock);
dev-xmit_lock_owner = cpu;
}

static inline void netif_tx_lock(struct net_device *dev)
{
__netif_tx_lock(dev, smp_processor_id());
}

And make the HARD_TX_LOCK() call __netif_tx_lock() and pass in
the already computed 'cpu' parameter.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][NET_SCHED] explict hold dev tx lock

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 12:14:34 -0400

 Changes:
 I made changes to the code path as defined in the patch included to 
 and noticed a slight increase (2-3%) in performance with both e1000 and
 tg3; which was a relief because i thought the spinlock_irq (which is
 needed because some drivers grab tx lock in interupts) may have negative
 effects. The fact it didnt reduce performance was a good thing.
 Note: This is the highest end machine ive ever laid hands on, so this
 may be misleading.

 So - what side effects do people see in doing this? If none, i will
 clean it up and submit.

I tried this 4 years ago, it doesn't work.  :-)

Many drivers, particularly very old ones that PIO packets into
a device which can take a long time, absolutely depend upon
interrupts being enabled fully during -hard_start_xmit()
so that other high periority devices (such as simpler serial
controllers) can have their interrupts serviced during this
slow operation.

I don't think we want to do it anyways, whatever performance
we gain from it is offset by the badness of disabling interrupts
during this reasonably length stretch of code.

The -rt folks as a result would notice this too and spank us :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: e1000 driver and samba

2007-09-16 Thread Kok, Auke

James Chapman wrote:
 Kok, Auke wrote:
 James Chapman wrote:
 Kok, Auke wrote:
 
  rx_long_byte_count: 34124849453

 Are these long frames expected in your network? What is the MTU of
 the transmitting clients? Perhaps this might explain why reads work
 (because data is coming from the Linux box so the packets have
 smaller MTU) while writes cause delays or packet loss because the
 clients are sending long frames which are getting fragmented?

 those are not long frames but the number of bytes the hardware
 counted in its long data type based byte counter.
 
 Thanks for correcting me, Auke.
 
 Should this counter be renamed to avoid someone else making this mistake
 in the future? Just a thought.

well, that would break tools that read this value. And for all of these stats
we can say that you should read our SDM's to figure out what they really
mean anyway, hence my caution to interpret the other value at first.

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 12:28 -0700, David Miller wrote:

 The only reason the cpu argument is superfluous is because
 we don't provide a way to pass it on down to netif_tx_lock().
 
 So instead netif_tx_lock() recomputes that value in this case which is
 extra unnecessary work.
 
 I would instead suggest ..

sounds much better - will resend after a simple test.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][NET_SCHED] explict hold dev tx lock

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 12:31 -0700, David Miller wrote:
 From: jamal [EMAIL PROTECTED]
 Date: Sun, 16 Sep 2007 12:14:34 -0400

  So - what side effects do people see in doing this? If none, i will
  clean it up and submit.

 I tried this 4 years ago, it doesn't work.  :-)

;-

[good reasons removed here]

 I don't think we want to do it anyways, whatever performance
 we gain from it is offset by the badness of disabling interrupts
 during this reasonably length stretch of code.

 The -rt folks as a result would notice this too and spank us :-)

indeed. 
Ok, maybe i am thinking too hard with that patch, so help me out:-
When i looked at that code path as it is today: i felt the softirq could
be interupted on the same CPU it is running on while it already grabbed
that tx lock (if the trylock succeeds) and that the hardirq code when
attempting to grab the lock would result in a deadlock.
Did i misread that? 
When i experimented with tg3 and e1000 i did not see any such problems
with the non irq version of the lock.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][NET_SCHED] explict hold dev tx lock

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 16:41 -0400, jamal wrote:

 indeed. 
 Ok, maybe i am thinking too hard with that patch, so help me out:-

Ok, that was probably too much of an explanation. What i should say is
if i grabbed the lock explicitly without disabling irqs it wont be much
different than what is done today and should always work.
No?

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 16:28 -0400, jamal wrote:

 
 sounds much better - will resend after a simple test.

Ok, heres the revised version

cheers,
jamal

[NET]: migrate HARD_TX_LOCK to header file
HARD_TX_LOCK micro is a nice aggregation that could be used
in other spots. move it to netdevice.h
Also makes sure the previously superflous cpu arguement is used.
Thanks to DaveM for the suggestions.

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

---
commit e467e3cb7fca9b533543aa749395547b7ade4980
tree 5e03a405e32968cc8e9e875ecdaeec4e798b6809
parent f55ad5bb4809bdd07720387c62788fad5359d41c
author Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 16:54:44 -0400
committer Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 16:54:44 -0400

 include/linux/netdevice.h |   21 +++--
 net/core/dev.c|   12 
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc5e35f..d529a0c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1265,10 +1265,15 @@ static inline void netif_rx_complete(struct net_device 
*dev,
  *
  * Get network device transmit lock
  */
-static inline void netif_tx_lock(struct net_device *dev)
+static inline void __netif_tx_lock(struct net_device *dev, int cpu)
 {
spin_lock(dev-_xmit_lock);
-   dev-xmit_lock_owner = smp_processor_id();
+   dev-xmit_lock_owner = cpu;
+}
+
+static inline void netif_tx_lock(struct net_device *dev)
+{
+   __netif_tx_lock(dev, smp_processor_id());
 }
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
@@ -1297,6 +1302,18 @@ static inline void netif_tx_unlock_bh(struct net_device 
*dev)
spin_unlock_bh(dev-_xmit_lock);
 }
 
+#define HARD_TX_LOCK(dev, cpu) {   \
+   if ((dev-features  NETIF_F_LLTX) == 0) {  \
+   __netif_tx_lock(dev, cpu);  \
+   }   \
+}
+
+#define HARD_TX_UNLOCK(dev) {  \
+   if ((dev-features  NETIF_F_LLTX) == 0) {  \
+   netif_tx_unlock(dev);   \
+   }   \
+}
+
 static inline void netif_tx_disable(struct net_device *dev)
 {
netif_tx_lock_bh(dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index 2897352..a1f6ca6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1574,18 +1574,6 @@ out_kfree_skb:
return 0;
 }
 
-#define HARD_TX_LOCK(dev, cpu) {   \
-   if ((dev-features  NETIF_F_LLTX) == 0) {  \
-   netif_tx_lock(dev); \
-   }   \
-}
-
-#define HARD_TX_UNLOCK(dev) {  \
-   if ((dev-features  NETIF_F_LLTX) == 0) {  \
-   netif_tx_unlock(dev);   \
-   }   \
-}
-
 /**
  * dev_queue_xmit - transmit a buffer
  * @skb: buffer to transmit

Re: [RFC][NET_SCHED] explict hold dev tx lock

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 16:52 -0400, jamal wrote:

 What i should say is
 if i grabbed the lock explicitly without disabling irqs it wont be much
 different than what is done today and should always work.
 No?

And to be more explicit, heres a patch using the macros from previous
patch. So far tested on 3 NICs.

cheers,
jamal

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e970e8e..1ae905e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -134,34 +134,18 @@ static inline int qdisc_restart(struct net_device *dev)
 {
 	struct Qdisc *q = dev-qdisc;
 	struct sk_buff *skb;
-	unsigned lockless;
 	int ret;
 
 	/* Dequeue packet */
 	if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL))
 		return 0;
 
-	/*
-	 * When the driver has LLTX set, it does its own locking in
-	 * start_xmit. These checks are worth it because even uncongested
-	 * locks can be quite expensive. The driver can do a trylock, as
-	 * is being done here; in case of lock contention it should return
-	 * NETDEV_TX_LOCKED and the packet will be requeued.
-	 */
-	lockless = (dev-features  NETIF_F_LLTX);
-
-	if (!lockless  !netif_tx_trylock(dev)) {
-		/* Another CPU grabbed the driver tx lock */
-		return handle_dev_cpu_collision(skb, dev, q);
-	}
-
 	/* And release queue */
 	spin_unlock(dev-queue_lock);
 
+	HARD_TX_LOCK(dev, smp_processor_id());
 	ret = dev_hard_start_xmit(skb, dev);
-
-	if (!lockless)
-		netif_tx_unlock(dev);
+	HARD_TX_UNLOCK(dev);
 
 	spin_lock(dev-queue_lock);
 	q = dev-qdisc;

Re: [PATCH] tehuti: driver for Tehuti 10GbE network adapters

2007-09-16 Thread Andrew Morton

erp, changes in the net-2.6.24 tree breaks this.

drivers/net/tehuti.c: In function 'bdx_isr_napi':
drivers/net/tehuti.c:268: error: too few arguments to function 
'netif_rx_schedule_prep'
drivers/net/tehuti.c:269: error: too few arguments to function 
'__netif_rx_schedule'
drivers/net/tehuti.c: In function 'bdx_poll':
drivers/net/tehuti.c:302: error: too few arguments to function 
'netif_rx_complete'
drivers/net/tehuti.c: In function 'bdx_hw_start':
drivers/net/tehuti.c:414: error: implicit declaration of function 
'netif_poll_enable'
drivers/net/tehuti.c: In function 'bdx_hw_stop':
drivers/net/tehuti.c:428: error: implicit declaration of function 
'netif_poll_disable'
drivers/net/tehuti.c: In function 'bdx_rx_receive':
drivers/net/tehuti.c:1219: error: 'struct net_device' has no member named 
'quota'
drivers/net/tehuti.c:1219: warning: type defaults to 'int' in declaration of 
'_y'
drivers/net/tehuti.c:1219: error: 'struct net_device' has no member named 
'quota'
drivers/net/tehuti.c:1311: error: 'struct net_device' has no member named 
'quota'
drivers/net/tehuti.c: In function 'bdx_probe':
drivers/net/tehuti.c:1994: error: 'struct net_device' has no member named 'poll'
drivers/net/tehuti.c:1995: error: 'struct net_device' has no member named 
'weight'
drivers/net/tehuti.c:2058: error: implicit declaration of function 
'SET_MODULE_OWNER'

There's a lot of churn in networking at present and I don't have 
time/inclination
to fix this one up, sorry.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ne driver crashes when unloaded in 2.6.22.6

2007-09-16 Thread Dan Williams

On Sat, 2007-09-15 at 21:27 +0100, Chris Rankin wrote:
 --- Dan Williams [EMAIL PROTECTED] wrote:
  On Wed, 2007-09-12 at 19:23 +0100, Chris Rankin wrote:
   Hmm, apparently not. The light on the card goes out though, so could this 
   just be a lack of
  driver
   support?
  
  Likely, yes.
 
 I've been trawling the Internet for 8390 specifications and have discovered 
 that there is a
 Carrier Sense Loss flag on the Transmit Status Register. However, there 
 doesn't seem to be an
 explicit media status test. Would this more likely be part of the NE2000's 
 functionality? I
 can't find any signs of MII support, but then the NE2000 is so heavily cloned 
 that
 NE2000-compatible seems to have become more of a generic description these 
 days.
 
 Does anyone have any ideas, please? Does NetworkManager even need full 
 carrier-detection support?

NM needs it if you want the interface to be automatically handled in
0.6.x and earlier.  In 0.7.x and later you'll be able to have NM just
set up the interface even if it doesn't have a link (if you set it to
autoconnect), but of course that means that whenever you start NM the
interface will be up with the settings you specify because, of course,
NM can't automatically figure out when the card is up or down and do
something intelligent.

dan


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 16:57:14 -0400

 On Sun, 2007-16-09 at 16:28 -0400, jamal wrote:

  sounds much better - will resend after a simple test.

 Ok, heres the revised version

Applied, thanks Jamal.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPV6: fix source address selection

2007-09-16 Thread David Miller

From: Jiri Kosina [EMAIL PROTECTED]
Date: Thu, 13 Sep 2007 00:56:00 +0200 (CEST)

 From: Jiri Kosina [EMAIL PROTECTED]

 [PATCH] IPV6: fix source address selection

 The commit 95c385 broke proper source address selection for cases in which 
 there is a address which is makred 'deprecated'. The commit mistakenly 
 changed ifa-flags to ifa_result-flags (probably copy/paste error from a 
 few lines above) in the 'Rule 3' address selection code.

 The patch below restores the previous RFC-compliant behavior, please 
 apply.

 Cc: Jiri Bohac [EMAIL PROTECTED]
 Cc: Petr Baudis [EMAIL PROTECTED]
 Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

Excellent catch Jiri.

I'll apply this and push to -stable as well.

Thanks a lot!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: Fix the prototype of call_netdevice_notifiers

2007-09-16 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 13 Sep 2007 09:59:05 -0600

 This replaces the void * parameter with a struct net_device * which
 is what is actually required.

 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

Applied to net-2.6.24, thanks Eric.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network Namespace status

2007-09-16 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 13 Sep 2007 13:12:08 -0600

 The final blocker to having multiple useful instances of network
 namespaces is the loopback device.  We recognize the network namespace
 of incoming packets by looking at dev-nd_net.  Which means for
 packets to properly loopback within a network namespace we need a
 loopback device per network namespace.  There were some concerns
 expressed when we posted the cleanup part of the patches that allowed
 for multiple loopback devices a few weeks ago so resolving this one
 may be tricky.

There was a change posted recently to dynamically allocate the
loopback device.  I like that (sorry I don't have a reference
to the patch handy), and you can build on top of that to get
the namespace local loopback objects you want.

static struct net_device *loopback_dev(struct net_namespace *net)
{
...
}

You get the idea.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3 PATCH 0/2] Add RCU locking to SCTPaddress management

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Thu, 13 Sep 2007 15:34:35 -0400

 Thanks to Sridhar Samudral and Paul McKenney for all the help and comments.
 I think this is a final version, unless someone else can spot more problems.
 I've ran this under heavy load and it the patches behaves well.

 I think patch 1 is a candidate for 2.6.23 since it fixes a bug, but splitting
 these seems a bit odd to me.  I'll leave it to DaveM to decide where to
 put them.

Since you tested this well, I've decided to put both of these
patches into net-2.6

I agree it's stupid to split them up.

There'll be some merge hassles when I rebase net-2.6.24, but
that tree is such a monster that this is inevitable for every
bug fix I queue up for 2.6.23 :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH for 2.6.24] SCTP: Move sysctl_sctp_[rw]mem definitions to protocol.c

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Thu, 13 Sep 2007 17:03:45 -0400

 The sctp_[rw]mem definitions should really be in protocol.c
 since that is where they are initialized.  This also allows
 one to build a kernel without sysctl support.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks Vlad.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/7] [PPP] pppoe: Fix skb_unshare_check call position

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:08:49 +0800

 [PPP] pppoe: Fix skb_unshare_check call position

 The skb_unshare_check call needs to be made before pskb_may_pull,
 not after.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Patch applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/7] [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:09:04 +0800

 [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value

 The function __pppoe_xmit modifies the skb data and therefore it needs
 to copy and skb data if it's cloned.

 In fact, it currently allocates a new skb so that it can return 0 in
 case of error without freeing the original skb.  This is totally wrong
 because returning zero is meant to indicate congestion whereupon pppoe
 is supposed to wake up the upper layer once the congestion subsides.

 This makes sense for ppp_async and ppp_sync but is out-of-place for
 pppoe.  This patch makes it always return 1 and free the skb.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/7] [PPP] pppoe: Fill in header directly in __pppoe_xmit

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:09:12 +0800

 [PPP] pppoe: Fill in header directly in __pppoe_xmit

 This patch removes the hdr variable (which is copied into the skb)
 and instead sets the header directly in the skb.

 It also uses __skb_push instead of skb_push since we've just checked
 using skb_cow for enough head room.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] [BRIDGE]: Kill clone argument to br_flood_*

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:09:14 +0800

 [BRIDGE]: Kill clone argument to br_flood_*

 The clone argument is only used by one caller and that caller can clone
 the packet itself.  This patch moves the clone call into the caller and
 kills the clone argument.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/7] [NET] skbuff: Add skb_cow_head

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:09:15 +0800

 [NET] skbuff: Add skb_cow_head

 This patch adds an optimised version of skb_cow that avoids the copy if
 the header can be modified even if the rest of the payload is cloned.

 This can be used in encapsulating paths where we only need to modify the
 header.  As it is, this can be used in PPPOE and bridging.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] [PPP] generic: Fix receive path data clobbering non-linear handling

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 17:09:17 +0800

 [PPP] generic: Fix receive path data clobbering  non-linear handling

 This patch adds missing pskb_may_pull calls to deal with non-linear
 packets that may arrive from pppoe or pppol2tp.

 It also copies cloned packets before writing over them.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NETLINK]: Avoid pointer in netlink_run_queue

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 31 Aug 2007 20:09:30 +0800

 Hi Dave:

 [NETLINK]: Avoid pointer in netlink_run_queue

 I was looking at Patrick's fix to inet_diag and it occured
 to me that we're using a pointer argument to return values
 unnecessarily in netlink_run_queue.  Changing it to return
 the value will allow the compiler to generate better code
 since the value won't have to be memory-backed.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Applied to net-2.6.24, thanks Herbert.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [SKBUFF]: Fix up csum_start when head room changes

2007-09-16 Thread David Miller

From: Herbert Xu [EMAIL PROTECTED]
Date: Sat, 1 Sep 2007 09:13:33 +0800

 Hi Dave:

 [SKBUFF]: Fix up csum_start when head room changes

 Thanks for noticing the bug where csum_start is not updated
 when the head room changes.

 This patch fixes that.  It also moves the csum/ip_summed
 copying into copy_skb_header so that skb_copy_expand gets
 it too.  I've checked its callers and no one should be upset
 by this.

 Signed-off-by: Herbert Xu [EMAIL PROTECTED]

Herbert, thanks for following up on this.

Although this is technically a bug fix we don't have anyone
explicitly triggering this and I don't feel comfortable pushing this
into net-2.6 without a reported failure case right now.

So I applied it to net-2.6.24 for now.

If you disagree, plead your case :-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] netlink: use a statically allocated nl_table instead

2007-09-16 Thread David Miller

From: Denis Cheng [EMAIL PROTECTED]
Date: Sun,  2 Sep 2007 03:45:59 +0800

 if the table is always fixed size with MAX_LINKS entries, why not use a 
 statically
 allocated table straightforwardly?

 Signed-off-by: Denis Cheng [EMAIL PROTECTED]

I made the explicit decision to dynamically allocate because
many systems have limits on how large the kernel image can
be and therefore the less we statically allocate huge tables
(constant size or not) the better.

Lockdep is the worst offender, for example, it's completely awful.  It
consumes 4MB of kernel BSS space when enabled on a 64-bit platform.

Patch not applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] net/: all net/ cleanup with ARRAY_SIZE

2007-09-16 Thread David Miller

From: Denis Cheng [EMAIL PROTECTED]
Date: Sun,  2 Sep 2007 18:30:17 +0800

 Signed-off-by: Denis Cheng [EMAIL PROTECTED]

You already submitted the net/ipv4/af_inet.c case
seperately, so I had to remove it from this patch for
it to apply properly.

Please keep your patches straight to avoid problems
like this.

Thans.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Network Namespace status

2007-09-16 Thread Eric W. Biederman

David Miller [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] (Eric W. Biederman)
 Date: Thu, 13 Esp 2007 13:12:08 -0600

 The final blocker to having multiple useful instances of network
 namespaces is the loopback device.  We recognize the network namespace
 of incoming packets by looking at dev-nd_net.  Which means for
 packets to properly loopback within a network namespace we need a
 loopback device per network namespace.  There were some concerns
 expressed when we posted the cleanup part of the patches that allowed
 for multiple loopback devices a few weeks ago so resolving this one
 may be tricky.

 There was a change posted recently to dynamically allocate the
 loopback device.  I like that (sorry I don't have a reference
 to the patch handy), and you can build on top of that to get
 the namespace local loopback objects you want.

 static struct net_device *loopback_dev(struct net_namespace *net)
 {
   ...
 }

 You get the idea.

Sure.  Thanks.

Since the change got dropped I figured it for a rejection, and that
I would have to rework that patch.

On a similar note. It recently occurred to me that I can make creating
multiple network namespaces depend on !CONFIG_SYSFS.  Which will allow
most of the rest of the patches I am sure of to be merged now.  And
give me just a little more time to work with Tejun and finish up the
sysfs support.

Eric

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net

2007-09-16 Thread David Miller

From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 22:07:14 +0200

 Could we just make it so dev-init is not allowed to fail? Then it
 can be a void function and the nasty unwind code can go?

Someone (not me :-) need to do an audit to find all current
users of this function and determine if they all can live
without returning errors.

If so, sure let's make the change and simplify things.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Add ICMPMsgStats MIB (RFC 4293) [rev 2]

2007-09-16 Thread David Miller

From: David Stevens [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 15:25:32 -0600

 Background: RFC 4293 deprecates existing individual, named ICMP
 type counters to be replaced with the ICMPMsgStatsTable. This table
 includes entries for both IPv4 and IPv6, and requires counting of all
 ICMP types, whether or not the machine implements the type.

 These patches remove (but not really) the existing counters, and
 replace them with the ICMPMsgStats tables for v4 and v6.
 It includes the named counters in the /proc places they were, but gets the
 values for them from the new tables. It also counts packets generated
 from raw socket output (e.g., OutEchoes, MLD queries, RA's from
 radvd, etc).

 Changes:
 1) create icmpmsg_statistics mib
 2) create icmpv6msg_statistics mib
 3) modify existing counters to use these
 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
 listed by number for easy SNMP parsing
 5) modify /proc/net/snmp printing for Icmp to get the named data
 from new counters.
 [new to 2nd revision]
 6) support per-interface ICMP stats
 7) use common macro for per-device stat macros

 IPv6 patch attached.

 +-DLS

 Signed-off-by: David L Stevens [EMAIL PROTECTED]

No objections, so patch applied to net-2.6.24

The following is not directed at this patch specifically, but rather
in general.

All of these crappy idev == NULL checks for nearly EVERY SINGLE ipv6
counter bump has gotten _WAY_ out of control.  By definition this
whole situation is broken if we need to test the thing basically
everywhere.

And it's the worst kind of disease because it's hidden inside all
kinds of macros so when you're reading the code you don't see this
nearly constant overhead spread all over the ipv6 stack in the most
critical paths we have.

How many remote OOPS'er DoS bugs have we had in ipv6 because of how
this stuff works?  I can remember at least 3, and that's 3 too many.

We need to fix this, and I don't care how, such that idev is never
NULL and at least points to some dummy ipv6 idev object.  And it
must be done in such a way that the cure is not worse than the
disease :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net

2007-09-16 Thread Eric W. Biederman

David Miller [EMAIL PROTECTED] writes:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Fri, 14 Sep 2007 22:07:14 +0200

 Could we just make it so dev-init is not allowed to fail? Then it
 can be a void function and the nasty unwind code can go?

 Someone (not me :-) need to do an audit to find all current
 users of this function and determine if they all can live
 without returning errors.

 If so, sure let's make the change and simplify things.

I did that audit when I replied to Stephen the first time and I just
redid it to verify myself.  We are calling functions that can fail
from the init function (kmalloc in the most common).  So the
init function can fail.

So short of adding a bunch of BUG_ON's to the kernel to trap those
failure cases we can't remove the backwards list walk.  Especially
since I can initiate this code path as root by calling 
clone(CLONE_NEWNET...).

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net

2007-09-16 Thread David Miller

From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Sun, 16 Sep 2007 18:06:00 -0600

 I did that audit when I replied to Stephen the first time and I just
 redid it to verify myself.  We are calling functions that can fail
 from the init function (kmalloc in the most common).  So the
 init function can fail.

 So short of adding a bunch of BUG_ON's to the kernel to trap those
 failure cases we can't remove the backwards list walk.  Especially
 since I can initiate this code path as root by calling 
 clone(CLONE_NEWNET...).

I just noticed that posting and thanks for reiterating.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote:

 The only major complaint I have about this patch series is that
 the IPoIB part should just be one big changeset. 

Dave, you do realize that i have been investing my time working on
batching as well, right? 

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 20:29:18 -0400

 On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote:

  The only major complaint I have about this patch series is that
  the IPoIB part should just be one big changeset. 

 Dave, you do realize that i have been investing my time working on
 batching as well, right? 

I do.

And I'm reviewing and applying several hundred patches a day.

What's the point? :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 18:02 -0700, David Miller wrote:

 I do.
 
 And I'm reviewing and applying several hundred patches a day.
 
 What's the point? :-)

Reading the commentary made me think you were about to swallow that with
one more change by the time i wake up;-
I still think this work - despite my vested interest - needs more
scrutiny from a performance perspective.
I tend to send a url to my work, but it may be time to start posting
patches.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 22:14:21 -0400

 I still think this work - despite my vested interest - needs more
 scrutiny from a performance perspective.

Absolutely.

There are tertiary issues I'm personally interested in, for example
how well this stuff works when we enable software GSO on a non-TSO
capable card.

In such a case the GSO segment should be split right before we hit the
driver and then all the sub-segments of the original GSO frame batched
in one shot down to the device driver.

In this way you'll get a large chunk of the benefit of TSO without
explicit hardware support for the feature.

There are several cards (some even 10GB) that will benefit immensely
from this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/8] SCTP: protocol definitions for SCTP-AUTH implementation

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:52 -0400

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] SCTP: Implement SCTP-AUTH internals

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:53 -0400

 This patch implements the internals operations of the AUTH, such as
 key computation and storage.  It also adds necessary variables to
 the SCTP data structures.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, with lots of trailing whitspace fixed.

Please check your patches with GIT by using something such
as git apply --check --whitespace=error-all foo.diff
in the future, and you'll see stuff like this:

Adds trailing whitespace.
diff:696:   
Adds trailing whitespace.
diff:732:   return secret; 
Adds trailing whitespace.
diff:805:   
Adds trailing whitespace.
diff:815:/* 
Adds trailing whitespace.
diff:1034:  break; 
Adds trailing whitespace.
diff:1098:  
Adds trailing whitespace.
diff:1109:  
fatal: 7 lines add trailing whitespaces.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/8] SCTP: Implement SCTP-AUTH initializations.

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:54 -0400

 The patch initializes AUTH related members of the generic SCTP
 structures and provides a way to enable/disable auth extension.
 
 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/8] SCTP: Implete SCTP-AUTH parameter processing

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:55 -0400

 Implement processing for the CHUNKS, RANDOM, and HMAC parameters and
 deal with how this parameters are effected by association restarts.
 In particular, during unexpeted INIT processing, we need to reply with
 parameters from the original INIT chunk.  Also, after restart, we need
 to update the old association with new peer parameters and change the
 association shared keys.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/8] SCTP: Enable the sending of the AUTH chunk.

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:56 -0400

 SCTP-AUTH, Section 6.2:

Endpoints MUST send all requested chunks authenticated where this has
been requested by the peer.  The other chunks MAY be sent
authenticated or not.  If endpoint pair shared keys are used, one of
them MUST be selected for authentication.

To send chunks in an authenticated way, the sender MUST include these
chunks after an AUTH chunk.  This means that a sender MUST bundle
chunks in order to authenticate them.

If the endpoint has no endpoint pair shared key for the peer, it MUST
use Shared Key Identifier 0 with an empty endpoint pair shared key.
If there are multiple endpoint shared keys the sender selects one and
uses the corresponding Shared Key Identifier

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/8] SCTP: Implement the receive and verification of AUTH chunk

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:57 -0400

 This patch implements the receive path needed to process authenticated
 chunks.  Add ability to process the AUTH chunk and handle edge cases
 for authenticated COOKIE-ECHO as well.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/8] SCTP: API updates to suport SCTP-AUTH extensions.

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:58 -0400

 Add SCTP-AUTH API.  The API implemented here was
 agreed to between implementors at the 9th SCTP Interop.
 It will be documented in the next revision of the
 SCTP socket API spec.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] SCTP: Tie ADD-IP and AUTH functionality as required by spec.

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 14:44:59 -0400

 ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH.
 So, disable ADD-IP functionality if the peer claims to support
 ADD-IP, but not AUTH.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Applied to net-2.6.24, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v2 PATCH 8/8] SCTP: Tie ADD-IP and AUTH functionality as required by spec.

2007-09-16 Thread David Miller

From: Vlad Yasevich [EMAIL PROTECTED]
Date: Fri, 14 Sep 2007 15:14:50 -0400

 [.. forgot to refresh the patch, the other version has compile problems ..]

 ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH.
 So, disable ADD-IP functionality if the peer claims to support
 ADD-IP, but not AUTH.

 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED]

Aha, I caught this and applied the correct patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Blackfin EMAC driver: Add phyabstraction layer supporting in bfin_emac driver

2007-09-16 Thread Robin Getz

On Sat 15 Sep 2007 22:57, Bryan Wu pondered:
 
  - add MDIO functions and register mdio bus
  - add phy abstraction layer (PAL) functions and use PAL API
  - test on STAMP537 board

Today, the Kconfig for the Blackfin just includes:

 config BFIN_MAC
 tristate Blackfin 536/537 on-chip mac support
 depends on NET_ETHERNET  (BF537 || BF536)  (!BF537_PORT_H)
 select CRC32
 select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
 help
   This is the driver for blackfin on-chip mac device. Say Y if you
 want it compiled into the kernel. This driver is also available as a module
 ( = code which can be inserted in and removed from the running kernel
 whenever you want). The module will be called bfin_mac.

Since you are adding requirement for the PHYLIB with this patch, should there
be a select for that?

-Robin
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread jamal

On Sun, 2007-16-09 at 19:25 -0700, David Miller wrote:

 There are tertiary issues I'm personally interested in, for example
 how well this stuff works when we enable software GSO on a non-TSO
 capable card.
 
 In such a case the GSO segment should be split right before we hit the
 driver and then all the sub-segments of the original GSO frame batched
 in one shot down to the device driver.

I think GSO is still useful on top of this.
In my patches anything with gso gets put into the batch list and shot
down the driver. Ive never considered checking whether the nic is TSO
capable, that may be something worth checking into. The netiron allows
you to shove upto 128 skbs utilizing one tx descriptor, which makes for
interesting possibilities.

 In this way you'll get a large chunk of the benefit of TSO without
 explicit hardware support for the feature.
 
 There are several cards (some even 10GB) that will benefit immensely
 from this.

indeed - ive always wondered if batching this way would make the NICs
behave differently from the way TSO does.

On a side note: My observation is that with large packets on a very busy
system; bulk transfer type app, one approaches wire speed; with or
without batching, the apps are mostly idling (Ive seen upto 90% idle
time polling at the socket level for write to complete with a really
busy system). This is the case with or without batching. cpu seems a
little better with batching. As the aggregation of the apps gets more
aggressive (achievable by reducing their packet sizes), one can achieve
improved throughput and reduced cpu utilization. This all with UDP; i am
still studying tcp. 
 
cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread David Miller

From: jamal [EMAIL PROTECTED]
Date: Sun, 16 Sep 2007 23:01:43 -0400

 I think GSO is still useful on top of this.
 In my patches anything with gso gets put into the batch list and shot
 down the driver. Ive never considered checking whether the nic is TSO
 capable, that may be something worth checking into. The netiron allows
 you to shove upto 128 skbs utilizing one tx descriptor, which makes for
 interesting possibilities.

We're talking past each other, but I'm happy to hear that for
sure your code does the right thing :-)

Right now only TSO capable hardware sets the TSO capable bit,
except perhaps for the XEN netfront driver.

What Herbert and I want to do is basically turn on TSO for
devices that can't do it in hardware, and rely upon the GSO
framework to do the segmenting in software right before we
hit the device.

This only makes sense for devices which can 1) scatter-gather
and 2) checksum on transmit.  Otherwise we make too many
copies and/or passes over the data.

And we can only get the full benefit if we can pass all the
sub-segments down to the driver in one -hard_start_xmit()
call.

 On a side note: My observation is that with large packets on a very busy
 system; bulk transfer type app, one approaches wire speed; with or
 without batching, the apps are mostly idling (Ive seen upto 90% idle
 time polling at the socket level for write to complete with a really
 busy system). This is the case with or without batching. cpu seems a
 little better with batching. As the aggregation of the apps gets more
 aggressive (achievable by reducing their packet sizes), one can achieve
 improved throughput and reduced cpu utilization. This all with UDP; i am
 still studying tcp. 

UDP apps spraying data tend to naturally batch well and load balance
amongst themselves because each socket fills up to it's socket send
buffer limit, then sleeps, and we then get a stream from the next UDP
socket up to it's limit, and so on and so forth.

UDP is too easy a test case in fact :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] Blackfin EMAC driver: Add phyabstraction layer supporting in bfin_emac driver

2007-09-16 Thread Bryan Wu

On Sun, 2007-09-16 at 22:51 -0400, Robin Getz wrote:
 On Sat 15 Sep 2007 22:57, Bryan Wu pondered:
  
   - add MDIO functions and register mdio bus
   - add phy abstraction layer (PAL) functions and use PAL API
   - test on STAMP537 board
 
 Today, the Kconfig for the Blackfin just includes:
 
  config BFIN_MAC
  tristate Blackfin 536/537 on-chip mac support
  depends on NET_ETHERNET  (BF537 || BF536)  (!BF537_PORT_H)
  select CRC32
  select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
  help
This is the driver for blackfin on-chip mac device. Say Y if you
  want it compiled into the kernel. This driver is also available as a module
  ( = code which can be inserted in and removed from the running kernel
  whenever you want). The module will be called bfin_mac.
 
 Since you are adding requirement for the PHYLIB with this patch, should there
 be a select for that?
 
 -Robin

OK, I will send a patch for  this update, since some people failed to
compile the kernel without select the PHYLIB.

Thanks
-Bryan Wu
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

net-2.6.24 plans

2007-09-16 Thread David Miller


Most if not all of my 2 week backlog of patches is in the net-2.6.24
and net-2.6 tree now.  And any relevant -stable fixes will be
submitted in the next day or two.

Tomorrow (Monday) I want to rebase the net-2.6.24 tree one more time
to deal with all of the conflicts which exist between
linux-2.6/net-2.6 and net-2.6.24, but I'll likely defer that
until the net-2.6 fixes I just pushed to Linus are integrated.

It's to the point where every single bug fix put into Linus's tree
creates a merge conflict with net-2.6.24, we are simply touching that
much stuff. :-)

I expect some small network namespace fixes from Eric B., but that's
basically it as far as 2.6.24 is concerned.  Oh yes, there are also
the MAC_FMT/MAC_ARG bits from Joe Perches that I need to do a merge
of.

The transmit batching stuff needs a lot more analysis and discussion,
so I definitely see that stuff as 2.6.25 material.  I think if we can
avoid a food fight between Jamal and Mr. Kumar and have healthy
discussions, we can end up with a really nice implementation.  So
everyone put your boxing gloves away and let's get at it. :-)

We've touched so much in net-2.6.24 that we really should be auditing
the thing and fixing any bugs that have been added.  If you're bored
and looking for something to do, pick an odd NAPI driver and audit it
in the net-2.6.24 tree.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Blackfin EMAC driver: add a select for the PHYLIB of this driver

2007-09-16 Thread Bryan Wu

Since we are adding requirement for the PHYLIB for this driver, there should be 
a select for that

Cc: Robin Getz [EMAIL PROTECTED]
Signed-off-by: Bryan Wu [EMAIL PROTECTED]
---
 drivers/net/Kconfig |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 5b9e17b..5eef224 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -843,6 +843,8 @@ config BFIN_MAC
tristate Blackfin 536/537 on-chip mac support
depends on NET_ETHERNET  (BF537 || BF536)  (!BF537_PORT_H)
select CRC32
+   select MII
+   select PHYLIB
select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE
help
  This is the driver for blackfin on-chip mac device. Say Y if you want 
it
-- 
1.5.2

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/10 REV5] [sched] Modify qdisc_run to support batching

2007-09-16 Thread Krishna Kumar2

Hi Evgeniy,

Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 05:45:19 PM:

  +  if (skb-next) {
  + int count = 0;
  +
  + do {
  +struct sk_buff *nskb = skb-next;
  +
  +skb-next = nskb-next;
  +__skb_queue_tail(dev-skb_blist, nskb);
  +count++;
  + } while (skb-next);

 Could it be list_move()-like function for skb lists?
 I'm pretty sure if you change first and the last skbs and ke of the
 queue in one shot, result will be the same.

I have to do a bit more like update count, etc, but I agree it is do-able.
I had mentioned in my PATCH 0/10 that I will later try this suggestion
that you provided last time.

 Actually how many skbs are usually batched in your load?

It depends, eg when the tx lock is not got, I get batching of upto 8-10
skbs (assuming that tx lock was not got quite a few times). But when the
queue gets blocked, I have seen batching upto 4K skbs (if tx_queue_len
is 4K).

  + /* Reset destructor for kfree_skb to work */
  + skb-destructor = DEV_GSO_CB(skb)-destructor;
  + kfree_skb(skb);

 Why do you free first skb in the chain?

This is the gso code which has segmented 'skb' to skb'1-n', and those
skb'1-n' are sent out and freed by driver, which means the dummy 'skb'
(without any data) remains to be freed.

Thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/10 REV5] [core] Add skb_blist support for batching

2007-09-16 Thread Krishna Kumar2

Hi Evgeniy,

Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 06:16:38 PM:

  +   if (dev-features  NETIF_F_BATCH_SKBS) {
  +  /* Driver supports batching skb */
  +  dev-skb_blist = kmalloc(sizeof *dev-skb_blist, GFP_KERNEL);
  +  if (dev-skb_blist)
  + skb_queue_head_init(dev-skb_blist);
  +   }
  +

 A nitpick is that you should use sizeof(struct ...) and I think it
 requires flag clearing in cae of failed initialization?

I thought it is better to use *var name in case the name of the structure
changes. Also, the flag is not cleared since I could try to enable batching
later, and it could succeed at that time. When skb_blist is allocated, then
batching is enabled otherwise it is disabled (while features flag just
indicates that driver supports batching).

Thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/10 REV5] [E1000] Implement batching

2007-09-16 Thread Krishna Kumar2

Hi Evgeniy,

Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 06:17:14 PM:

  if (unlikely(skb-len = 0)) {
 dev_kfree_skb_any(skb);
  -  return NETDEV_TX_OK;
  +  return NETDEV_TX_DROPPED;
  }

 This changes could actually go as own patch, although not sure it is
 ever used. just a though, not a stopper.

Since this flag is new and useful only for batching, I feel it is OK to
include it in this patch.

  +   if (!skb || (blist  skb_queue_len(blist))) {
  +  /*
  +   * Either batching xmit call, or single skb case but there are
  +   * skbs already in the batch list from previous failure to
  +   * xmit - send the earlier skbs first to avoid out of order.
  +   */
  +  if (skb)
  + __skb_queue_tail(blist, skb);
  +  skb = __skb_dequeue(blist);

 Why is it put at the end?

There is a bug that I had explained in rev4 (see XXX below) resulting
in sending out skbs out of order. The fix is that if the driver gets
called with a skb but there are older skbs already in the batch list
(which failed to get sent out), send those skbs first before this one.

Thanks,

- KK

[XXX] Dave had suggested to use batching only in the net_tx_action case.
When I implemented that in earlier revisions, there were lots of TCP
retransmissions (about 18,000 to every 1 in regular code). I found the
reason
for part of that problem as: skbs get queue'd up in dev-qdisc (when tx
lock
was not got or queue blocked); when net_tx_action is called later, it
passes
the batch list as argument to qdisc_run and this results in skbs being
moved
to the batch list; then batching xmit also fails due to tx lock failure;
the
next many regular xmit of a single skb will go through the fast path (pass
NULL batch list to qdisc_run) and send those skbs out to the device while
previous skbs are cooling their heels in the batch list.

The first fix was to not pass NULL/batch-list to qdisc_run() but to always
check whether skbs are present in batch list when trying to xmit. This
reduced
retransmissions by a third (from 18,000 to around 12,000), but led to
another
problem while testing - iperf transmit almost zero data for higher # of
parallel flows like 64 or more (and when I run iperf for a 2 min run, it
takes about 5-6 mins, and reports that it ran 0 secs and the amount of data
transfered is a few MB's). I don't know why this happens with this being
the
only change (any ideas is very appreciated).

The second fix that resolved this was to revert back to Dave's suggestion
to
use batching only in net_tx_action case, and modify the driver to see if
skbs
are present in batch list and to send them out first before sending the
current skb. I still see huge retransmission for IPoIB (but not for E1000),
though it has reduced to 12,000 from the earlier 18,000 number.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread Krishna Kumar2

Hi Dave,

David Miller [EMAIL PROTECTED] wrote on 09/17/2007 04:47:48 AM:

 The only major complaint I have about this patch series is that
 the IPoIB part should just be one big changeset.  Otherwise the
 tree is not bisectable, for example the initial ipoib header file
 change breaks the build.

Right, I will change it accordingly.

 On a lower priority, I question the indirection of skb_blist by making
 it a pointer.  For what?  Saving 12 bytes on 64-bit?  That kmalloc()'d
 thing is a nearly guarenteed cache and/or TLB miss.  Just inline the
 thing, we generally don't do crap like this anywhere else.

The intention was to avoid having two flags (one that driver supports
batching and second to indicate that batching is on/off). So I could test
skb_blist as an indication of whether batching is on/off. But your point
on cache miss is absolutely correct, and I will change this part to be
inline.

thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching

2007-09-16 Thread Krishna Kumar2

Hi Randy,

Randy Dunlap [EMAIL PROTECTED] wrote on 09/15/2007 12:07:09 AM:

  +   To fix this problem, error cases where driver xmit gets called with
a
  +   skb must code as follows:
  +  1. If driver xmit cannot get tx lock, return NETDEV_TX_LOCKED
  + as usual. This allows qdisc to requeue the skb.
  +  2. If driver xmit got the lock but failed to send the skb, it
  + should return NETDEV_TX_BUSY but before that it should have
  + queue'd the skb to the batch list. In this case, the qdisc

queued

  + does not requeue the skb.

Since this was a new section that I added to the documentation, this error
creeped up. Thanks for catching it, and review comments/ack-off :)

thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 plans

2007-09-16 Thread Jeff Garzik


David Miller wrote:

We've touched so much in net-2.6.24 that we really should be auditing
the thing and fixing any bugs that have been added.  If you're bored
and looking for something to do, pick an odd NAPI driver and audit it
in the net-2.6.24 tree.


You could try that weird post patches on the list thing for review.

I dunno about sparc64, but IMO any networking work you do yourself and 
commit yourself should also be sent to the list for review.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching

2007-09-16 Thread Jeff Garzik


Please remove me from the CC list.

I get this via netdev, and not having said a single thing in the thread, 
I don't feel the need to be CC'd on every email.


The CC list is pretty massive as it is, anyway.

Jeff



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) [RESEND]

2007-09-16 Thread David Stevens

Dave,
Thanks. That rev2 was for v6-only; I didn't see anythng about the
v4 patch (below, in case it fell through the cracks).

+-DLS


- Forwarded by David Stevens/Beaverton/IBM on 09/16/2007 09:02 PM 
-

David Stevens/Beaverton/[EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
09/10/2007 07:25 PM

To
[EMAIL PROTECTED], [EMAIL PROTECTED]
cc
netdev@vger.kernel.org
Subject
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293)






Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches remove (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add IcmpMsg with all ICMP types
listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for Icmp to get the named data
from new counters.

IPv4 patch attached, IPv6 patch to follow.

+-DLS

Signed-off-by: David L Stevens [EMAIL PROTECTED]

diff -ruNp linux-2.6.22.5/include/linux/snmp.h 
linux-2.6.22.5_ICMPMSG/include/linux/snmp.h
--- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 
15:32:29.0 -0700
@@ -82,6 +82,8 @@ enum
__ICMP_MIB_MAX
 };
 
+#define __ICMPMSG_MIB_MAX 512  /* Out+In for all 8-bit ICMP types */
+
 /* icmp6 mib definitions */
 /*
  * RFC 2466:  ICMPv6-MIB
diff -ruNp linux-2.6.22.5/include/net/icmp.h 
linux-2.6.22.5_ICMPMSG/include/net/icmp.h
--- linux-2.6.22.5/include/net/icmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h   2007-08-23 
15:56:45.0 -0700
@@ -30,9 +30,16 @@ struct icmp_err {
 
 extern struct icmp_err icmp_err_convert[];
 DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics);
+DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics);
 #define ICMP_INC_STATS(field)  SNMP_INC_STATS(icmp_statistics, 
field)
 #define ICMP_INC_STATS_BH(field)   SNMP_INC_STATS_BH(icmp_statistics, 

field)
 #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, 
field)
+#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, 

field+256)
+#define ICMPMSGOUT_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field+256)
+#define ICMPMSGOUT_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field+256)
+#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, 

field)
+#define ICMPMSGIN_INC_STATS_BH(field) 
SNMP_INC_STATS_BH(icmpmsg_statistics, field)
+#define ICMPMSGIN_INC_STATS_USER(field) 
SNMP_INC_STATS_USER(icmpmsg_statistics, field)
 
 struct dst_entry;
 struct net_proto_family;
@@ -42,6 +49,7 @@ extern void   icmp_send(struct sk_buff *sk
 extern int icmp_rcv(struct sk_buff *skb);
 extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern voidicmp_init(struct net_proto_family *ops);
+extern voidicmp_out_count(unsigned char type);
 
 /* Move into dst.h ? */
 extern int xrlim_allow(struct dst_entry *dst, int timeout);
diff -ruNp linux-2.6.22.5/include/net/snmp.h 
linux-2.6.22.5_ICMPMSG/include/net/snmp.h
--- linux-2.6.22.5/include/net/snmp.h   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h   2007-08-23 
14:42:50.0 -0700
@@ -82,6 +82,11 @@ struct icmp_mib {
unsigned long   mibs[ICMP_MIB_MAX];
 } __SNMP_MIB_ALIGN__;
 
+#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX
+struct icmpmsg_mib {
+   unsigned long   mibs[ICMPMSG_MIB_MAX];
+} __SNMP_MIB_ALIGN__;
+
 /* ICMP6 (IPv6-ICMP) */
 #define ICMP6_MIB_MAX  __ICMP6_MIB_MAX
 struct icmpv6_mib {
diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c 
linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c
--- linux-2.6.22.5/net/ipv4/af_inet.c   2007-08-22 16:23:54.0 
-0700
+++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c   2007-08-23 
14:47:26.0 -0700
@@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void)
  sizeof(struct icmp_mib),
  __alignof__(struct icmp_mib))  0)
goto err_icmp_mib;
+   if (snmp_mib_init((void **)icmpmsg_statistics,
+ sizeof(struct icmpmsg_mib),
+ __alignof__(struct icmpmsg_mib))  0)
+   goto err_icmpmsg_mib;
if (snmp_mib_init((void **)tcp_statistics,
  sizeof(struct

Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

2007-09-16 Thread Krishna Kumar2

[Removing Jeff as requested from thread :) ]

Hi Dave,

David Miller [EMAIL PROTECTED] wrote on 09/17/2007 07:55:02 AM:

 From: jamal [EMAIL PROTECTED]
 Date: Sun, 16 Sep 2007 22:14:21 -0400

  I still think this work - despite my vested interest - needs more
  scrutiny from a performance perspective.

 Absolutely.

 There are tertiary issues I'm personally interested in, for example
 how well this stuff works when we enable software GSO on a non-TSO
 capable card.

 In such a case the GSO segment should be split right before we hit the
 driver and then all the sub-segments of the original GSO frame batched
 in one shot down to the device driver.

 In this way you'll get a large chunk of the benefit of TSO without
 explicit hardware support for the feature.

 There are several cards (some even 10GB) that will benefit immensely
 from this.

I have tried this on ehca which does not support TSO. I added GSO flag at
the ipoib layer (and that resulted in a panic/fix that is mentioned in
this patchset). I will re-run tests for this and submit results.

Thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

72 matches

Mail list logo