[PATCH] jumbo all-NICs ethtool count cleanup
Just checked this in locally... The hooks -self_test_count() and -get_stats_count() are now unused in the main tree. (based off of latest davem/net-2.6.24.git) drivers/net/3c59x.c | 11 +++- drivers/net/8139cp.c| 11 +++- drivers/net/8139too.c | 11 +++- drivers/net/atl1/atl1_ethtool.c | 11 +++- drivers/net/b44.c | 11 +++- drivers/net/bnx2.c | 20 drivers/net/cassini.c | 11 +++- drivers/net/chelsio/cxgb2.c | 11 +++- drivers/net/cxgb3/cxgb3_main.c | 11 +++- drivers/net/e100.c | 19 drivers/net/e1000/e1000_ethtool.c | 22 - drivers/net/e1000e/ethtool.c| 21 - drivers/net/ehea/ehea_ethtool.c | 13 - drivers/net/forcedeth.c | 45 +--- drivers/net/gianfar_ethtool.c | 20 drivers/net/ibm_emac/ibm_emac_core.c| 12 +++-- drivers/net/ibmveth.c | 11 +++- drivers/net/ixgb/ixgb_ethtool.c | 11 +++- drivers/net/ixgbe/ixgbe_ethtool.c | 11 +++- drivers/net/mv643xx_eth.c | 12 +++-- drivers/net/myri10ge/myri10ge.c | 11 +++- drivers/net/netxen/netxen_nic_ethtool.c | 21 - drivers/net/pcnet32.c | 11 +++- drivers/net/qla3xxx.c |2 drivers/net/r8169.c | 11 +++- drivers/net/s2io.c | 47 drivers/net/sc92031.c | 11 +++- drivers/net/skge.c | 11 +++- drivers/net/sky2.c | 11 +++- drivers/net/spider_net_ethtool.c| 11 +++- drivers/net/tc35815.c | 12 - drivers/net/tg3.c | 19 drivers/net/ucc_geth_ethtool.c | 26 ++- drivers/net/veth.c | 11 +++- drivers/net/wireless/libertas/ethtool.c | 72 ++-- 35 files changed, 346 insertions(+), 246 deletions(-) diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c index ad0f6a7..6295e94 100644 --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -2834,9 +2834,14 @@ static void vortex_set_msglevel(struct net_device *dev, u32 dbg) vortex_debug = dbg; } -static int vortex_get_stats_count(struct net_device *dev) +static int vortex_get_sset_count(struct net_device *dev, int sset) { - return VORTEX_NUM_STATS; + switch (sset) { + case ETH_SS_STATS: + return VORTEX_NUM_STATS; + default: + return -EOPNOTSUPP; + } } static void vortex_get_ethtool_stats(struct net_device *dev, @@ -2893,7 +2898,7 @@ static const struct ethtool_ops vortex_ethtool_ops = { .get_msglevel = vortex_get_msglevel, .set_msglevel = vortex_set_msglevel, .get_ethtool_stats = vortex_get_ethtool_stats, - .get_stats_count= vortex_get_stats_count, + .get_sset_count = vortex_get_sset_count, .get_settings = vortex_get_settings, .set_settings = vortex_set_settings, .get_link = ethtool_op_get_link, diff --git a/drivers/net/8139cp.c b/drivers/net/8139cp.c index 58fad1b..eccaa16 100644 --- a/drivers/net/8139cp.c +++ b/drivers/net/8139cp.c @@ -1383,9 +1383,14 @@ static int cp_get_regs_len(struct net_device *dev) return CP_REGS_SIZE; } -static int cp_get_stats_count (struct net_device *dev) +static int cp_get_sset_count (struct net_device *dev, int sset) { - return CP_NUM_STATS; + switch (sset) { + case ETH_SS_STATS: + return CP_NUM_STATS; + default: + return -EOPNOTSUPP; + } } static int cp_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) @@ -1563,7 +1568,7 @@ static void cp_get_ethtool_stats (struct net_device *dev, static const struct ethtool_ops cp_ethtool_ops = { .get_drvinfo= cp_get_drvinfo, .get_regs_len = cp_get_regs_len, - .get_stats_count= cp_get_stats_count, + .get_sset_count = cp_get_sset_count, .get_settings = cp_get_settings, .set_settings = cp_set_settings, .nway_reset = cp_nway_reset, diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c index 16b9196..565fbdb 100644 --- a/drivers/net/8139too.c +++ b/drivers/net/8139too.c @@ -2400,9 +2400,14 @@ static void rtl8139_get_regs(struct net_device *dev, struct ethtool_regs *regs, } #endif /* CONFIG_8139TOO_MMIO */ -static int rtl8139_get_stats_count(struct net_device *dev) +static int rtl8139_get_sset_count(struct net_device *dev, int sset) { - return RTL_NUM_STATS; + switch (sset) { + case ETH_SS_STATS: + return
Re: [PATCH] jumbo all-NICs ethtool count cleanup
Hi Jeff. You wrote: The hooks -self_test_count() and -get_stats_count() are now unused in the main tree. So I'm suprised to see more lines added than deleted: 35 files changed, 346 insertions(+), 246 deletions(-) Puzzled - may need a bit more coffee (morning here).. Sam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
On Sep 15, 2007, at 13:24:46, Andreas Dilger wrote: On Sep 15, 2007 16:29 +0400, Evgeniy Polyakov wrote: Yes, block device itself is not able to scale well, but it is the place for redundancy, since filesystem will just fail if underlying device does not work correctly and FS actually does not know about where it should place redundancy bits - it might happen to be the same broken disk, so I created a low-level device which distribute requests itself. I actually think there is a place for this - and improvements are definitely welcome. Even Lustre needs block-device level redundancy currently, though we will be working to make Lustre- level redundancy available in the future (the problem is WAY harder than it seems at first glance, if you allow writeback caches at the clients and servers). I really think that to get proper non-block-device-level filesystem redundancy you need to base it on something similar to the GIT model. Data replication is done in specific-sized chunks indexed by SHA-1 sum and you actually have a sort of merge algorithm for when local and remote changes differ. The OS would only implement a very limited list of merge algorithms, IE one of: (A) Don't merge, each client gets its own branch and merges are manual (B) Most recent changed version is made the master every X-seconds/ open/close/write/other-event. (C) The tree at X (usually a particular client/server) is always used as the master when there are conflicts. This lets you implement whatever replication policy you want: You can require that some files are replicated (cached) on *EVERY* system, you can require that other files are cached on at least X systems. You can say this needs to be replicated on at least X% of the online systems, or at most Y. Moreover, the replication could be done pretty easily from userspace via a couple syscalls. You also automatically keep track of history with some default purge policy. The main point is that for efficiency and speed things are *not* always replicated; this also allows for offline operation. You would of course have userspace merge drivers which notice that the tree on your laptop is not a subset/superset of the tree on your desktop and do various merges based on per-file metadata. My address-book, for example, would have a custom little merge program which knows about how to merge changes between two address book files, asking me useful questions along the way. Since a lot of this merging is mechanical, some of the code from GIT could easily be made into a merge library which knows how to do such things. Moreover, this would allow me to have a shared root filesystem on my laptop and desktop. It would have 'sub-project'-type trees, so that / would be an independent branch on each system. /etc would be separate branches but manually merged git-style as I make changes. /home/* folders would be auto-created as separate subtrees so each user can version their own individually. Specific subfolders (like address-book, email, etc) would be adjusted by the GUI programs that manage them to be separate subtrees with manual- merging controlled by that GUI program. Backups/dumps/archival of such a system would be easy. You would just need to clone the significant commits/trees/etc to a DVD and replace the old SHA-1-indexed objects to tiny object-deleted stubs; to rollback to an archived version you insert the DVD, mount it into the existing kernel SHA-1 index, and then mount the appropriate commit as a read-only volume somewhere to access. The same procedure would also work for wide-area-network backups and such. The effective result would be the ability to do things like the following: (A) Have my homedir synced between both systems mostly- automatically as I make changes to different files on both systems (B) Easily have 2 copies of all my files, so if one system's disk goes kaput I can just re-clone from the other. (C) Keep archived copies of the last 5 years worth of work, including change history, on a stack of DVDs. (D) Synchronize work between locations over a relatively slow link without much work. As long as files were indirectly indexed by sub-block SHA1 (with the index depth based on the size of the file), and each individually- SHA1-ed object could have references, you could trivially have a 4TB- sized file where you modify 4 bytes at a thousand random locations throughout the file and only have to update about 5MB worth of on- disk data. The actual overhead for that kind of operation under any existing filesystem would be 100% seek-dominated regardless whereas with this mechanism you would not directly be overwriting data and so you could append all the updates as a single 5MB chunk. Data reads would be much more seek-y, but you could trivially have an on-line defragmenter tool which notices fragmented
Re: [PATCH] jumbo all-NICs ethtool count cleanup
Sam Ravnborg wrote: Hi Jeff. You wrote: The hooks -self_test_count() and -get_stats_count() are now unused in the main tree. So I'm suprised to see more lines added than deleted: 35 files changed, 346 insertions(+), 246 deletions(-) Puzzled - may need a bit more coffee (morning here).. The new interface that supercedes these is -get_sset_count(), which was added to provide additional functionality without having to add a new hook each time we want to return a new integer value. This new interface also (intentionally) aligns with the existing -get_strings() interface. (sset in get_sset_count stands for string set) Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Please pull 'adm8211' branch of wireless-2.6
Michael Wu wrote: On Saturday 15 September 2007 20:56, Jeff Garzik wrote: + if (flags IFF_PROMISC) + dev-flags |= IEEE80211_HW_RX_INCLUDES_FCS; + else + dev-flags = ~IEEE80211_HW_RX_INCLUDES_FCS; why does promisc dictate inclusion of FCS? Because that's the way the hardware works. Why not always include it, regardless of promisc? I really do mean that's how the hardware works. If you turn on the promisc bit in the hardware (which IFF_PROMISC causes), it starts including the FCS, but if the bit is not set, the FCS is not included in frames. OK, I was confused by the name. Based on the constant's name, I was assuming that you could unconditionally enable it, promisc or not. Nevermind. I thought that was a hardware rather than software bit. What form of debugging are you talking about? I don't see how it makes a difference for debugging. The type checking provided by enums won't make a When you are tracing through with kgdb, the code is actually readable. You see dev-flags |= IEEE80211_HW_RX_INCLUDES_FCS; rather than the far more obtuse dev-flags |= 8; Ditto for any time you have to read pre-processed source code. I do so at least once a month, since post-cpp code shows you precisely what the compiler is munching, after all the macro magic goes away. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] [PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.
Steve Wise wrote: RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery. Calling arp_send() to initiate neighbour discovery (ND) doesn't do the full ND protocol. Namely, it doesn't handle retransmitting the arp request if it is dropped. The function neigh_event_send() does all this. Without doing full ND, rdma address resolution fails in the presence of dropped arp bcast packets. Jay, Is there a way to deploy something similar for the gratuitous arp being sent by the bonding driver at bond_arp_send()? We have seen rare situations where the skb was dropped by the stack and hence bonding fail-over was detected by the remote peer only when its neighboring subsystem probe failures dictated that a new arp must be issued. Or. Signed-off-by: Steve Wise [EMAIL PROTECTED] --- drivers/infiniband/core/addr.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index c5c33d3..5381c80 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -161,8 +161,7 @@ static void addr_send_arp(struct sockadd if (ip_route_output_key(rt, fl)) return; - arp_send(ARPOP_REQUEST, ETH_P_ARP, rt-rt_gateway, rt-idev-dev, -rt-rt_src, NULL, rt-idev-dev-dev_addr, NULL); + neigh_event_send(rt-u.dst.neighbour, NULL); ip_rt_put(rt); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: - revert-8139too-clean-up-i-o-remapping.patch removed from -mm tree
[EMAIL PROTECTED] wrote: The patch titled revert 8139too: clean up I/O remapping has been removed from the -mm tree. Its filename was revert-8139too-clean-up-i-o-remapping.patch This patch was dropped because it was merged into mainline or a subsystem tree -- Subject: revert 8139too: clean up I/O remapping From: Andrew Morton [EMAIL PROTECTED] Revert git-netdev-all's 9ee6b32a47b9abc565466a9c3b127a5246b452e5. Michal was getting oopses. Cc: Michal Piotrowski [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Shit! Thanks for reminding me that I need to fix that up before it goes upstream with the rest of net-2.6.24. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] InfiniBand/RDMA merge plans for 2.6.24
Roland Dreier wrote: With 2.6.24 probably opening in the not-too-distant future, it's probably a good time to review what my plans are for when the merge window opens. Core: - Sean's QoS changes. These look fine at first glance, and I just plan to understand the backwards compatibility story (ie how this works with an old SM) and merge. Anyone who objects let me know. Hi Roland, I have reviewed the qos patches and provided comments which were deployed in v2 of the series. I also tested it (ipoib and iser which is rdma-cm based) against the Voltaire SM/SA to see that nothing was broken. I will send you a reviewed by: signature. ULPs: [ofa-general] [PATCH RFC] IB/ipoib: enable IGMP for userpsace multicast IB apps The IGMP enabling patch posted by me on September 2nd isn't on your list http://lists.openfabrics.org/pipermail/general/2007-September/040250.html can you add it? - Moni's IPoIB bonding support. This seems mostly an issue of getting the core bonding maintainer's attention. However getting a Reviewed-by: for the IPoIB changes wouldn't hurt too. Jay Vosburgh, the bonding driver maintainer just sent an ack on all patch series. As for the IPoIB changes, there are three patches, where two of them, namely [PATCH 02/11] IB/ipoib: Notify the world before doing unregister [PATCH 04/11] IB/ipoib: Verify address handle validity on send are handling a corner-case problems pointed by Michael Tsirkin. Michael, will you be able to look on it and provide a reviewed-by signature? the third patch [PATCH 03/11] IB/ipoib: Bound the net device to the ipoib_neigh structue is somehow much more simple, I don't think more review is needed for it. - Eli and Michael's IPoIB stateless offload (checksum offload, LSO, LRO, etc). It's a big series that makes quite a few core changes. I think it needs some careful review and is probably at risk of missing this merge window. Sorting in order of invasiveness so we can merge at least some of it (if splitting it makes sense) might be a good idea. Just for the record, the 'etc' above relates to the interrupt moderation support (mlx4, core, ipoib {config through ethertool, usage). Among other things, what is not clear to me here is if/how this goes hand-in-hand with NAPI. As you saw the patch adding checksum offload support had a long thread, and I think the discussion has reached the point where Michael is waiting for your take on it. As for the LSO, LRO patches, I did not see any review comment. I will see that I can review from the series, to begin with, will send Eli some comments and questions. HW specific: - Jack and Michael's mlx4 FMR support. Will merge I guess, although I do hope to have time to address the DMA API abuse that is being copied from mthca, so that mlx4 and mthca work in Xen domU. This patch series is somehow important as without them iser is useless over connectx. Can be nice if you merge this and at max fix the abuse later. Or. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Distributed storage. Move away from char device ioctls.
On Sat, Sep 15, 2007 at 11:24:46AM -0600, Andreas Dilger ([EMAIL PROTECTED]) wrote: When Chris Mason announced btrfs, I found that quite a few new ideas are already implemented there, so I postponed project (although direction of the developement of the btrfs seems to move to the zfs side with some questionable imho points, so I think I can jump to the wagon of new filesystems right now). This is an area I'm always a bit sad about in OSS development - the need everyone has to make a new {fs, editor, gui, etc} themselves instead of spending more time improving the work we already have. Imagine where the If that would be true, we would be still in the stone age. Or not, actually I think the first cell in the universe would not bother itself dividing into the two just because it could spent infinite time trying to make itself better. internet would be (or not) if there were 50 different network protocols instead of TCP/IP? If you don't like some things about btrfs, maybe you can fix them? When some idea is implemented it is virtually impossible to change it, only recreate new one with fixed issues. So, we have multiple ext, reiser and many others. I do not say btrfs is broken or has design problems, it is really interesting filesystem, but all we have our own opinions about how things should be done, that's it. Btw, we do have so many network protocols for different purposes, that number of (storage) filesystems is negligebly small compared to it. Internet as is popular today is just a subset of where network is used. And we do invent new protocols each time we need something new, which does not fit into existing models (for example TCP by design can not work with very long-distance links with tooo long RTT). We have sctp to fix some tcp issues. Number of IP layer 'neighbours' is even more. Physical media layer has many different protocols too. And that is just what exists in the linux tree... To be honest, developing a new filesystem that is actually widely useful and used is a very time consuming task (see Reiserfs and Reiser4). It takes many years before the code is reliable enough for people to trust it, so most likely any effort you put into this would be wasted unless you can come up with something that is dramatically better than something existing. Yep, I know. Wasting my time is one of the most pleasant things I ever tried in my life. The part that bothers me is that this same effort could have been used to improve something that more people would use (btrfs in this case). Of course, sometimes the new code is substantially better than what currently exists, and I think btrfs may have laid claim to the current generation of filesystems. Call me greedy bastard, but I do not care about world happiness, it is just impossible to achieve. So I like what I do right now. If it will be rest under the layer of dust I do not care, I like the process of creating, so if it will fail, I just will get new knowledge. :) Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH|[NET]: migrate HARD_TX_LOCK to header file
I wanted to get rid of the extrenous cpu arguement and ended moving this to the header files since it looks common enough an operation that could be used elsewhere. It is a trivial change - i could resend with leaving it in dev.c and just getting rid of the cpu arguement. cheers, jamal [NET]: migrate HARD_TX_LOCK to header file HARD_TX_LOCK micro is a nice aggregation that could be used in other spots. move it to netdevice.h Also get rid of superflous cpu arguement while doing this .. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit 1bc3a7393737ab1f5239bd8dc2f2953dcee5391e tree 83a7f39b61fe45282eee825286996ba4bf72c0f6 parent 1f08657fc9b0b56039a9378ca030c2c8ed7bd8ac author Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 11:29:48 -0400 committer Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 11:29:48 -0400 include/linux/netdevice.h | 12 net/core/dev.c| 14 +- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index dc5e35f..c83e667 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1297,6 +1297,18 @@ static inline void netif_tx_unlock_bh(struct net_device *dev) spin_unlock_bh(dev-_xmit_lock); } +#define HARD_TX_LOCK(dev) {\ + if ((dev-features NETIF_F_LLTX) == 0) { \ + netif_tx_lock(dev); \ + } \ +} + +#define HARD_TX_UNLOCK(dev) { \ + if ((dev-features NETIF_F_LLTX) == 0) { \ + netif_tx_unlock(dev); \ + } \ +} + static inline void netif_tx_disable(struct net_device *dev) { netif_tx_lock_bh(dev); diff --git a/net/core/dev.c b/net/core/dev.c index 2897352..7934d28 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1574,18 +1574,6 @@ out_kfree_skb: return 0; } -#define HARD_TX_LOCK(dev, cpu) { \ - if ((dev-features NETIF_F_LLTX) == 0) { \ - netif_tx_lock(dev); \ - } \ -} - -#define HARD_TX_UNLOCK(dev) { \ - if ((dev-features NETIF_F_LLTX) == 0) { \ - netif_tx_unlock(dev); \ - } \ -} - /** * dev_queue_xmit - transmit a buffer * @skb: buffer to transmit @@ -1710,7 +1698,7 @@ gso: if (dev-xmit_lock_owner != cpu) { - HARD_TX_LOCK(dev, cpu); + HARD_TX_LOCK(dev); if (!netif_queue_stopped(dev) !netif_subqueue_stopped(dev, skb-queue_mapping)) {
[RFC][NET_SCHED] explict hold dev tx lock
While trying to port my batching changes to net-2.6.24 from this morning i realized this is something i had wanted to probe people on Challenge: For N Cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of operation, after all the work of entering the dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. Testing: I did some testing with a 4 cpu (2xdual core) with no irq binding. I run about 10 runs of 30M packets each from the stack with a udp app i wrote which is intended to run keep all 4 cpus busy - and to my suprise i found that we only bail out less than 0.1%. I may need a better test case. Changes: I made changes to the code path as defined in the patch included to and noticed a slight increase (2-3%) in performance with both e1000 and tg3; which was a relief because i thought the spinlock_irq (which is needed because some drivers grab tx lock in interupts) may have negative effects. The fact it didnt reduce performance was a good thing. Note: This is the highest end machine ive ever laid hands on, so this may be misleading. So - what side effects do people see in doing this? If none, i will clean it up and submit. cheers, jamal diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index dc5e35f..ab9966f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1271,6 +1271,12 @@ static inline void netif_tx_lock(struct net_device *dev) dev-xmit_lock_owner = smp_processor_id(); } +static inline void netif_tx_lock_irq(struct net_device *dev) +{ + spin_lock_irq(dev-_xmit_lock); + dev-xmit_lock_owner = smp_processor_id(); +} + static inline void netif_tx_lock_bh(struct net_device *dev) { spin_lock_bh(dev-_xmit_lock); @@ -1291,6 +1297,12 @@ static inline void netif_tx_unlock(struct net_device *dev) spin_unlock(dev-_xmit_lock); } +static inline void netif_tx_unlock_irq(struct net_device *dev) +{ + dev-xmit_lock_owner = -1; + spin_unlock_irq(dev-_xmit_lock); +} + static inline void netif_tx_unlock_bh(struct net_device *dev) { dev-xmit_lock_owner = -1; diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index e970e8e..f75a924 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -134,34 +134,23 @@ static inline int qdisc_restart(struct net_device *dev) { struct Qdisc *q = dev-qdisc; struct sk_buff *skb; - unsigned lockless; + unsigned lockless = (dev-features NETIF_F_LLTX); int ret; /* Dequeue packet */ if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL)) return 0; - /* - * When the driver has LLTX set, it does its own locking in - * start_xmit. These checks are worth it because even uncongested - * locks can be quite expensive. The driver can do a trylock, as - * is being done here; in case of lock contention it should return - * NETDEV_TX_LOCKED and the packet will be requeued. - */ - lockless = (dev-features NETIF_F_LLTX); - - if (!lockless !netif_tx_trylock(dev)) { - /* Another CPU grabbed the driver tx lock */ - return handle_dev_cpu_collision(skb, dev, q); - } - /* And release queue */ spin_unlock(dev-queue_lock); + if (!lockless) + netif_tx_lock_irq(dev); + ret = dev_hard_start_xmit(skb, dev); if (!lockless) - netif_tx_unlock(dev); + netif_tx_unlock_irq(dev); spin_lock(dev-queue_lock); q = dev-qdisc;
Re: [PATCH] Configurable tap interface MTU
From: Ed Swierk [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 09:54:35 -0700 On 9/11/07, Herbert Xu [EMAIL PROTECTED] wrote: Please make it 65535 without an Ethernet header and 65521 with an Ethernet header. Here is a revised patch that allows MTUs up to 65535 for tap interfaces and up to 65521 for tun interfaces. (If I set the MTU to 65521 on a tun interface, ping complains message too long when I send a 65521-byte packet; 65520 works okay, though.) Applied to net-2.6.24 Please provide a proper Signed-off-by: line and a full changelog with every patch submission and revision in the future. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.23-rc regression: bcm43xx does not work after commit 4cf92a3c
Hello. With latest git tree, bcm43xx driver does not work. By bisect, I've found the commit 4cf92a3c is the first bad commit. [PATCH] softmac: Fix ESSID problem Victor Porton reported that the SoftMAC layer had random problem when setting the ESSID : http://bugzilla.kernel.org/show_bug.cgi?id=8686 After investigation, it turned out to be worse, the SoftMAC layer is left in an inconsistent state. The fix is pretty trivial. Signed-off-by: Jean Tourrilhes [EMAIL PROTECTED] Acked-by: Michael Buesch [EMAIL PROTECTED] Acked-by: Larry Finger [EMAIL PROTECTED] Signed-off-by: John W. Linville [EMAIL PROTECTED] After reverting this commit, the driver starts working again. Regards, -- YOSHIFUJI Hideaki @ USAGI Project [EMAIL PROTECTED] GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 11:48:45 -0400 I wanted to get rid of the extrenous cpu arguement and ended moving this to the header files since it looks common enough an operation that could be used elsewhere. It is a trivial change - i could resend with leaving it in dev.c and just getting rid of the cpu arguement. The only reason the cpu argument is superfluous is because we don't provide a way to pass it on down to netif_tx_lock(). So instead netif_tx_lock() recomputes that value in this case which is extra unnecessary work. I would instead suggest, in netdevice.h: static inline void __netif_tx_lock(struct net_device *dev, int cpu) { spin_lock(dev-_xmit_lock); dev-xmit_lock_owner = cpu; } static inline void netif_tx_lock(struct net_device *dev) { __netif_tx_lock(dev, smp_processor_id()); } And make the HARD_TX_LOCK() call __netif_tx_lock() and pass in the already computed 'cpu' parameter. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][NET_SCHED] explict hold dev tx lock
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 12:14:34 -0400 Changes: I made changes to the code path as defined in the patch included to and noticed a slight increase (2-3%) in performance with both e1000 and tg3; which was a relief because i thought the spinlock_irq (which is needed because some drivers grab tx lock in interupts) may have negative effects. The fact it didnt reduce performance was a good thing. Note: This is the highest end machine ive ever laid hands on, so this may be misleading. So - what side effects do people see in doing this? If none, i will clean it up and submit. I tried this 4 years ago, it doesn't work. :-) Many drivers, particularly very old ones that PIO packets into a device which can take a long time, absolutely depend upon interrupts being enabled fully during -hard_start_xmit() so that other high periority devices (such as simpler serial controllers) can have their interrupts serviced during this slow operation. I don't think we want to do it anyways, whatever performance we gain from it is offset by the badness of disabling interrupts during this reasonably length stretch of code. The -rt folks as a result would notice this too and spank us :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver and samba
James Chapman wrote: Kok, Auke wrote: James Chapman wrote: Kok, Auke wrote: rx_long_byte_count: 34124849453 Are these long frames expected in your network? What is the MTU of the transmitting clients? Perhaps this might explain why reads work (because data is coming from the Linux box so the packets have smaller MTU) while writes cause delays or packet loss because the clients are sending long frames which are getting fragmented? those are not long frames but the number of bytes the hardware counted in its long data type based byte counter. Thanks for correcting me, Auke. Should this counter be renamed to avoid someone else making this mistake in the future? Just a thought. well, that would break tools that read this value. And for all of these stats we can say that you should read our SDM's to figure out what they really mean anyway, hence my caution to interpret the other value at first. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file
On Sun, 2007-16-09 at 12:28 -0700, David Miller wrote: The only reason the cpu argument is superfluous is because we don't provide a way to pass it on down to netif_tx_lock(). So instead netif_tx_lock() recomputes that value in this case which is extra unnecessary work. I would instead suggest .. sounds much better - will resend after a simple test. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][NET_SCHED] explict hold dev tx lock
On Sun, 2007-16-09 at 12:31 -0700, David Miller wrote: From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 12:14:34 -0400 So - what side effects do people see in doing this? If none, i will clean it up and submit. I tried this 4 years ago, it doesn't work. :-) ;- [good reasons removed here] I don't think we want to do it anyways, whatever performance we gain from it is offset by the badness of disabling interrupts during this reasonably length stretch of code. The -rt folks as a result would notice this too and spank us :-) indeed. Ok, maybe i am thinking too hard with that patch, so help me out:- When i looked at that code path as it is today: i felt the softirq could be interupted on the same CPU it is running on while it already grabbed that tx lock (if the trylock succeeds) and that the hardirq code when attempting to grab the lock would result in a deadlock. Did i misread that? When i experimented with tg3 and e1000 i did not see any such problems with the non irq version of the lock. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][NET_SCHED] explict hold dev tx lock
On Sun, 2007-16-09 at 16:41 -0400, jamal wrote: indeed. Ok, maybe i am thinking too hard with that patch, so help me out:- Ok, that was probably too much of an explanation. What i should say is if i grabbed the lock explicitly without disabling irqs it wont be much different than what is done today and should always work. No? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file
On Sun, 2007-16-09 at 16:28 -0400, jamal wrote: sounds much better - will resend after a simple test. Ok, heres the revised version cheers, jamal [NET]: migrate HARD_TX_LOCK to header file HARD_TX_LOCK micro is a nice aggregation that could be used in other spots. move it to netdevice.h Also makes sure the previously superflous cpu arguement is used. Thanks to DaveM for the suggestions. Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED] --- commit e467e3cb7fca9b533543aa749395547b7ade4980 tree 5e03a405e32968cc8e9e875ecdaeec4e798b6809 parent f55ad5bb4809bdd07720387c62788fad5359d41c author Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 16:54:44 -0400 committer Jamal Hadi Salim [EMAIL PROTECTED] Sun, 16 Sep 2007 16:54:44 -0400 include/linux/netdevice.h | 21 +++-- net/core/dev.c| 12 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index dc5e35f..d529a0c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1265,10 +1265,15 @@ static inline void netif_rx_complete(struct net_device *dev, * * Get network device transmit lock */ -static inline void netif_tx_lock(struct net_device *dev) +static inline void __netif_tx_lock(struct net_device *dev, int cpu) { spin_lock(dev-_xmit_lock); - dev-xmit_lock_owner = smp_processor_id(); + dev-xmit_lock_owner = cpu; +} + +static inline void netif_tx_lock(struct net_device *dev) +{ + __netif_tx_lock(dev, smp_processor_id()); } static inline void netif_tx_lock_bh(struct net_device *dev) @@ -1297,6 +1302,18 @@ static inline void netif_tx_unlock_bh(struct net_device *dev) spin_unlock_bh(dev-_xmit_lock); } +#define HARD_TX_LOCK(dev, cpu) { \ + if ((dev-features NETIF_F_LLTX) == 0) { \ + __netif_tx_lock(dev, cpu); \ + } \ +} + +#define HARD_TX_UNLOCK(dev) { \ + if ((dev-features NETIF_F_LLTX) == 0) { \ + netif_tx_unlock(dev); \ + } \ +} + static inline void netif_tx_disable(struct net_device *dev) { netif_tx_lock_bh(dev); diff --git a/net/core/dev.c b/net/core/dev.c index 2897352..a1f6ca6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1574,18 +1574,6 @@ out_kfree_skb: return 0; } -#define HARD_TX_LOCK(dev, cpu) { \ - if ((dev-features NETIF_F_LLTX) == 0) { \ - netif_tx_lock(dev); \ - } \ -} - -#define HARD_TX_UNLOCK(dev) { \ - if ((dev-features NETIF_F_LLTX) == 0) { \ - netif_tx_unlock(dev); \ - } \ -} - /** * dev_queue_xmit - transmit a buffer * @skb: buffer to transmit
Re: [RFC][NET_SCHED] explict hold dev tx lock
On Sun, 2007-16-09 at 16:52 -0400, jamal wrote: What i should say is if i grabbed the lock explicitly without disabling irqs it wont be much different than what is done today and should always work. No? And to be more explicit, heres a patch using the macros from previous patch. So far tested on 3 NICs. cheers, jamal diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index e970e8e..1ae905e 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -134,34 +134,18 @@ static inline int qdisc_restart(struct net_device *dev) { struct Qdisc *q = dev-qdisc; struct sk_buff *skb; - unsigned lockless; int ret; /* Dequeue packet */ if (unlikely((skb = dev_dequeue_skb(dev, q)) == NULL)) return 0; - /* - * When the driver has LLTX set, it does its own locking in - * start_xmit. These checks are worth it because even uncongested - * locks can be quite expensive. The driver can do a trylock, as - * is being done here; in case of lock contention it should return - * NETDEV_TX_LOCKED and the packet will be requeued. - */ - lockless = (dev-features NETIF_F_LLTX); - - if (!lockless !netif_tx_trylock(dev)) { - /* Another CPU grabbed the driver tx lock */ - return handle_dev_cpu_collision(skb, dev, q); - } - /* And release queue */ spin_unlock(dev-queue_lock); + HARD_TX_LOCK(dev, smp_processor_id()); ret = dev_hard_start_xmit(skb, dev); - - if (!lockless) - netif_tx_unlock(dev); + HARD_TX_UNLOCK(dev); spin_lock(dev-queue_lock); q = dev-qdisc;
Re: [PATCH] tehuti: driver for Tehuti 10GbE network adapters
erp, changes in the net-2.6.24 tree breaks this. drivers/net/tehuti.c: In function 'bdx_isr_napi': drivers/net/tehuti.c:268: error: too few arguments to function 'netif_rx_schedule_prep' drivers/net/tehuti.c:269: error: too few arguments to function '__netif_rx_schedule' drivers/net/tehuti.c: In function 'bdx_poll': drivers/net/tehuti.c:302: error: too few arguments to function 'netif_rx_complete' drivers/net/tehuti.c: In function 'bdx_hw_start': drivers/net/tehuti.c:414: error: implicit declaration of function 'netif_poll_enable' drivers/net/tehuti.c: In function 'bdx_hw_stop': drivers/net/tehuti.c:428: error: implicit declaration of function 'netif_poll_disable' drivers/net/tehuti.c: In function 'bdx_rx_receive': drivers/net/tehuti.c:1219: error: 'struct net_device' has no member named 'quota' drivers/net/tehuti.c:1219: warning: type defaults to 'int' in declaration of '_y' drivers/net/tehuti.c:1219: error: 'struct net_device' has no member named 'quota' drivers/net/tehuti.c:1311: error: 'struct net_device' has no member named 'quota' drivers/net/tehuti.c: In function 'bdx_probe': drivers/net/tehuti.c:1994: error: 'struct net_device' has no member named 'poll' drivers/net/tehuti.c:1995: error: 'struct net_device' has no member named 'weight' drivers/net/tehuti.c:2058: error: implicit declaration of function 'SET_MODULE_OWNER' There's a lot of churn in networking at present and I don't have time/inclination to fix this one up, sorry. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ne driver crashes when unloaded in 2.6.22.6
On Sat, 2007-09-15 at 21:27 +0100, Chris Rankin wrote: --- Dan Williams [EMAIL PROTECTED] wrote: On Wed, 2007-09-12 at 19:23 +0100, Chris Rankin wrote: Hmm, apparently not. The light on the card goes out though, so could this just be a lack of driver support? Likely, yes. I've been trawling the Internet for 8390 specifications and have discovered that there is a Carrier Sense Loss flag on the Transmit Status Register. However, there doesn't seem to be an explicit media status test. Would this more likely be part of the NE2000's functionality? I can't find any signs of MII support, but then the NE2000 is so heavily cloned that NE2000-compatible seems to have become more of a generic description these days. Does anyone have any ideas, please? Does NetworkManager even need full carrier-detection support? NM needs it if you want the interface to be automatically handled in 0.6.x and earlier. In 0.7.x and later you'll be able to have NM just set up the interface even if it doesn't have a link (if you set it to autoconnect), but of course that means that whenever you start NM the interface will be up with the settings you specify because, of course, NM can't automatically figure out when the card is up or down and do something intelligent. dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH|[NET]: migrate HARD_TX_LOCK to header file
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 16:57:14 -0400 On Sun, 2007-16-09 at 16:28 -0400, jamal wrote: sounds much better - will resend after a simple test. Ok, heres the revised version Applied, thanks Jamal. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPV6: fix source address selection
From: Jiri Kosina [EMAIL PROTECTED] Date: Thu, 13 Sep 2007 00:56:00 +0200 (CEST) From: Jiri Kosina [EMAIL PROTECTED] [PATCH] IPV6: fix source address selection The commit 95c385 broke proper source address selection for cases in which there is a address which is makred 'deprecated'. The commit mistakenly changed ifa-flags to ifa_result-flags (probably copy/paste error from a few lines above) in the 'Rule 3' address selection code. The patch below restores the previous RFC-compliant behavior, please apply. Cc: Jiri Bohac [EMAIL PROTECTED] Cc: Petr Baudis [EMAIL PROTECTED] Signed-off-by: Jiri Kosina [EMAIL PROTECTED] Excellent catch Jiri. I'll apply this and push to -stable as well. Thanks a lot! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: Fix the prototype of call_netdevice_notifiers
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Thu, 13 Sep 2007 09:59:05 -0600 This replaces the void * parameter with a struct net_device * which is what is actually required. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] Applied to net-2.6.24, thanks Eric. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network Namespace status
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Thu, 13 Sep 2007 13:12:08 -0600 The final blocker to having multiple useful instances of network namespaces is the loopback device. We recognize the network namespace of incoming packets by looking at dev-nd_net. Which means for packets to properly loopback within a network namespace we need a loopback device per network namespace. There were some concerns expressed when we posted the cleanup part of the patches that allowed for multiple loopback devices a few weeks ago so resolving this one may be tricky. There was a change posted recently to dynamically allocate the loopback device. I like that (sorry I don't have a reference to the patch handy), and you can build on top of that to get the namespace local loopback objects you want. static struct net_device *loopback_dev(struct net_namespace *net) { ... } You get the idea. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3 PATCH 0/2] Add RCU locking to SCTPaddress management
From: Vlad Yasevich [EMAIL PROTECTED] Date: Thu, 13 Sep 2007 15:34:35 -0400 Thanks to Sridhar Samudral and Paul McKenney for all the help and comments. I think this is a final version, unless someone else can spot more problems. I've ran this under heavy load and it the patches behaves well. I think patch 1 is a candidate for 2.6.23 since it fixes a bug, but splitting these seems a bit odd to me. I'll leave it to DaveM to decide where to put them. Since you tested this well, I've decided to put both of these patches into net-2.6 I agree it's stupid to split them up. There'll be some merge hassles when I rebase net-2.6.24, but that tree is such a monster that this is inevitable for every bug fix I queue up for 2.6.23 :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for 2.6.24] SCTP: Move sysctl_sctp_[rw]mem definitions to protocol.c
From: Vlad Yasevich [EMAIL PROTECTED] Date: Thu, 13 Sep 2007 17:03:45 -0400 The sctp_[rw]mem definitions should really be in protocol.c since that is where they are initialized. This also allows one to build a kernel without sysctl support. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks Vlad. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] [PPP] pppoe: Fix skb_unshare_check call position
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:08:49 +0800 [PPP] pppoe: Fix skb_unshare_check call position The skb_unshare_check call needs to be made before pskb_may_pull, not after. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Patch applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:09:04 +0800 [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value The function __pppoe_xmit modifies the skb data and therefore it needs to copy and skb data if it's cloned. In fact, it currently allocates a new skb so that it can return 0 in case of error without freeing the original skb. This is totally wrong because returning zero is meant to indicate congestion whereupon pppoe is supposed to wake up the upper layer once the congestion subsides. This makes sense for ppp_async and ppp_sync but is out-of-place for pppoe. This patch makes it always return 1 and free the skb. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] [PPP] pppoe: Fill in header directly in __pppoe_xmit
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:09:12 +0800 [PPP] pppoe: Fill in header directly in __pppoe_xmit This patch removes the hdr variable (which is copied into the skb) and instead sets the header directly in the skb. It also uses __skb_push instead of skb_push since we've just checked using skb_cow for enough head room. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] [BRIDGE]: Kill clone argument to br_flood_*
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:09:14 +0800 [BRIDGE]: Kill clone argument to br_flood_* The clone argument is only used by one caller and that caller can clone the packet itself. This patch moves the clone call into the caller and kills the clone argument. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] [NET] skbuff: Add skb_cow_head
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:09:15 +0800 [NET] skbuff: Add skb_cow_head This patch adds an optimised version of skb_cow that avoids the copy if the header can be modified even if the rest of the payload is cloned. This can be used in encapsulating paths where we only need to modify the header. As it is, this can be used in PPPOE and bridging. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] [PPP] generic: Fix receive path data clobbering non-linear handling
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 17:09:17 +0800 [PPP] generic: Fix receive path data clobbering non-linear handling This patch adds missing pskb_may_pull calls to deal with non-linear packets that may arrive from pppoe or pppol2tp. It also copies cloned packets before writing over them. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NETLINK]: Avoid pointer in netlink_run_queue
From: Herbert Xu [EMAIL PROTECTED] Date: Fri, 31 Aug 2007 20:09:30 +0800 Hi Dave: [NETLINK]: Avoid pointer in netlink_run_queue I was looking at Patrick's fix to inet_diag and it occured to me that we're using a pointer argument to return values unnecessarily in netlink_run_queue. Changing it to return the value will allow the compiler to generate better code since the value won't have to be memory-backed. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Applied to net-2.6.24, thanks Herbert. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SKBUFF]: Fix up csum_start when head room changes
From: Herbert Xu [EMAIL PROTECTED] Date: Sat, 1 Sep 2007 09:13:33 +0800 Hi Dave: [SKBUFF]: Fix up csum_start when head room changes Thanks for noticing the bug where csum_start is not updated when the head room changes. This patch fixes that. It also moves the csum/ip_summed copying into copy_skb_header so that skb_copy_expand gets it too. I've checked its callers and no one should be upset by this. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Herbert, thanks for following up on this. Although this is technically a bug fix we don't have anyone explicitly triggering this and I don't feel comfortable pushing this into net-2.6 without a reported failure case right now. So I applied it to net-2.6.24 for now. If you disagree, plead your case :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] netlink: use a statically allocated nl_table instead
From: Denis Cheng [EMAIL PROTECTED] Date: Sun, 2 Sep 2007 03:45:59 +0800 if the table is always fixed size with MAX_LINKS entries, why not use a statically allocated table straightforwardly? Signed-off-by: Denis Cheng [EMAIL PROTECTED] I made the explicit decision to dynamically allocate because many systems have limits on how large the kernel image can be and therefore the less we statically allocate huge tables (constant size or not) the better. Lockdep is the worst offender, for example, it's completely awful. It consumes 4MB of kernel BSS space when enabled on a 64-bit platform. Patch not applied. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net/: all net/ cleanup with ARRAY_SIZE
From: Denis Cheng [EMAIL PROTECTED] Date: Sun, 2 Sep 2007 18:30:17 +0800 Signed-off-by: Denis Cheng [EMAIL PROTECTED] You already submitted the net/ipv4/af_inet.c case seperately, so I had to remove it from this patch for it to apply properly. Please keep your patches straight to avoid problems like this. Thans. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network Namespace status
David Miller [EMAIL PROTECTED] writes: From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Thu, 13 Esp 2007 13:12:08 -0600 The final blocker to having multiple useful instances of network namespaces is the loopback device. We recognize the network namespace of incoming packets by looking at dev-nd_net. Which means for packets to properly loopback within a network namespace we need a loopback device per network namespace. There were some concerns expressed when we posted the cleanup part of the patches that allowed for multiple loopback devices a few weeks ago so resolving this one may be tricky. There was a change posted recently to dynamically allocate the loopback device. I like that (sorry I don't have a reference to the patch handy), and you can build on top of that to get the namespace local loopback objects you want. static struct net_device *loopback_dev(struct net_namespace *net) { ... } You get the idea. Sure. Thanks. Since the change got dropped I figured it for a rejection, and that I would have to rework that patch. On a similar note. It recently occurred to me that I can make creating multiple network namespaces depend on !CONFIG_SYSFS. Which will allow most of the rest of the patches I am sure of to be merged now. And give me just a little more time to work with Tejun and finish up the sysfs support. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net
From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 22:07:14 +0200 Could we just make it so dev-init is not allowed to fail? Then it can be a void function and the nasty unwind code can go? Someone (not me :-) need to do an audit to find all current users of this function and determine if they all can live without returning errors. If so, sure let's make the change and simplify things. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add ICMPMsgStats MIB (RFC 4293) [rev 2]
From: David Stevens [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 15:25:32 -0600 Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. [new to 2nd revision] 6) support per-interface ICMP stats 7) use common macro for per-device stat macros IPv6 patch attached. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] No objections, so patch applied to net-2.6.24 The following is not directed at this patch specifically, but rather in general. All of these crappy idev == NULL checks for nearly EVERY SINGLE ipv6 counter bump has gotten _WAY_ out of control. By definition this whole situation is broken if we need to test the thing basically everywhere. And it's the worst kind of disease because it's hidden inside all kinds of macros so when you're reading the code you don't see this nearly constant overhead spread all over the ipv6 stack in the most critical paths we have. How many remote OOPS'er DoS bugs have we had in ipv6 because of how this stuff works? I can remember at least 3, and that's 3 too many. We need to fix this, and I don't care how, such that idev is never NULL and at least points to some dummy ipv6 idev object. And it must be done in such a way that the cure is not worse than the disease :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net
David Miller [EMAIL PROTECTED] writes: From: Stephen Hemminger [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 22:07:14 +0200 Could we just make it so dev-init is not allowed to fail? Then it can be a void function and the nasty unwind code can go? Someone (not me :-) need to do an audit to find all current users of this function and determine if they all can live without returning errors. If so, sure let's make the change and simplify things. I did that audit when I replied to Stephen the first time and I just redid it to verify myself. We are calling functions that can fail from the init function (kmalloc in the most common). So the init function can fail. So short of adding a bunch of BUG_ON's to the kernel to trap those failure cases we can't remove the backwards list walk. Especially since I can initiate this code path as root by calling clone(CLONE_NEWNET...). Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NETNS] Use list_for_each_entry_continue_reverse in setup_net
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Sun, 16 Sep 2007 18:06:00 -0600 I did that audit when I replied to Stephen the first time and I just redid it to verify myself. We are calling functions that can fail from the init function (kmalloc in the most common). So the init function can fail. So short of adding a bunch of BUG_ON's to the kernel to trap those failure cases we can't remove the backwards list walk. Especially since I can initiate this code path as root by calling clone(CLONE_NEWNET...). I just noticed that posting and thanks for reiterating. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote: The only major complaint I have about this patch series is that the IPoIB part should just be one big changeset. Dave, you do realize that i have been investing my time working on batching as well, right? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 20:29:18 -0400 On Sun, 2007-16-09 at 16:17 -0700, David Miller wrote: The only major complaint I have about this patch series is that the IPoIB part should just be one big changeset. Dave, you do realize that i have been investing my time working on batching as well, right? I do. And I'm reviewing and applying several hundred patches a day. What's the point? :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
On Sun, 2007-16-09 at 18:02 -0700, David Miller wrote: I do. And I'm reviewing and applying several hundred patches a day. What's the point? :-) Reading the commentary made me think you were about to swallow that with one more change by the time i wake up;- I still think this work - despite my vested interest - needs more scrutiny from a performance perspective. I tend to send a url to my work, but it may be time to start posting patches. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 22:14:21 -0400 I still think this work - despite my vested interest - needs more scrutiny from a performance perspective. Absolutely. There are tertiary issues I'm personally interested in, for example how well this stuff works when we enable software GSO on a non-TSO capable card. In such a case the GSO segment should be split right before we hit the driver and then all the sub-segments of the original GSO frame batched in one shot down to the device driver. In this way you'll get a large chunk of the benefit of TSO without explicit hardware support for the feature. There are several cards (some even 10GB) that will benefit immensely from this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] SCTP: protocol definitions for SCTP-AUTH implementation
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:52 -0400 Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] SCTP: Implement SCTP-AUTH internals
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:53 -0400 This patch implements the internals operations of the AUTH, such as key computation and storage. It also adds necessary variables to the SCTP data structures. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, with lots of trailing whitspace fixed. Please check your patches with GIT by using something such as git apply --check --whitespace=error-all foo.diff in the future, and you'll see stuff like this: Adds trailing whitespace. diff:696: Adds trailing whitespace. diff:732: return secret; Adds trailing whitespace. diff:805: Adds trailing whitespace. diff:815:/* Adds trailing whitespace. diff:1034: break; Adds trailing whitespace. diff:1098: Adds trailing whitespace. diff:1109: fatal: 7 lines add trailing whitespaces. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] SCTP: Implement SCTP-AUTH initializations.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:54 -0400 The patch initializes AUTH related members of the generic SCTP structures and provides a way to enable/disable auth extension. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/8] SCTP: Implete SCTP-AUTH parameter processing
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:55 -0400 Implement processing for the CHUNKS, RANDOM, and HMAC parameters and deal with how this parameters are effected by association restarts. In particular, during unexpeted INIT processing, we need to reply with parameters from the original INIT chunk. Also, after restart, we need to update the old association with new peer parameters and change the association shared keys. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] SCTP: Enable the sending of the AUTH chunk.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:56 -0400 SCTP-AUTH, Section 6.2: Endpoints MUST send all requested chunks authenticated where this has been requested by the peer. The other chunks MAY be sent authenticated or not. If endpoint pair shared keys are used, one of them MUST be selected for authentication. To send chunks in an authenticated way, the sender MUST include these chunks after an AUTH chunk. This means that a sender MUST bundle chunks in order to authenticate them. If the endpoint has no endpoint pair shared key for the peer, it MUST use Shared Key Identifier 0 with an empty endpoint pair shared key. If there are multiple endpoint shared keys the sender selects one and uses the corresponding Shared Key Identifier Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/8] SCTP: Implement the receive and verification of AUTH chunk
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:57 -0400 This patch implements the receive path needed to process authenticated chunks. Add ability to process the AUTH chunk and handle edge cases for authenticated COOKIE-ECHO as well. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] SCTP: API updates to suport SCTP-AUTH extensions.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:58 -0400 Add SCTP-AUTH API. The API implemented here was agreed to between implementors at the 9th SCTP Interop. It will be documented in the next revision of the SCTP socket API spec. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] SCTP: Tie ADD-IP and AUTH functionality as required by spec.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 14:44:59 -0400 ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH. So, disable ADD-IP functionality if the peer claims to support ADD-IP, but not AUTH. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Applied to net-2.6.24, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2 PATCH 8/8] SCTP: Tie ADD-IP and AUTH functionality as required by spec.
From: Vlad Yasevich [EMAIL PROTECTED] Date: Fri, 14 Sep 2007 15:14:50 -0400 [.. forgot to refresh the patch, the other version has compile problems ..] ADD-IP spec requires AUTH. It is, in fact, dangerous without AUTH. So, disable ADD-IP functionality if the peer claims to support ADD-IP, but not AUTH. Signed-off-by: Vlad Yasevich [EMAIL PROTECTED] Aha, I caught this and applied the correct patch. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Blackfin EMAC driver: Add phyabstraction layer supporting in bfin_emac driver
On Sat 15 Sep 2007 22:57, Bryan Wu pondered: - add MDIO functions and register mdio bus - add phy abstraction layer (PAL) functions and use PAL API - test on STAMP537 board Today, the Kconfig for the Blackfin just includes: config BFIN_MAC tristate Blackfin 536/537 on-chip mac support depends on NET_ETHERNET (BF537 || BF536) (!BF537_PORT_H) select CRC32 select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE help This is the driver for blackfin on-chip mac device. Say Y if you want it compiled into the kernel. This driver is also available as a module ( = code which can be inserted in and removed from the running kernel whenever you want). The module will be called bfin_mac. Since you are adding requirement for the PHYLIB with this patch, should there be a select for that? -Robin - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
On Sun, 2007-16-09 at 19:25 -0700, David Miller wrote: There are tertiary issues I'm personally interested in, for example how well this stuff works when we enable software GSO on a non-TSO capable card. In such a case the GSO segment should be split right before we hit the driver and then all the sub-segments of the original GSO frame batched in one shot down to the device driver. I think GSO is still useful on top of this. In my patches anything with gso gets put into the batch list and shot down the driver. Ive never considered checking whether the nic is TSO capable, that may be something worth checking into. The netiron allows you to shove upto 128 skbs utilizing one tx descriptor, which makes for interesting possibilities. In this way you'll get a large chunk of the benefit of TSO without explicit hardware support for the feature. There are several cards (some even 10GB) that will benefit immensely from this. indeed - ive always wondered if batching this way would make the NICs behave differently from the way TSO does. On a side note: My observation is that with large packets on a very busy system; bulk transfer type app, one approaches wire speed; with or without batching, the apps are mostly idling (Ive seen upto 90% idle time polling at the socket level for write to complete with a really busy system). This is the case with or without batching. cpu seems a little better with batching. As the aggregation of the apps gets more aggressive (achievable by reducing their packet sizes), one can achieve improved throughput and reduced cpu utilization. This all with UDP; i am still studying tcp. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 23:01:43 -0400 I think GSO is still useful on top of this. In my patches anything with gso gets put into the batch list and shot down the driver. Ive never considered checking whether the nic is TSO capable, that may be something worth checking into. The netiron allows you to shove upto 128 skbs utilizing one tx descriptor, which makes for interesting possibilities. We're talking past each other, but I'm happy to hear that for sure your code does the right thing :-) Right now only TSO capable hardware sets the TSO capable bit, except perhaps for the XEN netfront driver. What Herbert and I want to do is basically turn on TSO for devices that can't do it in hardware, and rely upon the GSO framework to do the segmenting in software right before we hit the device. This only makes sense for devices which can 1) scatter-gather and 2) checksum on transmit. Otherwise we make too many copies and/or passes over the data. And we can only get the full benefit if we can pass all the sub-segments down to the driver in one -hard_start_xmit() call. On a side note: My observation is that with large packets on a very busy system; bulk transfer type app, one approaches wire speed; with or without batching, the apps are mostly idling (Ive seen upto 90% idle time polling at the socket level for write to complete with a really busy system). This is the case with or without batching. cpu seems a little better with batching. As the aggregation of the apps gets more aggressive (achievable by reducing their packet sizes), one can achieve improved throughput and reduced cpu utilization. This all with UDP; i am still studying tcp. UDP apps spraying data tend to naturally batch well and load balance amongst themselves because each socket fills up to it's socket send buffer limit, then sleeps, and we then get a stream from the next UDP socket up to it's limit, and so on and so forth. UDP is too easy a test case in fact :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] Blackfin EMAC driver: Add phyabstraction layer supporting in bfin_emac driver
On Sun, 2007-09-16 at 22:51 -0400, Robin Getz wrote: On Sat 15 Sep 2007 22:57, Bryan Wu pondered: - add MDIO functions and register mdio bus - add phy abstraction layer (PAL) functions and use PAL API - test on STAMP537 board Today, the Kconfig for the Blackfin just includes: config BFIN_MAC tristate Blackfin 536/537 on-chip mac support depends on NET_ETHERNET (BF537 || BF536) (!BF537_PORT_H) select CRC32 select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE help This is the driver for blackfin on-chip mac device. Say Y if you want it compiled into the kernel. This driver is also available as a module ( = code which can be inserted in and removed from the running kernel whenever you want). The module will be called bfin_mac. Since you are adding requirement for the PHYLIB with this patch, should there be a select for that? -Robin OK, I will send a patch for this update, since some people failed to compile the kernel without select the PHYLIB. Thanks -Bryan Wu - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
net-2.6.24 plans
Most if not all of my 2 week backlog of patches is in the net-2.6.24 and net-2.6 tree now. And any relevant -stable fixes will be submitted in the next day or two. Tomorrow (Monday) I want to rebase the net-2.6.24 tree one more time to deal with all of the conflicts which exist between linux-2.6/net-2.6 and net-2.6.24, but I'll likely defer that until the net-2.6 fixes I just pushed to Linus are integrated. It's to the point where every single bug fix put into Linus's tree creates a merge conflict with net-2.6.24, we are simply touching that much stuff. :-) I expect some small network namespace fixes from Eric B., but that's basically it as far as 2.6.24 is concerned. Oh yes, there are also the MAC_FMT/MAC_ARG bits from Joe Perches that I need to do a merge of. The transmit batching stuff needs a lot more analysis and discussion, so I definitely see that stuff as 2.6.25 material. I think if we can avoid a food fight between Jamal and Mr. Kumar and have healthy discussions, we can end up with a really nice implementation. So everyone put your boxing gloves away and let's get at it. :-) We've touched so much in net-2.6.24 that we really should be auditing the thing and fixing any bugs that have been added. If you're bored and looking for something to do, pick an odd NAPI driver and audit it in the net-2.6.24 tree. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Blackfin EMAC driver: add a select for the PHYLIB of this driver
Since we are adding requirement for the PHYLIB for this driver, there should be a select for that Cc: Robin Getz [EMAIL PROTECTED] Signed-off-by: Bryan Wu [EMAIL PROTECTED] --- drivers/net/Kconfig |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 5b9e17b..5eef224 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -843,6 +843,8 @@ config BFIN_MAC tristate Blackfin 536/537 on-chip mac support depends on NET_ETHERNET (BF537 || BF536) (!BF537_PORT_H) select CRC32 + select MII + select PHYLIB select BFIN_MAC_USE_L1 if DMA_UNCACHED_NONE help This is the driver for blackfin on-chip mac device. Say Y if you want it -- 1.5.2 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/10 REV5] [sched] Modify qdisc_run to support batching
Hi Evgeniy, Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 05:45:19 PM: + if (skb-next) { + int count = 0; + + do { +struct sk_buff *nskb = skb-next; + +skb-next = nskb-next; +__skb_queue_tail(dev-skb_blist, nskb); +count++; + } while (skb-next); Could it be list_move()-like function for skb lists? I'm pretty sure if you change first and the last skbs and ke of the queue in one shot, result will be the same. I have to do a bit more like update count, etc, but I agree it is do-able. I had mentioned in my PATCH 0/10 that I will later try this suggestion that you provided last time. Actually how many skbs are usually batched in your load? It depends, eg when the tx lock is not got, I get batching of upto 8-10 skbs (assuming that tx lock was not got quite a few times). But when the queue gets blocked, I have seen batching upto 4K skbs (if tx_queue_len is 4K). + /* Reset destructor for kfree_skb to work */ + skb-destructor = DEV_GSO_CB(skb)-destructor; + kfree_skb(skb); Why do you free first skb in the chain? This is the gso code which has segmented 'skb' to skb'1-n', and those skb'1-n' are sent out and freed by driver, which means the dummy 'skb' (without any data) remains to be freed. Thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/10 REV5] [core] Add skb_blist support for batching
Hi Evgeniy, Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 06:16:38 PM: + if (dev-features NETIF_F_BATCH_SKBS) { + /* Driver supports batching skb */ + dev-skb_blist = kmalloc(sizeof *dev-skb_blist, GFP_KERNEL); + if (dev-skb_blist) + skb_queue_head_init(dev-skb_blist); + } + A nitpick is that you should use sizeof(struct ...) and I think it requires flag clearing in cae of failed initialization? I thought it is better to use *var name in case the name of the structure changes. Also, the flag is not cleared since I could try to enable batching later, and it could succeed at that time. When skb_blist is allocated, then batching is enabled otherwise it is disabled (while features flag just indicates that driver supports batching). Thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 10/10 REV5] [E1000] Implement batching
Hi Evgeniy, Evgeniy Polyakov [EMAIL PROTECTED] wrote on 09/14/2007 06:17:14 PM: if (unlikely(skb-len = 0)) { dev_kfree_skb_any(skb); - return NETDEV_TX_OK; + return NETDEV_TX_DROPPED; } This changes could actually go as own patch, although not sure it is ever used. just a though, not a stopper. Since this flag is new and useful only for batching, I feel it is OK to include it in this patch. + if (!skb || (blist skb_queue_len(blist))) { + /* + * Either batching xmit call, or single skb case but there are + * skbs already in the batch list from previous failure to + * xmit - send the earlier skbs first to avoid out of order. + */ + if (skb) + __skb_queue_tail(blist, skb); + skb = __skb_dequeue(blist); Why is it put at the end? There is a bug that I had explained in rev4 (see XXX below) resulting in sending out skbs out of order. The fix is that if the driver gets called with a skb but there are older skbs already in the batch list (which failed to get sent out), send those skbs first before this one. Thanks, - KK [XXX] Dave had suggested to use batching only in the net_tx_action case. When I implemented that in earlier revisions, there were lots of TCP retransmissions (about 18,000 to every 1 in regular code). I found the reason for part of that problem as: skbs get queue'd up in dev-qdisc (when tx lock was not got or queue blocked); when net_tx_action is called later, it passes the batch list as argument to qdisc_run and this results in skbs being moved to the batch list; then batching xmit also fails due to tx lock failure; the next many regular xmit of a single skb will go through the fast path (pass NULL batch list to qdisc_run) and send those skbs out to the device while previous skbs are cooling their heels in the batch list. The first fix was to not pass NULL/batch-list to qdisc_run() but to always check whether skbs are present in batch list when trying to xmit. This reduced retransmissions by a third (from 18,000 to around 12,000), but led to another problem while testing - iperf transmit almost zero data for higher # of parallel flows like 64 or more (and when I run iperf for a 2 min run, it takes about 5-6 mins, and reports that it ran 0 secs and the amount of data transfered is a few MB's). I don't know why this happens with this being the only change (any ideas is very appreciated). The second fix that resolved this was to revert back to Dave's suggestion to use batching only in net_tx_action case, and modify the driver to see if skbs are present in batch list and to send them out first before sending the current skb. I still see huge retransmission for IPoIB (but not for E1000), though it has reduced to 12,000 from the earlier 18,000 number. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
Hi Dave, David Miller [EMAIL PROTECTED] wrote on 09/17/2007 04:47:48 AM: The only major complaint I have about this patch series is that the IPoIB part should just be one big changeset. Otherwise the tree is not bisectable, for example the initial ipoib header file change breaks the build. Right, I will change it accordingly. On a lower priority, I question the indirection of skb_blist by making it a pointer. For what? Saving 12 bytes on 64-bit? That kmalloc()'d thing is a nearly guarenteed cache and/or TLB miss. Just inline the thing, we generally don't do crap like this anywhere else. The intention was to avoid having two flags (one that driver supports batching and second to indicate that batching is on/off). So I could test skb_blist as an indication of whether batching is on/off. But your point on cache miss is absolutely correct, and I will change this part to be inline. thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching
Hi Randy, Randy Dunlap [EMAIL PROTECTED] wrote on 09/15/2007 12:07:09 AM: + To fix this problem, error cases where driver xmit gets called with a + skb must code as follows: + 1. If driver xmit cannot get tx lock, return NETDEV_TX_LOCKED + as usual. This allows qdisc to requeue the skb. + 2. If driver xmit got the lock but failed to send the skb, it + should return NETDEV_TX_BUSY but before that it should have + queue'd the skb to the batch list. In this case, the qdisc queued + does not requeue the skb. Since this was a new section that I added to the documentation, this error creeped up. Thanks for catching it, and review comments/ack-off :) thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 plans
David Miller wrote: We've touched so much in net-2.6.24 that we really should be auditing the thing and fixing any bugs that have been added. If you're bored and looking for something to do, pick an odd NAPI driver and audit it in the net-2.6.24 tree. You could try that weird post patches on the list thing for review. I dunno about sparc64, but IMO any networking work you do yourself and commit yourself should also be sent to the list for review. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/10 REV5] [Doc] HOWTO Documentation for batching
Please remove me from the CC list. I get this via netdev, and not having said a single thing in the thread, I don't feel the need to be CC'd on every email. The CC list is pretty massive as it is, anyway. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) [RESEND]
Dave, Thanks. That rev2 was for v6-only; I didn't see anythng about the v4 patch (below, in case it fell through the cracks). +-DLS - Forwarded by David Stevens/Beaverton/IBM on 09/16/2007 09:02 PM - David Stevens/Beaverton/[EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/10/2007 07:25 PM To [EMAIL PROTECTED], [EMAIL PROTECTED] cc netdev@vger.kernel.org Subject [PATCH][1/2] Add ICMPMsgStats MIB (RFC 4293) Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches remove (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add IcmpMsg with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for Icmp to get the named data from new counters. IPv4 patch attached, IPv6 patch to follow. +-DLS Signed-off-by: David L Stevens [EMAIL PROTECTED] diff -ruNp linux-2.6.22.5/include/linux/snmp.h linux-2.6.22.5_ICMPMSG/include/linux/snmp.h --- linux-2.6.22.5/include/linux/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/linux/snmp.h 2007-08-23 15:32:29.0 -0700 @@ -82,6 +82,8 @@ enum __ICMP_MIB_MAX }; +#define __ICMPMSG_MIB_MAX 512 /* Out+In for all 8-bit ICMP types */ + /* icmp6 mib definitions */ /* * RFC 2466: ICMPv6-MIB diff -ruNp linux-2.6.22.5/include/net/icmp.h linux-2.6.22.5_ICMPMSG/include/net/icmp.h --- linux-2.6.22.5/include/net/icmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/icmp.h 2007-08-23 15:56:45.0 -0700 @@ -30,9 +30,16 @@ struct icmp_err { extern struct icmp_err icmp_err_convert[]; DECLARE_SNMP_STAT(struct icmp_mib, icmp_statistics); +DECLARE_SNMP_STAT(struct icmpmsg_mib, icmpmsg_statistics); #define ICMP_INC_STATS(field) SNMP_INC_STATS(icmp_statistics, field) #define ICMP_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmp_statistics, field) #define ICMP_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmp_statistics, field) +#define ICMPMSGOUT_INC_STATS(field)SNMP_INC_STATS(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field+256) +#define ICMPMSGOUT_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field+256) +#define ICMPMSGIN_INC_STATS(field) SNMP_INC_STATS(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_BH(field) SNMP_INC_STATS_BH(icmpmsg_statistics, field) +#define ICMPMSGIN_INC_STATS_USER(field) SNMP_INC_STATS_USER(icmpmsg_statistics, field) struct dst_entry; struct net_proto_family; @@ -42,6 +49,7 @@ extern void icmp_send(struct sk_buff *sk extern int icmp_rcv(struct sk_buff *skb); extern int icmp_ioctl(struct sock *sk, int cmd, unsigned long arg); extern voidicmp_init(struct net_proto_family *ops); +extern voidicmp_out_count(unsigned char type); /* Move into dst.h ? */ extern int xrlim_allow(struct dst_entry *dst, int timeout); diff -ruNp linux-2.6.22.5/include/net/snmp.h linux-2.6.22.5_ICMPMSG/include/net/snmp.h --- linux-2.6.22.5/include/net/snmp.h 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/include/net/snmp.h 2007-08-23 14:42:50.0 -0700 @@ -82,6 +82,11 @@ struct icmp_mib { unsigned long mibs[ICMP_MIB_MAX]; } __SNMP_MIB_ALIGN__; +#define ICMPMSG_MIB_MAX__ICMPMSG_MIB_MAX +struct icmpmsg_mib { + unsigned long mibs[ICMPMSG_MIB_MAX]; +} __SNMP_MIB_ALIGN__; + /* ICMP6 (IPv6-ICMP) */ #define ICMP6_MIB_MAX __ICMP6_MIB_MAX struct icmpv6_mib { diff -ruNp linux-2.6.22.5/net/ipv4/af_inet.c linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c --- linux-2.6.22.5/net/ipv4/af_inet.c 2007-08-22 16:23:54.0 -0700 +++ linux-2.6.22.5_ICMPMSG/net/ipv4/af_inet.c 2007-08-23 14:47:26.0 -0700 @@ -1296,6 +1296,10 @@ static int __init init_ipv4_mibs(void) sizeof(struct icmp_mib), __alignof__(struct icmp_mib)) 0) goto err_icmp_mib; + if (snmp_mib_init((void **)icmpmsg_statistics, + sizeof(struct icmpmsg_mib), + __alignof__(struct icmpmsg_mib)) 0) + goto err_icmpmsg_mib; if (snmp_mib_init((void **)tcp_statistics, sizeof(struct
Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000
[Removing Jeff as requested from thread :) ] Hi Dave, David Miller [EMAIL PROTECTED] wrote on 09/17/2007 07:55:02 AM: From: jamal [EMAIL PROTECTED] Date: Sun, 16 Sep 2007 22:14:21 -0400 I still think this work - despite my vested interest - needs more scrutiny from a performance perspective. Absolutely. There are tertiary issues I'm personally interested in, for example how well this stuff works when we enable software GSO on a non-TSO capable card. In such a case the GSO segment should be split right before we hit the driver and then all the sub-segments of the original GSO frame batched in one shot down to the device driver. In this way you'll get a large chunk of the benefit of TSO without explicit hardware support for the feature. There are several cards (some even 10GB) that will benefit immensely from this. I have tried this on ehca which does not support TSO. I added GSO flag at the ipoib layer (and that resulted in a panic/fix that is mentioned in this patchset). I will re-run tests for this and submit results. Thanks, - KK - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html