Re: [PATCH] tcp: cubic scaling error
From: Stephen Hemminger [EMAIL PROTECTED] Date: Wed, 25 Oct 2006 10:52:29 -0700 Doug Leith observed a discrepancy between the version of CUBIC described in the papers and the version in 2.6.18. A math error related to scaling causes Cubic to grow too slowly. Patch is from Sangtae Ha [EMAIL PROTECTED]. I validated that it does fix the problems. See the following to show behavior over 500ms 100 Mbit link. Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) --- Receiver (2.6.19-rc3) 1G [netem] 100M http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.png Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] Applied, thanks a lot Stephen. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix integer overflow in H-TCP congestion control
From: Gavin McCullagh [EMAIL PROTECTED] Date: Wed, 25 Oct 2006 09:47:26 +0100 When using H-TCP with a single flow on a 500Mbit connection (or less actually), alpha can exceed 65000, so alpha needs to be a u32. Signed-off-by: Gavin McCullagh [EMAIL PROTECTED] Signed-off-by: Doug Leith [EMAIL PROTECTED] Applied, thank you. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: correct print message typo
From: Randy Dunlap [EMAIL PROTECTED] Date: Tue, 24 Oct 2006 21:24:58 -0700 From: Randy Dunlap [EMAIL PROTECTED] Correct message typo/spello. Signed-off-by: Randy Dunlap [EMAIL PROTECTED] Applied, thanks a lot Randy. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
Stephen Hemminger wrote: On Wed, 25 Oct 2006 17:51:28 +0200 Daniel Lezcano [EMAIL PROTECTED] wrote: Hi Stephen, currently the work to make the container enablement into the kernel is doing good progress. The ipc, pid, utsname and filesystem system ressources are isolated/virtualized relying on the namespaces concept. But, there is missing the network virtualization/isolation. Two approaches are proposed: doing the isolation at the layer 2 and at the layer 3. The first one instanciate a network device by namespace and add a peer network device into the root namespace, all the routing ressources are relative to the namespace. This work is done by Andrey Savochkin from the openvz project. The second relies on the routes and associates the network namespace pointer with each route. When the traffic is incoming, the packet follows an input route and retrieve the associated network namespace. When the traffic is outgoing, the packet, identified from the network namespace is coming from, follows only the routes matching the same network namespace. This work is made by me. IMHO, we need the two approach, the layer-2 to be able to bring *very* strong isolation for system container with a performance cost and a layer-3 to be able to have good isolation for lightweight container or application container when performances are more important. Do you have some suggestions ? What is your point of view on that ? Thanks in advance. -- Daniel Any solution should allow both and it should build on the existing netfilter infrastructure. The problem is netfilter can not give a good isolation, eg. how can be handled netstat command ? or avoid to see IP addresses assigned to another container when doing ifconfig ? Furthermore, one of the biggest interest of the network isolation is to bring mobility with a container and that can only be done if the network ressources inside the kernel can be identified by container in order to checkpoint/restart them. The all-in-namespace solution, ie. at layer 2, is very good in terms of isolation but it adds an non-negligeable overhead. The layer 3 isolation has an insignifiant overhead, a good isolation perfectly adapted for applications containers. Unfortunatly, from the point of view of implementation, layer 3 can not be a subset of layer 2 isolation when using all-in-namespace and layer 2 isolation can not be a extension of the layer 3 isolation. I think the layer 2 and the layer 3 implementations can coexists. You can for example create a system container with a layer 2 isolation and inside it add a layer 3 isolation. Does that make sense ? -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: watchdog timeout panic in e1000 driver
Hi, Thank you for your comment. Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a reasonable time. This will unfortunately take some time, as we need to change some major components in the driver to make sure this doesn't happen. How about the following approach? If acquiring semaphore fails inside the interrupt handler, acquiring semaphore is abandoned immediately without waiting for timeout. However, I don't know whether this method affects other processes. with the current hardware being accessed simultaneously from several users in the kernel, that would lead to large problems - the watchdog task accesses it every 2 seconds as it reads the PHY link status, so when one of those fails the driver would have no choice but to reset the entire device. This problem occurs because interrupt handler is executed while the interrupted code is still holding the semaphore. Acquiring the semaphore fails regardless of the timeout period. I think the watchdog task will fail trying to read the PHY link status, even if the lock timeout period has been reduced. -- Kenzo Iwami ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Announce] Netchannels ported to the latest git tree. Gigabit benchmark. Complete rout.
On Fri, Oct 20, 2006 at 01:53:05PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: Netchannel [1] is pure bridge between low-level hardware and user, without any special protocol processing involved between them. Users are not limited to userspace only - I will use this netchannel infrastructure for fast NAT implementation, which is purely kernelspace user (although it is possible to create NAT in userspace, but price of the kernelspace board crossing is too high, which only needs to change some fields in the header and recalculate checksum). Userspace network stack [2] is another user of the new netchannel subsystem. Current netchannel version supports data transfer using copy*user(). Performance graph (speed and CPU usage) attached. Benchmark uses 128 bytes sending/receiving per syscall (no latency checks, only throughput. MB and KB mean not 1000, but 1024. Receiving is about 8 MB/sec faster. Receiving CPU usage is 3 times less (90% socket code vs. 30% netchannels+unetstack). Sending is 10 MB/sec faster. Sending CPU usage is 5 times less (upto 50% vs. upto 10%). Number of syscalls is about 10 times less for netchannels. Hardware. System 1. Netchannel kernel (2.6.19-rc3-git) or vanilla 2.6.19-rc3/2.6.18-1.2200.fc5. amd64 athlon 3500+ cpu 1gb ram r8169 nic System 2. 2.6.17-2-686 debian etch intel core duo 3.40GHz 2 gb ram Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (sky2 driven) All software used in tests (tcp_client.c/tcp_test.c and userspace network stack) can be found on project's hompages (userspace network stack requires increased window scaling factor than default). Consider for inclusion netchannel subsystem. 1. Netchannels homepage. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=netchannel 2. Userspace network stack homapage. http://tservice.net.ru/~s0mbre/old/?section=projectsitem=unetstack Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S index 2697e92..3231b22 100644 --- a/arch/i386/kernel/syscall_table.S +++ b/arch/i386/kernel/syscall_table.S @@ -319,3 +319,4 @@ ENTRY(sys_call_table) .long sys_move_pages .long sys_getcpu .long sys_epoll_pwait + .long sys_netchannel_control diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S index b4aa875..d35d4d8 100644 --- a/arch/x86_64/ia32/ia32entry.S +++ b/arch/x86_64/ia32/ia32entry.S @@ -718,4 +718,5 @@ #endif .quad compat_sys_vmsplice .quad compat_sys_move_pages .quad sys_getcpu + .quad sys_netchannel_control ia32_syscall_end: diff --git a/include/asm-i386/unistd.h b/include/asm-i386/unistd.h index beeeaf6..33242f8 100644 --- a/include/asm-i386/unistd.h +++ b/include/asm-i386/unistd.h @@ -325,10 +325,11 @@ #define __NR_vmsplice 316 #define __NR_move_pages317 #define __NR_getcpu318 #define __NR_epoll_pwait 319 +#define __NR_netchannel_control320 #ifdef __KERNEL__ -#define NR_syscalls 320 +#define NR_syscalls 321 #include linux/err.h /* diff --git a/include/asm-x86_64/unistd.h b/include/asm-x86_64/unistd.h index 777288e..16f1aac 100644 --- a/include/asm-x86_64/unistd.h +++ b/include/asm-x86_64/unistd.h @@ -619,8 +619,10 @@ #define __NR_vmsplice 278 __SYSCALL(__NR_vmsplice, sys_vmsplice) #define __NR_move_pages279 __SYSCALL(__NR_move_pages, sys_move_pages) +#define __NR_netchannel_control280 +__SYSCALL(__NR_netchannel_control, sys_netchannel_control) -#define __NR_syscall_max __NR_move_pages +#define __NR_syscall_max __NR_netchannel_control #ifdef __KERNEL__ #include linux/err.h diff --git a/include/linux/netchannel.h b/include/linux/netchannel.h new file mode 100644 index 000..23e9f1e --- /dev/null +++ b/include/linux/netchannel.h @@ -0,0 +1,88 @@ +/* + * netchannel.h + * + * 2006 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + */ + +#ifndef __NETCHANNEL_H +#define __NETCHANNEL_H + +#include linux/types.h + +enum netchannel_commands { + NETCHANNEL_CREATE = 0, + NETCHANNEL_RECV, + NETCHANNEL_SEND, +}; + +enum
RE: [PATCH] s2io: add PCI error recovery support
Hi, Can you try attached patch. The attached patch is simple. We set card state as down in error_detecct() so that all entry points return error and don't proceed further. In slot_reset() we do s2io_card_down() will reset adapter. In io_resume() we bringup the driver. Ananda -Original Message- From: Linas Vepstas [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 25, 2006 1:55 PM To: Ananda Raju Cc: Wen Xiong; linux-kernel@vger.kernel.org; [EMAIL PROTECTED]; netdev@vger.kernel.org; Jeff Garzik; Andrew Morton Subject: Re: [PATCH] s2io: add PCI error recovery support On Wed, Oct 25, 2006 at 10:11:24AM -0500, Linas Vepstas wrote: Also we have to add following if statement in beginning of s2io_isr(). Done, below, If it is ok to do BAR0 read/write in error_detected() then patch is OK. I re-wrote that section to avoid doing I/O. It seems to work well, and generates a few less messages in the process. New, improved patch below, please ack and send upstream if you like it. --linas This patch adds PCI error recovery support to the s2io 10-Gigabit ethernet device driver. Tested, seems to work well. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Cc: Raghavendra Koushik [EMAIL PROTECTED] Cc: Ananda Raju [EMAIL PROTECTED] Cc: Wen Xiong [EMAIL PROTECTED] drivers/net/s2io.c | 103 + drivers/net/s2io.h |5 ++ 2 files changed, 108 insertions(+) Index: linux-2.6.19-rc1-git11/drivers/net/s2io.c === --- linux-2.6.19-rc1-git11.orig/drivers/net/s2io.c 2006-10-25 14:09:47.0 -0500 +++ linux-2.6.19-rc1-git11/drivers/net/s2io.c 2006-10-25 15:18:25.0 -0500 @@ -434,11 +434,18 @@ static struct pci_device_id s2io_tbl[] _ MODULE_DEVICE_TABLE(pci, s2io_tbl); +static struct pci_error_handlers s2io_err_handler = { + .error_detected = s2io_io_error_detected, + .slot_reset = s2io_io_slot_reset, + .resume = s2io_io_resume, +}; + static struct pci_driver s2io_driver = { .name = S2IO, .id_table = s2io_tbl, .probe = s2io_init_nic, .remove = __devexit_p(s2io_rem_nic), + .err_handler = s2io_err_handler, }; /* A simplifier macro used both by init and free shared_mem Fns(). */ @@ -4171,6 +4178,11 @@ static irqreturn_t s2io_isr(int irq, voi mac_info_t *mac_control; struct config_param *config; + if ((sp-pdev-error_state != pci_channel_io_normal) +(sp-pdev-error_state != 0)) { + return IRQ_HANDLED; + } + atomic_inc(sp-isr_cnt); mac_control = sp-mac_control; config = sp-config; @@ -7564,3 +7576,94 @@ static void lro_append_pkt(nic_t *sp, lr sp-mac_control.stats_info-sw_stat.clubbed_frms_cnt++; return; } + +/** + * s2io_io_error_detected - called when PCI error is detected + * @pdev: Pointer to PCI device + * @state: The current pci conneection state + * + * This function is called after a PCI bus error affecting + * this device has been detected. + */ +static pci_ers_result_t s2io_io_error_detected(struct pci_dev *pdev, pci_channel_state_t state) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + nic_t *sp = netdev-priv; + + netif_device_detach(netdev); + + if (netif_running(netdev)) { + unsigned long flags; + + /* The folowing is an abreviated subset of the +* steps taken by s2io_card_down(), avoiding +* steps that touch the card itself. +*/ + del_timer_sync(sp-alarm_timer); + atomic_set(sp-card_state, CARD_DOWN); + + /* Kill tasklet. */ + tasklet_kill(sp-task); + + /* Free all Tx buffers */ + spin_lock_irqsave(sp-tx_lock, flags); + free_tx_buffers(sp); + spin_unlock_irqrestore(sp-tx_lock, flags); + + /* Free all Rx buffers */ + spin_lock_irqsave(sp-rx_lock, flags); + free_rx_buffers(sp); + spin_unlock_irqrestore(sp-rx_lock, flags); + + clear_bit(0, (sp-link_state)); + sp-device_close_flag = TRUE; /* Device is shut down. */ + } + pci_disable_device(pdev); + + return PCI_ERS_RESULT_NEED_RESET; +} + +/** + * s2io_io_slot_reset - called after the pci bus has been reset. + * @pdev: Pointer to PCI device + * + * Restart the card from scratch, as if from a cold-boot. + */ +static pci_ers_result_t s2io_io_slot_reset(struct pci_dev *pdev) +{ + struct net_device *netdev = pci_get_drvdata(pdev); + nic_t *sp = netdev-priv; + + if (pci_enable_device(pdev)) { + printk(KERN_ERR s2io: Cannot re-enable PCI device after reset.\n); + return PCI_ERS_RESULT_DISCONNECT; + } + + pci_set_master(pdev); + s2io_reset(sp); + + return
[patch 3/5] net: fix uaccess handling
Signed-off-by: Heiko Carstens [EMAIL PROTECTED] --- net/ipv4/raw.c | 17 +++-- net/ipv6/raw.c | 17 +++-- net/netlink/af_netlink.c |5 +++-- 3 files changed, 25 insertions(+), 14 deletions(-) Index: linux-2.6/net/ipv4/raw.c === --- linux-2.6.orig/net/ipv4/raw.c 2006-10-26 14:40:56.0 +0200 +++ linux-2.6/net/ipv4/raw.c2006-10-26 14:42:12.0 +0200 @@ -329,7 +329,7 @@ return err; } -static void raw_probe_proto_opt(struct flowi *fl, struct msghdr *msg) +static int raw_probe_proto_opt(struct flowi *fl, struct msghdr *msg) { struct iovec *iov; u8 __user *type = NULL; @@ -338,7 +338,7 @@ unsigned int i; if (!msg-msg_iov) - return; + return 0; for (i = 0; i msg-msg_iovlen; i++) { iov = msg-msg_iov[i]; @@ -360,8 +360,9 @@ code = iov-iov_base; if (type code) { - get_user(fl-fl_icmp_type, type); - get_user(fl-fl_icmp_code, code); + if (get_user(fl-fl_icmp_type, type) || + get_user(fl-fl_icmp_code, code)) + return -EFAULT; probed = 1; } break; @@ -372,6 +373,7 @@ if (probed) break; } + return 0; } static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, @@ -480,8 +482,11 @@ .proto = inet-hdrincl ? IPPROTO_RAW : sk-sk_protocol, }; - if (!inet-hdrincl) - raw_probe_proto_opt(fl, msg); + if (!inet-hdrincl) { + err = raw_probe_proto_opt(fl, msg); + if (err) + goto done; + } security_sk_classify_flow(sk, fl); err = ip_route_output_flow(rt, fl, sk, !(msg-msg_flagsMSG_DONTWAIT)); Index: linux-2.6/net/ipv6/raw.c === --- linux-2.6.orig/net/ipv6/raw.c 2006-10-26 14:40:56.0 +0200 +++ linux-2.6/net/ipv6/raw.c2006-10-26 14:42:12.0 +0200 @@ -604,7 +604,7 @@ return err; } -static void rawv6_probe_proto_opt(struct flowi *fl, struct msghdr *msg) +static int rawv6_probe_proto_opt(struct flowi *fl, struct msghdr *msg) { struct iovec *iov; u8 __user *type = NULL; @@ -616,7 +616,7 @@ int i; if (!msg-msg_iov) - return; + return 0; for (i = 0; i msg-msg_iovlen; i++) { iov = msg-msg_iov[i]; @@ -638,8 +638,9 @@ code = iov-iov_base; if (type code) { - get_user(fl-fl_icmp_type, type); - get_user(fl-fl_icmp_code, code); + if (get_user(fl-fl_icmp_type, type) || + get_user(fl-fl_icmp_code, code)) + return -EFAULT; probed = 1; } break; @@ -650,7 +651,8 @@ /* check if type field is readable or not. */ if (iov-iov_len 2 - len) { u8 __user *p = iov-iov_base; - get_user(fl-fl_mh_type, p[2 - len]); + if (get_user(fl-fl_mh_type, p[2 - len])) + return -EFAULT; probed = 1; } else len += iov-iov_len; @@ -664,6 +666,7 @@ if (probed) break; } + return 0; } static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk, @@ -787,7 +790,9 @@ opt = ipv6_fixup_options(opt_space, opt); fl.proto = proto; - rawv6_probe_proto_opt(fl, msg); + err = rawv6_probe_proto_opt(fl, msg); + if (err) + goto out; ipv6_addr_copy(fl.fl6_dst, daddr); if (ipv6_addr_any(fl.fl6_src) !ipv6_addr_any(np-saddr)) Index: linux-2.6/net/netlink/af_netlink.c === --- linux-2.6.orig/net/netlink/af_netlink.c 2006-10-26 14:40:56.0 +0200 +++ linux-2.6/net/netlink/af_netlink.c 2006-10-26 14:42:12.0 +0200 @@ -1075,8 +1075,9 @@ return -EINVAL; len = sizeof(int); val = nlk-flags
Re: [Announce] Netchannels ported to the latest git tree. Gigabit benchmark. Complete rout.
On Thu, Oct 26, 2006 at 02:51:51PM +0400, Evgeniy Polyakov wrote: Benchmark uses 128 bytes sending/receiving per syscall (no latency checks, only throughput. Receiving CPU usage is 3 times less (90% socket code vs. 30% Sending CPU usage is 5 times less (upto 50% vs. upto 10%). Wow. I currently lack the hardware to reproduce your measurements, do you have any idea of how these numbers would be with 1024 byte system calls? Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: watchdog timeout panic in e1000 driver
Kenzo Iwami wrote: Hi, Thank you for your comment. Anyway as I said in the same e-mail, we're working on reducing the lock timeout to a reasonable time. This will unfortunately take some time, as we need to change some major components in the driver to make sure this doesn't happen. How about the following approach? If acquiring semaphore fails inside the interrupt handler, acquiring semaphore is abandoned immediately without waiting for timeout. However, I don't know whether this method affects other processes. with the current hardware being accessed simultaneously from several users in the kernel, that would lead to large problems - the watchdog task accesses it every 2 seconds as it reads the PHY link status, so when one of those fails the driver would have no choice but to reset the entire device. This problem occurs because interrupt handler is executed while the interrupted code is still holding the semaphore. Acquiring the semaphore fails regardless of the timeout period. I think the watchdog task will fail trying to read the PHY link status, even if the lock timeout period has been reduced. correct, we're not looking into reducing the lock timeout but towards reducing the total lock time. Once we have reduced that to something acceptable, we can reduce the timout accordingly. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
On Thu, 2006-10-26 at 00:00 +0200, Johannes Berg wrote: On Wed, 2006-10-25 at 13:43 -0400, Luis R. Rodriguez wrote: I guess my hope was that d80211 would just be more than a softmac implementation. When I hear wireless stack I don't think softmac implementation, I think a robust set of headers and device definitions which all wireless devices can share. Not just that, a bunch of library functions for example for crypto would be nice too. That's part of why I've been proposing that the tkip stuff be library functions that the drivers can call if required, instead of the bitfields. Currently, there's lot of top-down stuff in d80211, it does things which depend on flags and then instructs the driver to do something. This is good for a bunch of things, but in some cases where devices vary wildly it may be better to go for library functions instead. IMHO the TKIP key computation is such a case, it's trivial for a driver to call phase1 and phase2 when required. Also I thought we'd ditch WE as it seems we keep fixing it with gum (as seen by Linville's latest ABI compatibility fix). Well, that was sort of necessary. If that wasn't the case then I'm suggesting it -- can we consider ditching WE? Well, no. We can make it a second-class citizen like I did with the cfg80211 work where I made it just one userspace interface for cfg80211 which admittedly sometimes strange behaviour, but it's still there and current operations should still work with it (and I'd consider not working a bug except if userspace never calls 'commit' and expects things to work) I'd say lets just go for a userspace MLME as its already written but I seriously think we need to ditch replace WE first. It seems no one has a plan on what to do though. - Jiri's trying to fix the SMP issues. That's great. - Jiri also would like to expand ieee80211_conf.c, the stuff I started for cfg80211 - I'd like to see a header cleanup, it's necessary. Part of the problem here is all the sub-ioctl WE foo. Clean that up by moving them into cfg80211 as required, there's basically one user, wpa_supplicant (and maybe hostapd), screw the others if there are any While wpa_supplicant is certainly the main client for stuff directly related to setting up a connection, there are quite a few other users of general WE calls to pull information out of the card, or to receive scan events. So if you want maximum compatibility for a limited amount of work, you can probably consider wpa_supplicant the only client of (s = set, g = get) 1) [s|g] ENCODEEXT 2) [s|g] AUTH 3) [s|g] MLME 4) [s] RATE 5) [s] FREQ 6) [s] SENS 7) [s] AP 8) [s|g] RTS 9) [s|g] FRAG 10)[s|g] GENIE 11)[s|g] PMKSA Notable exceptions: 1) [s|g] ENCODE 2) [s|g] MODE (other stuff turns on promiscuous mode) 3) [s|g] SCAN (other stuff needs to do this too) 4) [s|g] POWER (power management does this, not wpa_supplicant) Of course lots of stuff needs to get RATE, ESSID, AP, FREQ, etc. Dan - fix people's minds to not expect a perfect solution immediately but accept something that can be expanded on later. I think we need to accept some breakage in our development trees to get anywhere at all. Actually, the last point should be first. Enough rant from me for today, johannes - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
Evgeniy Polyakov wrote: On Wed, Oct 25, 2006 at 11:08:43AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: If user asks for a congestion control type with setsockopt() then it may be available as a module not included in the kernel already. It should be autoloaded if needed. This is done already when the default selection is change with sysctl, but not when application requests via sysctl. Only reservation is are there any bad security implications from this? What if system is badly configured, so it is possible to load malicious module by kernel? The kernel module loader has a fixed path. So one would have to be able to create a module in /lib/modules/kernel release in order to get the malicious code loaded. If the intruder could put a module there, it would be just as easy to patch an existing module and have the hack available on reboot. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
On Thu, 2006-10-26 at 10:35 -0400, Dan Williams wrote: - I'd like to see a header cleanup, it's necessary. Part of the problem here is all the sub-ioctl WE foo. Clean that up by moving them into cfg80211 as required, there's basically one user, wpa_supplicant (and maybe hostapd), screw the others if there are any Oh, right, by sub-ioctl I was referring to the mess of the private ioctls d80211 has for WE, including sub-items again. While wpa_supplicant is certainly the main client for stuff directly related to setting up a connection, there are quite a few other users of general WE calls to pull information out of the card, or to receive scan events. Of course. So if you want maximum compatibility for a limited amount of work, you can probably consider wpa_supplicant the only client of (s = set, g = get) 1) [s|g] ENCODEEXT 2) [s|g] AUTH 3) [s|g] MLME 4) [s] RATE 5) [s] FREQ 6) [s] SENS 7) [s] AP 8) [s|g] RTS 9) [s|g] FRAG 10)[s|g] GENIE 11)[s|g] PMKSA Sounds about right to me. I did actually intend to keep these intact but drop the private ones with the 10xx sub-numbers. 4) [s|g] POWER (power management does this, not wpa_supplicant) Does it work for any card? johannes - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
On Thu, Oct 26, 2006 at 07:34:57AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: On Wed, Oct 25, 2006 at 11:08:43AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: If user asks for a congestion control type with setsockopt() then it may be available as a module not included in the kernel already. It should be autoloaded if needed. This is done already when the default selection is change with sysctl, but not when application requests via sysctl. Only reservation is are there any bad security implications from this? What if system is badly configured, so it is possible to load malicious module by kernel? The kernel module loader has a fixed path. So one would have to be able to create a module in /lib/modules/kernel release in order to get the malicious code loaded. If the intruder could put a module there, it would be just as easy to patch an existing module and have the hack available on reboot. It just calls /sbin/modprobe, which in turn runs tons of scripts in /etc/hotplug, modprobe and other places... In the paranoid case we should not allow any user to load kernel modules, even known ones. Should this option be guarded by some capability check? -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
On 10/26/06, Dan Williams [EMAIL PROTECTED] wrote: While wpa_supplicant is certainly the main client for stuff directly related to setting up a connection, there are quite a few other users of general WE calls to pull information out of the card, or to receive scan events. How about we just ditch iwconfig completely and move on to wpa_supplicant/wpa_cli as our next userspace application with nl80211/cg80211 as our new API for usersapce--kernel communication? As you point out, wpa_supplicant already does a lot for us -- and several distributions already rely on it. Some work is required but I think its worth it. If we do a complete move from WE to nl80211 it would be transparent to the users too. Luis - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Announce] Netchannels ported to the latest git tree. Gigabit benchmark. Complete rout.
On Thu, Oct 26, 2006 at 03:44:37PM +0200, bert hubert ([EMAIL PROTECTED]) wrote: On Thu, Oct 26, 2006 at 02:51:51PM +0400, Evgeniy Polyakov wrote: Benchmark uses 128 bytes sending/receiving per syscall (no latency checks, only throughput. Receiving CPU usage is 3 times less (90% socket code vs. 30% Sending CPU usage is 5 times less (upto 50% vs. upto 10%). Wow. I currently lack the hardware to reproduce your measurements, do you have any idea of how these numbers would be with 1024 byte system calls? Results are not that exciting in this case. Receiving CPU usage is about the same: it steady grows upto about 30-35% (netchannel stops growing after some time with about 28%, socket slowly continues), but netchannel's speed is smaller. It can be described by that fact, that unetstack uses C-coded checksumming (the dumbies algo I think) and additional memory copy, which becomes visible in case of big buffers (it can be eliminated though, I will think about better interface). The same applies to sending - CPU usage is smaller, but speed is smaller too. (10% vs. 8% compared to 30 MB/sec vs. 24 MB/sec). So netchannels with userspace stack behave exactly the same in both 128 and 1024 byte write/read cases. But all it is just a drawnbacks of userspace stack, not netchannels, which do not have any protocol processing at all - it is just a queue between low-level driver and users - some kind of high performance scalable packet socket with only selected addresses or tun/tap device. Thanks. -- http://www.PowerDNS.com Open source, database driven DNS Software http://netherlabs.nl Open and Closed source services -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
On Thu, 26 Oct 2006 18:57:13 +0400 Evgeniy Polyakov [EMAIL PROTECTED] wrote: On Thu, Oct 26, 2006 at 07:34:57AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: Evgeniy Polyakov wrote: On Wed, Oct 25, 2006 at 11:08:43AM -0700, Stephen Hemminger ([EMAIL PROTECTED]) wrote: If user asks for a congestion control type with setsockopt() then it may be available as a module not included in the kernel already. It should be autoloaded if needed. This is done already when the default selection is change with sysctl, but not when application requests via sysctl. Only reservation is are there any bad security implications from this? What if system is badly configured, so it is possible to load malicious module by kernel? The kernel module loader has a fixed path. So one would have to be able to create a module in /lib/modules/kernel release in order to get the malicious code loaded. If the intruder could put a module there, it would be just as easy to patch an existing module and have the hack available on reboot. It just calls /sbin/modprobe, which in turn runs tons of scripts in /etc/hotplug, modprobe and other places... In the paranoid case we should not allow any user to load kernel modules, even known ones. Should this option be guarded by some capability check? No capability check needed. Any additional paranoia belongs in /sbin/modprobe. There seems to be lots of existing usage where a user can cause a module to be loaded (see bin_fmt, xtables, etc). -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
On Thu, 2006-10-26 at 11:04 -0400, Luis R. Rodriguez wrote: On 10/26/06, Dan Williams [EMAIL PROTECTED] wrote: While wpa_supplicant is certainly the main client for stuff directly related to setting up a connection, there are quite a few other users of general WE calls to pull information out of the card, or to receive scan events. How about we just ditch iwconfig completely and move on to wpa_supplicant/wpa_cli as our next userspace application with nl80211/cg80211 as our new API for usersapce--kernel communication? As you point out, wpa_supplicant already does a lot for us -- and several distributions already rely on it. Some work is required but I think its worth it. If we do a complete move from WE to nl80211 it would be transparent to the users too. The one blocker I can think of here is startup scripts on various distributions. Most of those are shell, and they usually rely on iwconfig quite heavily. Getting those converted to wpa_supplicant wouldn't be a trivial amount of work, but it wouldn't be a ton either. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network virtualization/isolation
On Thu, 26 Oct 2006 11:44:55 +0200 Daniel Lezcano [EMAIL PROTECTED] wrote: Stephen Hemminger wrote: On Wed, 25 Oct 2006 17:51:28 +0200 Daniel Lezcano [EMAIL PROTECTED] wrote: Hi Stephen, currently the work to make the container enablement into the kernel is doing good progress. The ipc, pid, utsname and filesystem system ressources are isolated/virtualized relying on the namespaces concept. But, there is missing the network virtualization/isolation. Two approaches are proposed: doing the isolation at the layer 2 and at the layer 3. The first one instanciate a network device by namespace and add a peer network device into the root namespace, all the routing ressources are relative to the namespace. This work is done by Andrey Savochkin from the openvz project. The second relies on the routes and associates the network namespace pointer with each route. When the traffic is incoming, the packet follows an input route and retrieve the associated network namespace. When the traffic is outgoing, the packet, identified from the network namespace is coming from, follows only the routes matching the same network namespace. This work is made by me. IMHO, we need the two approach, the layer-2 to be able to bring *very* strong isolation for system container with a performance cost and a layer-3 to be able to have good isolation for lightweight container or application container when performances are more important. Do you have some suggestions ? What is your point of view on that ? Thanks in advance. -- Daniel Any solution should allow both and it should build on the existing netfilter infrastructure. The problem is netfilter can not give a good isolation, eg. how can be handled netstat command ? or avoid to see IP addresses assigned to another container when doing ifconfig ? Furthermore, one of the biggest interest of the network isolation is to bring mobility with a container and that can only be done if the network ressources inside the kernel can be identified by container in order to checkpoint/restart them. The all-in-namespace solution, ie. at layer 2, is very good in terms of isolation but it adds an non-negligeable overhead. The layer 3 isolation has an insignifiant overhead, a good isolation perfectly adapted for applications containers. Unfortunatly, from the point of view of implementation, layer 3 can not be a subset of layer 2 isolation when using all-in-namespace and layer 2 isolation can not be a extension of the layer 3 isolation. I think the layer 2 and the layer 3 implementations can coexists. You can for example create a system container with a layer 2 isolation and inside it add a layer 3 isolation. Does that make sense ? -- Daniel Assuming you are talking about pseudo-virtualized environments, there are several different discussions. 1. How should the namespace be isolated for the virtualized containered applications? 2. How should traffic be restricted into/out of those containers. This is where existing netfilter, classification, etc, should be used. The network code is overly rich as it is, we don't need another abstraction. 3. Can the virtualized containers be secure? No. we really can't keep hostile root in a container from killing system without going to a hypervisor. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.19-rc3 2/2] ehea: 64K page support fix
Hi, that is right, I'll send a new patch Thanks, Jan-Bernd On Wednesday 25 October 2006 18:21, Anton Blanchard wrote: Hi, +#ifdef CONFIG_PPC_64K_PAGES + /* To support 64k pages we must round to 64k page boundary */ + epas-kernel.addr = + ioremap((paddr_kernel 0x), PAGE_SIZE) + + (paddr_kernel 0x); +#else epas-kernel.addr = ioremap(paddr_kernel, PAGE_SIZE); +#endif Cant you just use PAGE_MASK, ~PAGE_MASK and remove the ifdefs completely? Anton - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] usbnet: use MII hooks only if CONFIG_MII is enabled
On Wed, Oct 25, 2006 at 04:58:58PM -0700, Randy Dunlap wrote: ... Build tested with CONFIG_MII=y, m, n. ... --- linux-2619-rc3-pv.orig/drivers/usb/net/usbnet.c +++ linux-2619-rc3-pv/drivers/usb/net/usbnet.c @@ -47,6 +47,12 @@ #define DRIVER_VERSION 22-Aug-2005 +#if defined(CONFIG_MII) || defined(CONFIG_MII_MODULE) +#define HAVE_MII 1 +#else +#define HAVE_MII 0 +#endif ... I'm too lame to test it, but I bet this will break with CONFIG_USB_USBNET=y, CONFIG_MII=m, and you'll actually need #if defined(CONFIG_MII) || (defined(CONFIG_MII_MODULE) defined(MODULE)) And then there's the question whether this amount of #ifdef's is actually worth avoiding the select MII... cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] usbnet: use MII hooks only if CONFIG_MII is enabled
Adrian Bunk wrote: On Wed, Oct 25, 2006 at 04:58:58PM -0700, Randy Dunlap wrote: ... Build tested with CONFIG_MII=y, m, n. ... --- linux-2619-rc3-pv.orig/drivers/usb/net/usbnet.c +++ linux-2619-rc3-pv/drivers/usb/net/usbnet.c @@ -47,6 +47,12 @@ #define DRIVER_VERSION 22-Aug-2005 +#if defined(CONFIG_MII) || defined(CONFIG_MII_MODULE) +#define HAVE_MII 1 +#else +#define HAVE_MII 0 +#endif ... I'm too lame to test it, but I bet this will break with CONFIG_USB_USBNET=y, CONFIG_MII=m, and you'll actually need #if defined(CONFIG_MII) || (defined(CONFIG_MII_MODULE) defined(MODULE)) And then there's the question whether this amount of #ifdef's is actually worth avoiding the select MII... Thanks, but that's OK, David posted a different patch for it. -- ~Randy - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7421] New: Oops, EIP is at atalk_sendmsg
On Thu, 26 Oct 2006 04:08:36 -0700 [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=7421 Summary: Oops, EIP is at atalk_sendmsg Kernel Version: 2.6.18.1 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Distribution: Debian sarge Hardware Environment: i386 Problem Description: ct 26 10:01:03 localhost papd[3120]: restart (2.0.3) Oct 26 10:01:07 localhost kernel: BUG: unable to handle kernel NULL pointer \ dereference at virtual address Oct 26 10:01:07 localhost kernel: printing eip: Oct 26 10:01:07 localhost kernel: d0c16a8a Oct 26 10:01:07 localhost kernel: *pde = Oct 26 10:01:07 localhost kernel: Oops: [#1] Oct 26 10:01:07 localhost kernel: Modules linked in: appletalk psnap llc ipv6 \ pcmcia_core af_packet parport_pc parport floppy pcspkr sn d_maestro3 snd_ac97_codec \ snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd soundcore intel_agp uhci_hcd \ usbcore 3c59x mii agpgart mous edev tsdev joydev psmouse ide_cd cdrom rtc reiserfs \ ext3 jbd ide_disk ide_generic siimage aec62xx trm290 alim15x3 hpt34x hpt366 cmd64x \ piix rz1000 slc90e66 generic cs5530 cs5520 sc1200 triflex atiixp pdc202xx_old \ pdc202xx_new opti621 ns87415 cy82c693 amd74xx sis5513 via 82cxxx serverworks ide_core \ unix Oct 26 10:01:07 localhost kernel: CPU:0 Oct 26 10:01:07 localhost kernel: EIP:0060:[pg0+277633674/1070257152] Not \ tainted VLI Oct 26 10:01:07 localhost kernel: EFLAGS: 00010286 (2.6.17.14.2006-10-25 #1) Oct 26 10:01:07 localhost kernel: EIP is at atalk_sendmsg+0x15b/0x4e4 [appletalk] Oct 26 10:01:07 localhost kernel: eax: ebx: 002f ecx: \ edx: Oct 26 10:01:07 localhost kernel: esi: cadcb600 edi: ebp: cc9d7eec \ esp: cc9d7d6c Oct 26 10:01:07 localhost kernel: ds: 007b es: 007b ss: 0068 Oct 26 10:01:07 localhost kernel: Process afpd (pid: 3118, threadinfo=cc9d6000 \ task=cfe205d0) Oct 26 10:01:07 localhost kernel: Stack: c02b32c0 cc9d7ee8 cffbc500 \ d0c16f05 cffbc500 Oct 26 10:01:07 localhost kernel:cffbc500 cc9d7ec8 cadcb600 \ 0400 cc9d7f48 001b Oct 26 10:01:07 localhost kernel:cc9d7ec8 cc9d7e1c cc9d7ee8 c01fe97a cc9d7e1c \ ca252600 cc9d7ec8 001b Oct 26 10:01:07 localhost kernel: Call Trace: Oct 26 10:01:07 localhost kernel: d0c16f05 atalk_recvmsg+0xf2/0x105 [appletalk] \ c01fe97a sock_sendmsg+0xd0/0xeb Oct 26 10:01:07 localhost kernel: c0157bfd touch_atime+0xb4/0xbb c0198b22 \ copy_from_user+0x34/0x5a Oct 26 10:01:07 localhost kernel: c012383e autoremove_wake_function+0x0/0x3a \ c0198b22 copy_from_user+0x34/0x5a Oct 26 10:01:07 localhost kernel: c01fe490 move_addr_to_kernel+0x24/0x39 \ c01ffaaa sys_sendto+0xe9/0x10d Oct 26 10:01:07 localhost kernel: c01fe67e sock_attach_fd+0x72/0xd2 c0143d52 \ get_empty_filp+0x3b/0xe4 Oct 26 10:01:07 localhost kernel: c0143d7b get_empty_filp+0x64/0xe4 c0198ae4 \ copy_to_user+0x32/0x3c Oct 26 10:01:07 localhost kernel: c02001de sys_socketcall+0xf2/0x180 c0102a03 \ syscall_call+0x7/0xb Oct 26 10:01:07 localhost kernel: Code: 0c 83 c0 04 eb 15 c6 44 24 1a 00 0f b7 86 26 \ 01 00 00 66 89 44 24 18 8d 44 24 18 50 e8 e0 eb ff ff 89 44 24 04 85 f6 5d 8b 14 24 \ 8b 12 89 54 24 04 74 1b 8b 86 84 00 00 00 f6 c4 04 74 10 52 53 Oct 26 10:01:07 localhost kernel: EIP: [pg0+277633674/1070257152] \ atalk_sendmsg+0x15b/0x4e4 [appletalk] SS:ESP 0068:cc9d7d6c Oct 26 10:01:21 localhost atalkd[3106]: as_timer gateway 8000.100 down Steps to reproduce: restart the machine, start papd after network initializing has finished a second start of papd works fine appletalk is loades as module same behaviour with 2.6.17.14 --- You are receiving this mail because: --- You are on the CC list for the bug, or are watching someone who is. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
Stephen Hemminger wrote: No capability check needed. Any additional paranoia belongs in /sbin/modprobe. There seems to be lots of existing usage where a user can cause a module to be loaded (see bin_fmt, xtables, etc). x_tables is restricted to CAP_NET_ADMIN, but in net/ alone we have __sock_create (loads protocol families), sock_ioctl (loads bridge, vlan or dlci), the already mentioned netlink case, inet_create (loads IP protocols), inet6_create (similar to inet_create), and a few more. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[SOFTMAC] - level of verbosity
Hi. I'd just like to know whether it is be possible to reduce the verbosity level of softmac. My wireless is working fine, but my logs are polluted by: /* LOG */ printk: 20 messages suppressed. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Received deauthentication packet from 00:14:bf:03:40:68, but that network is unknown. SoftMAC: Authentication response received from 00:14:bf:03:40:68 but no queue item exists. SoftMAC: Authentication response received from 00:14:bf:03:40:68 but no queue item exists. SoftMAC: Authentication response received from 00:14:bf:03:40:68 but no queue item exists. SoftMAC: Authentication response received from 00:14:bf:03:40:68 but no queue item exists. SoftMAC: Authentication response received from 00:0f:66:b9:3d:4c but no queue item exists. SoftMAC: Authentication response received from 00:0f:66:b9:3d:4c but no queue item exists. SoftMAC: Authentication response received from 00:0f:66:b9:3d:4c but no queue item exists. SoftMAC: Authentication response received from 00:0f:66:b9:3d:4c but no queue item exists. SoftMAC: Authentication response received from 00:0f:66:b9:3d:4c but no queue item exists. SoftMAC: Received deauthentication packet from 00:0a:79:52:84:6c, but that network is unknown. /* LOG */ It's kinda polluting the interesting parts of logs, and furthermore, when it's actually written to the file, it provokes regular disk activity, which is really annoying. It built my kernel without the softmac debugging option: $ cat .config | grep SOFTMAC CONFIG_IEEE80211_SOFTMAC=y # CONFIG_IEEE80211_SOFTMAC_DEBUG is not set but it's still really talkative :-) So, is there anyway to get rid of this? Anyway, thanks for your work, cheers, cf - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair to each other. If the modules are autoloaded, I could still enforce this by moving the modules out of /lib/modules, but I think it's cleaner to do it by loading/unloading modules as appropriate. -John Stephen Hemminger wrote: If user asks for a congestion control type with setsockopt() then it may be available as a module not included in the kernel already. It should be autoloaded if needed. This is done already when the default selection is change with sysctl, but not when application requests via sysctl. Only reservation is are there any bad security implications from this? Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- orig/net/ipv4/tcp_cong.c2006-10-25 13:55:34.0 -0700 +++ new/net/ipv4/tcp_cong.c 2006-10-25 13:58:39.0 -0700 @@ -153,9 +153,19 @@ rcu_read_lock(); ca = tcp_ca_find(name); + /* no change asking for existing value */ if (ca == icsk-icsk_ca_ops) goto out; +#ifdef CONFIG_KMOD + /* not found attempt to autoload module */ + if (!ca) { + rcu_read_unlock(); + request_module(tcp_%s, name); + rcu_read_lock(); + ca = tcp_ca_find(name); + } +#endif if (!ca) err = -ENOENT; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP congestion graphs
Hi Stephen, is your rt-patch to netem public available? Best regards HGN -- Signed and/or encrypted mails preferd. Key-Id = 0x98350C22 Fingerprint = 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22 Key available under: www.jauu.net/download/gnupg_key - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch] kmemdup() cleanup in net/
hi, replace open coded kmemdup() to save some screen space, and allow inlining/not inlining to be triggered by gcc. Signed-off-by: Eric Sesterhenn [EMAIL PROTECTED] --- linux-2.6.19-rc3-git1/net/atm/lec.c.orig2006-10-26 20:21:48.0 +0200 +++ linux-2.6.19-rc3-git1/net/atm/lec.c 2006-10-26 20:23:28.0 +0200 @@ -1321,11 +1321,10 @@ static int lane2_resolve(struct net_devi if (table == NULL) return -1; - *tlvs = kmalloc(table-sizeoftlvs, GFP_ATOMIC); + *tlvs = kmemdup(table-tlvs, table-sizeoftlvs, GFP_ATOMIC); if (*tlvs == NULL) return -1; - memcpy(*tlvs, table-tlvs, table-sizeoftlvs); *sizeoftlvs = table-sizeoftlvs; return 0; @@ -1364,11 +1363,10 @@ static int lane2_associate_req(struct ne kfree(priv-tlvs); /* NULL if there was no previous association */ - priv-tlvs = kmalloc(sizeoftlvs, GFP_KERNEL); + priv-tlvs = kmemdup(tlvs, sizeoftlvs, GFP_KERNEL); if (priv-tlvs == NULL) return (0); priv-sizeoftlvs = sizeoftlvs; - memcpy(priv-tlvs, tlvs, sizeoftlvs); skb = alloc_skb(sizeoftlvs, GFP_ATOMIC); if (skb == NULL) --- linux-2.6.19-rc3-git1/net/ax25/ax25_out.c.orig 2006-10-26 20:23:59.0 +0200 +++ linux-2.6.19-rc3-git1/net/ax25/ax25_out.c 2006-10-26 20:24:15.0 +0200 @@ -70,11 +70,10 @@ ax25_cb *ax25_send_frame(struct sk_buff ax25-dest_addr = *dest; if (digi != NULL) { - if ((ax25-digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { + if ((ax25-digipeat = kmemdup(digi, sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { ax25_cb_put(ax25); return NULL; } - memcpy(ax25-digipeat, digi, sizeof(ax25_digi)); } switch (ax25-ax25_dev-values[AX25_VALUES_PROTOCOL]) { --- linux-2.6.19-rc3-git1/net/ax25/ax25_route.c.orig2006-10-26 20:24:23.0 +0200 +++ linux-2.6.19-rc3-git1/net/ax25/ax25_route.c 2006-10-26 20:24:50.0 +0200 @@ -432,11 +432,11 @@ int ax25_rt_autobind(ax25_cb *ax25, ax25 } if (ax25_rt-digipeat != NULL) { - if ((ax25-digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { + if ((ax25-digipeat = kmemdup(ax25_rt-digipeat, + sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { err = -ENOMEM; goto put; } - memcpy(ax25-digipeat, ax25_rt-digipeat, sizeof(ax25_digi)); ax25_adjust_path(addr, ax25-digipeat); } --- linux-2.6.19-rc3-git1/net/core/neighbour.c.orig 2006-10-26 20:25:20.0 +0200 +++ linux-2.6.19-rc3-git1/net/core/neighbour.c 2006-10-26 20:25:52.0 +0200 @@ -1266,10 +1266,9 @@ void pneigh_enqueue(struct neigh_table * struct neigh_parms *neigh_parms_alloc(struct net_device *dev, struct neigh_table *tbl) { - struct neigh_parms *p = kmalloc(sizeof(*p), GFP_KERNEL); + struct neigh_parms *p = kmemdup(tbl-parms, sizeof(*p), GFP_KERNEL); if (p) { - memcpy(p, tbl-parms, sizeof(*p)); p-tbl= tbl; atomic_set(p-refcnt, 1); INIT_RCU_HEAD(p-rcu_head); --- linux-2.6.19-rc3-git1/net/dccp/feat.c.orig 2006-10-26 20:26:12.0 +0200 +++ linux-2.6.19-rc3-git1/net/dccp/feat.c 2006-10-26 20:27:26.0 +0200 @@ -279,12 +279,11 @@ static int dccp_feat_nn(struct sock *sk, if (opt == NULL) return -ENOMEM; - copy = kmalloc(len, GFP_ATOMIC); + copy = kmemdup(val, len, GFP_ATOMIC); if (copy == NULL) { kfree(opt); return -ENOMEM; } - memcpy(copy, val, len); opt-dccpop_type = DCCPO_CONFIRM_R; /* NN can only confirm R */ opt-dccpop_feat = feature; @@ -501,20 +500,18 @@ int dccp_feat_clone(struct sock *oldsk, list_for_each_entry(opt, olddmsk-dccpms_pending, dccpop_node) { struct dccp_opt_pend *newopt; /* copy the value of the option */ - u8 *val = kmalloc(opt-dccpop_len, GFP_ATOMIC); + u8 *val = kmemdup(opt-dccpop_val, opt-dccpop_len, GFP_ATOMIC); if (val == NULL) goto out_clean; - memcpy(val, opt-dccpop_val, opt-dccpop_len); - newopt = kmalloc(sizeof(*newopt), GFP_ATOMIC); + newopt = kmemdup(opt, sizeof(*newopt), GFP_ATOMIC); if (newopt == NULL) { kfree(val); goto out_clean; } /* insert the option */ - memcpy(newopt, opt, sizeof(*newopt)); newopt-dccpop_val = val;
[PATCH] Rewrite e100_phys_id
The motivator for this was to fix the sparse warning: drivers/net/e100.c:2418:48: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) drivers/net/e100.c:2419:37: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) Initially, I tried a quick fix, but when it ran into difficulties, I looked at tg3.c to see how it does it. I liked their way better, so I rewrote e100.c to be similar. It shaves ~700 bytes off the size of the driver, and a few bytes off the size of struct nic, so I think it's a win all round. Tested on the internal interface of an HP Integrity rx2600. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] diff --git a/drivers/net/e100.c b/drivers/net/e100.c index a3a08a5..aade1e9 100644 --- a/drivers/net/e100.c +++ b/drivers/net/e100.c @@ -556,7 +556,6 @@ struct nic { struct params params; struct net_device_stats net_stats; struct timer_list watchdog; - struct timer_list blink_timer; struct mii_if_info mii; struct work_struct tx_timeout_task; enum loopback loopback; @@ -581,7 +580,6 @@ struct nic { u32 rx_over_length_errors; u8 rev_id; - u16 leds; u16 eeprom_wc; u16 eeprom[256]; spinlock_t mdio_lock; @@ -2168,23 +2166,6 @@ err_clean_rx: return err; } -#define MII_LED_CONTROL0x1B -static void e100_blink_led(unsigned long data) -{ - struct nic *nic = (struct nic *)data; - enum led_state { - led_on = 0x01, - led_off= 0x04, - led_on_559 = 0x05, - led_on_557 = 0x07, - }; - - nic-leds = (nic-leds led_on) ? led_off : - (nic-mac mac_82559_D101M) ? led_on_557 : led_on_559; - mdio_write(nic-netdev, nic-mii.phy_id, MII_LED_CONTROL, nic-leds); - mod_timer(nic-blink_timer, jiffies + HZ / 4); -} - static int e100_get_settings(struct net_device *netdev, struct ethtool_cmd *cmd) { struct nic *nic = netdev_priv(netdev); @@ -2411,16 +2392,32 @@ static void e100_diag_test(struct net_de msleep_interruptible(4 * 1000); } +#define MII_LED_CONTROL0x1B static int e100_phys_id(struct net_device *netdev, u32 data) { struct nic *nic = netdev_priv(netdev); + int i; + + enum led_state { + led_off= 0x04, + led_on_559 = 0x05, + led_on_557 = 0x07, + }; + u16 leds = led_off; + + if (data == 0) + data = 2; + + for (i = 0; i (data * 2); i++) { + leds = (leds == led_off) ? + (nic-mac mac_82559_D101M) ? led_on_557 : led_on_559 : + led_off; + mdio_write(nic-netdev, nic-mii.phy_id, MII_LED_CONTROL, leds); + if (msleep_interruptible(500)) + break; + } - if(!data || data (u32)(MAX_SCHEDULE_TIMEOUT / HZ)) - data = (u32)(MAX_SCHEDULE_TIMEOUT / HZ); - mod_timer(nic-blink_timer, jiffies); - msleep_interruptible(data * 1000); - del_timer_sync(nic-blink_timer); - mdio_write(netdev, nic-mii.phy_id, MII_LED_CONTROL, 0); + mdio_write(netdev, nic-mii.phy_id, MII_LED_CONTROL, led_off); return 0; } @@ -2633,9 +2630,6 @@ #endif init_timer(nic-watchdog); nic-watchdog.function = e100_watchdog; nic-watchdog.data = (unsigned long)nic; - init_timer(nic-blink_timer); - nic-blink_timer.function = e100_blink_led; - nic-blink_timer.data = (unsigned long)nic; INIT_WORK(nic-tx_timeout_task, (void (*)(void *))e100_tx_timeout_task, netdev); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Rewrite e100_phys_id
On Thu, Oct 26, 2006 at 01:11:55PM -0600, Matthew Wilcox wrote: The motivator for this was to fix the sparse warning: drivers/net/e100.c:2418:48: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) drivers/net/e100.c:2419:37: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) Initially, I tried a quick fix, but when it ran into difficulties, I looked at tg3.c to see how it does it. I liked their way better, so I rewrote e100.c to be similar. It shaves ~700 bytes off the size of the driver, and a few bytes off the size of struct nic, so I think it's a win all round. Tested on the internal interface of an HP Integrity rx2600. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] Seems sane to me... I'll pick it up, if Auke doesn't... Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP congestion graphs
On Thu, 26 Oct 2006 20:50:19 +0200 Hagen Paul Pfeifer [EMAIL PROTECTED] wrote: Hi Stephen, is your rt-patch to netem public available? Best regards HGN The tools are in the tcp directory http://developer.osdl.org/shemminger/tcp/netem-2.6.18-rt.patch -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull bcm43xx-d80211 features and bugfixes
Hi John, Please pull latest bcm43xx-d80211 features and bugfixes. git pull http://bu3sch.de/git/wireless-dev.git for-linville This will introduce hardware encryption for everything but TKIP on v4 firmware. For v3 firmware or TKIP it will transparently fall back to software encryption. It also fixes a bug that caused high CPU usage due to an IRQ not stopping to trigger. bcm43xx-d80211: Fix runaway IRQ which caused high CPU usage. bcm43xx-d80211: Rename IRQs bcm43xx-d80211: Fix hardware based encryption for v4 firmware. bcm43xx-d80211: Use software encryption for TKIP for now. bcm43xx-d80211: No support for hw encryption with v3 firmware. Various hwenc fixes. bcm43xx-d80211: Only set USEDEFKEYS hostflag for WEP. drivers/net/wireless/d80211/bcm43xx/bcm43xx.h | 103 +++- drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.c | 488 +--- drivers/net/wireless/d80211/bcm43xx/bcm43xx_main.h |3 drivers/net/wireless/d80211/bcm43xx/bcm43xx_xmit.c | 93 +++- drivers/net/wireless/d80211/bcm43xx/bcm43xx_xmit.h |2 5 files changed, 461 insertions(+), 228 deletions(-) -- Greetings Michael. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Rewrite e100_phys_id
Jeff Garzik wrote: On Thu, Oct 26, 2006 at 01:11:55PM -0600, Matthew Wilcox wrote: The motivator for this was to fix the sparse warning: drivers/net/e100.c:2418:48: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) drivers/net/e100.c:2419:37: warning: cast truncates bits from constant value (83126e978d4fdf becomes 978d4fdf) Initially, I tried a quick fix, but when it ran into difficulties, I looked at tg3.c to see how it does it. I liked their way better, so I rewrote e100.c to be similar. It shaves ~700 bytes off the size of the driver, and a few bytes off the size of struct nic, so I think it's a win all round. Tested on the internal interface of an HP Integrity rx2600. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] Seems sane to me... I'll pick it up, if Auke doesn't... no objections, so I'll ACK it with the notion that I'm going to let our labs do some more testing on it with all the latest changes to it. Jeff, I will stack it on the patches I have for 2.6.20 and push those out before the weekend. Cheers, Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7421] New: Oops, EIP is at atalk_sendmsg
On Thu, 26 Oct 2006, Andrew Morton wrote: On Thu, 26 Oct 2006 04:08:36 -0700 [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=7421 Summary: Oops, EIP is at atalk_sendmsg Kernel Version: 2.6.18.1 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Distribution: Debian sarge Hardware Environment: i386 Problem Description: ct 26 10:01:03 localhost papd[3120]: restart (2.0.3) Oct 26 10:01:07 localhost kernel: BUG: unable to handle kernel NULL pointer \ dereference at virtual address Oct 26 10:01:07 localhost kernel: printing eip: Oct 26 10:01:07 localhost kernel: d0c16a8a Oct 26 10:01:07 localhost kernel: *pde = Oct 26 10:01:07 localhost kernel: Oops: [#1] Oct 26 10:01:07 localhost kernel: Modules linked in: appletalk psnap llc ipv6 \ pcmcia_core af_packet parport_pc parport floppy pcspkr sn d_maestro3 snd_ac97_codec \ snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd soundcore intel_agp uhci_hcd \ usbcore 3c59x mii agpgart mous edev tsdev joydev psmouse ide_cd cdrom rtc reiserfs \ ext3 jbd ide_disk ide_generic siimage aec62xx trm290 alim15x3 hpt34x hpt366 cmd64x \ piix rz1000 slc90e66 generic cs5530 cs5520 sc1200 triflex atiixp pdc202xx_old \ pdc202xx_new opti621 ns87415 cy82c693 amd74xx sis5513 via 82cxxx serverworks ide_core \ unix Oct 26 10:01:07 localhost kernel: CPU:0 Oct 26 10:01:07 localhost kernel: EIP:0060:[pg0+277633674/1070257152] Not \ tainted VLI Oct 26 10:01:07 localhost kernel: EFLAGS: 00010286 (2.6.17.14.2006-10-25 #1) Oct 26 10:01:07 localhost kernel: EIP is at atalk_sendmsg+0x15b/0x4e4 [appletalk] Oct 26 10:01:07 localhost kernel: eax: ebx: 002f ecx: \ edx: Oct 26 10:01:07 localhost kernel: esi: cadcb600 edi: ebp: cc9d7eec \ esp: cc9d7d6c Oct 26 10:01:07 localhost kernel: ds: 007b es: 007b ss: 0068 Oct 26 10:01:07 localhost kernel: Process afpd (pid: 3118, threadinfo=cc9d6000 \ task=cfe205d0) Oct 26 10:01:07 localhost kernel: Stack: c02b32c0 cc9d7ee8 cffbc500 \ d0c16f05 cffbc500 Oct 26 10:01:07 localhost kernel:cffbc500 cc9d7ec8 cadcb600 \ 0400 cc9d7f48 001b Oct 26 10:01:07 localhost kernel:cc9d7ec8 cc9d7e1c cc9d7ee8 c01fe97a cc9d7e1c \ ca252600 cc9d7ec8 001b Oct 26 10:01:07 localhost kernel: Call Trace: Oct 26 10:01:07 localhost kernel: d0c16f05 atalk_recvmsg+0xf2/0x105 [appletalk] \ c01fe97a sock_sendmsg+0xd0/0xeb Oct 26 10:01:07 localhost kernel: c0157bfd touch_atime+0xb4/0xbb c0198b22 \ copy_from_user+0x34/0x5a Oct 26 10:01:07 localhost kernel: c012383e autoremove_wake_function+0x0/0x3a \ c0198b22 copy_from_user+0x34/0x5a Oct 26 10:01:07 localhost kernel: c01fe490 move_addr_to_kernel+0x24/0x39 \ c01ffaaa sys_sendto+0xe9/0x10d Oct 26 10:01:07 localhost kernel: c01fe67e sock_attach_fd+0x72/0xd2 c0143d52 \ get_empty_filp+0x3b/0xe4 Oct 26 10:01:07 localhost kernel: c0143d7b get_empty_filp+0x64/0xe4 c0198ae4 \ copy_to_user+0x32/0x3c Oct 26 10:01:07 localhost kernel: c02001de sys_socketcall+0xf2/0x180 c0102a03 \ syscall_call+0x7/0xb Oct 26 10:01:07 localhost kernel: Code: 0c 83 c0 04 eb 15 c6 44 24 1a 00 0f b7 86 26 \ 01 00 00 66 89 44 24 18 8d 44 24 18 50 e8 e0 eb ff ff 89 44 24 04 85 f6 5d 8b 14 24 \ 8b 12 89 54 24 04 74 1b 8b 86 84 00 00 00 f6 c4 04 74 10 52 53 Oct 26 10:01:07 localhost kernel: EIP: [pg0+277633674/1070257152] \ atalk_sendmsg+0x15b/0x4e4 [appletalk] SS:ESP 0068:cc9d7d6c Oct 26 10:01:21 localhost atalkd[3106]: as_timer gateway 8000.100 down Steps to reproduce: restart the machine, start papd after network initializing has finished a second start of papd works fine appletalk is loades as module same behaviour with 2.6.17.14 Something like me too: Unable to handle kernel NULL pointer dereference at virtual address printing eip: c036b1ef *pde = Oops: [#1] PREEMPT Modules linked in: bonding CPU:0 EIP:0060:[c036b1ef]Not tainted VLI EFLAGS: 00010286 (2.6.15.1) EIP is at atalk_sendmsg+0x158/0x557 eax: d468fee4 ebx: 0017 ecx: d468fd20 edx: esi: edi: d7e88200 ebp: bfa7c480 esp: d468fd68 ds: 007b es: 007b ss: 0068 Process atalkd (pid: 551, threadinfo=d468e000 task=d6f55090) Stack: d468ff40 d468fee0 d70d20a0 0003 c036b6e0 d70d20a0 d70d20a0 d468fec0 d7e88200 0400 d468ff40 0003 d468fec0 d468fe18 bfa7c480 c02e2d5e d468fe18 d7194540 d468fec0 0003 Call Trace: [c036b6e0] atalk_recvmsg+0xf2/0x105 [c02e2d5e] sock_sendmsg+0xce/0xe9 [c01212c2]
[IPROUTE] manpage for lnstat
Hello, I wrote a manpage for lnstat, would be great if it could be applied to the next release. @Harald, I'm following your 'If somebody wants to do a manpage, feel free to send me a patch :)' of lnstat's README. :) regards, -mika- .TH LNSTAT 1 .SH NAME lnstat \- unified linux network statistics .SH SYNOPSIS .B lnstat .RI [ options ] .SH DESCRIPTION This manual page documents briefly the .B lnstat command. .PP \fBlnstat\fP is a generalized and more feature-complete replacement for the old rtstat program. In addition to routing cache statistics, it supports any kind of statistics the linux kernel exports via a file in /proc/net/stat/. .SH OPTIONS These programs follow the usual GNU command line syntax, with long options starting with two dashes (`-'). lnstat supports the following options. .TP .B \-h, \-\-help Show summary of options. .TP .B \-V, \-\-version Show version of program. .TP .B \-c, \-\-count count Print count number of intervals. .TP .B \-d, \-\-dump Dump list of available files/keys. .TP .B \-f, \-\-file file Statistics file to use. .TP .B \-i, \-\-interval intv Set interval to 'intv' seconds. .TP .B \-k, \-\-keys k,k,k,... Display only keys specified. .TP .B \-s, \-\-subject [0-2] Specify display of subject/header. '0' means no header at all, '1' prints a header only at start of the program and '2' prints a header every 20 lines. .TP .B \-w, \-\-width n,n,n,... Width for each field. .SH USAGE EXAMPLES .TP .B # lnstat -d Get a list of supported statistics files. .TP .B # lnstat -k arp_cache:entries,rt_cache:in_hit,arp_cache:destroys Select the specified files and keys. .TP .B # lnstat -i 10 Use an interval of 10 seconds. .TP .B # lnstat -f ip_conntrack Use only the specified file for statistics. .TP .B # lnstat -s 0 Do not print a header at all. .TP .B # lnstat -s 20 Print a header at start and every 20 lines. .TP .B # lnstat -c -1 -i 1 -f rt_cache -k entries,in_hit,in_slow_tot Display statistics for keys entries, in_hit and in_slow_tot of field rt_cache every second. .SH SEE ALSO .BR ip (8), and /usr/share/doc/iproute-doc/README.lnstat (package iproute-doc on Debian) .br .SH AUTHOR lnstat was written by Harald Welte [EMAIL PROTECTED]. .PP This manual page was written by Michael Prokop [EMAIL PROTECTED] for the Debian project (but may be used by others). pgplbiCZvTrRZ.pgp Description: PGP signature
[PATCH] myri10ge: ServerWorks HT2000 PCI id is already defined in pci_ids.h
[PATCH] myri10ge: ServerWorks HT2000 PCI id is already defined in pci_ids.h No need to keep defining PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE in the driver code since it is now defined in pci_ids.h. Signed-off-by: Brice Goglin [EMAIL PROTECTED] --- drivers/net/myri10ge/myri10ge.c |1 - 1 file changed, 1 deletion(-) Please apply for 2.6.19 since PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE has been added to pci_ids.h in -rc1 (commit 6397c75cbc4d7dbc3d07278b57c82a47dafb21b5). Thanks, Brice Index: linux-rc/drivers/net/myri10ge/myri10ge.c === --- linux-rc.orig/drivers/net/myri10ge/myri10ge.c 2006-10-26 22:18:53.0 +0200 +++ linux-rc/drivers/net/myri10ge/myri10ge.c2006-10-26 22:19:05.0 +0200 @@ -2410,7 +2410,6 @@ * firmware image, and set tx.boundary to 4KB. */ -#define PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE 0x0132 #define PCI_DEVICE_ID_INTEL_E5000_PCIE23 0x25f7 #define PCI_DEVICE_ID_INTEL_E5000_PCIE47 0x25fa - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
From: Evgeniy Polyakov [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 18:57:13 +0400 It just calls /sbin/modprobe, which in turn runs tons of scripts in /etc/hotplug, modprobe and other places... In the paranoid case we should not allow any user to load kernel modules, even known ones. Should this option be guarded by some capability check? Do you realize that sys_socket() already makes this kind of thing happen already? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE] manpage for lnstat
On Thu, Oct 26, 2006 at 10:41:26PM +0200, Michael Prokop wrote: Hello, I wrote a manpage for lnstat, would be great if it could be applied to the next release. Stephen: Please include it into your next release, looks fine to me. @Harald, I'm following your 'If somebody wants to do a manpage, feel free to send me a patch :)' of lnstat's README. :) thanks a lot! -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ We all know Linux is great...it does infinite loops in 5 seconds. -- Linus pgpudwX1x8eA0.pgp Description: PGP signature
Re: [RFC] tcp: setsockopt congestion control autoload
From: John Heffner [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 13:29:26 -0400 My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair to each other. If the modules are autoloaded, I could still enforce this by moving the modules out of /lib/modules, but I think it's cleaner to do it by loading/unloading modules as appropriate. Fair enough, and for the folks doing tests of congestion control algorithms they can run as root or whatever. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sealevel: uses arp_broken_ops
On Wed, 25 Oct 2006 18:03:13 +0200 Toralf Förster wrote: WARNING: arp_broken_ops [drivers/net/wan/sealevel.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 Here's the config: ... # CONFIG_INET is not set CONFIG_SEALEVEL_4021=m --- From: Randy Dunlap [EMAIL PROTECTED] Sealevel uses arp_broken_ops so it needs to depend on INET. Signed-off-by: Randy Dunlap [EMAIL PROTECTED] --- drivers/net/wan/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- linux-2619-rc3-pv.orig/drivers/net/wan/Kconfig +++ linux-2619-rc3-pv/drivers/net/wan/Kconfig @@ -127,7 +127,7 @@ config LANMEDIA # There is no way to detect a Sealevel board. Force it modular config SEALEVEL_4021 tristate Sealevel Systems 4021 support - depends on WAN ISA m ISA_DMA_API + depends on WAN ISA m ISA_DMA_API INET help This is a driver for the Sealevel Systems ACB 56 serial I/O adapter. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
Getting people to use wpa_supplicant almost exclusivly as the interface for wireless will improve a lot of things. One thing that might help here is to do some work on the wpa_cli - to make it easier for the startup scripts to do what they need, and also to make the command syntax easier for command line users to do what they need. Simon -Original Message- From: Dan Williams [mailto:[EMAIL PROTECTED] Sent: Thursday, October 26, 2006 8:33 AM To: Luis R. Rodriguez Cc: Johannes Berg; Michael Wu; Simon Barber; David Kimdon; netdev@vger.kernel.org; Jiri Benc; John W. Linville; Jean Tourrilhes; Hong Liu; Jouni Malinen Subject: Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211 On Thu, 2006-10-26 at 11:04 -0400, Luis R. Rodriguez wrote: On 10/26/06, Dan Williams [EMAIL PROTECTED] wrote: While wpa_supplicant is certainly the main client for stuff directly related to setting up a connection, there are quite a few other users of general WE calls to pull information out of the card, or to receive scan events. How about we just ditch iwconfig completely and move on to wpa_supplicant/wpa_cli as our next userspace application with nl80211/cg80211 as our new API for usersapce--kernel communication? As you point out, wpa_supplicant already does a lot for us -- and several distributions already rely on it. Some work is required but I think its worth it. If we do a complete move from WE to nl80211 it would be transparent to the users too. The one blocker I can think of here is startup scripts on various distributions. Most of those are shell, and they usually rely on iwconfig quite heavily. Getting those converted to wpa_supplicant wouldn't be a trivial amount of work, but it wouldn't be a ton either. Dan - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPROUTE] manpage for lnstat
On Thu, 26 Oct 2006 22:41:26 +0200 Michael Prokop [EMAIL PROTECTED] wrote: -mika- [lnstat.1 text/plain (2087 bytes)] Added, but I took the liberty of moving it to section 8 (lnstat.8) because that is where the other iproute2 commands are. -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 0/3] Add Regulatory Domain support to d80211
On Thu, 2006-10-26 at 11:33 -0400, Dan Williams wrote: The one blocker I can think of here is startup scripts on various distributions. Most of those are shell, and they usually rely on iwconfig quite heavily. Getting those converted to wpa_supplicant wouldn't be a trivial amount of work, but it wouldn't be a ton either. But as I've said forever... I intend to leave WE working when we move to cfg80211. We can't rip out the old userspace API just like that. johannes signature.asc Description: This is a digitally signed message part
Re: Network virtualization/isolation
Stephen Hemminger wrote: On Thu, 26 Oct 2006 11:44:55 +0200 Daniel Lezcano [EMAIL PROTECTED] wrote: [ ... ] Assuming you are talking about pseudo-virtualized environments, there are several different discussions. Yes, exact, I forgot to mention that. 1. How should the namespace be isolated for the virtualized containered applications? The network ressources should be related to the namespaces and especially the struct sock. So when a checkpoint is initiated for the container, you can identify the established connection, the timewait socket, the req queues, ... related to the container in order to freeze the traffic and checkpoint them. The IP addresses are not a valid discrimator for identifiying, for example if you have several containers interconnected into the same host. 2. How should traffic be restricted into/out of those containers. This is where existing netfilter, classification, etc, should be used. The network code is overly rich as it is, we don't need another abstraction. Using only the netfilters you will be not able to bind to the same INADDR_ANY,port in different containers. You will need to handle several IP addresses coming from IP aliasing and check source address to be sure the source address is related to the right container and not from a primary interface probably assigned to a different container. 3. Can the virtualized containers be secure? No. we really can't keep hostile root in a container from killing system without going to a hypervisor. That is totally true, the containers don't aim to replace full-virtualized environment. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
* John Heffner | 2006-10-26 13:29:26 [-0400]: My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair to each other. ACK, completely right. A user without CAP_NET_ADMIN MUST NOT changed the algorithm. We know that there are some unfairness out there. And maybe some time ago someone introduce a satellite-algorithm which is per definition completely unfair to vanilla tcp. We should guard this with a CAP_NET_ADMIN capability so that built-in modules also shouldn't be enabled. HGN -- Signed and/or encrypted mails preferd. Key-Id = 0x98350C22 Fingerprint = 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] tcp: setsockopt congestion control autoload
Hagen Paul Pfeifer wrote: * John Heffner | 2006-10-26 13:29:26 [-0400]: My reservation in doing this would be that as an administrator, I may want to choose exactly what congestion control is available any any given time. The different congestion control algorithms are not necessarily fair to each other. ACK, completely right. A user without CAP_NET_ADMIN MUST NOT changed the algorithm. We know that there are some unfairness out there. And maybe some time ago someone introduce a satellite-algorithm which is per definition completely unfair to vanilla tcp. We should guard this with a CAP_NET_ADMIN capability so that built-in modules also shouldn't be enabled. I don't know if I'd want to go that far. For example, there's a nice protocol TCP-LP which is by design unfair in the other direction -- it yields to other traffic so that you can basically run a scavenger service. If you really care about this, you could try to rank protocols based on aggressiveness (note this is not trivial) and do something like 'nice' where mortals can only nice up not down. Practically speaking, I'm not sure this is necessary (worth the effort). -John - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] netpoll per device txq
When the netpoll beast got really busy, it tended to clog things, so it stored them for later. But the beast was putting all it's skb's in one basket. This was bad because maybe some pipes were clogged and others were not. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/netpoll.h |2 + net/core/netpoll.c | 50 ++-- 2 files changed, 17 insertions(+), 35 deletions(-) --- linux-2.6.orig/include/linux/netpoll.h +++ linux-2.6/include/linux/netpoll.h @@ -33,6 +33,8 @@ struct netpoll_info { spinlock_t rx_lock; struct netpoll *rx_np; /* netpoll that registered an rx_hook */ struct sk_buff_head arp_tx; /* list of arp requests to reply to */ + struct sk_buff_head txq; + struct work_struct tx_work; }; void netpoll_poll(struct netpoll *np); --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -38,10 +38,6 @@ static struct sk_buff_head skb_pool; -static DEFINE_SPINLOCK(queue_lock); -static int queue_depth; -static struct sk_buff *queue_head, *queue_tail; - static atomic_t trapped; #define NETPOLL_RX_ENABLED 1 @@ -56,46 +52,25 @@ static void arp_reply(struct sk_buff *sk static void queue_process(void *p) { - unsigned long flags; + struct netpoll_info *npinfo = p; struct sk_buff *skb; - while (queue_head) { - spin_lock_irqsave(queue_lock, flags); - - skb = queue_head; - queue_head = skb-next; - if (skb == queue_tail) - queue_head = NULL; - - queue_depth--; - - spin_unlock_irqrestore(queue_lock, flags); - + while ((skb = skb_dequeue(npinfo-txq))) dev_queue_xmit(skb); - } -} -static DECLARE_WORK(send_queue, queue_process, NULL); +} void netpoll_queue(struct sk_buff *skb) { - unsigned long flags; + struct net_device *dev = skb-dev; + struct netpoll_info *npinfo = dev-npinfo; - if (queue_depth == MAX_QUEUE_DEPTH) { - __kfree_skb(skb); - return; + if (!npinfo) + kfree_skb(skb); + else { + skb_queue_tail(npinfo-txq, skb); + schedule_work(npinfo-tx_work); } - - spin_lock_irqsave(queue_lock, flags); - if (!queue_head) - queue_head = skb; - else - queue_tail-next = skb; - queue_tail = skb; - queue_depth++; - spin_unlock_irqrestore(queue_lock, flags); - - schedule_work(send_queue); } static int checksum_udp(struct sk_buff *skb, struct udphdr *uh, @@ -649,6 +624,9 @@ int netpoll_setup(struct netpoll *np) npinfo-tries = MAX_RETRIES; spin_lock_init(npinfo-rx_lock); skb_queue_head_init(npinfo-arp_tx); + skb_queue_head_init(npinfo-txq); + INIT_WORK(npinfo-tx_work, queue_process, npinfo); + atomic_set(npinfo-refcnt, 1); } else { npinfo = ndev-npinfo; @@ -771,6 +749,8 @@ void netpoll_cleanup(struct netpoll *np) np-dev-npinfo = NULL; if (atomic_dec_and_test(npinfo-refcnt)) { skb_queue_purge(npinfo-arp_tx); + skb_queue_purge(npinfo-txq); + flush_scheduled_work(); kfree(npinfo); } -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] netpoll retry cleanup
The netpoll beast was still not happy. If the beast got clogged pipes, it tended to stare blankly off in space for a long time. The problem couldn't be completely fixed because the beast talked with irq's disabled. But it could be made less painful and shorter. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/netpoll.h |1 net/core/netpoll.c | 71 ++-- 2 files changed, 33 insertions(+), 39 deletions(-) --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -34,12 +34,12 @@ #define MAX_UDP_CHUNK 1460 #define MAX_SKBS 32 #define MAX_QUEUE_DEPTH (MAX_SKBS / 2) -#define MAX_RETRIES 2 static struct sk_buff_head skb_pool; static atomic_t trapped; +#define USEC_PER_POLL 50 #define NETPOLL_RX_ENABLED 1 #define NETPOLL_RX_DROP 2 @@ -72,6 +72,7 @@ static void queue_process(void *p) schedule_delayed_work(npinfo-tx_work, HZ/10); return; } + netif_tx_unlock_bh(dev); } } @@ -241,50 +242,44 @@ repeat: static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb) { - int status; - struct netpoll_info *npinfo; + int status = NETDEV_TX_BUSY; + unsigned long tries; + struct net_device *dev = np-dev; + struct netpoll_info *npinfo = np-dev-npinfo; + + if (!npinfo || !netif_running(dev) || !netif_device_present(dev)) { + __kfree_skb(skb); + return; + } + + /* don't get messages out of order, and no recursion */ + if ( !(np-drop == netpoll_queue skb_queue_len(npinfo-txq)) + npinfo-poll_owner != smp_processor_id() + netif_tx_trylock(dev)) { + + /* try until next clock tick */ + for(tries = jiffies_to_usecs(1)/USEC_PER_POLL; tries 0; --tries) { + if (!netif_queue_stopped(dev)) + status = dev-hard_start_xmit(skb, dev); - if (!np || !np-dev || !netif_running(np-dev)) { - __kfree_skb(skb); - return; - } + if (status == NETDEV_TX_OK) + break; + + /* tickle device maybe there is some cleanup */ + netpoll_poll(np); - npinfo = np-dev-npinfo; + udelay(USEC_PER_POLL); + } + netif_tx_unlock(dev); + } - /* avoid recursion */ - if (npinfo-poll_owner == smp_processor_id() || - np-dev-xmit_lock_owner == smp_processor_id()) { + if (status != NETDEV_TX_OK) { + /* requeue for later */ if (np-drop) np-drop(skb); else __kfree_skb(skb); - return; } - - do { - npinfo-tries--; - netif_tx_lock(np-dev); - - /* -* network drivers do not expect to be called if the queue is -* stopped. -*/ - status = NETDEV_TX_BUSY; - if (!netif_queue_stopped(np-dev)) - status = np-dev-hard_start_xmit(skb, np-dev); - - netif_tx_unlock(np-dev); - - /* success */ - if(!status) { - npinfo-tries = MAX_RETRIES; /* reset */ - return; - } - - /* transmit busy */ - netpoll_poll(np); - udelay(50); - } while (npinfo-tries 0); } void netpoll_send_udp(struct netpoll *np, const char *msg, int len) @@ -640,7 +635,7 @@ int netpoll_setup(struct netpoll *np) npinfo-rx_np = NULL; spin_lock_init(npinfo-poll_lock); npinfo-poll_owner = -1; - npinfo-tries = MAX_RETRIES; + spin_lock_init(npinfo-rx_lock); skb_queue_head_init(npinfo-arp_tx); skb_queue_head_init(npinfo-txq); --- linux-2.6.orig/include/linux/netpoll.h +++ linux-2.6/include/linux/netpoll.h @@ -28,7 +28,6 @@ struct netpoll_info { atomic_t refcnt; spinlock_t poll_lock; int poll_owner; - int tries; int rx_flags; spinlock_t rx_lock; struct netpoll *rx_np; /* netpoll that registered an rx_hook */ -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8] netpoll header cleanup
As Steve left netpoll beast, hopefully not to return soon. He noticed that the header was messy. He straightened it up and polished it a little, then waved goodbye. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/netpoll.h |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) --- linux-2.6.orig/include/linux/netpoll.h +++ linux-2.6/include/linux/netpoll.h @@ -12,16 +12,15 @@ #include linux/rcupdate.h #include linux/list.h -struct netpoll; - struct netpoll { struct net_device *dev; - char dev_name[16], *name; + char dev_name[IFNAMSIZ]; + const char *name; void (*rx_hook)(struct netpoll *, int, char *, int); u32 local_ip, remote_ip; u16 local_port, remote_port; - unsigned char local_mac[6], remote_mac[6]; + u8 local_mac[ETH_ALEN], remote_mac[ETH_ALEN]; }; struct netpoll_info { -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] netpoll setup error handling
The beast was not always healthy. When it was sick, it tended to be laconic and not tell anyone the real problem. A few small changes had it telling the world about its problems, if they really wanted to hear. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/netconsole.c |7 +-- net/core/netpoll.c | 20 +--- 2 files changed, 18 insertions(+), 9 deletions(-) --- linux-2.6.orig/drivers/net/netconsole.c +++ linux-2.6/drivers/net/netconsole.c @@ -102,6 +102,8 @@ __setup(netconsole=, option_setup); static int init_netconsole(void) { + int err; + if(strlen(config)) option_setup(config); @@ -110,8 +112,9 @@ static int init_netconsole(void) return 0; } - if(netpoll_setup(np)) - return -EINVAL; + err = netpoll_setup(np); + if (err) + return err; register_console(netconsole); printk(KERN_INFO netconsole: network logging started\n); --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -602,20 +602,23 @@ int netpoll_setup(struct netpoll *np) struct in_device *in_dev; struct netpoll_info *npinfo; unsigned long flags; + int err; if (np-dev_name) ndev = dev_get_by_name(np-dev_name); if (!ndev) { printk(KERN_ERR %s: %s doesn't exist, aborting.\n, np-name, np-dev_name); - return -1; + return -ENODEV; } np-dev = ndev; if (!ndev-npinfo) { npinfo = kmalloc(sizeof(*npinfo), GFP_KERNEL); - if (!npinfo) + if (!npinfo) { + err = -ENOMEM; goto release; + } npinfo-rx_flags = 0; npinfo-rx_np = NULL; @@ -636,6 +639,7 @@ int netpoll_setup(struct netpoll *np) if (!ndev-poll_controller) { printk(KERN_ERR %s: %s doesn't support polling, aborting.\n, np-name, np-dev_name); + err = -ENOTSUPP; goto release; } @@ -646,13 +650,14 @@ int netpoll_setup(struct netpoll *np) np-name, np-dev_name); rtnl_lock(); - if (dev_change_flags(ndev, ndev-flags | IFF_UP) 0) { + err = dev_open(ndev); + rtnl_unlock(); + + if (err) { printk(KERN_ERR %s: failed to open %s\n, - np-name, np-dev_name); - rtnl_unlock(); + np-name, ndev-name); goto release; } - rtnl_unlock(); atleast = jiffies + HZ/10; atmost = jiffies + 4*HZ; @@ -690,6 +695,7 @@ int netpoll_setup(struct netpoll *np) rcu_read_unlock(); printk(KERN_ERR %s: no IP address for %s, aborting\n, np-name, np-dev_name); + err = -EDESTADDRREQ; goto release; } @@ -722,7 +728,7 @@ int netpoll_setup(struct netpoll *np) kfree(npinfo); np-dev = NULL; dev_put(ndev); - return -1; + return err; } static int __init netpoll_init(void) { -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] netpoll queue cleanup
The beast had a long and not very happy history. At one point, a friend (netdump) had asked that he open up a little. Well, the friend was long gone now, and the beast had this dangling piece hanging (netpoll_queue). It wasn't hard to stitch the netpoll_queue back in where it belonged and make everything tidy. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- drivers/net/netconsole.c |1 - include/linux/netpoll.h |4 ++-- net/core/netpoll.c | 23 +++ 3 files changed, 5 insertions(+), 23 deletions(-) --- linux-2.6.orig/drivers/net/netconsole.c +++ linux-2.6/drivers/net/netconsole.c @@ -60,7 +60,6 @@ static struct netpoll np = { .local_port = 6665, .remote_port = , .remote_mac = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff}, - .drop = netpoll_queue, }; static int configured = 0; --- linux-2.6.orig/include/linux/netpoll.h +++ linux-2.6/include/linux/netpoll.h @@ -18,7 +18,7 @@ struct netpoll { struct net_device *dev; char dev_name[16], *name; void (*rx_hook)(struct netpoll *, int, char *, int); - void (*drop)(struct sk_buff *skb); + u32 local_ip, remote_ip; u16 local_port, remote_port; unsigned char local_mac[6], remote_mac[6]; @@ -44,7 +44,7 @@ int netpoll_trap(void); void netpoll_set_trap(int trap); void netpoll_cleanup(struct netpoll *np); int __netpoll_rx(struct sk_buff *skb); -void netpoll_queue(struct sk_buff *skb); + #ifdef CONFIG_NETPOLL static inline int netpoll_rx(struct sk_buff *skb) --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -77,19 +77,6 @@ static void queue_process(void *p) } } -void netpoll_queue(struct sk_buff *skb) -{ - struct net_device *dev = skb-dev; - struct netpoll_info *npinfo = dev-npinfo; - - if (!npinfo) - kfree_skb(skb); - else { - skb_queue_tail(npinfo-txq, skb); - schedule_work(npinfo-tx_work); - } -} - static int checksum_udp(struct sk_buff *skb, struct udphdr *uh, unsigned short ulen, u32 saddr, u32 daddr) { @@ -253,7 +240,7 @@ static void netpoll_send_skb(struct netp } /* don't get messages out of order, and no recursion */ - if ( !(np-drop == netpoll_queue skb_queue_len(npinfo-txq)) + if ( skb_queue_len(npinfo-txq) == 0 npinfo-poll_owner != smp_processor_id() netif_tx_trylock(dev)) { @@ -274,11 +261,8 @@ static void netpoll_send_skb(struct netp } if (status != NETDEV_TX_OK) { - /* requeue for later */ - if (np-drop) - np-drop(skb); - else - __kfree_skb(skb); + skb_queue_tail(npinfo-txq, skb); + schedule_work(npinfo-tx_work); } } @@ -800,4 +784,3 @@ EXPORT_SYMBOL(netpoll_setup); EXPORT_SYMBOL(netpoll_cleanup); EXPORT_SYMBOL(netpoll_send_udp); EXPORT_SYMBOL(netpoll_poll); -EXPORT_SYMBOL(netpoll_queue); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] netpoll deferred transmit path
When the netpoll beast got busy, he tended to babble. Instead of talking out of his large mouth as normal, he tended to try to snort out other orifices. This lead to words (skbs) ending up in odd places (like NIT) that he did not intend. The normal way of talking wouldn't work, but he could at least change to using the same tone all the time. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/core/netpoll.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -55,9 +55,25 @@ static void queue_process(void *p) struct netpoll_info *npinfo = p; struct sk_buff *skb; - while ((skb = skb_dequeue(npinfo-txq))) - dev_queue_xmit(skb); + while ((skb = skb_dequeue(npinfo-txq))) { + struct net_device *dev = skb-dev; + if (!netif_device_present(dev) || !netif_running(dev)) { + __kfree_skb(skb); + continue; + } + + netif_tx_lock_bh(dev); + if (netif_queue_stopped(dev) || + dev-hard_start_xmit(skb, dev) != NETDEV_TX_OK) { + skb_queue_head(npinfo-txq, skb); + netif_tx_unlock_bh(dev); + + schedule_delayed_work(npinfo-tx_work, HZ/10); + return; + } + netif_tx_unlock_bh(dev); + } } void netpoll_queue(struct sk_buff *skb) @@ -756,6 +772,7 @@ void netpoll_cleanup(struct netpoll *np) if (atomic_dec_and_test(npinfo-refcnt)) { skb_queue_purge(npinfo-arp_tx); skb_queue_purge(npinfo-txq); + cancel_rearming_delayed_work(npinfo-tx_work); flush_scheduled_work(); kfree(npinfo); -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] netpoll info leak
After looking harder, Steve noticed that the netpoll beast leaked a little every time it shutdown for a nap. Not a big leak, but a nuisance kind of thing. He took out his refcount duct tape and patched the leak. It was overkill since there was already other locking in that area, but it looked clean and wouldn't attract fleas. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/netpoll.h |1 + net/core/netpoll.c | 25 +++-- 2 files changed, 20 insertions(+), 6 deletions(-) --- linux-2.6.orig/include/linux/netpoll.h +++ linux-2.6/include/linux/netpoll.h @@ -25,6 +25,7 @@ struct netpoll { }; struct netpoll_info { + atomic_t refcnt; spinlock_t poll_lock; int poll_owner; int tries; --- linux-2.6.orig/net/core/netpoll.c +++ linux-2.6/net/core/netpoll.c @@ -649,8 +649,11 @@ int netpoll_setup(struct netpoll *np) npinfo-tries = MAX_RETRIES; spin_lock_init(npinfo-rx_lock); skb_queue_head_init(npinfo-arp_tx); - } else + atomic_set(npinfo-refcnt, 1); + } else { npinfo = ndev-npinfo; + atomic_inc(npinfo-refcnt); + } if (!ndev-poll_controller) { printk(KERN_ERR %s: %s doesn't support polling, aborting.\n, @@ -757,12 +760,22 @@ void netpoll_cleanup(struct netpoll *np) if (np-dev) { npinfo = np-dev-npinfo; - if (npinfo npinfo-rx_np == np) { - spin_lock_irqsave(npinfo-rx_lock, flags); - npinfo-rx_np = NULL; - npinfo-rx_flags = ~NETPOLL_RX_ENABLED; - spin_unlock_irqrestore(npinfo-rx_lock, flags); + if (npinfo) { + if (npinfo-rx_np == np) { + spin_lock_irqsave(npinfo-rx_lock, flags); + npinfo-rx_np = NULL; + npinfo-rx_flags = ~NETPOLL_RX_ENABLED; + spin_unlock_irqrestore(npinfo-rx_lock, flags); + } + + np-dev-npinfo = NULL; + if (atomic_dec_and_test(npinfo-refcnt)) { + skb_queue_purge(npinfo-arp_tx); + + kfree(npinfo); + } } + dev_put(np-dev); } -- Stephen Hemminger [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] netpoll: A Halloween horror mystery
It was dull and cloudy day in Portland when Steve first went hunting for suspend bugs in the sky2 driver. The hunt was motivated by a certain mini owned by penguin, but he could never get the license number. Anyway, he stumbled down some blind alleys and met the: NETPOLL BEAST It wasn't that beast was ugly, like some of the other things he had seen. It was just an untidy mess, the kind of thing you didn't want to bring home to mother, instead he would have rather booted over the fence to the Viro shredder. But since he was in the neighborhood, he got out his keyboard and went to work. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] s2io: add PCI error recovery support
Hi. On Thu, Oct 26, 2006 at 05:56:34AM -0400, Ananda Raju wrote: Hi, Can you try attached patch. The attached patch is simple. We set card state as down in error_detecct() so that all entry points return error and don't proceed further. In slot_reset() we do s2io_card_down() will reset adapter. In io_resume() we bringup the driver. Simplicity is always better. However, some questions/comments: @@ -4175,6 +4186,10 @@ static irqreturn_t s2io_isr(int irq, voi mac_info_t *mac_control; struct config_param *config; + if (atomic_read(sp-card_state) == CARD_DOWN) { + return IRQ_NONE; + } I used if ((sp-pdev-error_state != pci_channel_io_normal) here for a reason: the pdev-error_state is set even in an interrupt context, that is, it gets set even if interrups are disabled, and so it represents the actual state immediately. By contrast, the error callbacks do not get called until possibly much later, and so sp-card_state = CARD_DOWN might not get set for a while. If, for any reason, e.g. some obscure corner case, the s2io generates zillions of interupts, this could result in a soft-lockup. I actually saw this in the symbios device driver, which will regenerate an interrupt until its acknowledged -- and so it sat there, spinning. :-( I was returning IRQ_HANDLED instead of IRQ_NONE, so as to avoid falling into handle_bad_irq() or report_bad_irq(). I haven't seen this happen on s2io, but thought it would still be wise. If this can't happen, then there's no problem here. +/** + * s2io_io_slot_reset - called after the pci bus has been reset. + * @pdev: Pointer to PCI device + * + * Restart the card from scratch, as if from a cold-boot. + */ +static pci_ers_result_t s2io_io_slot_reset(struct pci_dev *pdev) +{ At this point, the card has just experienced a hardware reset, (the #RST wire was held low for 250 millisecs, followed by a settle time of 2 seconds, followed by whatever BIOS thinks it needed to do, followed by a restore of the pci config space to what it was after a cold boot. So the card is in a fresh state; in theory its identitcal to a cold boot. So ... are you sure you want to down at this point? + s2io_card_down(sp); + sp-device_close_flag = TRUE; /* Device is shut down. */ One problem I'm having is that the watchdog timer sometimes pops and tries to reset the card before s2io_card_down() has a chance to run. I fixed this ... == So -- just for grins, I thought to myself, Maybe I can make s2io be the first adapter ever to fully recover without a hard reset of the card. The idea is simple: 1) enable MMIO, 2) call s2io_card_down() 3) enable DMA 4) cal s2io_card_up() I have a patch that does this, but then hit a few more snags. I haven't yet nailed down all the trouble spots, maybe tommorrow. --linas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.18 forcedeth GSO panic on send
Hello, I am using an AMD64 box with 32bit userspace / 64bit kernel. Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff over the net - for example, svn commit, scp are affected. 2.6.17.11 does not seem to be affected. Unfortunately even 60-line screen is not big enough to catch whole trace. There are at least two traces, and first scrolls off. I have a photo at http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg Something bad is happening here, when kernel tries to send some data: ... error_exit skb_over_panic skb_over_panic skb_segment tcp_tso_segment inet_gso_segment skb_gso_segment dev_hard_start_xmit dev_queue_xmit ... Looks like it is related to hardware accel in forcedeth. I will try disabling all hw accel. Please find in attached tarball: .config dmesg ethtool-k lspci lspci-v -- vda gso_panic.tar.bz2 Description: application/tbz
[PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
Check if user has CAP_NET_ADMIN capability to change congestion control algorithm. Under normal circumstances a application programmer doesn't have enough information to choose the right algorithm (expect he is the pchar/pathchar maintainer). At 99.9% only the local host administrator has the knowledge to select a proper standard, system-wide algorithm (the remaining 0.1% are for testing purpose). If we let the user select an alternative algorithm we introduce one potential weak spot - so we ban this eventuality. HGN Signed-off-by: Hagen Paul Pfeifer [EMAIL PROTECTED] diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index af0aca1..c1ae2e9 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -10,6 +10,7 @@ #include linux/module.h #include linux/mm.h #include linux/types.h #include linux/list.h +#include linux/capability.h #include net/tcp.h static DEFINE_SPINLOCK(tcp_cong_list_lock); @@ -151,6 +152,9 @@ int tcp_set_congestion_control(struct so struct tcp_congestion_ops *ca; int err = 0; + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + rcu_read_lock(); ca = tcp_ca_find(name); if (ca == icsk-icsk_ca_ops) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
On 10/27/06, Hagen Paul Pfeifer [EMAIL PROTECTED] wrote: Check if user has CAP_NET_ADMIN capability to change congestion control algorithm. Under normal circumstances a application programmer doesn't have enough information to choose the right algorithm (expect he is the pchar/pathchar maintainer). At 99.9% only the local host administrator has the knowledge to select a proper standard, system-wide algorithm (the remaining 0.1% are for testing purpose). If we let the user select an alternative algorithm we introduce one potential weak spot - so we ban this eventuality. I don't agree with this at all. I would love Firefox, BitTorrent etc to implement usage of TCP-LP for example so they use unused bandwidth only. With this change applications can't do this. If we are going to restrict by capabilities then I think we should only restrict module loading - this way the admin of the box can decide what algorithms can be used. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
This is driving me crazy... Your email client turned the tabs into spaces in the patch making it useless. I want to ask why it is so hard for people to submit patches that are not corrupted? :-/ I type in this kind of email response at least 2 or 3 times every single day that I review patches. Spending the time to review a patch only to find out that it is corrupted and doesn't apply consumes a significant chunk of my time. Again, send the patch in an email to yourself and try to apply the patch from that email if you are in doubt. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
From: Ian McDonald [EMAIL PROTECTED] Date: Fri, 27 Oct 2006 12:59:30 +1300 I don't agree with this at all. I would love Firefox, BitTorrent etc to implement usage of TCP-LP for example so they use unused bandwidth only. With this change applications can't do this. If we are going to restrict by capabilities then I think we should only restrict module loading - this way the admin of the box can decide what algorithms can be used. You are using an example of a (supposedly) safe case of this as a justification for allowing all cases. It is bad, very bad, to allow arbitrary users to select arbitrary congestion control algorithms. It is just as bad as allowing them to disable congestion control completely if that were an option. If someone, for example, builds all the algorithms statically into their kernel, for testing as root, this lets all users on the machine do the same which is not right. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] netpoll: skb private pool management
From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 15:46:49 -0700 @@ -188,19 +186,14 @@ void netpoll_poll(struct netpoll *np) static void refill_skbs(void) { struct sk_buff *skb; - unsigned long flags; - spin_lock_irqsave(skb_list_lock, flags); - while (nr_skbs MAX_SKBS) { + while (skb_queue_len(skb_pool) MAX_SKBS) { Previously, the lock actually protected nr_skbs from going over MAX_SKBS properly, but the new code does not. skb_queue_len() is lockless. Stephen, I really appreciate your efforts to clean up netpoll, but on every iteration I am finding simple errors on the first patch every time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sealevel: uses arp_broken_ops
From: Randy Dunlap [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 14:08:08 -0700 Sealevel uses arp_broken_ops so it needs to depend on INET. Signed-off-by: Randy Dunlap [EMAIL PROTECTED] Applied, thanks Randy. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KJ] [Patch] kmemdup() cleanup in net/
From: Eric Sesterhenn [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 22:49:31 +0200 arg, i thought i compile tested everything, please use this version. Signed-off-by: Eric Sesterhenn [EMAIL PROTECTED] Definitely post-2.6.19 material, please resubmit when 2.6.20 merging opens up, thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
On 10/27/06, David Miller [EMAIL PROTECTED] wrote: From: Ian McDonald [EMAIL PROTECTED] Date: Fri, 27 Oct 2006 12:59:30 +1300 I don't agree with this at all. I would love Firefox, BitTorrent etc to implement usage of TCP-LP for example so they use unused bandwidth only. With this change applications can't do this. If we are going to restrict by capabilities then I think we should only restrict module loading - this way the admin of the box can decide what algorithms can be used. You are using an example of a (supposedly) safe case of this as a justification for allowing all cases. It is bad, very bad, to allow arbitrary users to select arbitrary congestion control algorithms. It is just as bad as allowing them to disable congestion control completely if that were an option. OK understand your point here but I think low priority TCP has its use. Don't agree it is just as bad, but it is bad under the wrong circumstances - it's still better than UDP which has no congestion control... Don't want to make it over complicated though. I think the most sense would be to restrict it as shown as tcp-lp is the exception and allow tcp-lp via another mechanism. That is a situation where the user could specify how low priority they want the traffic to be... If I ever get enough time I'll have a go at it but can't see it this year :-( It actually makes more sense to tie the congestion control algorithm to the route/destination IP if we are going to change it but that is a whole another exercise in itself. If someone, for example, builds all the algorithms statically into their kernel, for testing as root, this lets all users on the machine do the same which is not right. This is the state at present as I understand it. However that doesn't make it right. -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 7421] New: Oops, EIP is at atalk_sendmsg
From: Andrew Morton [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 09:44:38 -0700 Oct 26 10:01:07 localhost kernel: EIP is at atalk_sendmsg+0x15b/0x4e4 [appletalk] Oct 26 10:01:07 localhost kernel: eax: ebx: 002f ecx: \ edx: Oct 26 10:01:07 localhost kernel: esi: cadcb600 edi: ebp: cc9d7eec \ esp: cc9d7d6c Does this make the bug go away? This code has been like this for a long time, I'm surprised it never triggered before. We properly set dev = rt-dev right after the if (!rt) check, so the two settings removed by this patch were not only OOPS-prone, they were also superfluous. diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c index 708e2e0..485e35c 100644 --- a/net/appletalk/ddp.c +++ b/net/appletalk/ddp.c @@ -1584,7 +1584,6 @@ #endif if (usat-sat_addr.s_net || usat-sat_addr.s_node == ATADDR_ANYNODE) { rt = atrtr_find(usat-sat_addr); - dev = rt-dev; } else { struct atalk_addr at_hint; @@ -1592,7 +1591,6 @@ #endif at_hint.s_net = at-src_net; rt = atrtr_find(at_hint); - dev = rt-dev; } if (!rt) return -ENETUNREACH; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Check if user has CAP_NET_ADMIN to change congestion control algorithm
On Fri, 27 Oct 2006 01:52:56 +0200 Hagen Paul Pfeifer [EMAIL PROTECTED] wrote: Check if user has CAP_NET_ADMIN capability to change congestion control algorithm. Under normal circumstances a application programmer doesn't have enough information to choose the right algorithm (expect he is the pchar/pathchar maintainer). At 99.9% only the local host administrator has the knowledge to select a proper standard, system-wide algorithm (the remaining 0.1% are for testing purpose). If we let the user select an alternative algorithm we introduce one potential weak spot - so we ban this eventuality. HGN If you aren't doing experiments don't compile it in your kernel. If distro's are including unfair congestion control file a bug report. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] netpoll: skb private pool management
From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 18:04:02 -0700 On Thu, 26 Oct 2006 17:12:47 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 15:46:49 -0700 @@ -188,19 +186,14 @@ void netpoll_poll(struct netpoll *np) static void refill_skbs(void) { struct sk_buff *skb; - unsigned long flags; - spin_lock_irqsave(skb_list_lock, flags); - while (nr_skbs MAX_SKBS) { + while (skb_queue_len(skb_pool) MAX_SKBS) { Previously, the lock actually protected nr_skbs from going over MAX_SKBS properly, but the new code does not. skb_queue_len() is lockless. Stephen, I really appreciate your efforts to clean up netpoll, but on every iteration I am finding simple errors on the first patch every time. racing over by one is not a big issue. It's potentially racing by more than that, depending upon whether any cpus take interrupts and are stalled for significiant time after making the decision to add. The upper bound is something like (2 * NCPUS) - 1. It's a bug Stephen. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/8] netpoll: skb private pool management
It was a dark and stormy night when Steve first saw the netpoll beast. The beast was odd, and misshapen but not extremely ugly. Let me take off one of your warts he said. This wart is where you tried to make an skb list yourself. If the beast had ever run out of memory, he would have stupefied himself unnecessarily. The first try was painful, so he tried again till the bleeding stopped. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- net/core/netpoll.c | 53 + 1 file changed, 21 insertions(+), 32 deletions(-) --- netpoll.orig/net/core/netpoll.c 2006-10-26 19:12:36.0 -0700 +++ netpoll/net/core/netpoll.c 2006-10-26 19:16:05.0 -0700 @@ -36,9 +36,7 @@ #define MAX_QUEUE_DEPTH (MAX_SKBS / 2) #define MAX_RETRIES 2 -static DEFINE_SPINLOCK(skb_list_lock); -static int nr_skbs; -static struct sk_buff *skbs; +static struct sk_buff_head skb_pool; static DEFINE_SPINLOCK(queue_lock); static int queue_depth; @@ -190,17 +188,15 @@ struct sk_buff *skb; unsigned long flags; - spin_lock_irqsave(skb_list_lock, flags); - while (nr_skbs MAX_SKBS) { + spin_lock_irqsave(skb_pool-lock, flags); + while (skb_pool.qlen MAX_SKBS) { skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC); if (!skb) break; - skb-next = skbs; - skbs = skb; - nr_skbs++; + __skb_queue_tail(skb_pool, skb); } - spin_unlock_irqrestore(skb_list_lock, flags); + spin_unlock_irqrestore(skb_pool-lock, flags); } static void zap_completion_queue(void) @@ -229,38 +225,25 @@ put_cpu_var(softnet_data); } -static struct sk_buff * find_skb(struct netpoll *np, int len, int reserve) +static struct sk_buff *find_skb(struct netpoll *np, int len, int reserve) { - int once = 1, count = 0; - unsigned long flags; - struct sk_buff *skb = NULL; + int count = 0; + struct sk_buff *skb; zap_completion_queue(); + refill_skbs(); repeat: - if (nr_skbs MAX_SKBS) - refill_skbs(); skb = alloc_skb(len, GFP_ATOMIC); - - if (!skb) { - spin_lock_irqsave(skb_list_lock, flags); - skb = skbs; - if (skb) { - skbs = skb-next; - skb-next = NULL; - nr_skbs--; - } - spin_unlock_irqrestore(skb_list_lock, flags); - } + if (!skb) + skb = skb_dequeue(skb_pool); if(!skb) { - count++; - if (once (count == 100)) { - printk(out of netpoll skbs!\n); - once = 0; + if (++count 10) { + netpoll_poll(np); + goto repeat; } - netpoll_poll(np); - goto repeat; + return NULL; } atomic_set(skb-users, 1); @@ -764,6 +747,12 @@ return -1; } +static int __init netpoll_init(void) { + skb_queue_head_init(skb_pool); + return 0; +} +core_initcall(netpoll_init); + void netpoll_cleanup(struct netpoll *np) { struct netpoll_info *npinfo; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles
The following set of patches contain the source code for the NetEffect NE010 iWarp adapter running under the OpenFabrics Alliance software stack. This is a repost. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/Kconfig new/drivers/infiniband/Kconfig --- old/drivers/infiniband/Kconfig 2006-10-25 09:57:43.0 -0500 +++ new/drivers/infiniband/Kconfig 2006-10-25 10:48:40.0 -0500 @@ -41,6 +41,8 @@ source drivers/infiniband/hw/ehca/Kconf source drivers/infiniband/hw/amso1100/Kconfig +source drivers/infiniband/hw/nes/Kconfig + source drivers/infiniband/hw/cxgb3/Kconfig source drivers/infiniband/ulp/ipoib/Kconfig diff -ruNp old/drivers/infiniband/hw/nes/Kconfig new/drivers/infiniband/hw/nes/Kconfig --- old/drivers/infiniband/hw/nes/Kconfig 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/Kconfig 2006-10-25 10:50:18.0 -0500 @@ -0,0 +1,15 @@ +config INFINIBAND_NES + tristate NetEffect RNIC support + depends on PCI INET INFINIBAND + ---help--- + This is a low-level driver for NetEffect RDMA enabled + Network Interface Cards (RNIC). + +config INFINIBAND_NES_DEBUG + bool Verbose debugging output + depends on INFINIBAND_NES + default n + ---help--- + This option causes the NetEffect RNIC driver to produce debug + messages. Select this if you are developing the driver + or trying to diagnose a problem. diff -ruNp old/drivers/infiniband/hw/nes/Makefile new/drivers/infiniband/hw/nes/Makefile --- old/drivers/infiniband/hw/nes/Makefile 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/Makefile 2006-10-25 11:10:26.0 -0500 @@ -0,0 +1,27 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/hw/nes/nes_tcpip/include + +ifdef CONFIG_INFINIBAND_NES_DEBUG +EXTRA_CFLAGS += -DNES_DEBUG +endif + +ifneq ($(KERNELRELEASE),) + obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o + + iw_nes-objs := \ + nes.o \ + nes_hw.o \ + nes_nic.o \ + nes_cm.o \ + nes_utils.o \ + nes_verbs.o +else + KERNELDIR ?= /usr/src/linux + PWD := $(shell pwd) + +default: + $(MAKE) -C $(KERNELDIR) M=$(PWD) modules + +clean: + $(MAKE) -C $(KERNELDIR) M=$(PWD) clean + +endif - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] NetEffect 10Gb RNIC Driver: main kernel driver c file
Kernel driver patch 2 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes.c new/drivers/infiniband/hw/nes/nes.c --- old/drivers/infiniband/hw/nes/nes.c 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes.c 2006-10-25 10:15:49.0 -0500 @@ -0,0 +1,653 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#include linux/module.h +#include linux/moduleparam.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/mii.h +#include linux/if_vlan.h +#include linux/crc32.h +#include linux/in.h +#include linux/init.h +#include linux/if_arp.h +#include asm/io.h +#include asm/irq.h +#include asm/byteorder.h + +#include rdma/ib_smi.h +#include rdma/ib_verbs.h +#include rdma/ib_pack.h +#include rdma/iw_cm.h + +#include nes.h + +MODULE_AUTHOR(NetEffect); +MODULE_DESCRIPTION(NetEffect RNIC Low-level iWARP Driver); +MODULE_LICENSE(Dual BSD/GPL); +MODULE_VERSION(DRV_VERSION); + +int max_mtu = ETH_DATA_LEN; + + +/* Interoperability */ +int mpa_version = 1; +module_param(mpa_version, int, 0); +MODULE_PARM_DESC(mpa_version, MPA version to be used int MPA Req/Resp (0 or 1)); + +/* Interoperability */ +int disable_mpa_crc = 0; +module_param(disable_mpa_crc, int, 0); +MODULE_PARM_DESC(disable_mpa_crc, Disable checking of MPA CRC); + + +unsigned int send_first = 0; +module_param(send_first, int, 0); +MODULE_PARM_DESC(send_first, Send RDMA Message First on Active Connection); + + +LIST_HEAD(nes_adapter_list); +LIST_HEAD(nes_dev_list); + +static int nes_device_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static int nes_inetaddr_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static void nes_print_macaddr(struct net_device *netdev); +static irqreturn_t nes_interrupt(int, void *, struct pt_regs *); +static int __devinit nes_probe(struct pci_dev *, const struct pci_device_id *); +static int nes_suspend(struct pci_dev *, pm_message_t); +static int nes_resume(struct pci_dev *); +static void __devexit nes_remove(struct pci_dev *); +static int __init nes_init_module(void); +static void __exit nes_exit_module(void); + +extern struct nes_dev *nes_ifs[]; + +// _the_ function interface handle to nes_tcpip module +struct nes_stack_ops *stack_ops_p; + +static struct pci_device_id nes_pci_table[] = { + {PCI_VENDOR_ID_NETEFFECT, PCI_DEVICE_ID_NETEFFECT_NE010, PCI_ANY_ID, PCI_ANY_ID}, + {0} +}; + +MODULE_DEVICE_TABLE(pci, nes_pci_table); + + +static struct notifier_block nes_dev_notifier = { + notifier_call: nes_device_event +}; + +static struct notifier_block nes_inetaddr_notifier = { + notifier_call: nes_inetaddr_event +}; + + +/** + * nes_device_event + * + * @param notifier + * @param event + * @param ptr + * + * @return int + */ +static int nes_device_event(struct notifier_block *notifier, + unsigned long event, void *ptr) +{ + struct net_device *netdev = (struct net_device *)ptr; + struct nes_dev *nesdev; + + dprintk(nes_device_event: notifier %p event=%ld netdev=%p, interface name = %s.\n, + notifier, event, netdev, netdev-name); + + list_for_each_entry(nesdev, nes_dev_list, list) { + dprintk(Nesdev list entry = 0x%p.\n, nesdev); + if (nesdev-netdev == netdev) { + switch (event) { +
Re: [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles
+source drivers/infiniband/hw/nes/Kconfig + source drivers/infiniband/hw/cxgb3/Kconfig This patch seems to be against some non-standard tree, since cxgb3 isn't upstream yet. And if cxgb3 were already upstream, it might be polite to add yourself after it rather than before ;) +config INFINIBAND_NES_DEBUG +bool Verbose debugging output +depends on INFINIBAND_NES +default n +---help--- + This option causes the NetEffect RNIC driver to produce debug + messages. Select this if you are developing the driver + or trying to diagnose a problem. I recommend making this option invisible unless EMBEDDED is set, and having the default be 'y', and making your debugging level changeable at run-time. That way everyone (in particular distros) will have this turned on and you'll be able to figure out problems without making end-users rebuild a kernel. +EXTRA_CFLAGS += -Idrivers/infiniband/include Not needed in the kernel tree. -Idrivers/infiniband/hw/nes/nes_tcpip/include I guess this is the mysterious TCP stack module. Anyway if you need this in the end, I would suggest removing the C flag and using #include nes_tcpip/blah.h in your source. +ifdef CONFIG_INFINIBAND_NES_DEBUG +EXTRA_CFLAGS += -DNES_DEBUG +endif There's no point to this -- just test CONFIG_INFINIBAND_NES_DEBUG directly. +ifneq ($(KERNELRELEASE),) +obj-$(CONFIG_INFINIBAND_NES) += iw_nes.o + +iw_nes-objs := \ +nes.o \ +nes_hw.o \ +nes_nic.o \ +nes_cm.o \ +nes_utils.o \ +nes_verbs.o +else This should be your whole Makefile -- we're not going to merge stuff into the kernel tree to build your module out of the kernel tree. Also it's more idiomatic to put all your component objects onto one (or a few) lines. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/9] NetEffect 10Gb RNIC Driver: openfabrics connection manager c file
Kernel driver patch 3 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes_cm.c new/drivers/infiniband/hw/nes/nes_cm.c --- old/drivers/infiniband/hw/nes/nes_cm.c 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes_cm.c 2006-10-25 10:36:29.0 -0500 @@ -0,0 +1,1204 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#define TCPOPT_TIMESTAMP 8 + +#include linux/module.h +#include linux/moduleparam.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/mii.h +#include linux/if_vlan.h +#include linux/crc32.h +#include linux/in.h +#include linux/ip.h +#include linux/tcp.h +#include linux/init.h +#include linux/if_arp.h +#include linux/notifier.h +#include linux/net.h +#include linux/types.h +#include asm/irq.h +#include asm/byteorder.h + +#include net/neighbour.h +#include net/route.h +#include net/ip_fib.h + +#include rdma/ib_smi.h +#include rdma/ib_verbs.h +#include rdma/ib_pack.h +#include rdma/iw_cm.h + +#include nes.h + +#define OS_LINUX +#define OS_LINUX_26 +#include nes.h +#include nes_sockets.h + +extern unsigned int send_first; + +struct nes_v4_quad +{ + UINT32 rsvd0; + UINT32 DstIpAdrIndex; /* Only most significant 5 bits are valid */ + UINT32 SrcIpadr; + UINT32 TcpPorts; /* src is low, dest is high */ +}; + +enum ietf_mpa_flags { + IETF_MPA_FLAGS_MARKERS = 0x80, /* receive Markers */ +IETF_MPA_FLAGS_CRC = 0x40, /* receive Markers */ +IETF_MPA_FLAGS_REJECT = 0x20, /* Reject */ +}; + +#define IEFT_MPA_KEY_REQ MPA ID Req Frame +#define IEFT_MPA_KEY_REP MPA ID Rep Frame + +struct ietf_mpa_req_resp_frame { + u8 key[16]; + u8 flags; + u8 rev; + u16 private_data_size; + u8 private_data[0]; +}; + +static void connect_worker(void *); +static void listen_worker(void *); + +extern int NesAdapterAdd(struct net_device *netdev); +extern int NesInitSockets(void); +extern void set_interface( + UINT32ip_addr, + UINT32mask, + UINT32bcastaddr, + UINT32type + ); +#define ADD_ADDR 1 +#define SET_ADDR 2 +#define DELETE_ADDR 3 + +extern void bdc_cleanup(void); +extern int mpa_version; + +unsigned char DriverNamePrefix[] = iw_nes; + +int nes_if_count = 0; + +#define MAX_NES_IFS 4 +struct nes_dev *nes_ifs[MAX_NES_IFS]= { 0 }; + + +/** + * nes_start_cm + * + * @param nesdev + * @param new_ifa + * + * @return int + */ +int nes_start_cm(struct nes_dev *nesdev, struct in_ifaddr *new_ifa) +{ + int result = 0; + dprintk(%s:%s:%u\n, __FILE__, __FUNCTION__, __LINE__); + + nes_ifs[0] = nesdev; + + stack_ops_p-dhcp_control(0x00); + + // set ip and subnet mask + stack_ops_p-set_ip_info(ntohl(new_ifa-ifa_address), + ntohl(new_ifa-ifa_mask)); + stack_ops_p-set_dev_name(nesdev-netdev-name); + + if (nesdev-nes_stack_start == 0) { + stack_ops_p-stack_init(nesdev-netdev); + /* TODO: Deal with multiple IP addresses */ + nesdev-local_ipaddr = new_ifa-ifa_address; + + nesdev-nes_stack_start = 1; + } + + return result; +} + + +/** + * nes_stop_cm + * + * @param nesdev + * + * @return int + */ +int nes_stop_cm(struct nes_dev *nesdev) +{ + if (nesdev-nes_stack_start) + { +
Re: [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles
From: Roland Dreier [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 16:58:41 -0700 -Idrivers/infiniband/hw/nes/nes_tcpip/include I guess this is the mysterious TCP stack module. What is this thing? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/9] NetEffect 10Gb RNIC Driver: hardware interface c file
Kernel driver patch 5 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes_hw.c new/drivers/infiniband/hw/nes/nes_hw.c --- old/drivers/infiniband/hw/nes/nes_hw.c 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes_hw.c 2006-10-25 10:15:50.0 -0500 @@ -0,0 +1,1470 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include linux/module.h +#include linux/moduleparam.h +#include linux/etherdevice.h +#include linux/ip.h +#include linux/tcp.h + +#include nes.h + + +#if defined(SA1) +struct nes_init_values init_values[] = +{ + {0x0600,0x}, + {0x0604,0x}, + {0x2000,0x0001}, + {0x2004,0x0001}, + {0x2008,0x}, + {0x200C,0x0001}, + {0x2010,0x0241}, + {0x201C,0x75345678}, + {0x5100,0x0008}, + {0x6000,0x00e0}, + {0x6008,0x00e0}, +// {0x6018,0x0001}, +// {0x6028,0x0001}, + {0x6038,0x0003}, + {0x60B8,0x0002}, + {0x6090,0x}, + {0x0900,0x2001}, +// {0x01E8,0x000208c2}, + {0x01E8,0x000208c4}, + {0x01EC,0x5f1e8480}, + {0x01FC,0x00050005}, + {0x0B00,0x1000}, + {0x10C8,0x0003}, + {0x5008,0x1F1F1F1F}, + {0x5010,0x1F1F1F1F}, + {0x5018,0x1F1F1F1F}, + {0x5020,0x1F1F1F1F}, +// {0x60B8,0x0001}, + {0x60C0,0x0194}, + {0x60C8,0x0020}, + {0x,0x} +}; +#endif + + +/** + * nes_adapter_init - initialize adapter + * + * @param nesdev + * @param num_pds + * + * @return struct nes_adapter* + */ +struct nes_adapter *nes_adapter_init(struct nes_dev *nesdev, unsigned long num_pds) { + struct nes_adapter *nesadapter = NULL; + int i=0; + int found = 0; + u32 u32temp; + u16 max_rq_wrs; + u16 max_sq_wrs; + u32 max_mr; + u32 max_256pbl; + u32 max_4kpbl; + u32 max_qp; + u32 max_irrq; + u32 max_cq; + u32 hte_index_mask; + u32 adapter_size; + u32 arp_table_size; + + /* search the list of existing adapters */ + list_for_each_entry(nesadapter, nes_adapter_list, list) { + dprintk(Searching Adapter list for PCI devfn = 0x%X.\n, nesdev-pcidev-devfn); + if ((PCI_SLOT(nesadapter-devfn) == PCI_SLOT(nesdev-pcidev-devfn)) + (nesadapter-bus_number == nesdev-pcidev-bus-number)) { + found = 1; + break; + } + } + + if (!found) { + if (nes_read_indexed(nesdev-index_reg, + NES_IDX_QP_CONTROL+PCI_FUNC(nesdev-pcidev-devfn)*8)) { + nes_write32(nesdev-regs+NES_SOFTWARE_RESET, 0xd); + } + /* enable the ports */ + nes_write32(nesdev-regs+NES_SOFTWARE_RESET, 0); + + u32temp = 0; + while ( nes_read_indexed(nesdev-index_reg, + NES_IDX_INT_CPU_STATUS) != 0x80 ) { + if (u32temp++ 1) break; + mdelay(1); + } + + if (nes_read_indexed(nesdev-index_reg, NES_IDX_INT_CPU_STATUS) != 0x80) { + printk(KERN_ERR PFX Internal CPU not ready, status = %02X\n, +
Re: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files
From: Glenn Grundstrom [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 19:06:23 -0500 +#include nes_tcpip/include/nes_sockets.h I want to know what in the world this nes_tcpip thing is? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files
It is part of our connection manager and assists with connection setup and teardown only. Glenn. -Original Message- From: David Miller [mailto:[EMAIL PROTECTED] Sent: Thursday, October 26, 2006 7:10 PM To: Glenn Grundstrom; Glenn Grundstrom Cc: openib-general@openib.org; netdev@vger.kernel.org Subject: Re: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files From: Glenn Grundstrom [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 19:06:23 -0500 +#include nes_tcpip/include/nes_sockets.h I want to know what in the world this nes_tcpip thing is? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/9] NetEffect 10Gb RNIC Driver: kernel network interface c file
Kernel driver patch 6 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes_nic.c new/drivers/infiniband/hw/nes/nes_nic.c --- old/drivers/infiniband/hw/nes/nes_nic.c 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes_nic.c 2006-10-25 10:15:50.0 -0500 @@ -0,0 +1,567 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include linux/module.h +#include linux/moduleparam.h +#include linux/etherdevice.h +#include linux/ip.h +#include linux/tcp.h +#include linux/if_arp.h + +#include nes.h + +static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK + | NETIF_MSG_IFUP | NETIF_MSG_IFDOWN; +static int debug = -1; + +static int nes_netdev_open(struct net_device *); +static int nes_netdev_stop(struct net_device *); +static int nes_netdev_start_xmit(struct sk_buff *, struct net_device *); +static struct net_device_stats *nes_netdev_get_stats(struct net_device *); +static void nes_netdev_tx_timeout(struct net_device *); +static int nes_netdev_set_mac_address(struct net_device *, void *); +static int nes_netdev_change_mtu(struct net_device *, int); + + +/** + * nes_netdev_open + * + * @param netdev + * + * @return int + */ +static int nes_netdev_open(struct net_device *netdev) +{ + struct nes_port *nes_port = netdev_priv(netdev); + struct nes_dev *nesdev = nes_port-nesdev; + u32 u32temp; + u32 nic_active_bit; + u32 nic_active; + u16 link_up = 0; + + dprintk(%s:%s:%u\n, __FILE__, __FUNCTION__, __LINE__); + + assert(nesdev != NULL); + + if (netif_msg_ifup(nes_port)) + dprintk(KERN_INFO PFX %s: enabling interface\n, netdev-name); + + /* clear the MAC interrupt status */ + u32temp = nes_read_indexed(nesdev-index_reg, NES_IDX_MAC_INT_STATUS ); + dprintk(Phy interrupt status = 0x%X.\n, u32temp); + nes_write_indexed(nesdev-index_reg, NES_IDX_MAC_INT_STATUS, u32temp); + + nes_phy_init(nesdev); + + nes_nic_qp_init(nesdev, netdev); + + // Set packet filters + nic_active_bit = 1PCI_FUNC(nesdev-pcidev-devfn); + nic_active = nes_read_indexed(nesdev-index_reg, NES_IDX_NIC_ACTIVE); + nic_active |= nic_active_bit; + nic_active |= 2; + nes_write_indexed(nesdev-index_reg, NES_IDX_NIC_ACTIVE, nic_active); + nic_active = nes_read_indexed(nesdev-index_reg, NES_IDX_NIC_MULTICAST_ALL); + nic_active |= nic_active_bit; + nes_write_indexed(nesdev-index_reg, NES_IDX_NIC_MULTICAST_ALL, nic_active); + nic_active = nes_read_indexed(nesdev-index_reg, NES_IDX_NIC_BROADCAST_ON); + nic_active |= nic_active_bit; + nes_write_indexed(nesdev-index_reg, NES_IDX_NIC_BROADCAST_ON, nic_active); + + + nes_write32(nesdev-regs+NES_CQE_ALLOC, NES_CQE_ALLOC_NOTIFY_NEXT | + nesdev-hnic_cq.cq_number ); + + // TODO: add proper way to setup packet filters + // TODO: move some of the code from init_netdev? + + if ( link_up ) { + /* Enable network packets */ + nes_port-linkup = 1; + netif_start_queue(netdev); + } else { + nes_port-linkup = 0; + netif_carrier_off(netdev); + } + + nes_write_indexed(nesdev-index_reg, NES_IDX_MAC_INT_MASK, + ~(NES_MAC_INT_LINK_STAT_CHG | NES_MAC_INT_XGMII_EXT | +
Re: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files
From: Glenn Grundstrom [EMAIL PROTECTED] Date: Thu, 26 Oct 2006 19:14:19 -0500 It is part of our connection manager and assists with connection setup and teardown only. I fear this is exactly the kind of stuff that we didn't want to see start going into the kernel, and we've resisted the TCP/IP stack offload stuff in the infiniband layer exactly for this reason. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/9] NetEffect 10Gb RNIC Driver: main kernel driver c file
+static int nes_device_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static int nes_inetaddr_event(struct notifier_block *notifier, unsigned long event, void *ptr); +static void nes_print_macaddr(struct net_device *netdev); +static irqreturn_t nes_interrupt(int, void *, struct pt_regs *); +static int __devinit nes_probe(struct pci_dev *, const struct pci_device_id *); +static int nes_suspend(struct pci_dev *, pm_message_t); +static int nes_resume(struct pci_dev *); +static void __devexit nes_remove(struct pci_dev *); +static int __init nes_init_module(void); +static void __exit nes_exit_module(void); Some of these declarations are already unneeded (eg at least nes_init_module and nes_exit_module), and it would be good to rearrange your code so that the rest can be removed too. +// _the_ function interface handle to nes_tcpip module We prefer /* */ style comments +static struct notifier_block nes_dev_notifier = { +notifier_call: nes_device_event +}; Standard C syntax (rather than gcc extension is preferred), like: static struct notifier_block nes_dev_notifier = { .notifier_call = nes_device_event }; +/** + * nes_device_event + * + * @param notifier + * @param event + * @param ptr + * + * @return int + */ There's no point to comments like this. I can read the function declaration just fine, so save the screen real estate unless you have something more to say. +unsigned long reg0_start, reg0_flags, reg0_len; +unsigned long reg1_start, reg1_flags, reg1_len; PCI bars are type resource_size_t, which can be bigger than long... +assert(pcidev != NULL); +assert(ent != NULL); BUG_ON() is more idiomatic. But this looks kind of useless anyway -- you'll get a nice enough oops if they are NULL. +/* Enable PCI device */ +ret = pci_enable_device(pcidev); This isn't major, but comments like this just waste screen space. I mean, someone who can't guess what pci_enable_device() does is probably not going to be helped by the comment either. +/* pci tweaks */ +pci_write_config_word(pcidev, 0x000c, 0xfc10); +pci_write_config_dword(pcidev, 0x0048, 0x00480007); Looks rather magic and fragile. Register 0xc is the cacheline size and latency, right? Why are you tweaking that? And I assume 0x48 is somewhere in a capability structure. It's much better to use pci_find_capability() in that case. That way when the hardware guys tell you they have to rearrange the PCI header in the next rev of the chip, you don't have to touch the chip. However this tweaking probably needs to be justified too. +/** + * nes_suspend - power management + */ +static int nes_suspend(struct pci_dev *pcidev, pm_message_t state) +{ +dprintk(pcidev=%p\n, pcidev); + +return (0); +} Umm, just don't have suspend/resume methods if you don't support it. +nes_adapter_free(nesdev-nesadapter); + +dprintk(nes_remove: calling iounmap.\n); +/* Unmap adapter PA space */ +iounmap(nesdev-regs); + +/* Unregister with OpenFabrics */ +if (nesdev-of_device_registered) { +dprintk(nes_remove: calling nes_unregister_device.\n); +nes_unregister_device(nesdev); +} You can still have upper layers calling into you until ib_unregister_device() returns, so it looks bogus to do things like iounmap before then. I think your cleanup needs to be reordered. And I don't think you're unregistering with OpenFabrics -- you're just unregistering with the RDMA midlayer. +return (pci_module_init(nes_pci_driver)); Just use pci_register_driver(). - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] NetEffect 10Gb RNIC Driver: kernel Kconfig and makefiles
David What is this thing? Good question. I haven't gotten a straight answer yet, which is why I called it mysterious. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/9] NetEffect 10Gb RNIC Driver: utility routines c file
Kernel driver patch 7 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes_utils.c new/drivers/infiniband/hw/nes/nes_utils.c --- old/drivers/infiniband/hw/nes/nes_utils.c 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes_utils.c 2006-10-25 10:15:51.0 -0500 @@ -0,0 +1,488 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include linux/module.h +#include linux/moduleparam.h +#include linux/etherdevice.h +#include linux/ethtool.h +#include linux/mii.h +#include linux/if_vlan.h +#include linux/crc32.h +#include linux/in.h +#include linux/ip.h +#include linux/tcp.h +#include linux/init.h + +#include asm/io.h +#include asm/irq.h +#include asm/byteorder.h + +#include rdma/ib_smi.h +#include rdma/ib_verbs.h +#include rdma/ib_pack.h +#include rdma/iw_cm.h +#include nes.h +#include nes_verbs.h + +#define BITMASK(X) (1L (X)) +#define NES_CRC_WID 32 + +static u32 nesCRCTable[256]; +static u32 nesCRCInitialized = 0; + +static u32 nesCRCWidMask(u32); +static u32 nes_crc_table_gen(u32 *, u32, u32, u32); +static u32 reflect(u32, u32); +static u32 byte_swap(u32, u32); + + +/** + * nes_read_eeprom_values - + * + * @param nesdev + * + * @return int + */ +int nes_read_eeprom_values(struct nes_dev *nesdev) +{ + struct nes_adapter *nesadapter = nesdev-nesadapter; + u32 mac_addr_low; + u16 mac_addr_high; + u16 eeprom_data; + u16 eeprom_offset; + + if (0 == nesadapter-firmware_eeprom_offset) { + /* Read the EEPROM Parameters */ + eeprom_data = nes_read16_eeprom(nesdev-regs, 0); + dprintk(EEPROM Offset 0 = 0x%04X\n, eeprom_data); + eeprom_offset = 2 + (((eeprom_data 0x007f)3)((eeprom_data 0x0080)7)); + dprintk(Firmware Offset = 0x%04X\n, eeprom_offset); + nesadapter-firmware_eeprom_offset = eeprom_offset; + eeprom_data = nes_read16_eeprom(nesdev-regs, eeprom_offset+4); + if (eeprom_data != 0x5746) { + dprintk(Not a valid Firmware Image = 0x%04X\n, eeprom_data); + return -1; + } + + eeprom_data = nes_read16_eeprom(nesdev-regs, eeprom_offset+2); + dprintk(EEPROM Offset %u = 0x%04X\n, eeprom_offset+2, eeprom_data); + eeprom_offset += ((eeprom_data 0x00ff)3)((eeprom_data 0x0100)8); + dprintk(Software Offset = 0x%04X\n, eeprom_offset); + nesadapter-software_eeprom_offset = eeprom_offset; + eeprom_data = nes_read16_eeprom(nesdev-regs, eeprom_offset); + dprintk(EEPROM Offset %u = 0x%04X\n, eeprom_offset, eeprom_data); + eeprom_data = nes_read16_eeprom(nesdev-regs, eeprom_offset+4); + if (eeprom_data != 0x5753) { + dprintk(Not a valid Software Image = 0x%04X\n, eeprom_data); + return -1; + } + + eeprom_offset = nesadapter-software_eeprom_offset; + eeprom_offset += 10; + mac_addr_high = nes_read16_eeprom(nesdev-regs, eeprom_offset); + eeprom_offset += 2; + mac_addr_low = (u32)nes_read16_eeprom(nesdev-regs, eeprom_offset); + eeprom_offset += 2; + mac_addr_low = 16; + mac_addr_low += (u32)nes_read16_eeprom(nesdev-regs, eeprom_offset); + dprintk(MAC Address = 0x%04X%08X\n, mac_addr_high,
[PATCH 9/9] NetEffect 10Gb RNIC Driver: openfabrics verbs header file
Kernel driver patch 9 of 9. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/drivers/infiniband/hw/nes/nes_verbs.h new/drivers/infiniband/hw/nes/nes_verbs.h --- old/drivers/infiniband/hw/nes/nes_verbs.h 1969-12-31 18:00:00.0 -0600 +++ new/drivers/infiniband/hw/nes/nes_verbs.h 2006-10-25 10:15:52.0 -0500 @@ -0,0 +1,144 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef NES_VERBS_H +#define NES_VERBS_H + +struct nes_dev; + +#define NES_MAX_USER_DB_REGIONS 4096 +#define NES_MAX_USER_WQ_REGIONS 4096 + +struct nes_ucontext { + struct ib_ucontext ibucontext; + struct nes_dev *nesdev; + /* need to track mmapped areas, start with bit vector? */ + unsigned long mmap_wq_offset; + unsigned long mmap_cq_offset; /* to be removed */ + int index; /* rnic index (minor) */ + unsigned long allocated_doorbells[BITS_TO_LONGS(NES_MAX_USER_DB_REGIONS)]; + u16 mmap_db_index[NES_MAX_USER_DB_REGIONS]; + u16 first_free_db; + unsigned long allocated_wqs[BITS_TO_LONGS(NES_MAX_USER_WQ_REGIONS)]; + struct nes_qp * mmap_nesqp[NES_MAX_USER_WQ_REGIONS]; + u16 first_free_wq; + struct list_head cq_reg_mem_list; +}; + +struct nes_pd { + struct ib_pd ibpd; + u16 pd_id; + atomic_t sqp_count; + u16 mmap_db_index; +}; + +struct nes_mr { + struct ib_mr ibmr; + u16 pbls_used; + u8 mode; + u8 pbl_4k; +}; + +struct nes_hw_pb { + u32 pa_low; + u32 pa_high; +}; + +struct nes_vpbl { + dma_addr_t pbl_pbase; + struct nes_hw_pb *pbl_vbase; +}; + +struct nes_root_vpbl { + dma_addr_t pbl_pbase; + struct nes_hw_pb *pbl_vbase; + struct nes_vpbl *leaf_vpbl; +}; + +struct nes_av; + +struct nes_cq { + struct ib_cq ibcq; + struct nes_hw_cq hw_cq; + u32 polled_completions; + u32 cq_mem_size; + spinlock_t lock; + u8 virtual_cq; + u8 pad[3]; +}; + +struct nes_wq { + spinlock_t lock; +}; + +struct iw_cm_id; + +struct nes_qp { + struct ib_qp ibqp; + enum ib_qp_stateibqp_state; + u32 iwarp_state; + void * allocated_buffer; + struct iw_cm_id *cm_id; + struct workqueue_struct *wq; + struct workqueue_struct *aewq; + struct socket *ksock; + struct nes_cq *nesscq; + struct nes_cq *nesrcq; + struct nes_pd *nespd; +struct ietf_mpa_req_resp_frame *ietf_frame; +dma_addr_t ietf_frame_pbase; + wait_queue_head_t state_waitq; + unsigned long socket; + struct nes_hw_qp hwqp; + struct work_struct work; + struct work_struct ae_work; + u32 hte_index; + u32 last_aeq; + u32 qp_mem_size; + atomic_t refcount; + u32 mmap_sq_db_index; + u32 mmap_rq_db_index; +spinlock_t lock; + /* TODO: should move these two to the hw qp? */ + struct nes_qp_context *nesqp_context; + dma_addr_t nesqp_context_pbase; + u32 bytes_sent; +u16 private_data_len; +u8 active_conn; +u8 skip_lsmm; +u8 user_mode; + u8 hte_added; +}; + +#endif /* NES_VERBS_H */ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4 of 9] NetEffect 10Gb RNIC Driver: kernel driver header files
I fear this is exactly the kind of stuff that we didn't want to see start going into the kernel, and we've resisted the TCP/IP stack offload stuff in the infiniband layer exactly for this reason. We're definitely not going to merge a second TCP stack in any form. - R. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] NetEffect 10Gb RNIC Userspace Library: makefile generation
Kernel driver patch 2 of 5. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/src/userspace/libnes/libnes.spec.in new/src/userspace/libnes/libnes.spec.in --- old/src/userspace/libnes/libnes.spec.in 1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/libnes.spec.in 2006-10-25 11:11:23.0 -0500 @@ -0,0 +1,57 @@ + +%define ver @VERSION@ + +Name: libnes +Version: 0.1 +Release: 0.%{?dist} +Summary: NetEffect RNIC Userspace Driver + +Group: System Environment/Libraries +License: GPL/BSD +Url: http://openib.org/ +Source: http://openib.org/downloads/%{name}-%{ver}.tar.gz +BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) + +BuildRequires: libibverbs-devel + +%description +libnes provides a device-specific userspace driver for NetEffect RNICs +for use with the libibverbs library. + +%package devel +Summary: Development files for the libnes driver +Group: System Environment/Libraries +Requires: %{name} = %{version}-%{release} + +%description devel +Static version of libnes that may be linked directly to an +application, which may be useful for debugging. + +%prep +%setup -q -n %{name}-%{ver} + +%build +%configure +make %{?_smp_mflags} + +%install +rm -rf $RPM_BUILD_ROOT +%makeinstall +# remove unpackaged files from the buildroot +rm -f $RPM_BUILD_ROOT%{_libdir}/infiniband/*.la + +%clean +rm -rf $RPM_BUILD_ROOT + +%files +%defattr(-,root,root,-) +%{_libdir}/infiniband/nes.so +%doc AUTHORS COPYING ChangeLog README + +%files devel +%defattr(-,root,root,-) +%{_libdir}/infiniband/nes.a + +%changelog +* Wed May 10 2006 nesdev [EMAIL PROTECTED] - 1.0 +- First development Effort diff -ruNp old/src/userspace/libnes/Makefile.am new/src/userspace/libnes/Makefile.am --- old/src/userspace/libnes/Makefile.am1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/Makefile.am2006-10-25 11:11:30.0 -0500 @@ -0,0 +1,25 @@ + +neslibdir = $(libdir)/infiniband + +neslib_LTLIBRARIES = src/nes.la + +src_nes_la_CFLAGS = -g -Wall -D_GNU_SOURCE + +if HAVE_LD_VERSION_SCRIPT +nes_version_script = -Wl,--version-script=$(srcdir)/src/nes.map +else +nes_version_script = +endif + +src_nes_la_SOURCES = src/nes_umain.c src/nes_uverbs.c +src_nes_la_LDFLAGS = -avoid-version -module \ +$(nes_version_script) + +DEBIAN = debian/changelog debian/compat debian/control debian/copyright \ +debian/libnes1.install debian/libnes-dev.install debian/rules + +EXTRA_DIST = src/nes.h src/nes-abi.h \ +src/nes.map libnes.spec.in $(DEBIAN) + +dist-hook: libnes.spec + cp libnes.spec $(distdir) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] NetEffect 10Gb RNIC Userspace Library: userspace header files
Userspace driver patch 3 of 5. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/src/userspace/libnes/src/nes-abi.h new/src/userspace/libnes/src/nes-abi.h --- old/src/userspace/libnes/src/nes-abi.h 1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/src/nes-abi.h 2006-10-25 10:27:58.0 -0500 @@ -0,0 +1,99 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#ifndef nes_ABI_H +#define nes_ABI_H + +#include infiniband/kern-abi.h + +struct nes_ualloc_ucontext_resp { + struct ibv_get_context_resp ibv_resp; + __u32 max_pds; /* maximum pds allowed for this user process */ + __u32 max_qps; /* maximum qps allowed for this user process */ + __u32 wq_size; /* defines the size of the WQs (sq+rq) allocated to the mmaped area */ + __u32 reserved; +}; + +struct nes_ualloc_pd_resp { + struct ibv_alloc_pd_resp ibv_resp; + __u32 pd_id; + __u32 db_index; +}; + +struct nes_ucreate_cq { + struct ibv_create_cq ibv_cmd; + __u64 user_cq_buffer; +}; + +enum nes_umemreg_type { + NES_UMEMREG_TYPE_MEM = 0x, + NES_UMEMREG_TYPE_QP = 0x0001, + NES_UMEMREG_TYPE_CQ = 0x0002, +}; + +struct nes_ureg_mr { + struct ibv_reg_mr ibv_cmd; + __u32 reg_type; /* indicates if id is memory, QP or CQ */ + __u32 reserved; /* QP or CQ ID */ +}; + +struct nes_ucreate_cq_resp { + struct ibv_create_cq_resp ibv_resp; + __u32 cq_id; + __u32 cq_size; + __u32 mmap_db_index; + __u32 reserved; +}; + +struct nes_ucreate_qp { + struct ibv_create_qp ibv_cmd; +}; + +struct nes_ucreate_qp_resp { + struct ibv_create_qp_resp ibv_resp; + __u32 qp_id; + __u32 actual_sq_size; + __u32 actual_rq_size; + __u32 mmap_sq_db_index; + __u32 mmap_rq_db_index; + __u32 reserved; +}; + + +struct nes_cqe { + __u32 header; + __u32 len; + __u32 wrid_hi_stag; + __u32 wrid_low_msn; +}; + +#endif /* nes_ABI_H */ diff -ruNp old/src/userspace/libnes/src/nes.map new/src/userspace/libnes/src/nes.map --- old/src/userspace/libnes/src/nes.map1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/src/nes.map2006-10-25 11:11:45.0 -0500 @@ -0,0 +1,6 @@ +{ + global: + ibv_driver_init; + openib_driver_init; + local: *; +}; diff -ruNp old/src/userspace/libnes/src/nes_umain.h new/src/userspace/libnes/src/nes_umain.h --- old/src/userspace/libnes/src/nes_umain.h1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/src/nes_umain.h2006-10-25 10:27:59.0 -0500 @@ -0,0 +1,271 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * +
[PATCH 4/5] NetEffect 10Gb RNIC Userspace Library: userspace main c file
Userspace driver patch 4 of 5. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/src/userspace/libnes/src/nes_umain.c new/src/userspace/libnes/src/nes_umain.c --- old/src/userspace/libnes/src/nes_umain.c1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/src/nes_umain.c2006-10-25 10:27:58.0 -0500 @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if HAVE_CONFIG_H +# include config.h +#endif /* HAVE_CONFIG_H */ + +#include stdio.h +#include stdlib.h +#include unistd.h +#include errno.h +#include sys/mman.h +#include pthread.h + +#include nes_umain.h +#include nes-abi.h + +long int page_size; + +#ifdef HAVE_SYSFS_LIBSYSFS_H +#include sysfs/libsysfs.h +#endif + +#include sys/types.h +#include sys/stat.h +#include fcntl.h + +#ifndef PCI_VENDOR_ID_NETEFFECT +#define PCI_VENDOR_ID_NETEFFECT0x1678 +#endif + +#ifndef PCI_DEVICE_ID_NETEFFECT_nes +#define PCI_DEVICE_ID_NETEFFECT_nes0x0100 +#endif + + +#define HCA(v, d, t) \ + { .vendor = PCI_VENDOR_ID_##v, \ + .device = PCI_DEVICE_ID_NETEFFECT_##d,\ + .type = NETEFFECT_##t } + +struct { + unsigned vendor; + unsigned device; + enum nes_uhca_type type; +} hca_table[] = { + HCA(NETEFFECT, nes, nes),}; + +static struct ibv_context *nes_ualloc_context(struct ibv_device *, int); +static void nes_ufree_context(struct ibv_context *); + +static struct ibv_context_ops nes_uctx_ops = { + .query_device = nes_uquery_device, + .query_port = nes_uquery_port, + .alloc_pd = nes_ualloc_pd, + .dealloc_pd = nes_ufree_pd, + .reg_mr = nes_ureg_mr, + .dereg_mr = nes_udereg_mr, + .create_cq = nes_ucreate_cq, + .poll_cq = nes_upoll_cq, + .req_notify_cq = nes_uarm_cq, + .cq_event = NULL, + .resize_cq = nes_uresize_cq, + .destroy_cq = nes_udestroy_cq, + .create_srq = NULL, + .modify_srq = NULL, + .query_srq = NULL, + .destroy_srq = NULL, + .post_srq_recv = NULL, + .create_qp = nes_ucreate_qp, + .modify_qp = nes_umodify_qp, + .destroy_qp = nes_udestroy_qp, + .post_send = nes_upost_send, + .post_recv = nes_upost_recv, + .create_ah = nes_ucreate_ah, + .destroy_ah = nes_udestroy_ah, + .attach_mcast = nes_uattach_mcast, + .detach_mcast = nes_udetach_mcast +}; + + +/** + * nes_ualloc_context + * + * @param ibdev + * @param cmd_fd + * + * @return struct ibv_context* + */ +static struct ibv_context *nes_ualloc_context(struct ibv_device *ibdev, int cmd_fd) +{ + // void *mymmapp = NULL; + struct ibv_pd *ibv_pd; + struct nes_uvcontext *nesvctx; + struct ibv_get_context cmd; + struct nes_ualloc_ucontext_resp resp; + + page_size = sysconf(_SC_PAGESIZE); + + nesvctx = malloc(sizeof *nesvctx); + if (!nesvctx) + return NULL; + + nesvctx-ibv_ctx.cmd_fd = cmd_fd; + + if (ibv_cmd_get_context(nesvctx-ibv_ctx, cmd, sizeof cmd, + resp.ibv_resp, sizeof resp)) + goto err_free; + + nesvctx-ibv_ctx.device = ibdev; + nesvctx-ibv_ctx.ops = nes_uctx_ops; + nesvctx-max_pds = resp.max_pds; + nesvctx-max_qps = resp.max_qps; + nesvctx-wq_size = resp.wq_size; + + /* Get a doorbell region for the CQs */ + ibv_pd =
[PATCH 5/5] NetEffect 10Gb RNIC Userspace Library: openfabrics verbs interface c file
Userspace driver patch 5 of 5. Signed-off-by: Glenn Grundstrom [EMAIL PROTECTED] == diff -ruNp old/src/userspace/libnes/src/nes_uverbs.c new/src/userspace/libnes/src/nes_uverbs.c --- old/src/userspace/libnes/src/nes_uverbs.c 1969-12-31 18:00:00.0 -0600 +++ new/src/userspace/libnes/src/nes_uverbs.c 2006-10-25 10:27:59.0 -0500 @@ -0,0 +1,933 @@ +/* + * Copyright (c) 2006 NetEffect, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ +#if HAVE_CONFIG_H +# include config.h +#endif /* HAVE_CONFIG_H */ + +#include stdlib.h +#include stdio.h +#include string.h +#include unistd.h +#include signal.h +#include errno.h +#include pthread.h +#include malloc.h +#include sys/mman.h +#include netinet/in.h +#include linux/compiler.h + +#include nes_umain.h +#include nes-abi.h + +extern long int page_size; + + +/** + * nes_uquery_device + * + * @param context + * @param attr + * + * @return int + */ +int nes_uquery_device(struct ibv_context *context, struct ibv_device_attr *attr) +{ + struct ibv_query_device cmd; + uint64_t reserved; + int ret; + + ret = ibv_cmd_query_device(context, attr, reserved, cmd, sizeof cmd); + if (ret) + return ret; + + return 0; +} + + +/** + * nes_uquery_port + * + * @param context + * @param port + * @param attr + * + * @return int + */ +int nes_uquery_port(struct ibv_context *context, uint8_t port, + struct ibv_port_attr *attr) +{ + struct ibv_query_port cmd; + + return ibv_cmd_query_port(context, port, attr, cmd, sizeof cmd); +} + + +/** + * nes_ualloc_pd + * + * @param context + * + * @return struct ibv_pd* + */ +struct ibv_pd *nes_ualloc_pd(struct ibv_context *context) +{ + struct ibv_alloc_pd cmd; + struct nes_ualloc_pd_resp resp; + struct nes_upd *nesupd; + + nesupd = malloc(sizeof *nesupd); + if (!nesupd) + return NULL; + + if (ibv_cmd_alloc_pd(context, nesupd-ibv_pd, cmd, sizeof cmd, +resp.ibv_resp, sizeof resp)) { + free(nesupd); + return NULL; + } + nesupd-pd_id = resp.pd_id; + nesupd-db_index = resp.db_index; + + nesupd-udoorbell = mmap(NULL, 4096, PROT_WRITE | PROT_READ, MAP_SHARED, + context-cmd_fd, nesupd-db_index * 4096); + + if (((void *)-1) == nesupd-udoorbell) { + free(nesupd); + return NULL; + } + + return (nesupd-ibv_pd); +} + + +/** + * nes_ufree_pd + * + * @param pd + * + * @return int + */ +int nes_ufree_pd(struct ibv_pd *pd) +{ + int ret; + struct nes_upd *nesupd; +// fprintf(stderr, PFX %s\n, __FUNCTION__); + + nesupd = to_nes_upd(pd); + + ret = ibv_cmd_dealloc_pd(pd); + if (ret) + return ret; + + munmap((void *)nesupd-udoorbell, 4096); + free(nesupd); + return 0; +} + + +/** + * nes_ureg_mr + * + * @param pd + * @param addr + * @param length + * @param access + * + * @return struct ibv_mr* + */ +struct ibv_mr *nes_ureg_mr(struct ibv_pd *pd, void *addr, + size_t length, enum ibv_access_flags access) +{ + struct ibv_mr *mr; + struct nes_ureg_mr cmd; + +// fprintf(stderr, PFX %s: address = %p, length = %u.\n, __FUNCTION__, addr, length); + + mr = malloc(sizeof *mr); + if (!mr) + return NULL; + + cmd.reg_type = NES_UMEMREG_TYPE_MEM; + if (ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr, +
Re: [PATCH] Rewrite e100_phys_id
On Thu, Oct 26, 2006 at 01:04:32PM -0700, Auke Kok wrote: no objections, so I'll ACK it with the notion that I'm going to let our labs do some more testing on it with all the latest changes to it. Thanks, Auke. Here's the equivalent patch for e1000. I don't have a convenient machine to test it on, but it reduces the size of the driver by 1.5k. diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h index 7ecce43..1e22da6 100644 --- a/drivers/net/e1000/e1000.h +++ b/drivers/net/e1000/e1000.h @@ -257,9 +257,6 @@ #endif struct work_struct reset_task; uint8_t fc_autoneg; - struct timer_list blink_timer; - unsigned long led_status; - /* TX */ struct e1000_tx_ring *tx_ring; /* One per active queue */ unsigned long tx_queue_len; diff --git a/drivers/net/e1000/e1000_ethtool.c b/drivers/net/e1000/e1000_ethtool.c index 773821e..620afa5 100644 --- a/drivers/net/e1000/e1000_ethtool.c +++ b/drivers/net/e1000/e1000_ethtool.c @@ -1819,61 +1819,15 @@ e1000_set_wol(struct net_device *netdev, return 0; } -/* toggle LED 4 times per second = 2 blinks per second */ -#define E1000_ID_INTERVAL (HZ/4) - -/* bit defines for adapter-led_status */ -#define E1000_LED_ON 0 - -static void -e1000_led_blink_callback(unsigned long data) -{ - struct e1000_adapter *adapter = (struct e1000_adapter *) data; - - if (test_and_change_bit(E1000_LED_ON, adapter-led_status)) - e1000_led_off(adapter-hw); - else - e1000_led_on(adapter-hw); - - mod_timer(adapter-blink_timer, jiffies + E1000_ID_INTERVAL); -} - static int e1000_phys_id(struct net_device *netdev, uint32_t data) { struct e1000_adapter *adapter = netdev_priv(netdev); - if (!data || data (uint32_t)(MAX_SCHEDULE_TIMEOUT / HZ)) - data = (uint32_t)(MAX_SCHEDULE_TIMEOUT / HZ); - - if (adapter-hw.mac_type e1000_82571) { - if (!adapter-blink_timer.function) { - init_timer(adapter-blink_timer); - adapter-blink_timer.function = e1000_led_blink_callback; - adapter-blink_timer.data = (unsigned long) adapter; - } - e1000_setup_led(adapter-hw); - mod_timer(adapter-blink_timer, jiffies); - msleep_interruptible(data * 1000); - del_timer_sync(adapter-blink_timer); - } else if (adapter-hw.phy_type == e1000_phy_ife) { - if (!adapter-blink_timer.function) { - init_timer(adapter-blink_timer); - adapter-blink_timer.function = e1000_led_blink_callback; - adapter-blink_timer.data = (unsigned long) adapter; - } - mod_timer(adapter-blink_timer, jiffies); - msleep_interruptible(data * 1000); - del_timer_sync(adapter-blink_timer); - e1000_write_phy_reg((adapter-hw), IFE_PHY_SPECIAL_CONTROL_LED, 0); - } else { - e1000_blink_led_start(adapter-hw); - msleep_interruptible(data * 1000); - } + if (data == 0) + data = 2; - e1000_led_off(adapter-hw); - clear_bit(E1000_LED_ON, adapter-led_status); - e1000_cleanup_led(adapter-hw); + e1000_blink_led(adapter-hw, data); return 0; } diff --git a/drivers/net/e1000/e1000_hw.c b/drivers/net/e1000/e1000_hw.c index 65077f3..db5e999 100644 --- a/drivers/net/e1000/e1000_hw.c +++ b/drivers/net/e1000/e1000_hw.c @@ -6071,7 +6071,7 @@ e1000_id_led_init(struct e1000_hw * hw) * * hw - Struct containing variables accessed by shared code */ -int32_t +static int32_t e1000_setup_led(struct e1000_hw *hw) { uint32_t ledctl; @@ -6123,50 +6123,11 @@ e1000_setup_led(struct e1000_hw *hw) /** - * Used on 82571 and later Si that has LED blink bits. - * Callers must use their own timer and should have already called - * e1000_id_led_init() - * Call e1000_cleanup led() to stop blinking - * - * hw - Struct containing variables accessed by shared code - */ -int32_t -e1000_blink_led_start(struct e1000_hw *hw) -{ -int16_t i; -uint32_t ledctl_blink = 0; - -DEBUGFUNC(e1000_id_led_blink_on); - -if (hw-mac_type e1000_82571) { -/* Nothing to do */ -return E1000_SUCCESS; -} -if (hw-media_type == e1000_media_type_fiber) { -/* always blink LED0 for PCI-E fiber */ -ledctl_blink = E1000_LEDCTL_LED0_BLINK | - (E1000_LEDCTL_MODE_LED_ON E1000_LEDCTL_LED0_MODE_SHIFT); -} else { -/* set the blink bit for each LED that's on (0x0E) in ledctl_mode2 */ -ledctl_blink = hw-ledctl_mode2; -for
Re: [PATCH 2.6.19-rc3 1/2] ehea: kzalloc GFP_ATOMIC fix
On Wed, 25 Oct 2006 13:11:42 +0200 Jan-Bernd Themann [EMAIL PROTECTED] wrote: This patch fixes kzalloc parameters (GFP_ATOMIC instead of GFP_KERNEL) why? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.18 forcedeth GSO panic on send
On Thu, Oct 26, 2006 at 11:17:57PM +, Denis Vlasenko wrote: I am using an AMD64 box with 32bit userspace / 64bit kernel. Kernels 2.6.18 and 2.6.18.1 semi-randomly hang when I upload stuff over the net - for example, svn commit, scp are affected. 2.6.17.11 does not seem to be affected. Unfortunately even 60-line screen is not big enough to catch whole trace. There are at least two traces, and first scrolls off. I have a photo at http://busybox.net/~vda/gso_panic/forcedeth_gso_panic.jpg Looks like a network stack bug rather than a driver problem. However, I'd really like to see the first oops including the print out from skb_over_panic. Could you try booting with pause_on_oops=1 or perhaps use a serial console? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html