Re: [WIP][PATCHES] Network xmit batching
My somewhat confusing netperf script (to run on client) is attached below. Server just requires to run netserver. Client is not completely accurate since I am not using netperf4 (moving to that after some initial hiccups). thanks, - KK (See attached file: netperf.scp) J Hadi Salim <[EMAIL PROTECTED]> wrote on 06/06/2007 07:19:21 PM: > Folks, > > While Krishna and I have been attempting this on the side, progress has > been rather slow - so this is to solicit for more participation so we > can get this over with faster. Success (myself being conservative when > it comes to performance) requires testing on a wide variety of hardware. > > The results look promising - certainly from a pktgen perspective where > performance has been known in some cases to go up over 50%. > Tests by Sridhar on a low number of TCP flows also indicate improved > performance as well as lowered CPU use. > > I have setup the current state of my patches against Linus tree at: > git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git > > This is also clean against 2.6.22-rc4. So if you want just a diff that > will work against 2.6.22-rc4 - i can send it to you. > I also have a tree against Daves net-2.6 at > git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-net26.git > but iam abandoning that effort until we get this stable due to the > occasional bug that cropped up(like e1000). > > I am attaching a pktgen script. There is one experimental parameter > called "batch_low" - for starters just leave it at 0 in order to reduce > experimental variance. If you have solid results you can muck around > with it. > KK has a netperf script he has been using - if you know netperf your > help will really be appreciated in testing it on your hardware. > KK, can you please post your script? > Testing with forwarding and bridging will also be appreaciated. > Above that, suggestions to changes as long as they are based on > verifiable results or glaringly obvious changes are welcome. My > preference at the moment is to flesh out the patch as is and then > improve on it later if it shows it has some value on a wide variety of > apps. As the subject is indicating this is a WIP and as all eulas > suggest "subject to change without notice". > If you help out, when you post your results, can you please say what > hardware and setup was? > > The only real driver that has been changed is e1000 for now. KK is > working on something infiniband related and i plan (if noone beats me) > to get tg3 working. It would be nice if someone converted some 10G > ethernet driver. > > cheers, > jamal > [attachment "pktgen.batch-1-1" deleted by Krishna Kumar2/India/IBM] netperf.scp Description: Binary data
Re: extra layer below inet layer.
On Thu, 7 Jun 2007 10:10:44 +0530 "Tej Parkash" <[EMAIL PROTECTED]> wrote: > hi > > i just want to have something like tcp layer sitting below inet layer. > so that i can directly offload everything to NIC > and i want it to be placed dynamically. so depending on the nic i can > offload the feature or can make it normal flow. > i.e. both layer should exist and normal functionality should not break. > NIC protocol offloading is a contentious subject since it bypasses all the security and restricts usefulness of the device for VLAN's, bonding, bridging, etc. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
extra layer below inet layer.
hi i just want to have something like tcp layer sitting below inet layer. so that i can directly offload everything to NIC and i want it to be placed dynamically. so depending on the nic i can offload the feature or can make it normal flow. i.e. both layer should exist and normal functionality should not break. this code is in af_inet.c static struct inet_protosw inetsw_array[] = { { .type = SOCK_STREAM, .protocol = IPPROTO_TCP, .prot = &tcp_prot, .ops =&inet_stream_ops, .capability = -1, .no_check = 0, .flags = INET_PROTOSW_PERMANENT | INET_PROTOSW_ICSK, }, { .type = SOCK_DGRAM, .protocol = IPPROTO_UDP, .prot = &udp_prot, .ops =&inet_dgram_ops, .capability = -1, .no_check = UDP_CSUM_DEFAULT, .flags = INET_PROTOSW_PERMANENT, }, { .type = SOCK_RAW, .protocol = IPPROTO_IP,/* wild card */ .prot = &raw_prot, .ops =&inet_sockraw_ops, .capability = CAP_NET_RAW, .no_check = UDP_CSUM_DEFAULT, .flags = INET_PROTOSW_REUSE, } }; i was just going through the code i found that if i can regsiter above kind of registration dynamically i will be able to do that. i have tried to do that but since normal tcp registered this at boot up time i am not able to unregister this dynamically so is it possible to unregister tcp and register my protocol dynamically. thanks TEJ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Failover-friendly TCP retransmission
Hi Andi, I thank you for your comments. Andi Kleen <[EMAIL PROTECTED]> writes: > > Your suggestion, to utilize NET_XMIT_* code returned from an > > underlying layer, is done in tcp_transmit_skb. > > > > But my problem is that tcp_transmit_skb is not called during a > > certain period of time. So I'm suggesting to cap RTO value so > > that tcp_transmit_skb gets called more frequently. > > The transmit code controls the transmission timeout. Or at least > it could change it if it really wanted. > > What I wanted to say is: if the loss still happens under control > of the sending end device and TCP knows this then it could change > the retransmit timer to fire earlier or even just wait for an > event from the device that tells it to retransmit early. I examined your suggestion to introduce some interface so that TCP can know, or be notified, to retransmit early. Then there are two options; pull from TCP or notify to TCP. The first option, pulling from TCP, must be done at the expiration of retransmission timer because there is no other context to do this. But if RTO is already large, this can easily miss events or status of underlying layer, such as symptom of failure, failover, etc. So I give up pulling from TCP. The second option, notifying to TCP, seems a bit promising. Upon such a notification, TCP may look into a timer structure to find retransmission events, update their timers so that they expire earlier, and possibly reset their RTO values. Perhaps this should be done for all TCP packets because TCP doesn't know which packet will be sent to the device of interest at that time. But I don't quite see if this better solves my problem. Such upcalls would be more complicated than capping RTO, and thus may be error-prone and harder to maintain. Problems might be solvable, but I'd prefer a simpler solution. > The problem with capping RTO is that when there is a loss > in the network for some other reasons (and there is no reason > bonding can't be used when talking to the internet) you > might be too aggressive or not aggressive enough anymore > to get the data through. I think capping RTO is robust and better than upcalls. The effects of capping RTO, as follows, are small enough. * It just makes retransmission more frequent. Since there is already a fast retransmission in TCP, retransmitting earlier itself does not break TCP. (I'm going to examine every occurrence of TCP_RTO_MAX, though.) * In the worst case, it does not increase the total number of retransmission packets, which is bounded by tcp_retries2. Thus the final retransmission timeout comes earlier with same tcp_retries2. If this is a case, one will have to raise tcp_retries2. * The average case, in a certain period of time (say 60[s]), it may slightly increase the number of retransmission packets. Starting from RTO = 0.2[s], the numbers of retransmission packets in first 60[s] are, 8 with TCP_RTO_MAX = 120[s], and 15 with TCP_RTO_MAX = 5[s]. I think that increasing several packets per minute per socket are acceptable. Therefore the side effects of capping RTO, even talking to the Internet, seems to be small enough. -- OBATA Noboru ([EMAIL PROTECTED]) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
On Wed, Jun 06, 2007 at 06:04:21PM -0700, Andrew Morton wrote: > There _should_ be some #ifdeffable thing which is being passed to cpp when > we run sparse (but I'm not sure what it is). #ifdef __CHECKER__ (See include/linux/compiler.h, this is how we implement __user & friends) Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, 2007-06-06 at 16:53 -0700, David Miller wrote: > There are of course other uses for multiple TX queues, and in > particular my finer-grained locking example. > > I'm still amazed the TX locking issue wasn't clear to you, > too nervous about tonight's game? :) It's too depressing - so i came back here for a break ;-> I cant even stand Don Cherry today. As a side note: You will have to do a lot of surgery to the current code to make tx run on multi CPUs. It needs some experimenting to get right. And i am begining to like Herberts changes ;-> I am not against multi-rings; iam just suggesting an alternative approach which is less disruptive. In regards to the tx lock - my thinking is resolving that via tx batching. You amortize the lock over multiple packets. There may be value in fine grained locking - i need to think about it. A small extension to the batching patches will provide the change i am proposing. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
On Wed, 06 Jun 2007 15:33:46 -0700 James Ketrenos <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos <[EMAIL PROTECTED]> wrote: > > > * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data > */ > #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > >>> This is identical to ARRAY_SIZE. > >>> > >>> And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE! Don't > >>> go > >>> off and create some private thing and leave everyone else twisting in the > >>> wind. > >> The code was resolving the sparse warnings. > > > ... > > Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE(). > > >From include/linux/kernel.h > > #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + > __must_be_array(arr)) o crap, sorry, I was looking at one of the other definitions of ARRAY_SIZE :( > >From drivers/net/wireless/mac80211/iwlwifi/iwl-helpers.h > > #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > > The '+ __must_be_array(arr)' part of ARRAY_SIZE was causing sparse to > complain > with: > > drivers/net/wireless/mac80211/iwlwifi/base.c:4646:22: error: cannot size > expression > ... OK, that's a problem in sparse, I guess. There _should_ be some #ifdeffable thing which is being passed to cpp when we run sparse (but I'm not sure what it is). If there is such a thing then we could disable the __must_be_array() thing during sparse runs. But longer-term, sparse should be taught about __must_be_array(). > When I had run my builds, I had restricted the sparse checks to just iwlwifi > (vs. checking the rest of the kernel). > > I just ran it C=2 CF=-Wall against the rest of the kernel and do see other > code > with the same problem, eg: > > sound/core/memalloc.c:521:14: error: cannot size expression > ... > > I had erroneously thought it was just a problem with iwlwifi... > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, Jun 06, 2007 at 04:52:15PM -0700, David Miller wrote: > For the locking is makes a ton of sense. > > If you have sendmsg() calls going on N cpus, would you rather > they: > > 1) All queue up to the single netdev->tx_lock > > or > > 2) All take local per-hw-queue locks > > to transmit the data they are sending? > > I thought this was obvious... guess not :-) Agreed ++ For my part, I definitely want to see parallel Tx as well as parallel Rx. It's the only thing that makes sense for modern multi-core CPUs. Two warnings flags are raised in my brain though: 1) you need (a) well-designed hardware _and_ (b) a smart driver writer to avoid bottlenecking on internal driver locks. As you can see we have both (a) and (b) for tg3 ;-) But it's up in the air whether a multi-TX-queue scheme can be sanely locked internally on other hardware. At the moment we have to hope Intel gets it right in their driver... 2) I fear that the getting-it-into-the-Tx-queue part will take some thought in order to make this happen, too. Just like you have the SMP/SMT/Multi-core scheduler scheduling various resources, surely we will want some smarts so that sockets are not bouncing wildly across CPUs, absent other factors outside our control. Otherwise you will negate a lot of the value of the nifty multi-TX-lock driver API, by bouncing data across CPUs on each transmit anyway. IOW, you will have to sanely fill each of the TX queues. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv
On Wed, 2007-06-06 at 20:53 +0200, Thomas Klein wrote: > This patch fixes a possible kernel panic due to not checking the vlan group > when processing received VLAN packets and a malfunction in VLAN/hypervisor > registration. > > > Signed-off-by: Thomas Klein <[EMAIL PROTECTED]> > --- > > > diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea.h > patched_kernel/drivers/net/ehea/ehea.h > --- linux-2.6.22-rc4/drivers/net/ehea/ehea.h 2007-06-05 02:57:25.0 > +0200 > +++ patched_kernel/drivers/net/ehea/ehea.h2007-06-06 12:53:58.0 > +0200 > @@ -39,7 +39,7 @@ > #include > > #define DRV_NAME "ehea" > -#define DRV_VERSION "EHEA_0061" > +#define DRV_VERSION "EHEA_0064" > > #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ > | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) > diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c > patched_kernel/drivers/net/ehea/ehea_main.c > --- linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 2007-06-05 > 02:57:25.0 +0200 > +++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-06-06 > 12:53:58.0 +0200 > @@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct > } > > index = (vid / 64); > - cb1->vlan_filter[index] |= ((u64)(1 << (vid & 0x3F))); > + cb1->vlan_filter[index] |= ((u64)(0x8000 >> (vid & 0x3F))); > > hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id, > H_PORT_CB1, H_PORT_CB1_ALL, cb1); > @@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct > } > > index = (vid / 64); > - cb1->vlan_filter[index] &= ~((u64)(1 << (vid & 0x3F))); > + cb1->vlan_filter[index] &= ~((u64)(0x8000 >> (vid & 0x3F))); These two seem ripe for splitting into some sort of helper routine. Which would leave only one place to get it right. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part
Re: warnings in git-wireless
Andrew Morton wrote: > On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos <[EMAIL PROTECTED]> wrote: > * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data */ #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) >>> This is identical to ARRAY_SIZE. >>> >>> And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE! Don't go >>> off and create some private thing and leave everyone else twisting in the >>> wind. >> The code was resolving the sparse warnings. > ... > Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE(). >From include/linux/kernel.h #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) >From drivers/net/wireless/mac80211/iwlwifi/iwl-helpers.h #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) The '+ __must_be_array(arr)' part of ARRAY_SIZE was causing sparse to complain with: drivers/net/wireless/mac80211/iwlwifi/base.c:4646:22: error: cannot size expression ... When I had run my builds, I had restricted the sparse checks to just iwlwifi (vs. checking the rest of the kernel). I just ran it C=2 CF=-Wall against the rest of the kernel and do see other code with the same problem, eg: sound/core/memalloc.c:521:14: error: cannot size expression ... I had erroneously thought it was just a problem with iwlwifi... Thanks, James - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
On Wed, 6 Jun 2007 16:35:34 -0700 Andrew Morton wrote: > On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos <[EMAIL PROTECTED]> wrote: > > > > > >> * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data > > >> */ > > >> #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > > > > > > This is identical to ARRAY_SIZE. > > > > > > And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE! Don't > > > go > > > off and create some private thing and leave everyone else twisting in the > > > wind. > > > > The code was resolving the sparse warnings. GLOBAL_ARRAY_SIZE removes the > > part of the ARRAY_SIZE definition that causes sparse to complain ('+ > > __must_be_array(arr)'). I don't know enough about how sparse works to fix > > sparse, or to know if its a problem with __must_be_array. The code itself > > was fine -- using an array with ARRAY_SIZE. > > (These 340-column emails are rather hard to reply to) > > Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE(). > > If ARRAY_SIZE() is spitting some sparse warning then please report it and > we'll take a look into it. Looks like a sparse problem. See http://marc.info/?l=linux-sparse&m=118105261224378&w=2 > > Agreed; iwlwifi should be using lib/hexcump.c > > The hexdump.c interfaces are about to change, so I wouldn't do anything > with this until after 2.6.23-rc1. James, hexdump.c changes are already in 2.6.22-rc4-mm1 just in case you want to see them. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: jamal <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:54:47 -0400 > On Wed, 2007-06-06 at 16:48 -0700, Rick Jones wrote: > > > RX queues - yes, I can see; TX queues, it doesnt make sense to put > > > different rings on different CPUs. > > > > To what extent might that preclude some cachelines bouncing hither and > > yon between the CPUs? > > I think the bouncing will exist a lot more with the multi CPUs. But one > would assume if you go that path, you would also parallelize the stack > on egress to reduce such an effect. I guess the point i am not seeing is > the value. The tx, once hitting the NIC is an IO issue not a CPU issue. Disagred, that single TX lock kills cpu cycles. If all of the TX queues are independantly programmable of one another, the single TX lock kills performance. > off for the night. Enjoy the game. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: Rick Jones <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 16:48:59 -0700 > > RX queues - yes, I can see; TX queues, it doesnt make sense to put > > different rings on different CPUs. > > To what extent might that preclude some cachelines bouncing hither and > yon between the CPUs? I think per-TX-queue locking takes locality as another advantage. You only touch the TX descriptors for queue N, rather than a single globally shared one. Same goes for RX. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: jamal <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:35:46 -0400 > There is no potential for parallelizing on transmit that i can think of. > Dave, please explain it slowly so i can understand it. > > There is huge potential for parallelizing on receive. But i am certainly > missing the value in the transmit. I gave an example in another response, you have N processes queueing up data for TCP or UDP or whatever in parallel on different cpus, all going out the same 10gbit device. All of them enter into ->hard_start_xmit(), and thus all of them try to take the same netdev->tx_lock If they have multiple TX queues, independantly programmable, that single lock is stupid. We could use per-queue TX locks for such hardware, but we can't support that currently. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, 2007-06-06 at 16:48 -0700, Rick Jones wrote: > > RX queues - yes, I can see; TX queues, it doesnt make sense to put > > different rings on different CPUs. > > To what extent might that preclude some cachelines bouncing hither and > yon between the CPUs? I think the bouncing will exist a lot more with the multi CPUs. But one would assume if you go that path, you would also parallelize the stack on egress to reduce such an effect. I guess the point i am not seeing is the value. The tx, once hitting the NIC is an IO issue not a CPU issue. OTOH, the receive path once a packet is received, that is a CPU problem (and therefore multi CPUs help). To be fair to Peter, that is not what his patches are trying to address (and infact, they cant solve that problem). off for the night. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: jamal <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:32:46 -0400 > So use a different scheduler. Dont use strict prio. Strict prio will > guarantee starvation of low prio packets as long as there are high prio > packets. Thats the intent. Ok, point taken. There are of course other uses for multiple TX queues, and in particular my finer-grained locking example. I'm still amazed the TX locking issue wasn't clear to you, too nervous about tonight's game? :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: jamal <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:32:46 -0400 > On Wed, 2007-06-06 at 15:35 -0700, David Miller wrote: > > With the above for transmit, and having N "struct napi_struct" > > instances for MSI-X directed RX queues, we'll have no problem keeping > > a 10gbit (or even faster) port completely full with lots of cpu to > > spare on multi-core boxes. > > > > RX queues - yes, I can see; TX queues, it doesnt make sense to put > different rings on different CPUs. For the locking is makes a ton of sense. If you have sendmsg() calls going on N cpus, would you rather they: 1) All queue up to the single netdev->tx_lock or 2) All take local per-hw-queue locks to transmit the data they are sending? I thought this was obvious... guess not :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
> RX queues - yes, I can see; TX queues, it doesnt make sense to put different rings on different CPUs. To what extent might that preclude some cachelines bouncing hither and yon between the CPUs? rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
bnx2 driver reports WoL not supported when it actually works
Hi, Michael. We have some IBM blades with a BCM5708-based dual-port NIC that the bnx2 driver reports as not supporting wake-on-LAN. (Ethtool says "Supports Wake-on: d".) However, IBM says that WoL is supported by these NICs, and in fact when I tried putting the blade into soft-off and sending a magic packet to it, it did wake up! I tried both ports, and they both worked. Some more details: The driver identifies the NIC as follows: Broadcom NetXtreme II BCM5708 1000Base-SX (B2) PCI-X 64-bit 133MHz found at mem da00, IRQ 153, node addr 00145 ed6f8ea Here's the full ethtool output: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 2 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Cannot get message level: Function not implemented Link detected: yes I added a printk to get the full contents of the chip_id register; it's 0x57081021. So the driver is deciding that the NIC doesn't support WoL because the SERDES bit is set in the CHIP_BOND_ID. (It's not the later CHIP_ID test since this NIC is a 5708_B2.) /* Disable WOL support if we are running on a SERDES chip. */ if (CHIP_BOND_ID(bp) & CHIP_BOND_ID_SERDES_BIT) { bp->phy_flags |= PHY_SERDES_FLAG; bp->flags |= NO_WOL_FLAG; if (CHIP_NUM(bp) == CHIP_NUM_5708) { bp->phy_addr = 2; reg = REG_RD_IND(bp, bp->shmem_base + BNX2_SHARED_HW_CFG_CONFIG); if (reg & BNX2_SHARED_HW_CFG_PHY_2_5G) bp->phy_flags |= PHY_2_5G_CAPABLE_FLAG; } } if ((CHIP_ID(bp) == CHIP_ID_5708_A0) || (CHIP_ID(bp) == CHIP_ID_5708_B0) || (CHIP_ID(bp) == CHIP_ID_5708_B1)) bp->flags |= NO_WOL_FLAG; It would be easy to "fix" this by commenting out the line that sets the NO_WOL_FLAG if the SERDES bit is set, but I assume the line is there for a reason -- I imagine there are *some* NICs that don't support WoL if the SERDES bit is set (?). Can you find out what the right fix is? (Or do we need to ask IBM?) It's important to me because my software is conservative and does not put a machine to sleep if ethtool says that the machine doesn't have a NIC that supports WoL. Thanks, -- Tim Mann work: [EMAIL PROTECTED] home: [EMAIL PROTECTED] http://www.vmware.com http://tim-mann.org - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, 2007-06-06 at 15:40 -0700, David Miller wrote: > There are two core issues in my mind: > > 1) multi-queue on both RX and TX is going to be very pervasive very >soon, everyone is putting this into silicon. > >The parallelization gain potential is enormous, and we have to >design for this. > There is no potential for parallelizing on transmit that i can think of. Dave, please explain it slowly so i can understand it. There is huge potential for parallelizing on receive. But i am certainly missing the value in the transmit. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, Jun 06, 2007 at 04:14:12PM -0700, Waskiewicz Jr, Peter P wrote: > > While I am growing in support of your changes, there are two things: > > > > 1) I want to study them more and hear more about what Patrick has to > >say about them when he returns from his trip on Sunday > > > > 2) I don't want to open up a net-2.6.23 tree yet so that people > >concentrate on bug fixes and regressions, pick an open bug or > >regression report and help out if you want net-2.6.23 cut faster > >:-) > > Where can I find such reports? Michal Piotrowski's list is a good place to start, while not network-specific, is turning into _the_ place to list regressions. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
On Wed, 06 Jun 2007 13:51:41 -0700 James Ketrenos <[EMAIL PROTECTED]> wrote: > > >> * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data > >> */ > >> #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > > > > This is identical to ARRAY_SIZE. > > > > And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE! Don't go > > off and create some private thing and leave everyone else twisting in the > > wind. > > The code was resolving the sparse warnings. GLOBAL_ARRAY_SIZE removes the > part of the ARRAY_SIZE definition that causes sparse to complain ('+ > __must_be_array(arr)'). I don't know enough about how sparse works to fix > sparse, or to know if its a problem with __must_be_array. The code itself > was fine -- using an array with ARRAY_SIZE. (These 340-column emails are rather hard to reply to) Your GLOBAL_ARRAY_SIZE() is, afaict, identical to ARRAY_SIZE(). If ARRAY_SIZE() is spitting some sparse warning then please report it and we'll take a look into it. > > >> /* Debug and printf string expansion helpers for printing bitfields */ > >> #define BIT_FMT8 "%c%c%c%c-%c%c%c%c" > >> #define BIT_FMT16 BIT_FMT8 ":" BIT_FMT8 > >> #define BIT_FMT32 BIT_FMT16 " " BIT_FMT16 > ... > >> #define BIT_ARG32(x) \ > >> BITC(x,31),BITC(x,30),BITC(x,29),BITC(x,28),\ > >> BITC(x,27),BITC(x,26),BITC(x,25),BITC(x,24),\ > >> BITC(x,23),BITC(x,22),BITC(x,21),BITC(x,20),\ > >> BITC(x,19),BITC(x,18),BITC(x,17),BITC(x,16),\ > >> BIT_ARG16(x) > > > > None of the above is appropriate to a driver-private header. > > Where would be the appropriate place? Dunno. include/linux/bitfield-helpers.h? > We use it in with iwlwifi; I don't know if others need it or want to use it. > Do you know of other projects using something similar? I haven't looked. > >> #define KELVIN_TO_CELSIUS(x) ((x)-273) > > > > Nor is that. > > Where would the appropriate place be? include/linux/temperature.h? acpi could use it, and there are other things we could put into temperature.h > >> static inline const struct ieee80211_hw_mode *iwl_get_hw_mode(struct > >> iwl_priv > >> *priv, int mode) > >> { > >>int i; > >> > >>for (i = 0; i < 3; i++) > >>if (priv->modes[i].mode == mode) > >>return &priv->modes[i]; > >> > >>return NULL; > >> } > > > > Far too large to inline, has five callsites. > > Currently CodingStyle states to use inline where you might have otherwise > used a macro, and then later if the function is not overly complex (citing "3 > lines" as a guideline). Is this too long because it has a for loop in it? > Or a loop and a branch? Anything more than 10-20 instructions turns out to be too large. > Removing static inline from the functions declared in header files means they > need to be moved to .c files, declared as extern, and pollute the namespace. > In prior drivers we had been beaten up about doing that... You were mis-beaten-up. Choose an appropriate namespace (iwl_* sounds OK), stick to it and you'll be fine. > >> #define WLAN_FC_GET_TYPE(fc)(((fc) & IEEE80211_FCTL_FTYPE)) > >> #define WLAN_FC_GET_STYPE(fc) (((fc) & IEEE80211_FCTL_STYPE)) > >> #define WLAN_GET_SEQ_FRAG(seq) ((seq) & 0x000f) > >> #define WLAN_GET_SEQ_SEQ(seq) ((seq) >> 4) > > > > These don't need to be macros > > What would you like these to be? /* * comment goes here */ static inline suitable_return_type wlan_fc_get_type(whatever_type_fc_has fc) { return fc & IEEE80211_FCTL_FTYPE; } > These currently exist as macros in ieee80211.h and there are other examples > in the kernel of similar macros. If a goal is to remove *all macros* then > that should be stated in CodingStyle, preferably in a way that helps > developers understand how they are supposed to write their code. These things come up again and again in lkml code-review. Could be that CodingStyle doesn't cover everything. Common sense and taste applies too. > >> #define QOS_CONTROL_LEN 2 > >> > >> static inline u16 *ieee80211_get_qos_ctrl(struct ieee80211_hdr *hdr) > >> { > >>int hdr_len = ieee80211_get_hdrlen(hdr->frame_control); > >>if (hdr->frame_control & IEEE80211_STYPE_QOS_DATA) > >>return (u16 *) ((u8 *) hdr + (hdr_len) - QOS_CONTROL_LEN); > >>return NULL; > >> } > > > > Two callsites, too large to inline. > > If something is used more than once then it is unsuitable for an inline? > Again maybe updating CodingStyle would be helpful? > > Change: > "Generally, inline functions are preferable to macros resembling > functions." > > To: > "Macros resembling functions and inline functions should *NOT* be used." > > IIRC, ieee80211_get_qos_ctrl used to be a macros, and was then changed to an > inline per CodingStyle. Now we don't want inlines, and instead want pure > functions (and consequently polluted namespace--or do we want to add to > CodingStyle that all function
Re: [PATCH] NET: Multiqueue network device support.
On Wed, 2007-06-06 at 15:35 -0700, David Miller wrote: > From: jamal <[EMAIL PROTECTED]> > Date: Wed, 06 Jun 2007 18:13:40 -0400 > There are other reasons to do interesting things in this area, > purely for parallelization reasons. > > For example, consider a chip that has N totally independant TX packet > queues going out to the same ethernet port. You can lock and transmit > on them independantly, and the chip internally arbitrates using DRR or > whatever to blast the queues out to the physical port in some fair'ish > manner. > > In that case you'd want to be able to do something like: > > struct mydev_tx_queue *q = &mydev->tx_q[smp_processor_id() % N]; > > or similar in the ->hard_start_xmit() driver. But something generic > to support this kind of parallelization would be great (and necessary) > because the TX lock is unary per netdev and destroys all of the > parallelization possible with something like the above. > I cant think of any egress scheduler that will benefit from that approach. The scheduler is the decider of which packet goes out next on the wire. > With the above for transmit, and having N "struct napi_struct" > instances for MSI-X directed RX queues, we'll have no problem keeping > a 10gbit (or even faster) port completely full with lots of cpu to > spare on multi-core boxes. > RX queues - yes, I can see; TX queues, it doesnt make sense to put different rings on different CPUs. > However, I have to disagree with your analysis of the multi-qdisc > situation, and I tend to agree with Patrick. > If you only have one qdisc to indicate status on, when is the queue > full? That is the core issue. I just described why it is not an issue. If you make the assumption it is an issue, then it becomes one. > Indicating full status when any of > the hardware queues are full is broken, because we should never > block out queuing of higher priority packets just because the > low priority queue can't take any more frames, _and_ vice versa. Dave, you didnt read anything i said ;-> The situation you describe is impossible. low prio will never block high prio. > I really want to believe your proofs but they are something out of > a fairy tale :-) They are a lot real than it seems. Please read again what i typed in ;-> And i will produce patches since this seems to be complex to explain. > > The only way PHL will ever shutdown the path to the hardware is when > > there are sufficient PHL packets. > > Corrollary, > > The only way PSL will ever shutdown the path to the hardware is when > > there are _NO_ PSH packets. > > The problem with this line of thinking is that it ignores the fact > that it is bad to not queue to the device when there is space > available, _even_ for lower priority packets. So use a different scheduler. Dont use strict prio. Strict prio will guarantee starvation of low prio packets as long as there are high prio packets. Thats the intent. > The more you keep all available TX queues full, the less likely > delays in CPU processing will lead to a device with nothing to > do. It is design intent - thats how the specific scheduler works. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/3] qla3xxx: cleanup checksum offload code
Jeff, You've already appled patch 1/3 in Stephen's series. I tested this one (2/3) and it looks good. I can resubmit this if you want. Regards, Ron Mercer > -Original Message- > From: Stephen Hemminger [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 30, 2007 2:23 PM > To: Jeff Garzik; Linux Driver > Cc: netdev@vger.kernel.org > Subject: [PATCH 2/3] qla3xxx: cleanup checksum offload code > > The code for checksum is more complex than needed when > dealing with VLAN's; > the higher layers already pass down the location of the IP header. > > Compile tested only, no hardware available. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > --- > drivers/net/qla3xxx.c | 37 +++-- > 1 file changed, 11 insertions(+), 26 deletions(-) > > --- a/drivers/net/qla3xxx.c 2007-05-30 14:09:25.0 -0700 > +++ b/drivers/net/qla3xxx.c 2007-05-30 14:09:31.0 -0700 > @@ -2433,37 +2433,22 @@ static int ql_get_seg_count(struct ql3_a > return -1; > } > > -static void ql_hw_csum_setup(struct sk_buff *skb, > +static void ql_hw_csum_setup(const struct sk_buff *skb, >struct ob_mac_iocb_req *mac_iocb_ptr) > { > - struct ethhdr *eth; > - struct iphdr *ip = NULL; > - u8 offset = ETH_HLEN; > - > - eth = (struct ethhdr *)(skb->data); > - > - if (eth->h_proto == __constant_htons(ETH_P_IP)) { > - ip = (struct iphdr *)&skb->data[ETH_HLEN]; > - } else if (eth->h_proto == htons(ETH_P_8021Q) && > -((struct vlan_ethhdr *)skb->data)-> > -h_vlan_encapsulated_proto == > __constant_htons(ETH_P_IP)) { > - ip = (struct iphdr *)&skb->data[VLAN_ETH_HLEN]; > - offset = VLAN_ETH_HLEN; > - } > - > - if (ip) { > - if (ip->protocol == IPPROTO_TCP) { > - mac_iocb_ptr->flags1 |= > OB_3032MAC_IOCB_REQ_TC | > + const struct iphdr *ip = ip_hdr(skb); > + > + mac_iocb_ptr->ip_hdr_off = skb_network_offset(skb); > + mac_iocb_ptr->ip_hdr_len = ip->ihl; > + > + if (ip->protocol == IPPROTO_TCP) { > + mac_iocb_ptr->flags1 |= OB_3032MAC_IOCB_REQ_TC | > OB_3032MAC_IOCB_REQ_IC; > - mac_iocb_ptr->ip_hdr_off = offset; > - mac_iocb_ptr->ip_hdr_len = ip->ihl; > - } else if (ip->protocol == IPPROTO_UDP) { > - mac_iocb_ptr->flags1 |= > OB_3032MAC_IOCB_REQ_UC | > + } else { > + mac_iocb_ptr->flags1 |= OB_3032MAC_IOCB_REQ_UC | > OB_3032MAC_IOCB_REQ_IC; > - mac_iocb_ptr->ip_hdr_off = offset; > - mac_iocb_ptr->ip_hdr_len = ip->ihl; > - } > } > + > } > > /* > > -- > Stephen Hemminger <[EMAIL PROTECTED]> > > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] NET: Multiqueue network device support.
> While I am growing in support of your changes, there are two things: > > 1) I want to study them more and hear more about what Patrick has to >say about them when he returns from his trip on Sunday > > 2) I don't want to open up a net-2.6.23 tree yet so that people >concentrate on bug fixes and regressions, pick an open bug or >regression report and help out if you want net-2.6.23 cut faster >:-) Where can I find such reports? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]> Date: Wed, 6 Jun 2007 15:57:35 -0700 > Can we move forward on this please? If you are comfortable > that my patches give the kernel the ability to manage hardware > queues sufficiently, I'd like to request that 2.6.23 be opened (wink > wink) so I can submit the patches for inclusion to that kernel. While I am growing in support of your changes, there are two things: 1) I want to study them more and hear more about what Patrick has to say about them when he returns from his trip on Sunday 2) I don't want to open up a net-2.6.23 tree yet so that people concentrate on bug fixes and regressions, pick an open bug or regression report and help out if you want net-2.6.23 cut faster :-) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] NET: Multiqueue network device support.
> However, I have to disagree with your analysis of the > multi-qdisc situation, and I tend to agree with Patrick. > > If you only have one qdisc to indicate status on, when is the > queue full? That is the core issue. Indicating full status > when any of the hardware queues are full is broken, because > we should never block out queuing of higher priority packets > just because the low priority queue can't take any more > frames, _and_ vice versa. > > I really want to believe your proofs but they are something > out of a fairy tale :-) Dave, Can we move forward on this please? If you are comfortable that my patches give the kernel the ability to manage hardware queues sufficiently, I'd like to request that 2.6.23 be opened (wink wink) so I can submit the patches for inclusion to that kernel. Thanks, -PJ Waskiewicz [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
Andrew Morton wrote: > i386 allmodconfig isn't that hard, guys. > ... > drivers/net/wireless/mac80211/iwlwifi/iwl-4965.c:2050: warning: left shift > count >= width of type My mistake. I ran that and even fixed the warning at one point... anyway, I've committed a patch to our tree to fix the code and also added comments to the iwl_{get,set}_bits code. >> /* >> * make C=2 CF=-Wall will complain if you use ARRAY_SIZE on global data >> */ >> #define GLOBAL_ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) > > This is identical to ARRAY_SIZE. > > And if there's some problem with ARRAY_SIZE then fix ARRAY_SIZE! Don't go > off and create some private thing and leave everyone else twisting in the > wind. The code was resolving the sparse warnings. GLOBAL_ARRAY_SIZE removes the part of the ARRAY_SIZE definition that causes sparse to complain ('+ __must_be_array(arr)'). I don't know enough about how sparse works to fix sparse, or to know if its a problem with __must_be_array. The code itself was fine -- using an array with ARRAY_SIZE. >> /* Debug and printf string expansion helpers for printing bitfields */ >> #define BIT_FMT8 "%c%c%c%c-%c%c%c%c" >> #define BIT_FMT16 BIT_FMT8 ":" BIT_FMT8 >> #define BIT_FMT32 BIT_FMT16 " " BIT_FMT16 ... >> #define BIT_ARG32(x) \ >> BITC(x,31),BITC(x,30),BITC(x,29),BITC(x,28),\ >> BITC(x,27),BITC(x,26),BITC(x,25),BITC(x,24),\ >> BITC(x,23),BITC(x,22),BITC(x,21),BITC(x,20),\ >> BITC(x,19),BITC(x,18),BITC(x,17),BITC(x,16),\ >> BIT_ARG16(x) > > None of the above is appropriate to a driver-private header. Where would be the appropriate place? We use it in with iwlwifi; I don't know if others need it or want to use it. Do you know of other projects using something similar? >> #define KELVIN_TO_CELSIUS(x) ((x)-273) > > Nor is that. Where would the appropriate place be? >> static inline const struct ieee80211_hw_mode *iwl_get_hw_mode(struct iwl_priv >>*priv, int mode) >> { >> int i; >> >> for (i = 0; i < 3; i++) >> if (priv->modes[i].mode == mode) >> return &priv->modes[i]; >> >> return NULL; >> } > > Far too large to inline, has five callsites. Currently CodingStyle states to use inline where you might have otherwise used a macro, and then later if the function is not overly complex (citing "3 lines" as a guideline). Is this too long because it has a for loop in it? Or a loop and a branch? Removing static inline from the functions declared in header files means they need to be moved to .c files, declared as extern, and pollute the namespace. In prior drivers we had been beaten up about doing that... >> #define WLAN_FC_GET_TYPE(fc)(((fc) & IEEE80211_FCTL_FTYPE)) >> #define WLAN_FC_GET_STYPE(fc) (((fc) & IEEE80211_FCTL_STYPE)) >> #define WLAN_GET_SEQ_FRAG(seq) ((seq) & 0x000f) >> #define WLAN_GET_SEQ_SEQ(seq) ((seq) >> 4) > > These don't need to be macros What would you like these to be? These currently exist as macros in ieee80211.h and there are other examples in the kernel of similar macros. If a goal is to remove *all macros* then that should be stated in CodingStyle, preferably in a way that helps developers understand how they are supposed to write their code. >> #define QOS_CONTROL_LEN 2 >> >> static inline u16 *ieee80211_get_qos_ctrl(struct ieee80211_hdr *hdr) >> { >> int hdr_len = ieee80211_get_hdrlen(hdr->frame_control); >> if (hdr->frame_control & IEEE80211_STYPE_QOS_DATA) >> return (u16 *) ((u8 *) hdr + (hdr_len) - QOS_CONTROL_LEN); >> return NULL; >> } > > Two callsites, too large to inline. If something is used more than once then it is unsuitable for an inline? Again maybe updating CodingStyle would be helpful? Change: "Generally, inline functions are preferable to macros resembling functions." To: "Macros resembling functions and inline functions should *NOT* be used." IIRC, ieee80211_get_qos_ctrl used to be a macros, and was then changed to an inline per CodingStyle. Now we don't want inlines, and instead want pure functions (and consequently polluted namespace--or do we want to add to CodingStyle that all functions should be implemented in a single file and marked static?) "A reasonable rule of thumb is to not put inline at functions that have more than 3 lines of code in them. " An assignment and a branch return isn't overly complex; it does call another function... is that the problem? If callsites is a key indicator of when to and not to use inline, please add that to CodingStyle. We look at other drivers and headers, we see what they do, and we replicate. We read CodingStyle and use it as a guideline. ... >> #define ieee80211_is_reassoc_response(fc) \ >>((WLAN_FC_GET_TYPE(fc) == IEEE80211_FTYPE_MGMT) && \ >>(WLAN_FC_GET_STYPE(fc) == IEEE80211_STYPE_REASSOC_RESP)) > > None of the above should be macros. So inline wit
Re: [PATCH] NET: Multiqueue network device support.
From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]> Date: Wed, 6 Jun 2007 15:30:39 -0700 > > [Which of course leads to the complexity (and not optimizing > > for the common - which is single ring NICs)]. > > The common for 100 Mbit and older 1Gbit is single ring NICs. Newer > PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the > hardware, and it's all headed in that direction, so it's becoming the > common case. I totally agree. No modern commodity 1gb and faster card is going to be without many queues on both TX and RX. > > Iam actually not against the subqueue control - i know Peter > > needs it for certain hardware; i am just against the mucking > > around of the common case (single ring NIC) just to get that working. > > Single-ring NICs see no difference here. Please explain why using my > patches with pfifo_fast, sch_prio, or any other existing qdisc will > change the behavior for single-ring NICs? I agree with the implication here, there is no penalty for existing devices. There are two core issues in my mind: 1) multi-queue on both RX and TX is going to be very pervasive very soon, everyone is putting this into silicon. The parallelization gain potential is enormous, and we have to design for this. 2) Queues are meant to be filled as much as possible, you can't do that by having only one qdisc attached to the device indicating unary full status, you simply can't. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
From: jamal <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 18:13:40 -0400 > From the above you can see they are simple. I am working on a couple of > things (batching and recovering pktgen ipsec patches)- I will work on > those patches soon after. > > Iam actually not against the subqueue control - i know Peter needs it > for certain hardware; i am just against the mucking around of the common > case (single ring NIC) just to get that working. There are other reasons to do interesting things in this area, purely for parallelization reasons. For example, consider a chip that has N totally independant TX packet queues going out to the same ethernet port. You can lock and transmit on them independantly, and the chip internally arbitrates using DRR or whatever to blast the queues out to the physical port in some fair'ish manner. In that case you'd want to be able to do something like: struct mydev_tx_queue *q = &mydev->tx_q[smp_processor_id() % N]; or similar in the ->hard_start_xmit() driver. But something generic to support this kind of parallelization would be great (and necessary) because the TX lock is unary per netdev and destroys all of the parallelization possible with something like the above. With the above for transmit, and having N "struct napi_struct" instances for MSI-X directed RX queues, we'll have no problem keeping a 10gbit (or even faster) port completely full with lots of cpu to spare on multi-core boxes. However, I have to disagree with your analysis of the multi-qdisc situation, and I tend to agree with Patrick. If you only have one qdisc to indicate status on, when is the queue full? That is the core issue. Indicating full status when any of the hardware queues are full is broken, because we should never block out queuing of higher priority packets just because the low priority queue can't take any more frames, _and_ vice versa. I really want to believe your proofs but they are something out of a fairy tale :-) > The only way PHL will ever shutdown the path to the hardware is when > there are sufficient PHL packets. > Corrollary, > The only way PSL will ever shutdown the path to the hardware is when > there are _NO_ PSH packets. The problem with this line of thinking is that it ignores the fact that it is bad to not queue to the device when there is space available, _even_ for lower priority packets. The more you keep all available TX queues full, the less likely delays in CPU processing will lead to a device with nothing to do. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] NET: Multiqueue network device support.
> [Which of course leads to the complexity (and not optimizing > for the common - which is single ring NICs)]. The common for 100 Mbit and older 1Gbit is single ring NICs. Newer PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the hardware, and it's all headed in that direction, so it's becoming the common case. > Infact for the wired case i see little value (there is some) > in using multiple rings. In the case of wireless (which is > strict prio based) it provides more value. There is value, hence why NIC manufacturers are building wired parts that support multiple rings today. And wireless may not want strict prio in software, and may just want round-robin from the stack. Either way, Yi Zhu has represented the wireless side in this discussion agreeing with these per-queue control patches. Is wireless not a common case to be considered? > > I would love to see your alternative patches. > > >From the above you can see they are simple. The description above won't provide what I'm trying to solve and what wireless has stated they want. > Iam actually not against the subqueue control - i know Peter > needs it for certain hardware; i am just against the mucking > around of the common case (single ring NIC) just to get that working. Single-ring NICs see no difference here. Please explain why using my patches with pfifo_fast, sch_prio, or any other existing qdisc will change the behavior for single-ring NICs? If the driver doesn't call alloc_etherdev_mq() explicity, or use the new sch_rr qdisc, then the Tx path is identical to the kernel today. What am I mucking around with? And these patches are not for specific hardware; rather they're for all the NICs today that have multiple rings, and want to control them in the OS instead of the driver, which is most of wireless and a handful of NICs from Intel and Neterion afaik. -PJ Waskiewicz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
On Wed, 2007-06-06 at 17:11 +0200, Patrick McHardy wrote: > I haven't followed the entire discussion, but I still don't see a > alternative to touching the qdisc layer - multiple hardware queues > need multiple queue states if you want to avoid a busy hardware > queue stopping the qdisc entirely If you start with the above premise then ... > and thereby preventing the qdisc > to continue feeding packets to other active HW queues. And to make > use of the multiple queue states you need multiple queues. you will logically lead to the above conclusion. [Which of course leads to the complexity (and not optimizing for the common - which is single ring NICs)]. The problem is the premise is _innacurate_. Since you havent followed the discussion, i will try to be brief (which is hard). If you want verbosity it is in my previous emails: Consider a simple example of strict prio qdisc which is mirror configuration of a specific hardware. Then for sake of discussion, assume two prio queues in the qdisc - PSL and PSH and two hardware queues/rings in a NIC which does strict prio with queues PHL and PHH. The mapping is as follows: PSL --- maps to --- PHL PSH --- maps to --- PHH Assume the PxH has a higher prio than PxL. Strict prio will always favor H over L. Two scenarios: a) a lot of packets for PSL arriving on the stack. They only get sent from PSL -> PHL if and only if there are no packets from PSH->PHH. b)a lot of packets for PSH arriving from the stack. They will always be favored over PSL in sending to the hardware. >From the above: The only way PHL will ever shutdown the path to the hardware is when there are sufficient PHL packets. Corrollary, The only way PSL will ever shutdown the path to the hardware is when there are _NO_ PSH packets. So there is no need to do per queue control because the schedule will ensure things work out fine as long as what you have the correct qdisc; and it is a qdisc that will work just fine with a single ring with zero mods. What you need is a driver API to ask it to select the ring given an index. This is similar to the qdisc filter used to select a queue. You can extend the use case i described above to N queues. You can extend it to other schedulers (WRR or any non-work conserving queues) etc. It is consistent. Of course if you configure CBQ for a hardware that does strict prio - that is a misconfig etc. Infact for the wired case i see little value (there is some) in using multiple rings. In the case of wireless (which is strict prio based) it provides more value. > I would love to see your alternative patches. >From the above you can see they are simple. I am working on a couple of things (batching and recovering pktgen ipsec patches)- I will work on those patches soon after. Iam actually not against the subqueue control - i know Peter needs it for certain hardware; i am just against the mucking around of the common case (single ring NIC) just to get that working. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] b44: timer power saving
> I don't think we need +1, if you need to fire immediately > (on the next tick). The timer core will always fire > timers that are in the past immediately. Thanks, but is it worth bothering to change it again? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [PATCH] Virtual ethernet tunnel
From: Daniel Lezcano <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 22:38:11 +0200 > Perhaps, a name like "epipe" or "npipe", which reflects what does the > device, is more appropriate ? 'npipe' (Network PIPE) or 'epipe' (Ethernet PIPE) are fine with me. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Devel] Re: [PATCH] Virtual ethernet tunnel
David Miller wrote: From: Pavel Emelianov <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:11:38 +0400 Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. I would suggest choosing a different name. 'veth' is also the name of the virtualized ethernet device found on IBM machines, driven by driver/net/ibmveth.[ch] Eric Biederman proposed the name "etun" which stands for "Ethernet TUNnel". The goal of the pair device is to pass network packets between namespaces. That reminds me the well known "pipe" which pass data between processes. All network devices have a name describing what they do : "eth" for ethernet, "dummy" for a device doing nothing, loopback, ... Perhaps, a name like "epipe" or "npipe", which reflects what does the device, is more appropriate ? -- Daniel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PCNET32] Lock solid with netconsole
Emmanuel Fusté <[EMAIL PROTECTED]> : [...] > Did you plan to submit the "spin_lock_irqsave" patch to mainline ? No. I will not have enough time to figure why/if it fixes things. -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel
From: Pavel Emelianov <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 19:11:38 +0400 > Veth stands for Virtual ETHernet. It is a simple tunnel driver > that works at the link layer and looks like a pair of ethernet > devices interconnected with each other. I would suggest choosing a different name. 'veth' is also the name of the virtualized ethernet device found on IBM machines, driven by driver/net/ibmveth.[ch] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] NetXen: Fix link status messages
On Sun, 2007-06-03 at 11:51 -0400, Jeff Garzik wrote: > Mithlesh Thukral wrote: > > NetXen: Correct link status messages. > > This patch will fix the problem of wrong link status messages > > that were reported. > > > > Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > --- > > > > drivers/net/netxen/netxen_nic.h |1 + > > drivers/net/netxen/netxen_nic_init.c | 21 + > > drivers/net/netxen/netxen_nic_isr.c | 23 +++ > > 3 files changed, 37 insertions(+), 8 deletions(-) > > dropped due to previous patches being dropped Hi Jeff, Thanks for your review the patch! We are going to re-submit the patch with correct problem description. Thanks, wendy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Hello! > Good point, I didn't think of that. Is there a version of this patch > that already uses different namespaces so I can look at it? Pavel does not like the idea. It looks "not exactly pretty", like you said. :-) The alternative is to create pair in main namespace and then move one end to another namespace renaming it and changing index. Why I do not like it? Because this makes RTM_NEWLINK just useless step, all its work is undone and real work is remade when the device moves, with all the unrettiness moved to another place. >From another hand, some move operation is required in any case. Right now in openvz the problem is solved in tricky, but quite inerseting way: all the devices in main namespace are assigned with odd index, child devices get odd index. So that, when a device moves from main namespace to child, openvz does not need to change ifindex, conflict is impossible. Well, it is working approach. But it is not pretty either. > Are network namespace completely seperated or is there some hierarchy > with all lower namespaces visible above or something like that? Right now they are completely separate. It is possible to make child devices visible in parent namespace like it is done for process pids: i.e. there is an abstract identity which is seen under different names and indices in different namespaces. Sounds cool, but this add a lot of complexity, which has no meaning outside of context of device creation, I do not think it is worth to do. > The identity of the main device has no meaning within a different > namespace, but are there other reasons for hiding it? Perhaps, security. It is not a good idea to leak information about parent namespace to child namespace. Also, people will want to see emulated ethernet inside namespace looking exactly like ethernet. No freaking additional attributes. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] b44: timer power saving
On Monday 04 June 2007 22:25:37 Stephen Hemminger wrote: > Make the PHY and statistic timer run on one second boundary > for powersaving. > > On resume, the driver should check for link up immediately, to > get online faster (rather than waiting for the next second). > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > --- > drivers/net/b44.c |9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > --- a/drivers/net/b44.c 2007-06-04 12:31:27.0 -0700 > +++ b/drivers/net/b44.c 2007-06-04 12:31:34.0 -0700 > @@ -599,8 +599,7 @@ static void b44_timer(unsigned long __op > > spin_unlock_irq(&bp->lock); > > - bp->timer.expires = jiffies + HZ; > - add_timer(&bp->timer); > + mod_timer(&bp->timer, round_jiffies(jiffies + HZ)); > } > > static void b44_tx(struct b44 *bp) > @@ -2348,11 +2347,11 @@ static int b44_resume(struct pci_dev *pd > netif_device_attach(bp->dev); > spin_unlock_irq(&bp->lock); > > - bp->timer.expires = jiffies + HZ; > - add_timer(&bp->timer); > - > b44_enable_ints(bp); > netif_wake_queue(dev); > + > + mod_timer(&bp->timer, jiffies + 1); I don't think we need +1, if you need to fire immediately (on the next tick). The timer core will always fire timers that are in the past immediately. -- Greetings Michael. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] NetXen: Fix ping issue
You'll need to resend the patch... - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv
This patch fixes a possible kernel panic due to not checking the vlan group when processing received VLAN packets and a malfunction in VLAN/hypervisor registration. Signed-off-by: Thomas Klein <[EMAIL PROTECTED]> --- diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea.h patched_kernel/drivers/net/ehea/ehea.h --- linux-2.6.22-rc4/drivers/net/ehea/ehea.h2007-06-05 02:57:25.0 +0200 +++ patched_kernel/drivers/net/ehea/ehea.h 2007-06-06 12:53:58.0 +0200 @@ -39,7 +39,7 @@ #include #define DRV_NAME "ehea" -#define DRV_VERSION"EHEA_0061" +#define DRV_VERSION"EHEA_0064" #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \ | NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR) diff -Nurp -X dontdiff linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c patched_kernel/drivers/net/ehea/ehea_main.c --- linux-2.6.22-rc4/drivers/net/ehea/ehea_main.c 2007-06-05 02:57:25.0 +0200 +++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-06-06 12:53:58.0 +0200 @@ -451,7 +451,8 @@ static struct ehea_cqe *ehea_proc_rwqes( processed_rq3++; } - if (cqe->status & EHEA_CQE_VLAN_TAG_XTRACT) + if ((cqe->status & EHEA_CQE_VLAN_TAG_XTRACT) + && port->vgrp) vlan_hwaccel_receive_skb(skb, port->vgrp, cqe->vlan_tag); else @@ -1910,10 +1911,7 @@ static void ehea_vlan_rx_register(struct goto out; } - if (grp) - memset(cb1->vlan_filter, 0, sizeof(cb1->vlan_filter)); - else - memset(cb1->vlan_filter, 0xFF, sizeof(cb1->vlan_filter)); + memset(cb1->vlan_filter, 0, sizeof(cb1->vlan_filter)); hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id, H_PORT_CB1, H_PORT_CB1_ALL, cb1); @@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct } index = (vid / 64); - cb1->vlan_filter[index] |= ((u64)(1 << (vid & 0x3F))); + cb1->vlan_filter[index] |= ((u64)(0x8000 >> (vid & 0x3F))); hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id, H_PORT_CB1, H_PORT_CB1_ALL, cb1); @@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct } index = (vid / 64); - cb1->vlan_filter[index] &= ~((u64)(1 << (vid & 0x3F))); + cb1->vlan_filter[index] &= ~((u64)(0x8000 >> (vid & 0x3F))); hret = ehea_h_modify_ehea_port(adapter->handle, port->logical_port_id, H_PORT_CB1, H_PORT_CB1_ALL, cb1); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros
On Wed, 2007-06-06 at 14:05 -0400, Jeff Garzik wrote: > On Wed, Jun 06, 2007 at 12:52:44PM -0500, wendy xiong wrote: > > On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote: > > > Mithlesh Thukral wrote: > > > > NetXen: Add NETXEN prefix to a macro > > > > This patch will add the "NETXEN" prefix to "USER_START" macro. > > > > > > > > Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > > > Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > > > --- > > > > > > > > drivers/net/netxen/netxen_nic.h |4 ++-- > > > > drivers/net/netxen/netxen_nic_ethtool.c |2 +- > > > > drivers/net/netxen/netxen_nic_hw.c |4 ++-- > > > > drivers/net/netxen/netxen_nic_init.c|4 ++-- > > > > 4 files changed, 7 insertions(+), 7 deletions(-) > > > > > > Your patch description is useless. Clearly we know -what- it does, > > > simply by reading the patch. > > > > > > But it does not answer the simple question: why? why is this needed in > > > a bug fix Release Candidate series? > > > > Hi Jeff, > > > > When we backported netxen driver to 2.6.9 kernel on ppc, we saw the > > compile errors for "USER_START". Because there is another definition for > > USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c. > > > > So we like to add NETXEN before USER_START in netxen device driver, then > > netxen driver can be compiled correctly in 2.6.9 kernel. > > > > Let me know if you any question for this patch. > > That answers my questions, but doesn't address the issue: You need > to regenerate the patch with the description improved as described. > > The patch description is copied verbatim into the kernel changelog, and > preserved verbatim for all eternity. Four years from now, we need to be > able to read the patch description and understand -why- the NETXEN_xxx > prefix was added. Simply stating that the NETXEN_xxx prefix was added > is useless, as it is self-evident from reading the patch itself. > > Jeff > > > Jeff, Thanks. We will re-generate the patch with correct issue title and description. Thanks, wendy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] NetXen: Fix ping issue
On Sun, 2007-06-03 at 11:51 -0400, Jeff Garzik wrote: > Mithlesh Thukral wrote: > > NetXen: Fix initialization and subsequent ping issue > > This patch will fix the initialization and ping issues seen on > > certain PPC architecture blades. > > > > Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > --- > > > > drivers/net/netxen/netxen_nic_main.c |7 +++ > > drivers/net/netxen/netxen_nic_niu.c |8 ++-- > > 2 files changed, 9 insertions(+), 6 deletions(-) > > Again, your patch description is useless. > > You should describe the problem being fixed, and how/why the changes > seen in the patch actually fix the issue. > Hi Jeff, Ping problem description: After we moved up netxen adapter's firmware to 3.4.19, we saw this ping problem on x/pBlade. After configured interface up, ping -c 1 10.10.10.10 failed. Netxen adapter couldn't accept ARP broadcast packet somehow. If I manually added MAC address in the ARP table, then ping start working. netxen adapter should finish initilization after system boot. But on some platform, looks netxen adapter didn't initilization correctly after system boot up, so have to re-load the firmware again in probe routine. Also re-initilization netxen_config_0 and netxen_config_1 registers. Let me know if you have any question for this patch. Thanks Wendy Xiong - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros
On Wed, Jun 06, 2007 at 12:52:44PM -0500, wendy xiong wrote: > On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote: > > Mithlesh Thukral wrote: > > > NetXen: Add NETXEN prefix to a macro > > > This patch will add the "NETXEN" prefix to "USER_START" macro. > > > > > > Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > > Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > > --- > > > > > > drivers/net/netxen/netxen_nic.h |4 ++-- > > > drivers/net/netxen/netxen_nic_ethtool.c |2 +- > > > drivers/net/netxen/netxen_nic_hw.c |4 ++-- > > > drivers/net/netxen/netxen_nic_init.c|4 ++-- > > > 4 files changed, 7 insertions(+), 7 deletions(-) > > > > Your patch description is useless. Clearly we know -what- it does, > > simply by reading the patch. > > > > But it does not answer the simple question: why? why is this needed in > > a bug fix Release Candidate series? > > Hi Jeff, > > When we backported netxen driver to 2.6.9 kernel on ppc, we saw the > compile errors for "USER_START". Because there is another definition for > USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c. > > So we like to add NETXEN before USER_START in netxen device driver, then > netxen driver can be compiled correctly in 2.6.9 kernel. > > Let me know if you any question for this patch. That answers my questions, but doesn't address the issue: You need to regenerate the patch with the description improved as described. The patch description is copied verbatim into the kernel changelog, and preserved verbatim for all eternity. Four years from now, we need to be able to read the patch description and understand -why- the NETXEN_xxx prefix was added. Simply stating that the NETXEN_xxx prefix was added is useless, as it is self-evident from reading the patch itself. Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 04/09]: Link creation API
Patrick McHardy <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >>>+if (linkinfo[IFLA_INFO_NAME]) { >>>+nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name)); >>>+ops = rtnl_link_ops_get(name); >> >> >> Ugh. Shouldn't we have the request_module logic here? >> Otherwise it looks like we can skip the validate method and >> have other weird interactions. > > > Good catch. The easiest solution seems be to simply replay the > request after successful module load, which also avoids the > device lookup race. Looks reasonable to me. There might be a variable or two that we need to make certain is initialized but at a quick glance it looks ok. Eric > Something like this (untested). > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > index 8d2f817..f2868b0 100644 > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -930,6 +930,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct > nlmsghdr > *nlh, void *arg) > struct nlattr *linkinfo[IFLA_INFO_MAX+1]; > int err; > > +replay: > err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); > if (err < 0) > return err; > @@ -1012,19 +1013,19 @@ static int rtnl_newlink(struct sk_buff *skb, struct > nlmsghdr *nlh, void *arg) > if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP]) > return -EOPNOTSUPP; > > + if (!ops) { > #ifdef CONFIG_KMOD > - if (!ops && kind[0]) { > - /* race condition: device may be created while rtnl is > - * unlocked, final register_netdevice will catch it. > - */ > - __rtnl_unlock(); > - request_module("rtnl-link-%s", kind); > - rtnl_lock(); > - ops = rtnl_link_ops_get(kind); > - } > + if (kind[0]) { > + __rtnl_unlock(); > + request_module("rtnl-link-%s", kind); > + rtnl_lock(); > + ops = rtnl_link_ops_get(kind); > + if (ops) > + goto replay; > + } > #endif > - if (!ops) > return -EOPNOTSUPP; > + } > > if (!ifname[0]) > snprintf(ifname, IFNAMSIZ, "%s%%d", ops->kind); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros
On Sun, 2007-06-03 at 11:50 -0400, Jeff Garzik wrote: > Mithlesh Thukral wrote: > > NetXen: Add NETXEN prefix to a macro > > This patch will add the "NETXEN" prefix to "USER_START" macro. > > > > Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > --- > > > > drivers/net/netxen/netxen_nic.h |4 ++-- > > drivers/net/netxen/netxen_nic_ethtool.c |2 +- > > drivers/net/netxen/netxen_nic_hw.c |4 ++-- > > drivers/net/netxen/netxen_nic_init.c|4 ++-- > > 4 files changed, 7 insertions(+), 7 deletions(-) > > Your patch description is useless. Clearly we know -what- it does, > simply by reading the patch. > > But it does not answer the simple question: why? why is this needed in > a bug fix Release Candidate series? Hi Jeff, When we backported netxen driver to 2.6.9 kernel on ppc, we saw the compile errors for "USER_START". Because there is another definition for USER_START which is used by arch/ppc64/mm/hash_utils.c and native.c. So we like to add NETXEN before USER_START in netxen device driver, then netxen driver can be compiled correctly in 2.6.9 kernel. Let me know if you any question for this patch. Thanks, wendy Xiong - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy <[EMAIL PROTECTED]> writes: > Alexey Kuznetsov wrote: >>> I just suggested to >>>Pavel to create only a single device per newlink operation and binding >>>them later, >> >> >> I see some logical inconsistency here. >> >> Look, the second end is supposed to be in another namespace. >> It will have identity, which cannot be expressed in any way in namespace, >> which is allowed to create the pair: name, ifindex - nothing >> is shared between namespaces. > > > Good point, I didn't think of that. Is there a version of this patch > that already uses different namespaces so I can look at it? We have posted patches a couple of times, and veth or etun were always a part of it. But except for some book keeping details we really don't care. > Are network namespace completely seperated or is there some hierarchy > with all lower namespaces visible above or something like that? Completely separated. The goal is to look like two separate machines to user space, with respect to the network stack. There is a bit of a hierarchy usage wise. Because frequently only one namespace will have real hardware devices in it. So everything needs to route through there. But that detail is a usage detail and is easiest not to reflect in the actual implementation. > I imagined it more as a bind operation, pretty similar to enslave, so > it would only contain an ifindex, no parameters. But as you say that > doesn't work, so I guess we'd have to nest an entire ifinfomsg + the > attributes for the partner device under it .. not exactly pretty. In the model I'm working in, is that there is a separate operation: "move device to other namespace", which should work for any network device. So there should be an interval immediately after device creation when both devices are in the same namespace, and then one of the pair is moved to another namespace. >> As response to this action two replies are generated: one RTM_NEWLINK >> for one end of device with the whole desciption of partnet >> is broadcasted inside this namespace, another RTM_NETLINK with index/name >> of partner device is broadcasted inside the second namespace >> (and, probably, some attributes, which must be hidden inside namespace, >> f.e. identity of main device is suppressed). > > > The identity of the main device has no meaning within a different > namespace, but are there other reasons for hiding it? Not really. We can already recognized the type of the device. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 04/09]: Link creation API
Eric W. Biederman wrote: >>+ if (linkinfo[IFLA_INFO_NAME]) { >>+ nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name)); >>+ ops = rtnl_link_ops_get(name); > > > Ugh. Shouldn't we have the request_module logic here? > Otherwise it looks like we can skip the validate method and > have other weird interactions. Good catch. The easiest solution seems be to simply replay the request after successful module load, which also avoids the device lookup race. Something like this (untested). diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 8d2f817..f2868b0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -930,6 +930,7 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) struct nlattr *linkinfo[IFLA_INFO_MAX+1]; int err; +replay: err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err < 0) return err; @@ -1012,19 +1013,19 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP]) return -EOPNOTSUPP; + if (!ops) { #ifdef CONFIG_KMOD - if (!ops && kind[0]) { - /* race condition: device may be created while rtnl is -* unlocked, final register_netdevice will catch it. -*/ - __rtnl_unlock(); - request_module("rtnl-link-%s", kind); - rtnl_lock(); - ops = rtnl_link_ops_get(kind); - } + if (kind[0]) { + __rtnl_unlock(); + request_module("rtnl-link-%s", kind); + rtnl_lock(); + ops = rtnl_link_ops_get(kind); + if (ops) + goto replay; + } #endif - if (!ops) return -EOPNOTSUPP; + } if (!ifname[0]) snprintf(ifname, IFNAMSIZ, "%s%%d", ops->kind);
Re: [RFC RTNETLINK 04/09]: Link creation API
> +static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg) > +{ > + struct rtnl_link_ops *ops; > + struct net_device *dev; > + struct ifinfomsg *ifm; > + char name[MODULE_NAME_LEN]; > + char ifname[IFNAMSIZ]; > + struct nlattr *tb[IFLA_MAX+1]; > + struct nlattr *linkinfo[IFLA_INFO_MAX+1]; > + int err; > + > + err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); > + if (err < 0) > + return err; > + > + if (tb[IFLA_IFNAME]) > + nla_strlcpy(ifname, tb[IFLA_IFNAME], IFNAMSIZ); > + else > + ifname[0] = '\0'; > + > + ifm = nlmsg_data(nlh); > + if (ifm->ifi_index > 0) > + dev = __dev_get_by_index(ifm->ifi_index); > + else if (ifname[0]) > + dev = __dev_get_by_name(ifname); > + else > + dev = NULL; > + > + if (tb[IFLA_LINKINFO]) { > + err = nla_parse_nested(linkinfo, IFLA_INFO_MAX, > +tb[IFLA_LINKINFO], ifla_info_policy); > + if (err < 0) > + return err; > + } else > + memset(linkinfo, 0, sizeof(linkinfo)); > + > + if (linkinfo[IFLA_INFO_NAME]) { > + nla_strlcpy(name, linkinfo[IFLA_INFO_NAME], sizeof(name)); > + ops = rtnl_link_ops_get(name); Ugh. Shouldn't we have the request_module logic here? Otherwise it looks like we can skip the validate method and have other weird interactions. > + } else { > + name[0] = '\0'; > + ops = NULL; > + } > + > + if (1) { > + struct nlattr *attr[ops ? ops->maxtype + 1 : 0], **data = NULL; > + > + if (ops) { > + if (ops->maxtype && linkinfo[IFLA_INFO_DATA]) { > + err = nla_parse_nested(attr, ops->maxtype, > + linkinfo[IFLA_INFO_DATA], > +ops->policy); > + if (err < 0) > + return err; > + data = attr; > + } > + if (ops->validate) { > + err = ops->validate(tb, data); > + if (err < 0) > + return err; > + } > + } > + > + if (dev) { > + int modified = 0; > + > + if (nlh->nlmsg_flags & NLM_F_EXCL) > + return -EEXIST; > + if (nlh->nlmsg_flags & NLM_F_REPLACE) > + return -EOPNOTSUPP; > + > + if (linkinfo[IFLA_INFO_DATA]) { > + if (!ops || ops != dev->rtnl_link_ops || > + !ops->changelink) > + return -EOPNOTSUPP; > + > + err = ops->changelink(dev, tb, data); > + if (err < 0) > + return err; > + modified = 1; > + } > + > + return do_setlink(dev, ifm, tb, ifname, modified); > + } > + > + if (!(nlh->nlmsg_flags & NLM_F_CREATE)) > + return -ENODEV; > + > + if (ifm->ifi_index) > + return -EINVAL; > + if (tb[IFLA_ADDRESS] || tb[IFLA_BROADCAST] || tb[IFLA_MAP]) > + return -EOPNOTSUPP; > + > +#ifdef CONFIG_KMOD > + if (!ops && name[0]) { > + /* race condition: device may be created while rtnl is > + * unlocked, final register_netdevice will catch it. > + */ > + __rtnl_unlock(); > + request_module("rtnl-link-%s", name); > + rtnl_lock(); > + ops = rtnl_link_ops_get(name); > + } > +#endif > + if (!ops) > + return -EOPNOTSUPP; > + > + if (!ifname[0]) > + snprintf(ifname, IFNAMSIZ, "%s%%d", ops->name); > + dev = alloc_netdev(ops->priv_size, ifname, ops->setup); > + if (!dev) > + return -ENOMEM; > + > + if (strchr(dev->name, '%')) { > + err = dev_alloc_name(dev, dev->name); > + if (err < 0) > + goto err_free; > + } > + dev->rtnl_link_ops = ops; > + > + if (tb[IFLA_MTU]) > + dev->mtu = nla_get_u32(tb[IFLA_MTU]); > + if (tb[IFLA_TXQLEN]) > + dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); > + if (tb[IFLA_WEIGHT]) > + dev->weight = nla_get_u32(tb[IFLA_WEIGHT]); > + if (tb[IFLA_OPERSTATE]) > + set_ope
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Alexey Kuznetsov wrote: >> I just suggested to >>Pavel to create only a single device per newlink operation and binding >>them later, > > > I see some logical inconsistency here. > > Look, the second end is supposed to be in another namespace. > It will have identity, which cannot be expressed in any way in namespace, > which is allowed to create the pair: name, ifindex - nothing > is shared between namespaces. Good point, I didn't think of that. Is there a version of this patch that already uses different namespaces so I can look at it? Are network namespace completely seperated or is there some hierarchy with all lower namespaces visible above or something like that? > Moreover, do not forget we have two netlink spaces as well. > Events happening in one namespace are reported only inside that namespace. > > Actually, the only self-consistent solution, which I see right now > (sorry, did not think that much) is to create the whole pair as > one operation; required parameters (name of partner, identity of namespace) > can be passed as attributes. I guess IFLA_PARTNER approach suggests > the same thing, right? I imagined it more as a bind operation, pretty similar to enslave, so it would only contain an ifindex, no parameters. But as you say that doesn't work, so I guess we'd have to nest an entire ifinfomsg + the attributes for the partner device under it .. not exactly pretty. > As response to this action two replies are generated: one RTM_NEWLINK > for one end of device with the whole desciption of partnet > is broadcasted inside this namespace, another RTM_NETLINK with index/name > of partner device is broadcasted inside the second namespace > (and, probably, some attributes, which must be hidden inside namespace, > f.e. identity of main device is suppressed). The identity of the main device has no meaning within a different namespace, but are there other reasons for hiding it? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel
On Wed, 06 Jun 2007 19:11:38 +0400 Pavel Emelianov <[EMAIL PROTECTED]> wrote: > Veth stands for Virtual ETHernet. It is a simple tunnel driver > that works at the link layer and looks like a pair of ethernet > devices interconnected with each other. > > Mainly it allows to communicate between network namespaces but > it can be used as is as well. > > Eric recently sent a similar driver called etun. This > implementation uses another interface - the RTM_NRELINK > message introduced by Patric. The patch fits today netdev > tree with Patrick's patches. > > The newlink callback is organized that way to make it easy > to create the peer device in the separate namespace when we > have them in kernel. > > The patch for an ip utility is also provided. > > Eric, since ethtool interface was from your patch, I add > your Signed-off-by line. > > Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> > Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]> > > --- > > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig > index 7d57f4a..7e144be 100644 > --- a/drivers/net/Kconfig > +++ b/drivers/net/Kconfig > @@ -119,6 +119,12 @@ config TUN > > If you don't know what to use this for, you don't need it. > > +config VETH > + tristate "Virtual ethernet device" > + ---help--- > + The device is an ethernet tunnel. Devices are created in pairs. When > + one end receives the packet it appears on its pair and vice versa. > + > config NET_SB1000 > tristate "General Instruments Surfboard 1000" > depends on PNP > diff --git a/drivers/net/Makefile b/drivers/net/Makefile > index a77affa..4764119 100644 > --- a/drivers/net/Makefile > +++ b/drivers/net/Makefile > @@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o > obj-$(CONFIG_MACMACE) += macmace.o > obj-$(CONFIG_MAC89x0) += mac89x0.o > obj-$(CONFIG_TUN) += tun.o > +obj-$(CONFIG_VETH) += veth.o > obj-$(CONFIG_NET_NETX) += netx-eth.o > obj-$(CONFIG_DL2K) += dl2k.o > obj-$(CONFIG_R8169) += r8169.o > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > new file mode 100644 > index 000..6746c91 > --- /dev/null > +++ b/drivers/net/veth.c > @@ -0,0 +1,391 @@ > +/* > + * drivers/net/veth.c > + * > + * Copyright (C) 2007 OpenVZ http://openvz.org, SWsoft Inc > + * > + */ > + > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#define DRV_NAME "veth" > +#define DRV_VERSION "1.0" > + > +struct veth_priv { > + struct net_device *peer; > + struct net_device *dev; > + struct list_head list; > + struct net_device_stats stats; > + unsigned ip_summed; > +}; > + > +static LIST_HEAD(veth_list); > + > +/* > + * ethtool interface > + */ > + > +static struct { > + const char string[ETH_GSTRING_LEN]; > +} ethtool_stats_keys[] = { > + { "peer_ifindex" }, > +}; Seems like a good usage of sysfs attributes, rather than ethtool. Then you can get rid of all the useless ethtool for what is basically a virtual device. > +/* > + * xmit > + */ > + > +static int veth_xmit(struct sk_buff *skb, struct net_device *dev) > +{ > + struct net_device *rcv = NULL; > + struct veth_priv *priv, *rcv_priv; > + int length; > + > + skb_orphan(skb); > + > + priv = netdev_priv(dev); > + rcv = priv->peer; > + rcv_priv = netdev_priv(rcv); > + > + if (!(rcv->flags & IFF_UP)) > + goto outf; > + > + skb->dev = rcv; > + skb->pkt_type = PACKET_HOST; > + skb->protocol = eth_type_trans(skb, rcv); > + if (dev->features & NETIF_F_NO_CSUM) > + skb->ip_summed = rcv_priv->ip_summed; > + > + dst_release(skb->dst); > + skb->dst = NULL; > + > + secpath_reset(skb); > + nf_reset(skb); > + > + length = skb->len; > + > + priv->stats.tx_bytes += length; > + priv->stats.tx_packets++; > + > + rcv_priv->stats.rx_bytes += length; > + rcv_priv->stats.rx_packets++; Per-cpu stats? This will cacheline thrash. > + netif_rx(skb); > + return 0; > + > +outf: > + kfree_skb(skb); > + priv->stats.tx_dropped++; > + return 0; > +} -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: > Patrick McHardy <[EMAIL PROTECTED]> writes: > > >>>I still think adding a IFLA_PARTNER or a custom attribute is cleaner >>>in this case. Slight semantic mismatches are the worst design bugs >>>to correct. >> >> >>Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to >>Pavel to create only a single device per newlink operation and binding >>them later, what do you think about that? > > > I don't think it solves much because we still need a way to report the > partner device. I was thinking of something like this: ip link add veth0 type veth ip link add veth1 partner veth0 type veth ip would resolve veth0 to an ifindex and set IFLA_PARTNER. But Alexey just raised a few good points, so this might not work. > On the actual using side I think it makes the core of the driver much > more difficult to get right. > > Basically if we can't count on having a partner device we have to > add NULL pointer checks and locking to the packet dispatch which > is otherwise unnecessary. All you'd need to do is keep the queue stopped until the device is bound. No changes to rx or tx path neccessary. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy <[EMAIL PROTECTED]> writes: >> I still think adding a IFLA_PARTNER or a custom attribute is cleaner >> in this case. Slight semantic mismatches are the worst design bugs >> to correct. > > > Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to > Pavel to create only a single device per newlink operation and binding > them later, what do you think about that? I don't think it solves much because we still need a way to report the partner device. On the actual using side I think it makes the core of the driver much more difficult to get right. Basically if we can't count on having a partner device we have to add NULL pointer checks and locking to the packet dispatch which is otherwise unnecessary. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
Christoph Hellwig wrote: > On Tue, Jun 05, 2007 at 01:12:03PM -0700, James Ketrenos wrote: >> Yes, we certainly don't want a driver to be "near mainline" that does >> things that the rest of the kernel and other drivers are doing. We should >> force them to stay out-of-tree until any and everything is resolved. >> Heaven forbid that the code should be merged, contributed, and improved >> upon as a community. >> >> You'd think these guys were trying to enable folks with a driver so they >> can use the hardware they bought or something. Its almost laughable. What >> idiots. > > Maybe you should make sure your companies publishes specs I'm flattered that you think I am high enough at Intel to do that. If it was in my power, accurate and complete specifications would be available for all of Intel's hardware--for everyone. I doubt you'll find a single Linux developer at Intel who is not fighting to try and reach that goal, including myself. The reality is, however, (and this may shock you) I am not a vice president or head of a major division at Intel. Its true. Intel doesn't do what I want just because I snap my fingers and say it should be so. Instead, we try and do what we can -- enable people with a driver (and a community to contribute into) so that the hardware they purchased can operate to its fullest potential on Linux. We realize within the community that some areas are difficult for people outside of Intel to initially contribute to given the lack of specs. However, there are vast quantities of features and functionality that can be enabled and improved given nothing more than the driver sources. Anyway, its obvious from your emails that our efforts are not sufficient and you seem to resent that we're even trying. I apologize. If you'd like to submit patches to help us make the perfect driver, we're very willing to review and merge those in. James - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Hello! >I just suggested to > Pavel to create only a single device per newlink operation and binding > them later, I see some logical inconsistency here. Look, the second end is supposed to be in another namespace. It will have identity, which cannot be expressed in any way in namespace, which is allowed to create the pair: name, ifindex - nothing is shared between namespaces. Moreover, do not forget we have two netlink spaces as well. Events happening in one namespace are reported only inside that namespace. Actually, the only self-consistent solution, which I see right now (sorry, did not think that much) is to create the whole pair as one operation; required parameters (name of partner, identity of namespace) can be passed as attributes. I guess IFLA_PARTNER approach suggests the same thing, right? As response to this action two replies are generated: one RTM_NEWLINK for one end of device with the whole desciption of partnet is broadcasted inside this namespace, another RTM_NETLINK with index/name of partner device is broadcasted inside the second namespace (and, probably, some attributes, which must be hidden inside namespace, f.e. identity of main device is suppressed). Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel
Pavel Emelianov wrote: > +MODULE_DESCRIPTION("Virtual Ethernet Tunnel"); > +MODULE_LICENSE("GPL v2"); This seems to be missing MODULE_ALIAS_RTNL_LINK("veth"); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: > Patrick McHardy <[EMAIL PROTECTED]> writes: > >>You don't really need to patch it, installing a new iplink_XXX.so >>file is enough. Generalizing driver specific options more than >>what we currently have doesn't look very promising. I think your >>driver was simple enough to get along with the generic device >>attributes though (IFLA_LINK or IFLA_MASTER). > > > I need to know the other device in the pair of devices I am creating. > If ifindex was selectable IFLA_LINK or IFLA_MASTER might be > interesting however they are currently are not, and I'm not quite > certain about letting a user select the ifindex. Although there may > come a point when dealing with migration when it makes sense. It shouldn't be very hard to implement, so far I just didn't see any use for it. > Hmm. I guess if I had a reasonable default I could find out the > pair device by looking at the returned NEW_LINK message. > > Looking more close. IFLA_MASTER does not work because I don't > have a master/slave relationship. > > IFLA_LINK looks like it will work. I don't precisely match the > semantics though which makes me nervous. In particular my other > device is not something I am sending through but what I am sending > to. The way the IPv6 code uses iflink to get the link local address > starting with the hardware address of the iflink would be completely > the wrong thing to do in my case. Now my device won't have the magic > IPv6 tunnel arp type so that code won't trigger. Still it is > a challenge. > > I still think adding a IFLA_PARTNER or a custom attribute is cleaner > in this case. Slight semantic mismatches are the worst design bugs > to correct. Indeed, IFLA_PARTNER sounds like a better idea. I just suggested to Pavel to create only a single device per newlink operation and binding them later, what do you think about that? >>>I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in >>>a header file. So it is easy to get a list of all of the different >>>kinds and so we don't conflict. >> >> >>I don't think conflicts are going to be a problem (it would be >>nice if modpost would warn about duplicate aliases though). >>How is listing IFLA_KIND types in a header file going to help >>get a list? Presuming the user knows what kind of device he >>wants to set up and is not just looking for things to play >>around with I also don't see any real value in this. > > > This isn't about the user this is about maintaining the ABI. > > We have to control set of strings for IFLA_KIND. Having them all > in a single header file means that we can easily look when we add > support for a new kind to see if some other driver has already > used that kind. > > The same reason we stick the rest of the enumerations into a header > file. > > Strings don't conflict as easily as small integers do, but it is still > possible to have a conflict, so having something like an ifla_kind.h to > hold them would be useful. Mhh .. we have multiple string based APIs that do just fine. I'd prefer having someone adding a new driver do a quick grep for MODULE_ALIAS_RTNL_LINK to adding unused definitions. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Virtual ethernet tunnel
Pavel Emelianov wrote: > Veth stands for Virtual ETHernet. It is a simple tunnel driver > that works at the link layer and looks like a pair of ethernet > devices interconnected with each other. > > Mainly it allows to communicate between network namespaces but > it can be used as is as well. > > Eric recently sent a similar driver called etun. This > implementation uses another interface - the RTM_NRELINK > message introduced by Patric. The patch fits today netdev > tree with Patrick's patches. > > The newlink callback is organized that way to make it easy > to create the peer device in the separate namespace when we > have them in kernel. > > +struct veth_priv { > + struct net_device *peer; > + struct net_device *dev; > + struct list_head list; > + struct net_device_stats stats; You can use dev->stats instead. > +static int veth_xmit(struct sk_buff *skb, struct net_device *dev) > +{ > + struct net_device *rcv = NULL; > + struct veth_priv *priv, *rcv_priv; > + int length; > + > + skb_orphan(skb); > + > + priv = netdev_priv(dev); > + rcv = priv->peer; > + rcv_priv = netdev_priv(rcv); > + > + if (!(rcv->flags & IFF_UP)) > + goto outf; > + > + skb->dev = rcv; eth_type_trans already sets skb->dev. > + skb->pkt_type = PACKET_HOST; > + skb->protocol = eth_type_trans(skb, rcv); > + if (dev->features & NETIF_F_NO_CSUM) > + skb->ip_summed = rcv_priv->ip_summed; > + > + dst_release(skb->dst); > + skb->dst = NULL; > + > + secpath_reset(skb); > + nf_reset(skb); Is skb->mark supposed to survive communication between different namespaces? > +static const struct nla_policy veth_policy[VETH_INFO_MAX] = { > + [VETH_INFO_MAC] = { .type = NLA_BINARY, .len = ETH_ALEN }, > + [VETH_INFO_PEER]= { .type = NLA_STRING }, > + [VETH_INFO_PEER_MAC]= { .type = NLA_BINARY, .len = ETH_ALEN }, > +}; The rtnl_link codes looks fine. I don't like the VETH_INFO_MAC attribute very much though, we already have a generic device attribute for MAC addresses. Of course that only allows you to supply one MAC address, so I'm wondering what you think of allocating only a single device per newlink operation and binding them in a seperate enslave operation? > +enum { > + VETH_INFO_UNSPEC, > + VETH_INFO_MAC, > + VETH_INFO_PEER, > + VETH_INFO_PEER_MAC, > + > + VETH_INFO_MAX > +}; Please follow the #define VETH_INFO_MAX (__VETH_INFO_MAX - 1) convention here. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> Reading through the patches they look usable to me. >> >> Having to patch iproute to create the more interesting network >> devices sucks, but that problem seems fundamental. We might >> be able to avoid it if we allowed fields to be reused between >> different types of devices but that makes the error checking >> trickier, and we aren't likely to have that many types of >> devices so there likely isn't much value in generalizing. > > > You don't really need to patch it, installing a new iplink_XXX.so > file is enough. Generalizing driver specific options more than > what we currently have doesn't look very promising. I think your > driver was simple enough to get along with the generic device > attributes though (IFLA_LINK or IFLA_MASTER). I need to know the other device in the pair of devices I am creating. If ifindex was selectable IFLA_LINK or IFLA_MASTER might be interesting however they are currently are not, and I'm not quite certain about letting a user select the ifindex. Although there may come a point when dealing with migration when it makes sense. Hmm. I guess if I had a reasonable default I could find out the pair device by looking at the returned NEW_LINK message. Looking more close. IFLA_MASTER does not work because I don't have a master/slave relationship. IFLA_LINK looks like it will work. I don't precisely match the semantics though which makes me nervous. In particular my other device is not something I am sending through but what I am sending to. The way the IPv6 code uses iflink to get the link local address starting with the hardware address of the iflink would be completely the wrong thing to do in my case. Now my device won't have the magic IPv6 tunnel arp type so that code won't trigger. Still it is a challenge. I still think adding a IFLA_PARTNER or a custom attribute is cleaner in this case. Slight semantic mismatches are the worst design bugs to correct. To some extent this is a practical problematic point for me, because in the context of multiple network namespaces I could theoretically have both network devices have the same name and the same ifindex in different network namespaces. Although it really doesn't matter unless they are in the same network namespace in which case they can't have the same ifindex or ifname. >> I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in >> a header file. So it is easy to get a list of all of the different >> kinds and so we don't conflict. > > > I don't think conflicts are going to be a problem (it would be > nice if modpost would warn about duplicate aliases though). > How is listing IFLA_KIND types in a header file going to help > get a list? Presuming the user knows what kind of device he > wants to set up and is not just looking for things to play > around with I also don't see any real value in this. This isn't about the user this is about maintaining the ABI. We have to control set of strings for IFLA_KIND. Having them all in a single header file means that we can easily look when we add support for a new kind to see if some other driver has already used that kind. The same reason we stick the rest of the enumerations into a header file. Strings don't conflict as easily as small integers do, but it is still possible to have a conflict, so having something like an ifla_kind.h to hold them would be useful. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Module for ip utility to support veth device
Pavel Emelianov wrote: > diff --git a/ip/iplink.c b/ip/iplink.c > index 5170419..6975990 100644 > --- a/ip/iplink.c > +++ b/ip/iplink.c > @@ -287,7 +287,7 @@ static int iplink_modify(int cmd, unsign >strlen(type)); > > lu = get_link_type(type); > - if (lu) { > + if (lu && argc) { > struct rtattr * data = NLMSG_TAIL(&req.n); > addattr_l(&req.n, sizeof(req), IFLA_INFO_DATA, NULL, 0); I've folded this part into my iproute patch, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Module for ip utility to support veth device
The usage is # ip link add [name] type veth [peer ] [mac ] [peer_mac ] The Makefile is maybe not as beautiful as it could be. It is to be discussed. One thing I noticed during testing is the following. When launching this with link_veth.so module and not specifying any module specific parameters, the kernel refuses to accept the packet when parsing the IFLA_LINKINFO. So the hunk for ip/iplink.c doesn't add an empty extra header if no extra data expected. Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]> --- diff --git a/ip/Makefile b/ip/Makefile index 9a5bfe3..b46bce3 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -8,8 +8,9 @@ RTMONOBJ=rtmon.o ALLOBJ=$(IPOBJ) $(RTMONOBJ) SCRIPTS=ifcfg rtpr routel routef TARGETS=ip rtmon +LIBS=link_veth.so -all: $(TARGETS) $(SCRIPTS) +all: $(TARGETS) $(SCRIPTS) $(LIBS) ip: $(IPOBJ) $(LIBNETLINK) $(LIBUTIL) @@ -24,3 +25,6 @@ clean: LDLIBS += -ldl LDFLAGS+= -Wl,-export-dynamic + +%.so: %.c + $(CC) $(CFLAGS) -shared $< -o $@ diff --git a/ip/iplink.c b/ip/iplink.c index 5170419..6975990 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -287,7 +287,7 @@ static int iplink_modify(int cmd, unsign strlen(type)); lu = get_link_type(type); - if (lu) { + if (lu && argc) { struct rtattr * data = NLMSG_TAIL(&req.n); addattr_l(&req.n, sizeof(req), IFLA_INFO_DATA, NULL, 0); diff --git a/ip/link_veth.c b/ip/link_veth.c new file mode 100644 index 000..adfdef6 --- /dev/null +++ b/ip/link_veth.c @@ -0,0 +1,77 @@ +#include + +#include "utils.h" +#include "ip_common.h" +#include "veth.h" + +#define ETH_ALEN 6 + +static void usage(void) +{ + printf("Usage: ip link add ... " + "[peer ] [mac ] [peer_mac ]\n"); +} + +static int veth_parse_opt(struct link_util *lu, int argc, char **argv, + struct nlmsghdr *hdr) +{ + __u8 mac[ETH_ALEN]; + + for (; argc != 0; argv++, argc--) { + if (strcmp(*argv, "peer") == 0) { + argv++; + argc--; + if (argc == 0) { + usage(); + return -1; + } + + addattr_l(hdr, 1024, VETH_INFO_PEER, + *argv, strlen(*argv)); + + continue; + } + + if (strcmp(*argv, "mac") == 0) { + argv++; + argc--; + if (argc == 0) { + usage(); + return -1; + } + + if (hexstring_a2n(*argv, mac, sizeof(mac)) == NULL) + return -1; + + addattr_l(hdr, 1024, VETH_INFO_MAC, + mac, ETH_ALEN); + continue; + } + + if (strcmp(*argv, "peer_mac") == 0) { + argv++; + argc--; + if (argc == 0) { + usage(); + return -1; + } + + if (hexstring_a2n(*argv, mac, sizeof(mac)) == NULL) + return -1; + + addattr_l(hdr, 1024, VETH_INFO_PEER_MAC, + mac, ETH_ALEN); + continue; + } + + usage(); + return -1; + } + + return 0; +} + +struct link_util veth_link_util = { + .id = "veth", + .parse_opt = veth_parse_opt, +}; diff --git a/ip/veth.h b/ip/veth.h new file mode 100644 index 000..74c8e1e --- /dev/null +++ b/ip/veth.h @@ -0,0 +1,13 @@ +#ifndef __NET_VETH_H__ +#define __NET_VETH_H__ + +enum { + VETH_INFO_UNSPEC, + VETH_INFO_MAC, + VETH_INFO_PEER, + VETH_INFO_PEER_MAC, + + VETH_INFO_MAX +}; + +#endif - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Multiqueue network device support.
jamal wrote: > There will be no issue if a) multiple APIs would be allowed for driver > multi-rings[1] and b) you didnt touch the qdiscs. > > Given that #a is not a sensible thing to do since there can only be one > API and for #b you are not compromising, what do you want me to do? I haven't followed the entire discussion, but I still don't see a alternative to touching the qdisc layer - multiple hardware queues need multiple queue states if you want to avoid a busy hardware queue stopping the qdisc entirely and thereby preventing the qdisc to continue feeding packets to other active HW queues. And to make use of the multiple queue states you need multiple queues. I would love to see your alternative patches. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Virtual ethernet tunnel
Veth stands for Virtual ETHernet. It is a simple tunnel driver that works at the link layer and looks like a pair of ethernet devices interconnected with each other. Mainly it allows to communicate between network namespaces but it can be used as is as well. Eric recently sent a similar driver called etun. This implementation uses another interface - the RTM_NRELINK message introduced by Patric. The patch fits today netdev tree with Patrick's patches. The newlink callback is organized that way to make it easy to create the peer device in the separate namespace when we have them in kernel. The patch for an ip utility is also provided. Eric, since ethtool interface was from your patch, I add your Signed-off-by line. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]> --- diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 7d57f4a..7e144be 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -119,6 +119,12 @@ config TUN If you don't know what to use this for, you don't need it. +config VETH + tristate "Virtual ethernet device" + ---help--- + The device is an ethernet tunnel. Devices are created in pairs. When + one end receives the packet it appears on its pair and vice versa. + config NET_SB1000 tristate "General Instruments Surfboard 1000" depends on PNP diff --git a/drivers/net/Makefile b/drivers/net/Makefile index a77affa..4764119 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o obj-$(CONFIG_MACMACE) += macmace.o obj-$(CONFIG_MAC89x0) += mac89x0.o obj-$(CONFIG_TUN) += tun.o +obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_NET_NETX) += netx-eth.o obj-$(CONFIG_DL2K) += dl2k.o obj-$(CONFIG_R8169) += r8169.o diff --git a/drivers/net/veth.c b/drivers/net/veth.c new file mode 100644 index 000..6746c91 --- /dev/null +++ b/drivers/net/veth.c @@ -0,0 +1,391 @@ +/* + * drivers/net/veth.c + * + * Copyright (C) 2007 OpenVZ http://openvz.org, SWsoft Inc + * + */ + +#include +#include +#include +#include + +#include +#include +#include + +#define DRV_NAME "veth" +#define DRV_VERSION"1.0" + +struct veth_priv { + struct net_device *peer; + struct net_device *dev; + struct list_head list; + struct net_device_stats stats; + unsigned ip_summed; +}; + +static LIST_HEAD(veth_list); + +/* + * ethtool interface + */ + +static struct { + const char string[ETH_GSTRING_LEN]; +} ethtool_stats_keys[] = { + { "peer_ifindex" }, +}; + +static int veth_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +{ + cmd->supported = 0; + cmd->advertising= 0; + cmd->speed = SPEED_1; + cmd->duplex = DUPLEX_FULL; + cmd->port = PORT_TP; + cmd->phy_address= 0; + cmd->transceiver= XCVR_INTERNAL; + cmd->autoneg= AUTONEG_DISABLE; + cmd->maxtxpkt = 0; + cmd->maxrxpkt = 0; + return 0; +} + +static void veth_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info) +{ + strcpy(info->driver, DRV_NAME); + strcpy(info->version, DRV_VERSION); + strcpy(info->fw_version, "N/A"); +} + +static void veth_get_strings(struct net_device *dev, u32 stringset, u8 *buf) +{ + switch(stringset) { + case ETH_SS_STATS: + memcpy(buf, ðtool_stats_keys, sizeof(ethtool_stats_keys)); + break; + } +} + +static int veth_get_stats_count(struct net_device *dev) +{ + return ARRAY_SIZE(ethtool_stats_keys); +} + +static void veth_get_ethtool_stats(struct net_device *dev, + struct ethtool_stats *stats, u64 *data) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + data[0] = priv->peer->ifindex; +} + +static u32 veth_get_rx_csum(struct net_device *dev) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + return priv->ip_summed == CHECKSUM_UNNECESSARY; +} + +static int veth_set_rx_csum(struct net_device *dev, u32 data) +{ + struct veth_priv *priv; + + priv = netdev_priv(dev); + priv->ip_summed = data ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE; + return 0; +} + +static u32 veth_get_tx_csum(struct net_device *dev) +{ + return (dev->features & NETIF_F_NO_CSUM) != 0; +} + +static int veth_set_tx_csum(struct net_device *dev, u32 data) +{ + if (data) + dev->features |= NETIF_F_NO_CSUM; + else + dev->features &= ~NETIF_F_NO_CSUM; + return 0; +} + +static struct ethtool_ops veth_ethtool_ops = { + .get_settings = veth_get_settings, + .get_drvinfo= veth_get_drvinfo, + .get_link = ethtool_op_get_link, + .get_rx_csum= veth_get_rx_csum, + .set_rx_csum= vet
Re: [PATCH] acenic: SET_NETDEV_DEV is always there these days
> "Geert" == Geert Uytterhoeven <[EMAIL PROTECTED]> writes: Geert> acenic: SET_NETDEV_DEV is always there these days Geert> Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]> Signed-off-by: Jes Sorensen <[EMAIL PROTECTED]> Geert> --- Disclaimer: not tested at all Geert> --- a/drivers/net/acenic.c +++ b/drivers/net/acenic.c @@ -159,10 Geert> +159,6 @@ static struct pci_device_id acenic_pci_t }; Geert> MODULE_DEVICE_TABLE(pci, acenic_pci_tbl); Geert> -#ifndef SET_NETDEV_DEV -#define SET_NETDEV_DEV(net, pdev) do{} Geert> while(0) -#endif - #define ace_sync_irq(irq) synchronize_irq(irq) Geert> #ifndef offset_in_page Geert> Gr{oetje,eeting}s, Geert> Geert Geert> -- Geert Uytterhoeven -- Sony Network and Software Technology Geert> Center Europe (NSCE) [EMAIL PROTECTED] --- The Geert> Corporate Village, Da Vincilaan 7-D1 Voice +32-2-7008453 Fax Geert> +32-2-7008622 B-1935 Zaventem, Belgium - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] acenic: SET_NETDEV_DEV is always there these days
acenic: SET_NETDEV_DEV is always there these days Signed-off-by: Geert Uytterhoeven <[EMAIL PROTECTED]> --- Disclaimer: not tested at all --- a/drivers/net/acenic.c +++ b/drivers/net/acenic.c @@ -159,10 +159,6 @@ static struct pci_device_id acenic_pci_t }; MODULE_DEVICE_TABLE(pci, acenic_pci_tbl); -#ifndef SET_NETDEV_DEV -#define SET_NETDEV_DEV(net, pdev) do{} while(0) -#endif - #define ace_sync_irq(irq) synchronize_irq(irq) #ifndef offset_in_page Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- Sony Network and Software Technology Center Europe (NSCE) [EMAIL PROTECTED] --- The Corporate Village, Da Vincilaan 7-D1 Voice +32-2-7008453 Fax +32-2-7008622 B-1935 Zaventem, Belgium - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[WIP][PATCHES] Network xmit batching
Folks, While Krishna and I have been attempting this on the side, progress has been rather slow - so this is to solicit for more participation so we can get this over with faster. Success (myself being conservative when it comes to performance) requires testing on a wide variety of hardware. The results look promising - certainly from a pktgen perspective where performance has been known in some cases to go up over 50%. Tests by Sridhar on a low number of TCP flows also indicate improved performance as well as lowered CPU use. I have setup the current state of my patches against Linus tree at: git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-lin26.git This is also clean against 2.6.22-rc4. So if you want just a diff that will work against 2.6.22-rc4 - i can send it to you. I also have a tree against Daves net-2.6 at git://git.kernel.org/pub/scm/linux/kernel/git/hadi/batch-net26.git but iam abandoning that effort until we get this stable due to the occasional bug that cropped up(like e1000). I am attaching a pktgen script. There is one experimental parameter called "batch_low" - for starters just leave it at 0 in order to reduce experimental variance. If you have solid results you can muck around with it. KK has a netperf script he has been using - if you know netperf your help will really be appreciated in testing it on your hardware. KK, can you please post your script? Testing with forwarding and bridging will also be appreaciated. Above that, suggestions to changes as long as they are based on verifiable results or glaringly obvious changes are welcome. My preference at the moment is to flesh out the patch as is and then improve on it later if it shows it has some value on a wide variety of apps. As the subject is indicating this is a WIP and as all eulas suggest "subject to change without notice". If you help out, when you post your results, can you please say what hardware and setup was? The only real driver that has been changed is e1000 for now. KK is working on something infiniband related and i plan (if noone beats me) to get tg3 working. It would be nice if someone converted some 10G ethernet driver. cheers, jamal pktgen.batch-1-1 Description: application/shellscript
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Patrick McHardy wrote: > The following patches contain the rtnetlink link creation API I promised, > as well as two simple driver conversion to use the API as an example. > I've also converted VLAN as a more complex example, but these patches > need some more work and are most likely not interesting to all the CCed > parties, so I'm sending them seperately. I've updated the patches to remove the broken VLAN ID change, added back some consts, renamed IFLA_INFO_NAME to IFLA_INFO_KIND and rebased to current net-2.6. The current patches and -git trees can be found at http://people.netfilter.org/kaber/rtnl_link/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Socket hangs in read() when putting interface down
>> This may be ok on a major interface like eth0 but it is a problem when >> using a ppp interface via GSM data or GPRS connection. >> Killing the pppd while the socket waits in read() produces the same >> behaviour. Also a broken connection looks the same. >> It is reproducable on several 2.6 kernel versions including the lastest >> stable 2.6.21. >> >> On my understanding "ifconfig down" should lead into a read() error. > > Why? Can't traffic for the endpoint arrive via some other interface on > the system? I would think that getting out of a blocking read() > requires the application to either have a timer running (and so get out > of read() with EINTR) or to have keepalives enabled on the TCP > connection (I'm assuming TCP for an AF_INET, SOCK_STREAM although I > suppose it could be SCTP) > > rick jones Of course, you are right. I am so focused to my small embedded world, that I simply have forgotten how IP-traffic works in big systems (to clarify my first post: I'm using TCP). Your hint with the self produced signal to get out of read() was very helpful for me. Thats what I was missing. I also learned another think, while reading the list FAQ: PEBCAC (=Problem exists between chair and computer) Obviously, thats what best described me! Thank you very much! Chris Schlund - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 00/09]: Netlink link creation API
Eric W. Biederman wrote: > Reading through the patches they look usable to me. > > Having to patch iproute to create the more interesting network > devices sucks, but that problem seems fundamental. We might > be able to avoid it if we allowed fields to be reused between > different types of devices but that makes the error checking > trickier, and we aren't likely to have that many types of > devices so there likely isn't much value in generalizing. You don't really need to patch it, installing a new iplink_XXX.so file is enough. Generalizing driver specific options more than what we currently have doesn't look very promising. I think your driver was simple enough to get along with the generic device attributes though (IFLA_LINK or IFLA_MASTER). > I do think we should specify the IFLA_KIND (was: IFLA_NAME) values in > a header file. So it is easy to get a list of all of the different > kinds and so we don't conflict. I don't think conflicts are going to be a problem (it would be nice if modpost would warn about duplicate aliases though). How is listing IFLA_KIND types in a header file going to help get a list? Presuming the user knows what kind of device he wants to set up and is not just looking for things to play around with I also don't see any real value in this. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC RTNETLINK 04/09]: Link creation API
Stephen Hemminger wrote: > On Wed, 06 Jun 2007 01:17:11 +0200 > Patrick McHardy <[EMAIL PROTECTED]> wrote: > >>>If you want I'll extend existing bridge netlink to use these. >> >> >>Are you talking about brige-port information or bridge device >>configuration? So far the API is not suitable for anything that >>currently uses IFLA_PROTINFO because the sender is not the driver >>which created the device and doesn't use AF_UNSPEC. For bridge >>device configuration it would certainly be nice to have, but I'm >>not sure yet how to handle enslave operations. So far my favourite >>idea is to add enslave/release operations to rtnl_link_ops and call >>them when IFLA_MASTER is set (so the netlink message would look like >>this: ifindex: eth0 master: br0 nlmsg_type: RTM_NETLINK). But I >>haven't really thought this through yet. > > > Was thinking AF_BRIDGE (we have it already so use it), and both > add/remove bridge, and enslave/unslave device. I think we should use two seperate families for bridge devices and bridge ports. I'll think about the enslave operation some more and try to add it next week. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 5/7] CAN: Add virtual CAN netdevice driver
Urs Thuermann wrote: > Patrick McHardy <[EMAIL PROTECTED]> writes: > >>I don't get why you can't directly check the socket option on the >>TX path. > > > We have several types of sockets in the PF_CAN family, two of which > are GPL'ed and which are in the patch series. These are CAN_RAW and > CAN_BCM. The protocol implementations use can_send() in af_can.c to > send a CAN frame and indicate to can_send() in an int argument, > whether this frame should be looped back. Only the raw protocol has a > socket option (setsockopt(2)) in struct raw_sock for this, bcm always > sets this to 1 to have the frame looped back. There is no option in > struct bcm_sock for this. In can_send() and in the driver we don't > know what type of socket skb->sk points to and can't check that > option. Changing this would mean we have to add such an option in the > same position in all CAN socket types and set it to fixed values in > some of them (e.g. to 1 for bcm). While it's doable, I wouldn't like > that very much. > > Is there anything that prevents can_send() from using skb->pkt_type to > pass the loopback flag down to the driver? No, that should be fine, in fact it should set it anyway even if you can't use it for this purpose. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] NetXen: Add NETXEN prefix to macros
He seems to be the same guy who has dropped the patches from RHEL5 bugzilla for now. -- Mithlesh Thukral On Tuesday 05 June 2007 00:28, Andy Gospodarek wrote: > On Sun, Jun 03, 2007 at 11:50:29AM -0400, Jeff Garzik wrote: > > Mithlesh Thukral wrote: > > >NetXen: Add NETXEN prefix to a macro > > >This patch will add the "NETXEN" prefix to "USER_START" macro. > > > > > >Signed-off by: Wen Xiong <[EMAIL PROTECTED]> > > >Signed-off by: Mithlesh Thukral <[EMAIL PROTECTED]> > > >--- > > > > > > drivers/net/netxen/netxen_nic.h |4 ++-- > > > drivers/net/netxen/netxen_nic_ethtool.c |2 +- > > > drivers/net/netxen/netxen_nic_hw.c |4 ++-- > > > drivers/net/netxen/netxen_nic_init.c|4 ++-- > > > 4 files changed, 7 insertions(+), 7 deletions(-) > > > > Your patch description is useless. Clearly we know -what- it does, > > simply by reading the patch. > > > > But it does not answer the simple question: why? why is this needed in > > a bug fix Release Candidate series? > > I can't see why this is needed in an RC branch, but the use of > generically named macros/variables has been problematic with this driver > on more 'obscure' arches. > > I submitted a bigger cleanup patch a in March that never got taken so > this is still a problem. Here is that patch: > > Signed-off-by: Andy Gospodarek <[EMAIL PROTECTED]> > --- > > netxen_nic.h | 47 > --- netxen_nic_ethtool.c |8 > > netxen_nic_hw.c | 10 +- > netxen_nic_init.c| 23 --- > 4 files changed, 45 insertions(+), 43 deletions(-) > > diff --git a/drivers/net/netxen/netxen_nic.h > b/drivers/net/netxen/netxen_nic.h index dd8ce35..d5f0c06 100644 > --- a/drivers/net/netxen/netxen_nic.h > +++ b/drivers/net/netxen/netxen_nic.h > @@ -68,9 +68,10 @@ > #define _NETXEN_NIC_LINUX_SUBVERSION 3 > #define NETXEN_NIC_LINUX_VERSIONID "3.3.3" > > -#define NUM_FLASH_SECTORS (64) > -#define FLASH_SECTOR_SIZE (64 * 1024) > -#define FLASH_TOTAL_SIZE (NUM_FLASH_SECTORS * FLASH_SECTOR_SIZE) > +#define NETXEN_NUM_FLASH_SECTORS (64) > +#define NETXEN_FLASH_SECTOR_SIZE (64 * 1024) > +#define NETXEN_FLASH_TOTAL_SIZE (NETXEN_NUM_FLASH_SECTORS \ > + * NETXEN_FLASH_SECTOR_SIZE) > > #define PHAN_VENDOR_ID 0x4040 > > @@ -671,28 +672,28 @@ struct netxen_new_user_info { > > /* Flash memory map */ > typedef enum { > - CRBINIT_START = 0, /* Crbinit section */ > - BRDCFG_START = 0x4000, /* board config */ > - INITCODE_START = 0x6000,/* pegtune code */ > - BOOTLD_START = 0x1, /* bootld */ > - IMAGE_START = 0x43000, /* compressed image */ > - SECONDARY_START = 0x20, /* backup images */ > - PXE_START = 0x3E, /* user defined region */ > - USER_START = 0x3E8000, /* User defined region for new boards */ > - FIXED_START = 0x3F /* backup of crbinit */ > + NETXEN_CRBINIT_START = 0, /* Crbinit section */ > + NETXEN_BRDCFG_START = 0x4000, /* board config */ > + NETXEN_INITCODE_START = 0x6000, /* pegtune code */ > + NETXEN_BOOTLD_START = 0x1, /* bootld */ > + NETXEN_IMAGE_START = 0x43000, /* compressed image */ > + NETXEN_SECONDARY_START = 0x20, /* backup images */ > + NETXEN_PXE_START = 0x3E,/* user defined region */ > + NETXEN_USER_START = 0x3E8000, /* User defined region for new boards */ > + NETXEN_FIXED_START = 0x3F /* backup of crbinit */ > } netxen_flash_map_t; > > -#define USER_START_OLD PXE_START /* for backward compatibility */ > - > -#define FLASH_START (CRBINIT_START) > -#define INIT_SECTOR (0) > -#define PRIMARY_START(BOOTLD_START) > -#define FLASH_CRBINIT_SIZE (0x4000) > -#define FLASH_BRDCFG_SIZE(sizeof(struct netxen_board_info)) > -#define FLASH_USER_SIZE (sizeof(struct > netxen_user_info)/sizeof(u32)) > -#define FLASH_SECONDARY_SIZE (USER_START-SECONDARY_START) > -#define NUM_PRIMARY_SECTORS (0x20) > -#define NUM_CONFIG_SECTORS (1) > +#define NETXEN_USER_START_OLD NETXEN_PXE_START /* for backward > compatibility */ + > +#define NETXEN_FLASH_START (NETXEN_CRBINIT_START) > +#define NETXEN_INIT_SECTOR (0) > +#define NETXEN_PRIMARY_START (NETXEN_BOOTLD_START) > +#define NETXEN_FLASH_CRBINIT_SIZE(0x4000) > +#define NETXEN_FLASH_BRDCFG_SIZE (sizeof(struct netxen_board_info)) > +#define NETXEN_FLASH_USER_SIZE (sizeof(struct > netxen_user_info)/sizeof(u32)) +#define NETXEN_FLASH_SECONDARY_SIZE > (NETXEN_USER_START-NETXEN_SECONDARY_START) +#define > NETXEN_NUM_PRIMARY_SECTORS(0x20) > +#define NETXEN_NUM_CONFIG_SECTORS(1) > #define PFX "NetXen: " > extern char netxen_nic_driver_name[]; > > diff --git a/drivers/net/netxen/netxen_nic_ethtool.c > b/drivers/net/netxen/netxen_nic_ethtool.c index ee1b5a2..4dfa76b 100644 > --- a/drivers/net/
[GENETLINK] get pid from userspace
Hi, I have a user space program which "connects" to different kernel modules with generic netlink. Each module provides its own generic netlink family. For each module to "connect" to I create a socket and bind it: fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); struct sockaddr_nl addr; memset(&addr, 0, sizeof(addr)); bind(fd, (struct sockaddr *)&addr, sizeof(addr)); As far as I understand, this lets the generic netlink controller choose a unique pid, which shows up in the kernel module in the callback function as genl_info.snd_pid. Is it possible to get this pid in userspace (without sending a message to the kernel module)? Thanks! Ariane - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix e100 rx path on ARM (was [PATCH] e100 rx: or s and el bits)
On Jun 5, 2007, at 9:26 PM, Kok, Auke wrote: Kok, Auke wrote: Hmm git-revert seems to do the job right. I checked it with git-show | patch -p1 -R and the results look OK. The two patches on top of the one we want to revert are unrelated enough to apply (manually it shows some fuzz, but otherwise it's OK). Jeff, please `git-revert d52df4a35af569071fda3f4eb08e47cc7023f094` to revert the following patch for now: --- commit d52df4a35af569071fda3f4eb08e47cc7023f094 Author: Scott Feldman <[EMAIL PROTECTED]> Date: Wed Nov 9 02:18:52 2005 -0500 [netdrvr e100] experiment with doing RX in a similar manner to eepro100 I was going to say that eepro100's speedo_rx_link() does the same DMA abuse as e100, but then I noticed one little detail: eepro100 sets both EL (end of list) and S (suspend) bits in the RFD as it chains it to the RFD list. e100 was only setting the EL bit. Hmmm, that's interesting. That means that if HW reads a RFD with the S-bit set, it'll process that RFD and then suspend the receive unit. The receive unit will resume when SW clears the S-bit. There is no need for SW to restart the receive unit. Which means a lot of the receive unit state tracking code in the driver goes away. So here's a patch against 2.6.14. (Sorry for inlining it; the mailer I'm using now will mess with the word wrap). I can't test this on XScale (unless someone has an e100 module for Gumstix :) . It should be doing exactly what eepro100 does with RFDs. I don't believe this change will introduce a performance hit because the S-bit and EL-bit go hand-in-hand meaning if we're going to suspend because of the S- bit, we're on the last resource anyway, so we'll have to wait for SW to replenish. (cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 commit) --- A little bit more is needed to explain why we're reverting it for now. Jeff, please insert this into the revert commit. Auke -- This patch attempted to fix e100 for non-cache coherent memory architectures by using the cb style code that eepro100 had and using the EL and s bits from the RFD list. Unfortunately the hardware doesn't work exactly like this and therefore this patch actually breaks e100 on those systems. on all systems. (Both the &| typo and the removed restart logic). Reverting the change brings it back to the previously known good state for 2.6.22. The pending rewrite in progress to this code can then be safely merged later. Signed-off-by: Auke Kok <[EMAIL PROTECTED]> --- milton - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
From: Miklos Szeredi <[EMAIL PROTECTED]> Date: Wed, 06 Jun 2007 10:08:29 +0200 > > > > Holding a global mutex over recvmsg() calls under AF_UNIX is pretty > > > > much a non-starter, this will kill performance for multi-threaded > > > > apps. > > > > > > That's an rwsem held for read. It's held for write in unix_gc() only > > > for a short duration, and unix_gc() should only rarely be called. So > > > I don't think there's any performance problem here. > > > > It pulls a non-local cacheline into the local thread, that's extremely > > expensive on SMP. > > OK, here's an updated patch, that uses ->readlock, and passes my > testing. Thanks a lot, I'll review this as soon as possible unless someone else beats me to it :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
> > > Holding a global mutex over recvmsg() calls under AF_UNIX is pretty > > > much a non-starter, this will kill performance for multi-threaded > > > apps. > > > > That's an rwsem held for read. It's held for write in unix_gc() only > > for a short duration, and unix_gc() should only rarely be called. So > > I don't think there's any performance problem here. > > It pulls a non-local cacheline into the local thread, that's extremely > expensive on SMP. OK, here's an updated patch, that uses ->readlock, and passes my testing. From: Miklos Szeredi <[EMAIL PROTECTED]> There are races involving the garbage collector, that can throw away perfectly good packets with AF_UNIX sockets in them. The problems arise when a socket goes from installed to in-flight or vice versa during garbage collection. Since gc is done with a spinlock held, this only shows up on SMP. Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux-2.6.22-rc2/net/unix/garbage.c === --- linux-2.6.22-rc2.orig/net/unix/garbage.c2007-06-03 23:58:11.0 +0200 +++ linux-2.6.22-rc2/net/unix/garbage.c 2007-06-06 09:48:36.0 +0200 @@ -186,7 +186,21 @@ void unix_gc(void) forall_unix_sockets(i, s) { - unix_sk(s)->gc_tree = GC_ORPHAN; + struct unix_sock *u = unix_sk(s); + + u->gc_tree = GC_ORPHAN; + + /* +* Hold ->readlock to protect against sockets going from +* in-flight to installed +* +* Can't sleep on this, because +* a) we are under spinlock +* b) skb_recv_datagram() could be waiting for a packet that +* is to be sent by this thread +*/ + if (!mutex_trylock(&u->readlock)) + goto lock_failed; } /* * Everything is now marked @@ -207,8 +221,6 @@ void unix_gc(void) forall_unix_sockets(i, s) { - int open_count = 0; - /* * If all instances of the descriptor are not * in flight we are in use. @@ -218,10 +230,20 @@ void unix_gc(void) * In this case (see unix_create1()) we set artificial * negative inflight counter to close race window. * It is trick of course and dirty one. +* +* Get the inflight counter first, then the open +* counter. This avoids problems if racing with +* sendmsg +* +* If just created socket is not yet attached to +* a file descriptor, assume open_count of 1 */ + int inflight_count = atomic_read(&unix_sk(s)->inflight); + int open_count = 1; + if (s->sk_socket && s->sk_socket->file) open_count = file_count(s->sk_socket->file); - if (open_count > atomic_read(&unix_sk(s)->inflight)) + if (open_count > inflight_count) maybe_unmark_and_push(s); } @@ -300,6 +322,7 @@ void unix_gc(void) spin_unlock(&s->sk_receive_queue.lock); } u->gc_tree = GC_ORPHAN; + mutex_unlock(&u->readlock); } spin_unlock(&unix_table_lock); @@ -309,4 +332,19 @@ void unix_gc(void) __skb_queue_purge(&hitlist); mutex_unlock(&unix_gc_sem); + return; + + lock_failed: + { + struct sock *s1; + forall_unix_sockets(i, s1) { + if (s1 == s) { + spin_unlock(&unix_table_lock); + mutex_unlock(&unix_gc_sem); + return; + } + mutex_unlock(&unix_sk(s1)->readlock); + } + BUG(); + } } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sending IPv6 packets via kern_sendmsg
Anton wrote: > Hi, > > Following on from a previous (now lost :-)) posting, I have been trying > to send out IPv6 packets from a kernel module using the kern_sendmsg() > function. Since in theory I need this function to be started in > interrupt context (actually, softirq context), but since this is > impossible because kern_sendmsg() needs to sleep, I have created a work > queue which calls the kern_sendmsg() function separately. The work queue > is scheduled from the softirq context (actually, at the moment this > happens from a Netfilter hook). The work queue function creates a msghdr > structure, fills in a sin6_addr structure, calls sock_create_kern() and > then uses this socket to send an IPv6 packet, which consists of a header > (struct ipv6hdr *iphdr) and some data following on from this. The above > packet is placed in the msghdr structure by setting (after the > appropriate initializations): > msg.msg_iov->iov_base = (char *) ip6hdr; > msg.msg_iov->iov_len = sizeof( struct ipv6hdr + ntohs( > ip6hdr->payload_len ) ); Shouldn't this be: msg.msg_iov->iov_len = sizeof (struct ipv6hdr) + ntohs (ip6hdr->payload_len); Seems you use sizeof (nthos (...)), and that seems wrong. -- Roar B. Rotvik - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive
On Tue, Jun 05 2007, Evgeniy Polyakov wrote: > On Tue, Jun 05, 2007 at 10:05:43AM +0200, Jens Axboe ([EMAIL PROTECTED]) > wrote: > > Here's an implementation of tcp network splice receive support. It's > > originally based on the patch set that Intel posted some time ago, but > > has been (close to) 100% reworked. > > > > Now, I'm not a networking guru by any stretch of the imagination, so I'd > > like some input on the direction of the main patch. Is the approach > > feasible? Glaring errors? Missing bits? > > 263.709262] [ cut here ] > [ 263.713932] kernel BUG at include/linux/mm.h:285! > [ 263.718678] invalid opcode: [1] PREEMPT SMP > [ 263.723561] CPU 0 > [ 263.725665] Modules linked in: button loop snd_intel8x0 > snd_ac97_codec ac97_bus snd_pcm snd_timer snd soundcore psmouse > snd_page_alloc k8temp i2c_nforcen > [ 263.755666] Pid: 2709, comm: splice-fromnet Not tainted > 2.6.22-rc4-splice #2 > [ 263.762759] RIP: 0010:[] [] > skb_splice_bits+0xac/0x1c9 > [ 263.771212] RSP: 0018:81003c79fc88 EFLAGS: 00010246 > [ 263.776564] RAX: RBX: 05a8 RCX: > 81003ff04778 > [ 263.783743] RDX: 81003ff04778 RSI: 0ab2 RDI: > 0003d52d > [ 263.790925] RBP: 81003c79fdd8 R08: R09: > 81003d927b78 > [ 263.798104] R10: 803b0181 R11: 81003c79fde8 R12: > 81003d52d000 > [ 263.805284] R13: 054e R14: 81003d927b78 R15: > 81003bbc6ea0 > [ 263.812463] FS: 2ac4089a86d0() GS:804fb000() > knlGS: > [ 263.820611] CS: 0010 DS: ES: CR0: 8005003b > [ 263.826396] CR2: 2ac4088320e0 CR3: 3c987000 CR4: > 06e0 > [ 263.833578] Process splice-fromnet (pid: 2709, threadinfo > 81003c79e000, task 81003755c380) > [ 263.842591] Stack: 81003ff04720 > 81003755c380 0046 > [ 263.850897] 00c6 0046 81003b0428b8 > 81003d0b5b10 > [ 263.858543] 00c6 81003d0b5b10 81003b0428b8 > 81003d0b5b10 > [ 263.865957] Call Trace: > [ 263.868683] [] _read_unlock_irq+0x31/0x4e > [ 263.874393] [] tcp_splice_data_recv+0x20/0x22 > [ 263.880447] [] tcp_read_sock+0xa2/0x1ab > [ 263.885983] [] tcp_splice_data_recv+0x0/0x22 > [ 263.891951] [] tcp_splice_read+0xae/0x1a3 > [ 263.897655] [] sock_def_readable+0x0/0x6f > [ 263.903366] [] sock_splice_read+0x15/0x17 > [ 263.909072] [] do_splice_to+0x76/0x88 > [ 263.914432] [] sys_splice+0x1a8/0x232 > [ 263.919795] [] system_call+0x7e/0x83 > [ 263.925067] > [ 263.926606] > [ 263.926607] Code: 0f 0b eb fe 44 89 e6 81 e6 ff 0f 00 00 90 ff 42 > 08 48 63 55 > [ 263.936418] RIP [] skb_splice_bits+0xac/0x1c9 > [ 263.942516] RSP > > This a vm_bug_on in get_page(). > > > +static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page > > *page, > > + unsigned int len, unsigned int offset) > > +{ > > + struct page *p; > > + > > + if (unlikely(spd->nr_pages == PIPE_BUFFERS)) > > + return 1; > > + > > +#ifdef NET_COPY_SPLICE > > + p = alloc_pages(GFP_KERNEL, 0); > > + if (!p) > > + return 1; > > + > > + memcpy(page_address(p) + offset, page_address(page) + offset, len); > > +#else > > + p = page; > > + get_page(p); > > +#endif > > Some pages have zero reference counter here. > > Is commented NET_COPY_SPLICE part from old implementation? > It will be always slower than existing approach due to allocation > overhead. NET_COPY_SPLICE is not supposed to be enabled, it's just a demonstration of a copy operation if you wanted to test functionality without just linking in the skb pages. At least that would allow you to test correctness of the rest of the code, since I don't know if the skb page linking is entirely correct yet. But it's somewhat annoying to get pages with zero reference counts there, I wonder how that happens. I guess if the skb->data originated from kmalloc() then we don't really know. The main intent there was just to ensure the page wasn't going away, but clearly it's not good enough to ensure that reuse isn't taking place. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warnings in git-wireless
On Tue, Jun 05, 2007 at 01:12:03PM -0700, James Ketrenos wrote: > Yes, we certainly don't want a driver to be "near mainline" that does > things that the rest of the kernel and other drivers are doing. We should > force them to stay out-of-tree until any and everything is resolved. > Heaven forbid that the code should be merged, contributed, and improved > upon as a community. > > You'd think these guys were trying to enable folks with a driver so they > can use the hardware they bought or something. Its almost laughable. What > idiots. Maybe you should make sure your companies publishes specs or useable drivers earlier, and you should stop writing crap code instead of these hypocritical comments. We're a big project and we need to be up to some standards before we can put code in that we'll have to maintain then. > > James > - > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text--- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RFC] network splice receive
On Tue, Jun 05 2007, jamal wrote: > On Tue, 2007-05-06 at 14:20 +0200, Jens Axboe wrote: > > > > > > > 1800 4ff3 937f e000 6381 7275 0008 > > > > > > Perhaps that hex pattern rings a bell with someone intimate with the > > > networking. The remaining wrong bytes don't seem to have anything in > > > common. > > > > Ok, the source mac address is 00:18:F3:4F:7F:93 and the destination is > > 00:E0:81:63:75:72 which are the middle 12 bytes of the 16. > > > > It appears you may have endianness issues and perhaps a 16 bit > mis-offset. the 0008 at the end looks like a swapped 0x800 which > is the ethernet type for IPV4. Yeah, it looks like the first part of the ethernet frame. I'm using an e1000 on the receive side and a sky2 on the sender. > > Hope that helps someone clue me in as to which network part is reusing > > the data. Do I need to 'pin' the sk_buff until the pipe data has been > > consumed? > > I would worry about the driver level first - likely thats where your > corruption is. I'd just be a heck-of-a-lot more inclined to suspect my code! Hence I thought that data reuse for perhaps another packet would be the likely explanation, especially since it looked like valid network driver data ending up where it should not. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
typo in via-velocity.c
http://bugzilla.kernel.org/show_bug.cgi?id=8160 Signed-off-by: Dave Jones <[EMAIL PROTECTED]> diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c index 25b75b6..b670b97 100644 --- a/drivers/net/via-velocity.c +++ b/drivers/net/via-velocity.c @@ -1562,7 +1562,7 @@ static void velocity_print_link_status(struct velocity_info *vptr) if (vptr->mii_status & VELOCITY_LINK_FAIL) { VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE "%s: failed to detect cable link\n", vptr->dev->name); } else if (vptr->options.spd_dpx == SPD_DPX_AUTO) { - VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE "%s: Link autonegation", vptr->dev->name); + VELOCITY_PRT(MSG_LEVEL_INFO, KERN_NOTICE "%s: Link auto-negotiation", vptr->dev->name); if (vptr->mii_status & VELOCITY_SPEED_1000) VELOCITY_PRT(MSG_LEVEL_INFO, " speed 1000M bps"); -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Remove incorrect comment from hamradio/scc.
scc_rxint doesn't call this function at all. http://bugzilla.kernel.org/show_bug.cgi?id=8146 Signed-off-by: Dave Jones <[EMAIL PROTECTED]> diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c index 6fdaad5..30bed2a 100644 --- a/drivers/net/hamradio/scc.c +++ b/drivers/net/hamradio/scc.c @@ -1610,7 +1610,7 @@ static int scc_net_close(struct net_device *dev) return 0; } -/* > receive frame, called from scc_rxint() < */ +/* > receive frame < */ static void scc_net_rx(struct scc_channel *scc, struct sk_buff *skb) { -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html