Re: [patch] RFC: matching interface groups
On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote: > Hi Phil, > > On Tue, Aug 01, 2006 at 11:46:55AM -0700, Phil Oester told us: > > Since in this scenario userspace is able to determine ppp vs pptp, > > could you not also do something like have an inbound_ppp and inbound_pptp > > chain, then jump to the appropriate chain depending on type? If you > > need per-interface rules, then create an inbound_pppX chain, populate > > it with rules, then jump to that chain if -i pppX. In ip-down, just > > delete the chain as well as the jump. > > if I understood Balazs correctly, one of the things he wanted to > avoid is addition/deletion of iptables rules on every pppX interface > up/down Exactly. > as this would require the complete chain (say, INPUT or > OUTPUT) to be "downloaded" to userspace, modified and then again > "uploaded" to the kernel. At least until iptables redesign to > allow replacement/insertion/deletion of single rules is completed > which if started at all will take quite some more time :-) Iptables operates on a per-table basis, so it is not only the INPUT or OUTPUT chain that needs to be down and uploaded, but the whole filter table. And in addition, in my humble opinion the iptables ruleset should be up to the user to maintain, once some kind of automatism starts to add/remove rules on the fly, it becomes more difficult to do other changes to add independent rules to the table. For example the user needs to save the current ruleset using iptables-save, then modify the resulting file, and then load it again. If the ruleset is generated as it happens with a lot of tools, this might not be so easy. -- Bazsi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] RFC: matching interface groups
On Tue, 2006-08-01 at 11:29 -0700, Stephen Hemminger wrote: > On Tue, 01 Aug 2006 19:10:09 +0200 > Balazs Scheidler <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I would like to easily match a set of dynamically created interfaces > > from my packet filter rules. The attached patch forms the basis of my > > implementation and I would like to know whether something like this is > > mergeable to mainline. > > > > The use-case is as follows: > > > > * I have two different subsystems creating interfaces dynamically (for > > example pptpd and serial pppd lines, each creating dynamic pppX > > interfaces), > > * I would like to assign a different set of iptables rules for these > > clients, > > * I would like to react to a new interface being added to a specific set > > in a userspace application, > > > > The reasons I see this needs new kernel functionality: > > > > * iptables supports wildcard interface matching (for example "iptables > > -i ppp+"), but as the names of the interfaces used by PPTPD and PPPD > > cannot be distinguished this way, this is not enough, > > * Reloading the iptables ruleset everytime a new interface comes up is > > not really feasible, as it abrupts packet processing, and validating the > > ruleset in the kernel can take significant amount of time, > > * the kernel change is very simple, adapting userspace to this change is > > also very simple, and in userspace various software packages can easily > > interoperate with each-other once this is merged. > > > > The implementation: > > > > Each interface can belong to a single "group" at a time, an interface > > comes up without being a member in any of the groups. > > > > Userspace can assign interfaces to groups after being created, this > > would typically be performed in /etc/ppp/ip-up.d (and similar) scripts. > > > > In spirit "interface group" is somewhat similar to the "routing > > protocol" field for routing entries, which contains information on which > > routing daemon was responsible for adding the given route entry. > > > > [snip] > I like the concept, but it probably needs more review. > > There is a bigger issue, which is how should the network device namespace > exist? There are virtualization efforts, that want to virtualize it, > and network device names have always lived in a parallel universe. > I don't expect your patch to solve this... I have read the OLS paper on virtualization, it states that the current state of affairs is that struct net_device will be assigned to one specific namespace. As my change changes struct net_device itself, I expect to work without problems when virtualization comes, the interface group can be interpreted on a per-namespace basis. There probably will be several iptables rulesets when the time comes, one for each namespace, but again, struct net_device will be assigned to a namespace, and the proper iptables tables will be iterated based on the net_device assignment. Am I missing something? -- Bazsi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 speed/duplex error
Hi, Auke. Auke Kok wrote: AK> Here's that part of the driver documentation: AK> $ modprobe e1000 AutoNeg=0x08 AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD AK> 99 /* Auto-negotiation Advertisement Override AK> 100 * AK> 101 * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber) AK> 102 * AK> 103 * The AutoNeg value is a bit mask describing which speed and duplex AK> 104 * combinations should be advertised during auto-negotiation. AK> 105 * The supported speed and duplex modes are listed below AK> 106 * AK> 107 * Bit 7 6 5 4 3 2 1 0 AK> 108 * Speed (Mbps) N/A N/A 1000 N/A100 100 10 10 AK> 109 * DuplexFull Full Half Full Half AK> 110 * AK> 111 * Default Value: 0x2F (copper); 0x20 (fiber) AK> 112 */ This is not what I'm thinking of. Say, for example, I have a bunch of e1000 adapters in my box and want to dynamically change one's spd/dplx. For that works in the way you described I need to stop all of them and load with autoneg parameter (can I pass this parameter only to single card?) and loose all connection I had on other adapters. It would be better to handle it in ethtool way, since I discovered it's a common behavior. Thanks. AK> hth, AK> Auke -- Best Regards, Alexandr Kotov mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] gre: transparent ethernet bridging
On Mon, Jul 31, 2006 at 10:08:22PM -0700, Stephen Hemminger wrote: > > > Why not use existing bridge code? > > > > It does use the existing bridge code. Perhaps the name is misleading. > > All it does is encapsulate the full ethernet header in a gre packet, > > rather than only layer 3. That is, currently gre uses ARPHRD_IPGRE, > > but bridging requires ARPHRD_ETHER. > > I am not against making the bridge code smarter to handle other > encapsulation. What if you want to run ethernet directly over a GRE tunnel, without using bridging? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 speed/duplex error
On 8/2/06, a1 <[EMAIL PROTECTED]> wrote: Hi, Auke. Auke Kok wrote: AK> Here's that part of the driver documentation: AK> $ modprobe e1000 AutoNeg=0x08 AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD AK> 99 /* Auto-negotiation Advertisement Override AK> 100 * AK> 101 * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber) AK> 102 * AK> 103 * The AutoNeg value is a bit mask describing which speed and duplex AK> 104 * combinations should be advertised during auto-negotiation. AK> 105 * The supported speed and duplex modes are listed below AK> 106 * AK> 107 * Bit 7 6 5 4 3 2 1 0 AK> 108 * Speed (Mbps) N/A N/A 1000 N/A100 100 10 10 AK> 109 * DuplexFull Full Half Full Half AK> 110 * AK> 111 * Default Value: 0x2F (copper); 0x20 (fiber) AK> 112 */ This is not what I'm thinking of. Say, for example, I have a bunch of e1000 adapters in my box and want to dynamically change one's spd/dplx. For that works in the way you described I need to stop all of them and load with autoneg parameter (can I pass this parameter only to single card?) and loose all connection I had on other adapters. It would be better to handle it in ethtool way, since I discovered it's a common behavior. Thanks. AK> hth, AK> Auke -- Best Regards, Alexandr Kotov mailto:[EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I agree. Although ethtool does not have that functionality as of yet. Feel free to provide a patch to the ethtool maintainer (Jeff Garzik) if you would like. I will put it on my plate of things to do, but I will admit that it is near the bottom of the list of items to get done for me. Feel free to ping me once in awhile to remind me. -- Cheers, Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take2 1/4] kevent: core files.
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Wed, 2 Aug 2006 10:39:18 +0400 > u64 is not aligned, so I prefer to use u32 as much as possible. We have aligned_u64 exactly for this purpose, netfilter makes use of it to avoid the x86_64 vs. x86 u64 alignment discrepency. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [take2 1/4] kevent: core files.
On Wed, Aug 02, 2006 at 12:25:05AM -0700, David Miller ([EMAIL PROTECTED]) wrote: > From: Evgeniy Polyakov <[EMAIL PROTECTED]> > Date: Wed, 2 Aug 2006 10:39:18 +0400 > > > u64 is not aligned, so I prefer to use u32 as much as possible. > > We have aligned_u64 exactly for this purpose, netfilter makes > use of it to avoid the x86_64 vs. x86 u64 alignment discrepency. Ok, I will use that type. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Mobile IPv6 introduction
Hugo Santos wrote: > David, > > On Tue, Aug 01, 2006 at 05:35:35PM -0700, David Miller wrote: >> This is partly why the multiple routing table code is being >> added as the initial infrastrucutre, so that source based >> things are possible. > >There have been other approaches for partial source-based stuff. For > instance, in my tree i brought Subtrees back to a point of being > usable. But what i was refering to was route-caching (some places only > check cookies based on dst because we "don't support source routing"), > APIs, etc. A few mails back you pointed the extension of a public > structure to include a "source" attribute -- this is the kind of stuff > we must add and that i think it's an independent work (read: even if > the rest isn't merged, this should). > >Still regarding Subtrees, is there any interest in revitalizing that > code? I have a couple patches that i could submit. Hi Hugo, I don't want to be dismissive towards your patches, but I've been working with the subtree routing stuff for several years now. And let me tell you: it has provided us with some nasty little surprises every now and then. I'm only saying it's surprisingly difficult to get right on the first try. To name just one issue: the chicken and egg problem of source address selection and source address based routing. I solved this problem by letting policy rules (with a source prefix) add additional constraints to the address selection. This did however mean the source address selection had to be moved inside the routing code. This is just one example; trust me, there are several more. My latest incarnation of source address routing is against my previous version of policy routing, which luckily isn't that different from current the version by Thomas. Unless Yoshifuji-san has already ported my code to Thomas'es policy routing code, I'll start working on it. >> Such a scheme would need provisions for handling the case where >> the user eats the message, but never tells us what to do. >> In such a case we'd need to emit some kind of ICMPv6 message, >> even if it would be just a timeout generated parameter problem. > >As i see it, the moment there is a raw socket open for dealing with a > particular protocol, whoever opened that socket (handling the protocol) > is responsible of generating any error messages associated with the > protocol running. Which is the case, the kernel shouldn't need to know > whether any of the Mobile IPv6 specific messages have problems. The > particular patch i was refering to does partial MIPv6 message > processing inside the kernel before handing it to the socket as you > only have access to the full received headers there. > >> Such a layer would be needed if we ever put some kernel level >> components of Mobile IPv4 into the tree, which I see no reason >> not to, since it has this route optimization as well. > >Yes, the functionality is needed. My only problem is with exposing > MODE_ROUTEOPTIMIZATION, it isn't modular. But it's something i can live > with. But route optimization is just one form of packet transform; it just adds a Routing Header type 2 and/or Home Address Option Destination Header to the outgoing packet. Isn't xfrm just the right place for this? You are right that we (HUT and USAGI) have mostly just looked at the xfrm framework from a MIPv6+IPsec perspective, but even this has helped us pinpoint several shortcomings in the current only IPsec specific framework. IMO, this doesn't hinder, but rather helps change xfrm into the generic packet transform framework it was originally envisioned to be. Regards, Ville - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/23] [PATCH] [XFRM]: Add XFRM_MODE_xxx for future use.
Herbert Xu wrote: > Please rebase your tree on something that's more recent. We've had > xfrm modes for more than two months now. OK, I use rebase to catch up with the latest tree. (This tree is just for review then it is not against the latest but 2.6.17.) -- Masahide NAKAMURA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/20][IPV6/XFRM] MIPv6 CN (part B)
David Miller wrote: > From: Masahide NAKAMURA <[EMAIL PROTECTED]> > Date: Sat, 29 Jul 2006 18:37:04 +0900 > >> Here is Part B patches, following this mail. >> >> Part B is also available as mip6cn-20060716-review branch at: >> >> git://git.skbuff.net:9419/gitroot/nakam/linux-2.6-mip6cn >> >> This tree includes part A, then it has all patches about >> "Advanced XFRM for CN". > > These patches mainly deal with the specifics of ipv6 > mobility processing, they look mostly fine to me and > I could not spot any obvious errors. Thank you for reviewing. Next time I prepare the patch for the latest tree with fixes about comments. Thanks, -- Masahide NAKAMURA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/23] [PATCH] [XFRM] STATE: Add a hook to find where to be inserted header in outbound.
David Miller wrote: > From: Masahide NAKAMURA <[EMAIL PROTECTED]> > Date: Wed, 02 Aug 2006 11:20:30 +0900 > >> David Miller wrote: >>> I see a dangerous pattern of adding many, many, many methods >>> to the xfrm_type structure which are only used by ipv6. >>> But I cannot suggest another method. >> Sometimes this is a difficult point for me to design. > > Do not worry so much about it right now, it is not a barrier > for code integration. We can try to refine this later on. OK, I improve my code for current framework at first. Thanks :-) -- Masahide NAKAMURA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 speed/duplex error
Hi, Jeff. JK> On 8/2/06, a1 <[EMAIL PROTECTED]> wrote: >> Hi, Auke. >> >> Auke Kok wrote: >> AK> Here's that part of the driver documentation: >> >> AK> $ modprobe e1000 AutoNeg=0x08 >> AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD >> >> >> AK> 99 /* Auto-negotiation Advertisement Override >> AK> 100 * >> AK> 101 * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber) >> AK> 102 * >> AK> 103 * The AutoNeg value is a bit mask describing which speed and duplex >> AK> 104 * combinations should be advertised during auto-negotiation. >> AK> 105 * The supported speed and duplex modes are listed below >> AK> 106 * >> AK> 107 * Bit 7 6 5 4 3 2 1 0 >> AK> 108 * Speed (Mbps) N/A N/A 1000 N/A100 100 10 10 >> AK> 109 * DuplexFull Full Half Full Half >> AK> 110 * >> AK> 111 * Default Value: 0x2F (copper); 0x20 (fiber) >> AK> 112 */ >> >> This is not what I'm thinking of. Say, for example, I have a bunch of >> e1000 adapters in my box and want to dynamically change one's spd/dplx. >> For that works in the way you described I need to stop all of them and >> load with autoneg parameter (can I pass this parameter only to single >> card?) and loose all connection I had on other adapters. >> It would be better to handle it in ethtool way, since I discovered >> it's a common behavior. >> Thanks. >> >> >> AK> hth, >> >> AK> Auke >> >> >> -- >> Best Regards, >> Alexandr Kotov mailto:[EMAIL PROTECTED] >> >> - >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to [EMAIL PROTECTED] >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> JK> I agree. Although ethtool does not have that functionality as of yet. JK> Feel free to provide a patch to the ethtool maintainer (Jeff Garzik) JK> if you would like. I will put it on my plate of things to do, but I JK> will admit that it is near the bottom of the list of items to get done JK> for me. Feel free to ping me once in awhile to remind me. Ethtool already have support for that, but e1000 driver doesn't treat all values passed from ethtool correctly. For example, if I run ethtool with the following parameters: ethtool -s eth0 speed 100 duplex full autoneg on parameters filled by ethtool looks like: ecmd->autoneg = AUTONEG_ENABLE; ecmd->advertising = ADVERTISED_100baseT_Full; but then they passed to the driver, driver fills the structure passed to the hw layer with all possible advertise values. static int e1000_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd) { struct e1000_adapter *adapter = netdev_priv(netdev); struct e1000_hw *hw = &adapter->hw; /* When SoL/IDER sessions are active, autoneg/speed/duplex * cannot be changed */ if (e1000_check_phy_reset_block(hw)) { DPRINTK(DRV, ERR, "Cannot change link characteristics " "when SoL/IDER is active.\n"); return -EINVAL; } if (ecmd->autoneg == AUTONEG_ENABLE) { hw->autoneg = 1; if (hw->media_type == e1000_media_type_fiber) hw->autoneg_advertised = ADVERTISED_1000baseT_Full | ADVERTISED_FIBRE | ADVERTISED_Autoneg; else --->hw->autoneg_advertised = ADVERTISED_10baseT_Half | ADVERTISED_10baseT_Full | ADVERTISED_100baseT_Half | ADVERTISED_100baseT_Full | ADVERTISED_1000baseT_Full| ADVERTISED_Autoneg | ADVERTISED_TP; ecmd->advertising = hw->autoneg_advertised; } else if (e1000_set_spd_dplx(adapter, ecmd->speed + ecmd->duplex)) return -EINVAL; /* reset the link */ if (netif_running(adapter->netdev)) e1000_reinit_locked(adapter); else e1000_reset(adapter); return 0; } If you change it that way everything works like I thought --- e1000_ethtool.c.origMon Jun 26 14:13:26 2006 +++ e1000_ethtool.c Wed Aug 02 12:35:36 2006 @@ -225,13 +225,7 @@ ADVERTISED_FIBRE | ADVERTISED_Autoneg; else - hw->autoneg_advertised = ADVERTISED_10baseT_Half | - ADVERTISED_10baseT_Full | - ADVERTISED_100baseT_Half | - ADVERTISED_100baseT_Full | -
Re: Linville's L2 rant... -- Re: PATCH Fix bonding active-backup behavior for VLAN interfaces
On Tuesday 01 August 2006 19:21, you wrote: > John W. Linville wrote: > >>>I'm just not sure that cleverness is worth the headache, especially > >>>since the most clever things usually only work by accident... > >> > >>Or, work by solid, modular design and small tweaks! > > > > Point taken. But stashing little hacks in the networking core for > > specific virtual drivers isn't totally modular either. And even if > > it were, "modular design" probably belongs on the list of "things > > that can be taken too far", like "everything in userland", "never > > use ioctl", and "microkernels are superior". :-) > > To be honest, I'm not over-joyed to see bridging hooks included > in the VLAN code..but if that is what it takes to get bridging > and VLANs to play well and be flexible, I think it is a fair price. > > It certainly wouldn't hurt to have someone take a holistic view of the > various L2 device interactions. Just documenting current functionality > on, say, the netdev wiki would be a good first step. Ultimate flexibility could be provided by making the netif_rx routine (and the others, including vlan etc), a "virtual" routine. That way a list of "filters" could be defined that allow any processing to be done on the packet before it is handed of to the linux kernel's higher layers, including not delivering it on that interface, or delivering it on another interface. This would allow very complex implementations including stuff like a high-level l2 bridge, with vlan support, and a number of protocols like rstp, pvst+, ... with relatively simple code, that could be isolated from the main kernel. Would anyone be interested in signing off on such a patch ? (which basically creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and then modify bridging and bonding drivers to just use this) Regards, Christophe - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] RFC: matching interface groups
* Balazs Scheidler wrote, On 02/08/06 08:04: > On Tue, 2006-08-01 at 21:18 +0200, Sven Schuster wrote: >> as this would require the complete chain (say, INPUT or >> OUTPUT) to be "downloaded" to userspace, modified and then again >> "uploaded" to the kernel. At least until iptables redesign to >> allow replacement/insertion/deletion of single rules is completed >> which if started at all will take quite some more time :-) > > Iptables operates on a per-table basis, so it is not only the INPUT or > OUTPUT chain that needs to be down and uploaded, but the whole filter > table. > > And in addition, in my humble opinion the iptables ruleset should be up > to the user to maintain, once some kind of automatism starts to > add/remove rules on the fly, it becomes more difficult to do other > changes to add independent rules to the table. For example the user > needs to save the current ruleset using iptables-save, then modify the > resulting file, and then load it again. If the ruleset is generated as > it happens with a lot of tools, this might not be so easy. > Even without this scenario it is not easily safe; if two interfaces chanegd at the same time, two copies of iptables would be downloaded to user space, both modified differently and the last one to be uploaded would win, the other one loosing its changes. This has bitten me and is one of my reasons for liking ipt_condition Sam - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Mobile IPv6 introduction
Hi, Thanks for the reply, however: On Wed, Aug 02, 2006 at 12:24:30PM +0900, Masahide NAKAMURA wrote: > Our patch is similar as you said. Our design is that kernel does nothing > as possible about validation which can be done by user-space. > As you mentioned ICMPv6 error is hard to be sent by user-space because it > carries > original packet causing error. MIPv6 RFC says when mobility header length is > too short > ICMPv6 error (parameter problem) is sent. We also discussed about design like > your choice. > but we have not taken it because ICMPv6 sending mechanism is already in kernel > then it is reasonable to use it. We MIPL developers concluded that kernel > should > know mobility header types and their minimum length at least. I guess when we > would > support NEMO and FMIPv6, we just add their defines at that time. > (Actually, their implementations based on MIPL2 exists.) > If somebody would feel that such defines should be removed from kernel we > have another > idea to make new socket interface like ICMP filter to store mobility header > type and its > minimum length to kernel by user-space. Although the ICMP-filter approach would be better, it is not flexible enough to handle this situation. We must also send ICMPv6 Parameter Problems when ip6mh_proto isn't IPPROTO_NONE. I don't think it is too much of a burthen to handle ICMPv6 in the control daemon because you should already do so to react to ICMPv6 error messages from peers concerning MIPv6 signalling. I'm strongly against doing these checks in the kernel for the simple reason that it is not easily extendable. You wouldn't be able to deploy a new daemon version over an existing kernel with these changes if it supported a new control protocol with new messages. I think we should follow a different path here and i propose either have a hdrinc=1 mode (for reception only) for protocol raw sockets, possibly adding with control on reception which specifies the offset of the UPL header; or have a control message to obtain the network headers. For instance: put_cmsg(msg, SOL_IPV6, ..., (skb->h.raw - skb->nh.raw), skb->nh.raw); Hugo signature.asc Description: Digital signature
Re: [RFC] Mobile IPv6 introduction
Hi Ville, On Wed, Aug 02, 2006 at 10:58:49AM +0300, Ville Nuorvala wrote: > To name just one issue: the chicken and egg problem of source address > selection and source address based routing. I solved this problem by > letting policy rules (with a source prefix) add additional constraints > to the address selection. This did however mean the source address > selection had to be moved inside the routing code. To tell you the truth i don't know what MIPL does in terms of policy management. In my implementation, all routing policies go into Subtrees without any kind of extra routing tables. I also had the problem you describe, but i opted for what i think is a simpler solution: - Access router default routes are installed with a source-address, the address that was generated from the announced prefix (which to be fair degenerates to several entries if a single router announces multiple prefixes). This is based on the assumption that access routers do perform source-based ingress filtering so you may only use a particular access router for global connectivity using a particular address. - The default home route is installed without a source-address for the "default" Home address (i may have several). This means Linux's source address selection works without modifications: if no address is specified, it will pick the default home route and then the Home address (which has a preference as well). In this sense, subtrees have worked fine for me. > But route optimization is just one form of packet transform; it just > adds a Routing Header type 2 and/or Home Address Option Destination > Header to the outgoing packet. Isn't xfrm just the right place for this? > > You are right that we (HUT and USAGI) have mostly just looked at the > xfrm framework from a MIPv6+IPsec perspective, but even this has helped > us pinpoint several shortcomings in the current only IPsec specific > framework. XFRM is indeed the right place for this; i just would rather not have the mode exposed and prefer wrapping any mode-specific stuff into optional callbacks. It might not be as performant but would allow adding new modes more easily. Hugo signature.asc Description: Digital signature
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
On Wed, 2006-08-02 at 02:47 -0400, Catherine Zhang wrote: > Hi, all, > > Enclosed please find the updated patch incorporating comments from > Stephen and Dave. Note that this patch is intended for 2.6.18 as a bug fix for the memory leak introduced by the original dgram peersec patches. > Again thanks for your help! > Catherine > > -- > > > From: [EMAIL PROTECTED] > > This patch implements a cleaner fix for the memory leak problem of the > original > unix datagram getpeersec patch. Instead of creating a security context each > time a unix datagram is sent, we only create the security context when the > receiver requests it. > > This new design requires modification of the current unix_getsecpeer_dgram > LSM hook and addition of two new hooks, namely, secid_to_secctx and > release_secctx. The former retrieves the security context and the latter > releases it. A hook is required for releasing the security context because > it is up to the security module to decide how that's done. In the case of > Selinux, it's a simple kfree operation. Acked-by: Stephen Smalley <[EMAIL PROTECTED]> > > > --- > > include/linux/security.h | 41 +++-- > include/net/af_unix.h|6 ++ > include/net/scm.h| 29 + > net/ipv4/ip_sockglue.c |9 +++-- > net/unix/af_unix.c | 17 + > security/dummy.c | 14 -- > security/selinux/hooks.c | 38 -- > 7 files changed, 110 insertions(+), 44 deletions(-) > > diff -puN include/net/scm.h~af_unix-datagram-getpeersec-ml-fix > include/net/scm.h > --- linux-2.6.18-rc2/include/net/scm.h~af_unix-datagram-getpeersec-ml-fix > 2006-07-22 21:28:21.0 -0400 > +++ linux-2.6.18-rc2-cxzhang/include/net/scm.h2006-08-01 > 22:43:50.0 -0400 > @@ -3,6 +3,7 @@ > > #include > #include > +#include > > /* Well, we should have at least one descriptor open > * to accept passed FDs 8) > @@ -20,8 +21,7 @@ struct scm_cookie > struct ucredcreds; /* Skb credentials */ > struct scm_fp_list *fp;/* Passed files */ > #ifdef CONFIG_SECURITY_NETWORK > - char*secdata; /* Security context */ > - u32 seclen; /* Security length */ > + u32 secid; /* Passed security ID */ > #endif > unsigned long seq;/* Connection seqno */ > }; > @@ -32,6 +32,16 @@ extern int __scm_send(struct socket *soc > extern void __scm_destroy(struct scm_cookie *scm); > extern struct scm_fp_list * scm_fp_dup(struct scm_fp_list *fpl); > > +#ifdef CONFIG_SECURITY_NETWORK > +static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct > scm_cookie *scm) > +{ > + security_socket_getpeersec_dgram(sock, NULL, &scm->secid); > +} > +#else > +static __inline__ void unix_get_peersec_dgram(struct socket *sock, struct > scm_cookie *scm) > +{ } > +#endif /* CONFIG_SECURITY_NETWORK */ > + > static __inline__ void scm_destroy(struct scm_cookie *scm) > { > if (scm && scm->fp) > @@ -47,6 +57,7 @@ static __inline__ int scm_send(struct so > scm->creds.pid = p->tgid; > scm->fp = NULL; > scm->seq = 0; > + unix_get_peersec_dgram(sock, scm); > if (msg->msg_controllen <= 0) > return 0; > return __scm_send(sock, msg, scm); > @@ -55,8 +66,18 @@ static __inline__ int scm_send(struct so > #ifdef CONFIG_SECURITY_NETWORK > static inline void scm_passec(struct socket *sock, struct msghdr *msg, > struct scm_cookie *scm) > { > - if (test_bit(SOCK_PASSSEC, &sock->flags) && scm->secdata != NULL) > - put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, scm->seclen, > scm->secdata); > + char *secdata; > + u32 seclen; > + int err; > + > + if (test_bit(SOCK_PASSSEC, &sock->flags)) { > + err = security_secid_to_secctx(scm->secid, &secdata, &seclen); > + > + if (!err) { > + put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, seclen, > secdata); > + security_release_secctx(secdata, seclen); > + } > + } > } > #else > static inline void scm_passec(struct socket *sock, struct msghdr *msg, > struct scm_cookie *scm) > diff -puN net/unix/af_unix.c~af_unix-datagram-getpeersec-ml-fix > net/unix/af_unix.c > --- linux-2.6.18-rc2/net/unix/af_unix.c~af_unix-datagram-getpeersec-ml-fix > 2006-07-22 23:01:26.0 -0400 > +++ linux-2.6.18-rc2-cxzhang/net/unix/af_unix.c 2006-08-02 > 02:25:00.454243480 -0400 > @@ -128,23 +128,17 @@ static atomic_t unix_nr_socks = ATOMIC_I > #define UNIX_ABSTRACT(sk)(unix_sk(sk)->addr->hash != UNIX_HASH_SIZE) > > #ifdef CONFIG_SECURITY_NETWORK > -static void unix_get_peersec_dgram(struct sk_buff *skb) > +static void unix_get_secdata(struct scm_cookie *scm,
RE: [RFC 2/3] secid reconciliation on inbound: add LSM hooks
> > - if (err) > > - goto out; > > + /* if (err) */ > > + /* goto out; */ > > > > - err = selinux_xfrm_sock_rcv_skb(sksec->sid, skb, &ad); > > -out: + /* err = > selinux_xfrm_sock_rcv_skb(sksec->sid, skb, &ad); */ > > +out: return err; > > } > > > Did you mean to leave the call to selinux_xfrm_sock_rcv_skb() > commented > out? I actually meant to take the call out entirely. Will fix this in the next round. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC 1/3] secid reconciliation on inbound
> On Tue, 1 Aug 2006, James Morris wrote: > > > On Tue, 1 Aug 2006, Venkat Yekkirala wrote: > > > > > +#define PACKET__COME_THRU 0x0008UL > > > +#define PACKET__GO_THRU 0x0010UL > > > > These names seem awkward, and do we really need a separate > perm for each > > direction? > > Ok, I see we need separate permissions. The naming, still... You are probably seeing something I haven't :), because I did consider using just one perm such as flow_thru for both directions but then thought separate perms would make things easier to understand. As for naming, how about "enter" and "leave"? Or "flow_in" and "flow_out". Any other suggestions out there? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Mobile IPv6 introduction
Hugo Santos wrote: >Although the ICMP-filter approach would be better, it is not flexible > enough to handle this situation. We must also send ICMPv6 Parameter > Problems when ip6mh_proto isn't IPPROTO_NONE. I don't think it is too I don't think IPPROTO_NONE case is a suitable example here (it is also supported by our kernel patch). We don't have any problem about who checks next header field since its offset of mobility header never changes then its value can be checked as the same way for all type number. But anyway, > much of a burthen to handle ICMPv6 in the control daemon because you > should already do so to react to ICMPv6 error messages from peers > concerning MIPv6 signalling. I'm strongly against doing these checks in > the kernel for the simple reason that it is not easily extendable. You > wouldn't be able to deploy a new daemon version over an existing kernel > with these changes if it supported a new control protocol with new > messages. I think we should follow a different path here and i propose > either have a hdrinc=1 mode (for reception only) for protocol raw > sockets, possibly adding with control on reception which specifies the > offset of the UPL header; or have a control message to obtain the > network headers. For instance: > > put_cmsg(msg, SOL_IPV6, ..., (skb->h.raw - skb->nh.raw), >skb->nh.raw); I can agree such suggestion as new kernel feature but I'm not sure MIPv6 stuff should depend on it just for new message type to extend later. On our design MIPv6 signaling itself is almost done by user-space daemon. When developer wants to add new or original type number, it is enough for kernel to be added the number and its length. All other things can be modified at user-space application. If there is much requirement to add new type number without any modification of kernel code at all I would support ICMPv6 filter approach, too. -- Masahide NAKAMURA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] DECnet Fix for DECnet routing bug
This patch fixes a bug in the DECnet routing code where we were selecting a loopback device in preference to an outward facing device even when the destination was known non-local. This patch should fix the problem. Signed-off-by: Patrick Caulfield <[EMAIL PROTECTED]> Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index 1355614..743e9fc 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -925,8 +925,13 @@ static int dn_route_output_slow(struct d for(dev_out = dev_base; dev_out; dev_out = dev_out->next) { if (!dev_out->dn_ptr) continue; - if (dn_dev_islocal(dev_out, oldflp->fld_src)) - break; + if (!dn_dev_islocal(dev_out, oldflp->fld_src)) + continue; + if ((dev_out->flags & IFF_LOOPBACK) && + oldflp->fld_dst && + !dn_dev_islocal(dev_out, oldflp->fld_dst)) + continue; + break; } read_unlock(&dev_base_lock); if (dev_out == NULL) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [NET]: Fix ___pskb_trim when entire frag_list needs dropping
Herbert Xu wrote: On Thu, Jul 13, 2006 at 07:03:41PM +1000, herbert wrote: > > This needs to go into stable as well. In fact, there is another unrelated > bug with exactly the same symptoms which was inadvertently fixed by the > GSO patches. So I'll send a simpler fix for that to stable. > > [NET]: Update frag_list in pskb_trim Marco told me that he was still seeing the same problem. Turns out that my patch missed one important case. I hope this is really the last time I have to look at this bug :) [NET]: Fix ___pskb_trim when entire frag_list needs dropping When the trim point is within the head and there is no paged data, ___pskb_trim fails to drop the first element in the frag_list. This patch fixes this by moving the len <= offset case out of the page data loop. This patch also adds a missing kfree_skb on the frag that we just cloned. The problem is fixed now. I have applied this patch to 2.6.18-rc2 Many thanks Herbert for all the time spent to debug this problem. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 speed/duplex error
a1 wrote: JK> I agree. Although ethtool does not have that functionality as of yet. JK> Feel free to provide a patch to the ethtool maintainer (Jeff Garzik) JK> if you would like. I will put it on my plate of things to do, but I JK> will admit that it is near the bottom of the list of items to get done JK> for me. Feel free to ping me once in awhile to remind me. Ethtool already have support for that, but e1000 driver doesn't treat all values passed from ethtool correctly. For example, if I run ethtool with the following parameters: ethtool -s eth0 speed 100 duplex full autoneg on parameters filled by ethtool looks like: ecmd->autoneg = AUTONEG_ENABLE; ecmd->advertising = ADVERTISED_100baseT_Full; but then they passed to the driver, driver fills the structure passed to the hw layer with all possible advertise values. if (ecmd->autoneg == AUTONEG_ENABLE) { hw->autoneg = 1; if (hw->media_type == e1000_media_type_fiber) hw->autoneg_advertised = ADVERTISED_1000baseT_Full | ADVERTISED_FIBRE | ADVERTISED_Autoneg; else --->hw->autoneg_advertised = ADVERTISED_10baseT_Half | ADVERTISED_10baseT_Full | ADVERTISED_100baseT_Half | ADVERTISED_100baseT_Full | ADVERTISED_1000baseT_Full| ADVERTISED_Autoneg | ADVERTISED_TP; ecmd->advertising = hw->autoneg_advertised; } else If you change it that way everything works like I thought --- e1000_ethtool.c.origMon Jun 26 14:13:26 2006 +++ e1000_ethtool.c Wed Aug 02 12:35:36 2006 @@ -225,13 +225,7 @@ ADVERTISED_FIBRE | ADVERTISED_Autoneg; else - hw->autoneg_advertised = ADVERTISED_10baseT_Half | - ADVERTISED_10baseT_Full | - ADVERTISED_100baseT_Half | - ADVERTISED_100baseT_Full | - ADVERTISED_1000baseT_Full| - ADVERTISED_Autoneg | - ADVERTISED_TP; + hw->autoneg_advertised = ecmd->advertising; Don't you mean this? : + hw->autoneg_advertised = ecmd->advertising | +ADVERTISED_Autoneg | +ADVERTISED_TP; and we'd also have to do this for fibre... > ecmd->advertising = hw->autoneg_advertised; } else if (e1000_set_spd_dplx(adapter, ecmd->speed + ecmd->duplex)) but that's not really what you want: the way ethtool works currently only allows you to pass *one* speed/duplex tuple and autonegotiate with that, or all (by omitting any speed/duplex tuple). ethtool needs some code that allows you to specify "autonegotiate 10_half or 100_full or 1000_full" (3 tuples, but not implying 100_half or 10_full). This is something mii-tool was able to do but this functionality never made it into ethtool AFAIK :) This is the most useful case for everyone, you can omit advertising gig link if you only have 100mbit switches and speed up link times that way etc. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix more per-cpu typos
On Wed, Aug 02, 2006 at 05:02:11AM +0200, Andi Kleen wrote: > > --- a/arch/x86_64/kernel/smp.c > > +++ b/arch/x86_64/kernel/smp.c > > @@ -203,7 +203,7 @@ int __cpuinit init_smp_flush(void) > > { > > int i; > > for_each_cpu_mask(i, cpu_possible_map) { > > - spin_lock_init(&per_cpu(flush_state.tlbstate_lock, i)); > > + spin_lock_init(&per_cpu(flush_state, i).tlbstate_lock); > > What advantage does this have over the earlier form? I've grepped tree after seeing "[PATCH] fix vmstat per cpu usage"¹. Rationale mentioned in that thread are 1) invalid asm on s390 2) it only works because per-cpu macros are very simple > In general this should be split up into three patches. Yep, I see Andrew splitted them. ¹ http://marc.theaimsgroup.com/?l=linux-kernel&m=115445399826223&w=2 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/23] [PATCH] [XFRM] POLICY: sub policy support.
On Sat, 29 Jul 2006, Masahide NAKAMURA wrote: > Sub policy is introduced. Main and sub policy are applied the same flow. > (Policy that current kernel uses is named as main.) > It is required another transformation policy management to keep IPsec > and Mobile IPv6 lives separate. > Policy which lives shorter time in kernel should be a sub i.e. normally > main is for IPsec and sub is for Mobile IPv6. > (Such usage as two IPsec policies on different database can be used, too.) Why can't IPSec & MIP transforms be bundled on the same policy? Or, perhaps a different approach is needed, where the disposition of a policy can be to re-submit a packet for another policy match after the current bundle has been traversed (something like NF_REPEAT). - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv4: don't call upper-layer disconnect function if not connected
The socket could have been bind()'d to, in which case it will not move to connected state and we still need to invoke the disconnect methods such as udp_disconnect() to clear out that binding. Ok. You seem to be groveling in random areas of the ipv4 and ipv6 stack, what are you working on? Was looking into a customer-reported memory leak that seemed to be in this code path. It wasn't, but this tweak seemed sane at the time. -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPv6: only set err in rawv6_bind() when necessary
Every other path going from this location in rawv6_bind() will clear err to zero, so your patch also doesn't fix any bug. I knew it didn't fix a bug, I just hadn't noticed the C idiom you pointed-out until I knew to look for it. rawv6_bind() even does this, duh. -Brian - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
Hi Catherine, On 02/08/06, Catherine Zhang <[EMAIL PROTECTED]> wrote: Hi, all, Enclosed please find the updated patch incorporating comments from Stephen and Dave. Thanks! Again thanks for your help! Catherine -- Regards, Michal -- Michal K. K. Piotrowski LTG - Linux Testers Group (http://www.stardust.webpages.pl/ltg/wiki/) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.18-rc2][e1000][swsusp] - Regression - Suspend to disk and resume breaks e1000 - RESOLVED Bug #6867
Shawn Starr wrote: On Sunday 16 July 2006 12:33 pm, Auke Kok wrote: [adding netdev to the cc] unfortunately I didn't. e1000 has a special e1000_pci_save_state/e1000_pci_restore_state set of routines that save and restore the configuration space. the fact that it works for suspend to memory to me suggests that there is nothing wrong with that. Hi Auke, It appears 2.6.18-rc3 this does not occur anymore. I suspended to disk/ram and the interface pci registers were restored. Bugzilla #6867 I would not be surprised if all the suspend issues in 2.6.18rcX were not involved in this somehow... thanks for reporting back in. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 speed/duplex error
a1 wrote: Hi, Auke. Auke Kok wrote: AK> Here's that part of the driver documentation: AK> $ modprobe e1000 AutoNeg=0x08 AK> e1000: :00:00.0: e1000_validate_option: AutoNeg advertising 100/FD AK> 99 /* Auto-negotiation Advertisement Override AK> 100 * AK> 101 * Valid Range: 0x01-0x0F, 0x20-0x2F (copper); 0x20 (fiber) AK> 102 * AK> 103 * The AutoNeg value is a bit mask describing which speed and duplex AK> 104 * combinations should be advertised during auto-negotiation. AK> 105 * The supported speed and duplex modes are listed below AK> 106 * AK> 107 * Bit 7 6 5 4 3 2 1 0 AK> 108 * Speed (Mbps) N/A N/A 1000 N/A100 100 10 10 AK> 109 * DuplexFull Full Half Full Half AK> 110 * AK> 111 * Default Value: 0x2F (copper); 0x20 (fiber) AK> 112 */ This is not what I'm thinking of. Say, for example, I have a bunch of e1000 adapters in my box and want to dynamically change one's spd/dplx. For that works in the way you described I need to stop all of them and load with autoneg parameter (can I pass this parameter only to single card?) and loose all connection I had on other adapters. you can pass these parameters per card as such: $ modprobe e1000 AutoNeg=0x2f,0x28,0x2f,0x2f this way card #2 will see the non-default value, the other 3 will run with the default value (0x2f). Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.16.19 2/2] LARTC: trace control for netem: kernelspace
Trace Control for Netem: Emulate network properties such as long-dependency and self-similarity of cross-traffic. The delay, drop, duplication and corruption values are readout in user space and sent to kernel space via procfs. The kernel determines the time when new values should be sent by the use of SIGSTOP and SIGCONT signals. In order to have always packet action values ready to apply, there are two buffers that hold these values. Packet action values can be read from one buffer and the other buffer can be refilled with new values simultaneously. If a buffer is empty it will be switched to the other buffer and a SIGCONT signal is sent in order to receive new packet action values. Having applied the delay value to a packet, the packet gets processed by the original netem functions. Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]> --- Patch for linux kernel 2.6.16.19: http://tcn.hypert.net/tcnKernel.patch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.16.19 1/2] LARTC: trace control for netem: userspace
Trace Control for Netem: Emulate network properties such as long-dependency and self-similarity of cross-traffic. The directory tc/netem was split in two parts, one containing the original distributions and the other the tools to generate trace files as well as the program responsible for reading the delay values from the trace file and sending them to the kernel (called flowseed). If the trace option is set, netem starts the flowseedprocess and initializes the kernel. To be able to kill the flowseedprocess, in case the command was faulty, the PID of the flowseedprocess is passed to the netem kernel module. If the kernel receives packet delay data from a not registered PID, the Process will be killed. The flowseedprocess does not send data to the kernel until the registration is completed. Signed-off-by: Rainer Baumann <[EMAIL PROTECTED]> --- Patch for iproute2-2.6.16-060323: http://tcn.hypert.net/tcnIproute.patch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.16.19 0/2] LARTC: trace control for netem
Hi, We developed an extension to the network emulator netem, that provides emulation of long term network properties such as long-range dependence and self-similarity of cross-traffic. It is not possible to emulate these properties with the statistical tables for the packet delay values used by the original netem. We read the values for the packet delay, drop, loss and corruption from a pre-generated trace file. This trace file is obtained by monitoring network traffic and writing all actions to a trace file. During the emulation the packets get processed according the values in such a trace file. Detailed information are available on our Webseitehttp://tcn.hypert.net A new option (trace) has been added to the netem command. If the trace option is used, the values for packet delay etc. are read from a trace file, afterwards the packets are processed by the normal netem functions. The packet action values are readout from the trace file in user space and sent to kernel space via procfs. The evaluation results show similar behavior for our enhancement and the original netem with respect to packet delay precision and packet loss at high load (e.g. 80'000 packets per second). It is possible to add, change or delete multiple netem qdiscs on-the-fly (original netem qdiscs and trace qdiscs mixed). We are looking forward for any comments, feedback and suggestions! Thanks, Rainer - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] gre: transparent ethernet bridging
On Wed, 02 Aug 2006 16:17:42 +1000 Philip Craig <[EMAIL PROTECTED]> wrote: > > Stephen Hemminger wrote: > >> I am not against making the bridge code smarter to handle other > >> encapsulation. > > Here's an updated patch that fixes all issues I am aware of. > > It generates a random mac address for gre ports, and also stores > a copy of the mac address for ethernet ports, rather than checking > dev->type everywhere. That looks cleaner. I wonder if using a fixed OUI would be better than random addresses but then choosing an OUI would be a problem. You probably should add a comment about what this function is doing, and why. > static int __br_nf_dev_queue_xmit(struct sk_buff *skb) > +{ > + if (skb->dst == (struct dst_entry *)&__fake_rtable) { > + dst_release(skb->dst); > + skb->dst = NULL; > + } > + > + return br_dev_queue_push_xmit(skb); > +} > + - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
means to artificially alter the bandwidth of a system
Hi, For research purposes we are considering to develop a program to alter the bandwidth of a system via the software, so instance: a machine has 100 MB/s and we change it to 1MB/s. Does something like this already exist? Or is there a way to do this without creating a program/kernel module Any help will be highly appreciated! Irfan Habib - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Stackable devices.
Stephen Hemminger wrote: On Wed, 2 Aug 2006 11:02:20 +0200 Christophe Devriese <[EMAIL PROTECTED]> wrote: On Tuesday 01 August 2006 19:21, you wrote: John W. Linville wrote: I'm just not sure that cleverness is worth the headache, especially since the most clever things usually only work by accident... Or, work by solid, modular design and small tweaks! Point taken. But stashing little hacks in the networking core for specific virtual drivers isn't totally modular either. And even if it were, "modular design" probably belongs on the list of "things that can be taken too far", like "everything in userland", "never use ioctl", and "microkernels are superior". :-) To be honest, I'm not over-joyed to see bridging hooks included in the VLAN code..but if that is what it takes to get bridging and VLANs to play well and be flexible, I think it is a fair price. It certainly wouldn't hurt to have someone take a holistic view of the various L2 device interactions. Just documenting current functionality on, say, the netdev wiki would be a good first step. Ultimate flexibility could be provided by making the netif_rx routine (and the others, including vlan etc), a "virtual" routine. That way a list of "filters" could be defined that allow any processing to be done on the packet before it is handed of to the linux kernel's higher layers, including not delivering it on that interface, or delivering it on another interface. This would allow very complex implementations including stuff like a high-level l2 bridge, with vlan support, and a number of protocols like rstp, pvst+, ... with relatively simple code, that could be isolated from the main kernel. Would anyone be interested in signing off on such a patch ? (which basically creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and then modify bridging and bonding drivers to just use this) I have thought about this, but you end up reinventing System V streams. The problem is for simple up/down call, the stacking model works fine but once you add flow-control and multiplexing issues the problem becomes complex. It is hard to think of a good general solution where the performance wouldn't end up sucking. Currently, the bridge hook logic is something like: if (bridge-consumed-pkt) { return } // drop through to other layers There are several other hooks I'd like to see added (pktgen receive processing, mac-vlans, etc). Each of these hooks are logically similar to the bridge hook, ie if it consumes the pkt, return, else, drop through to the next hook untill we get to the regular protocol processing logic. I would like to be able to chain layer-2 handlers, such as bridge, mac-vlan, pktgen such that if one consumed, you break out of the handling, else, you try the next handler. The handlers can be dynamically registered and inserted in any order, controllable by user-space and/or module load/unload. For many of the handlers, the logic will re-insert the packet by re-calling the netif-rx logic, so there would need to be some protection to keep loops from occurring that would recurse too much and overflow the stack. Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Stackable devices.
On Wed, 2 Aug 2006 11:02:20 +0200 Christophe Devriese <[EMAIL PROTECTED]> wrote: > On Tuesday 01 August 2006 19:21, you wrote: > > John W. Linville wrote: > > >>>I'm just not sure that cleverness is worth the headache, especially > > >>>since the most clever things usually only work by accident... > > >> > > >>Or, work by solid, modular design and small tweaks! > > > > > > Point taken. But stashing little hacks in the networking core for > > > specific virtual drivers isn't totally modular either. And even if > > > it were, "modular design" probably belongs on the list of "things > > > that can be taken too far", like "everything in userland", "never > > > use ioctl", and "microkernels are superior". :-) > > > > To be honest, I'm not over-joyed to see bridging hooks included > > in the VLAN code..but if that is what it takes to get bridging > > and VLANs to play well and be flexible, I think it is a fair price. > > > > It certainly wouldn't hurt to have someone take a holistic view of the > > various L2 device interactions. Just documenting current functionality > > on, say, the netdev wiki would be a good first step. > > Ultimate flexibility could be provided by making the netif_rx routine (and > the > others, including vlan etc), a "virtual" routine. > > That way a list of "filters" could be defined that allow any processing to be > done on the packet before it is handed of to the linux kernel's higher > layers, including not delivering it on that interface, or delivering it on > another interface. > > This would allow very complex implementations including stuff like a > high-level l2 bridge, with vlan support, and a number of protocols like rstp, > pvst+, ... with relatively simple code, that could be isolated from the main > kernel. > > Would anyone be interested in signing off on such a patch ? (which basically > creates netif_rx and vlan_acc_netif_rx lists in the net_device structure, and > then modify bridging and bonding drivers to just use this) I have thought about this, but you end up reinventing System V streams. The problem is for simple up/down call, the stacking model works fine but once you add flow-control and multiplexing issues the problem becomes complex. It is hard to think of a good general solution where the performance wouldn't end up sucking. -- Stephen Hemminger <[EMAIL PROTECTED]> "And in the Packet there writ down that doome" - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem
On Wed, 02 Aug 2006 19:21:27 +0200 Rainer Baumann <[EMAIL PROTECTED]> wrote: > Hi, > > We developed an extension to the network emulator netem, that provides > emulation of long term network properties such as long-range dependence > and self-similarity of cross-traffic. It is not possible to emulate > these properties with the statistical tables for the packet delay > values used by the original netem. > > We read the values for the packet delay, drop, loss and corruption from > a pre-generated trace file. This trace file is obtained by monitoring > network traffic and writing all actions to a trace file. During the > emulation the packets get processed according the values in such a trace > file. Detailed information are available on our > Webseitehttp://tcn.hypert.net > > A new option (trace) has been added to the netem command. If the trace > option is used, the values for packet delay etc. are read from a trace > file, afterwards the packets are processed by the normal netem functions. > The packet action values are readout from the trace file in user space > and sent to kernel space via procfs. > > The evaluation results show similar behavior for our enhancement and the > original netem with respect to packet delay precision and packet loss at > high load (e.g. 80'000 packets per second). > It is possible to add, change or delete multiple netem qdiscs on-the-fly > (original netem qdiscs and trace qdiscs mixed). > > We are looking forward for any comments, feedback and suggestions! > > Thanks, > Rainer I like the idea and want to get it incorporated. Major things that need fixing: * Don't extend size of tc_netem_qopt instead use a new netlink payload. + add type to TCA_NETEM_ enum + new structure containing the payload This allows for binary compatiablity. * Don't use proc for a interface to netem features. Use netlink. Either add a new command (or option) to the iproute2 commands to handle flow table, or add a new payload. Minor stuff: * the bzero macro in netem is a BSDism, just use memset * bad indentation and style issues. * minor whitespace damage in several places in patch - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][SECURITY] secmark: nul-terminate secdata
* James Morris ([EMAIL PROTECTED]) wrote: > cc'd Chris Wright, as this patch seems like a candidate for the stable > tree. Would be, but I thought secmark went in post 2.6.17. And I expect Dave will push this well before 2.6.18. thanks, -chris - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug] e100: checksum mismatch on 82551ER rev10
[cc-ing netdev] [adding original thread authors back, please do not strip CC] Charlie Brady wrote: Molle Bestefich wrote: The NICs are working perfectly. How can you tell? Do you know if jumbo frames work correctly? Is the device properly checksumming? is flow control working properly? These and many, many more settings are determined by the EEPROM. Seemingly it may work correctly, but there is no guarantee whatsoever that it will work correctly at all if the checksum is bad. Again, you can lose data, or worse, you could corrupt memory in the system causing massive failure (DMA timings, etc). Unlikely? sure, but not impossible. Let's assume that these things are all true, and the NIC currently does not work perfectly, just imperfectly, but acceptably. With the recent driver change, it now does not work at all. That's surely a bug in the driver. There is no logic in that sentence at all. You're saying that the driver is broken because it doesn't fix an error in the EEPROM? We're trying extremely hard to fix real errors here (especially when we find that hardware resellers send out hardware with EEPROM problems) and you are asking for a workaround that will (likely) introduce random errors and failure into your kernel. I do not want to accept responsability for that and I also do not think any other kernel developer would like me to release such a risk into the kernel. I'd probably get whistled back instantly :) If you want to edit your own kernel then I am fine with it. If you want to recalculate the checksum yourself and put it in the EEPROM then I am also fine with that. As long as you never ask for support for that NIC. But we can't support an option that allows all users to willingly enable a piece of non-properly-working hardware. Because that is what it is: Not properly configured hardware. The bottom line is that your problem is that a specific hardware vendor is/was selling badly configured hardware, and you buy it from them, even after it's End Of Lifed for that vendor. Even though that vendor did buy the units properly configured and had all the tools needed to configure them properly. I can maybe fix your problem by seeing if we can get you an eeprom update, but I can not break everyone elses kernel for that. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug] e100: checksum mismatch on 82551ER rev10
On Wed, 2 Aug 2006, Auke Kok wrote: [cc-ing netdev] [adding original thread authors back, please do not strip CC] [There were no Cc's visible in the lkml archive I used as source of my quotes.] Charlie Brady wrote: Let's assume that these things are all true, and the NIC currently does not work perfectly, just imperfectly, but acceptably. With the recent driver change, it now does not work at all. That's surely a bug in the driver. There is no logic in that sentence at all. You're saying that the driver is broken because it doesn't fix an error in the EEPROM? I am not asking the driver to fix errors in the EEPROM. I'm asking it to send and receive packets, as it has done in the past. We're trying extremely hard to fix real errors here (especially when we find that hardware resellers send out hardware with EEPROM problems) ... I do not expect the kernel to perform QA tests on my hardware, just work. and you are asking for a workaround that will (likely) introduce random errors and failure into your kernel. I do not want to accept responsability for that ... You publish your code under the GPL. You explicitly disclaim any warranty. If you want to edit your own kernel then I am fine with it. I suspect that if all/many T23 laptops perform as mine does then some major vendors will also edit their kernels. I'm sure they would rather not do that. If you want to recalculate the checksum yourself and put it in the EEPROM then I am also fine with that. Can you provide a reference as to how I might do that? As long as you never ask for support for that NIC. But we can't support an option that allows all users to willingly enable a piece of non-properly-working hardware. Because that is what it is: Not properly configured hardware. Which it may be. But it doesn't work at all with the new kernel, where it has in the past. The bottom line is that your problem is that a specific hardware vendor is/was selling badly configured hardware, and you buy it from them, even after it's End Of Lifed for that vendor. Even though that vendor did buy the units properly configured and had all the tools needed to configure them properly. I don't think either of us knows that. I can maybe fix your problem by seeing if we can get you an eeprom update... That'd be great. Thanks! Regards --- Charlie - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [bug] e100: checksum mismatch on 82551ER rev10
Charlie Brady wrote: Let's assume that these things are all true, and the NIC currently does not work perfectly, just imperfectly, but acceptably. With the recent driver change, it now does not work at all. That's surely a bug in the driver. There is no logic in that sentence at all. You're saying that the driver is broken because it doesn't fix an error in the EEPROM? I am not asking the driver to fix errors in the EEPROM. I'm asking it to send and receive packets, as it has done in the past. maybe you are confusing e100 with eepro100. e100 has done this since it made it into 2.6.4 or so. Auke - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.16.19 0/2] LARTC: trace control for netem
Thanx for your feedback! We will try to fix this. Rainer Stephen Hemminger wrote: > On Wed, 02 Aug 2006 19:21:27 +0200 > Rainer Baumann <[EMAIL PROTECTED]> wrote: > > >> Hi, >> >> We developed an extension to the network emulator netem, that provides >> emulation of long term network properties such as long-range dependence >> and self-similarity of cross-traffic. It is not possible to emulate >> these properties with the statistical tables for the packet delay >> values used by the original netem. >> >> We read the values for the packet delay, drop, loss and corruption from >> a pre-generated trace file. This trace file is obtained by monitoring >> network traffic and writing all actions to a trace file. During the >> emulation the packets get processed according the values in such a trace >> file. Detailed information are available on our >> Webseitehttp://tcn.hypert.net >> >> A new option (trace) has been added to the netem command. If the trace >> option is used, the values for packet delay etc. are read from a trace >> file, afterwards the packets are processed by the normal netem functions. >> The packet action values are readout from the trace file in user space >> and sent to kernel space via procfs. >> >> The evaluation results show similar behavior for our enhancement and the >> original netem with respect to packet delay precision and packet loss at >> high load (e.g. 80'000 packets per second). >> It is possible to add, change or delete multiple netem qdiscs on-the-fly >> (original netem qdiscs and trace qdiscs mixed). >> >> We are looking forward for any comments, feedback and suggestions! >> >> Thanks, >> Rainer >> > > I like the idea and want to get it incorporated. > > Major things that need fixing: > * Don't extend size of tc_netem_qopt instead use a new netlink > payload. > + add type to TCA_NETEM_ enum > + new structure containing the payload > This allows for binary compatiablity. > > * Don't use proc for a interface to netem features. Use netlink. > Either add a new command (or option) to the iproute2 commands > to handle flow table, or add a new payload. > > > Minor stuff: > * the bzero macro in netem is a BSDism, just use memset > * bad indentation and style issues. > * minor whitespace damage in several places in patch > > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
Hi John, Thanks for all your feedback. > > +/* waits for MAC not busy, with timeout. Assumes MacPhyAccessLock has > > + * already been acquired */ > > +static int smsc911x_mac_notbusy(struct smsc911x_data *pdata) > > +{ > > +int i; > > + > > +for (i = 0; i < 40; i++) { > > +if ((smsc911x_reg_read(pdata, MAC_CSR_CMD) > > + & MAC_CSR_CMD_CSR_BUSY_) == 0) { > > +return 1; > > +} > > +} > > +SMSC_WARNING("Timed out waiting for MAC not BUSY. " > > + "MAC_CSR_CMD: 0x%08X", smsc911x_reg_read(pdata, > > + MAC_CSR_CMD)); > > +return 0; > > +} > > How is the length of this timeout controlled? IOW, what prevents > it from being too short when the Omegatron 128 running at 10GHz hits > the market? Are you relying on the MII clock rate? The LAN911x and LAN921x devices uses an SRAM-like bus interface with a minimum cycle time of 45ns, so smsc_reg_read() and smsc_reg_write() are guaranteed to take at least 45ns. The MAC operates a little slower, but the operation shouldn't take longer than 225ns (5 read cycles). The PHY is accessed via slave registers in the MAC (which then relays the command over mii), so its timeout works in the same way. The timeouts are only there to prevent total lockup if the hardware fails, if the part is working it should take nowhere near 40 iterations. > > +/* Auto-detect PHY */ > > +for (address = 0; address <= 31; address++) { > > +pdata->phy_address = address; > > +phyid1 = smsc911x_phy_read(pdata, MII_PHYSID1); > > +phyid2 = smsc911x_phy_read(pdata, MII_PHYSID2); > > +if ((phyid1 != 0xU) || (phyid2 != 0xU)) { > > + SMSC_TRACE("Detected PHY at address = " > > + "0x%02X = %d", address, address); > > +break; > > +} > > +} > > Does this need the magic "for (addr=1; addr <=32; addr++)" trick that > has become idiomatic for PHY discovery in our drivers? I don't understand the question - surely 32 is not a valid PHY address? > > +/* SMSC911x registers and bitfields */ > > +#define RX_DATA_FIFO0x00 > > + > > +#define TX_DATA_FIFO0x20 > > +#define TX_CMD_A_ON_COMP_ 0x8000 > > +#define TX_CMD_A_BUF_END_ALGN_ 0x0300 > > +#define TX_CMD_A_4_BYTE_ALGN_ 0x > > +#define TX_CMD_A_16_BYTE_ALGN_ 0x0100 > > +#define TX_CMD_A_32_BYTE_ALGN_ 0x0200 > > +#define TX_CMD_A_DATA_OFFSET_ 0x001F > > +#define TX_CMD_A_FIRST_SEG_ 0x2000 > > +#define TX_CMD_A_LAST_SEG_ 0x1000 > > +#define TX_CMD_A_BUF_SIZE_ 0x07FF > > +#define TX_CMD_B_PKT_TAG_ 0x > > +#define TX_CMD_B_ADD_CRC_DISABLE_ 0x2000 > > +#define TX_CMD_B_DISABLE_PADDING_ 0x1000 > > +#define TX_CMD_B_PKT_BYTE_LENGTH_ 0x07FF > > Looks like something went haywire w/ your tabbing in this file...? Its just the "+ " in the patch, once applied it looks quite pretty! Best Regards, -- Steve Glendinning SMSC GmbH m: +44 777 933 9124 e: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
Hi Francois, Thanks again for all your feedback. I have implemented most of your suggestions, > > /* Enable phy clocks to the MAC */ > > hwcfg &= (~HW_CFG_PHY_CLK_SEL_); > > hwcfg |= HW_CFG_PHY_CLK_SEL_EXT_PHY_; > > smsc911x_reg_write(hwcfg, pdata, HW_CFG); > > udelay(10); /* Enough time for clocks to restart */ > > (back to my original question that I should have reworded in a different > thread) > > Does the platform guarantees that the register write has actually reached > the real register when the udelay is issued ? I think so, but maybe you can help me check. The LAN911x device is always directly connected to a simple SRAM-like host bus, and smsc911x_reg_write is implemented using readl. Does this implicitly guarantee it to be volatile? > > > > if (!pdata->software_irq_signal) { > > printk(KERN_WARNING "%s: ISR failed signaling test (IRQ %d)\n", > > dev->name, dev->irq); > > return -ENODEV; > > } > > SMSC_TRACE("IRQ handler passed test using IRQ %d", dev->irq); > > > > printk(KERN_INFO "%s: SMSC911x/921x identified at %#08lx, IRQ: %d\n", > > dev->name, (unsigned long)pdata->ioaddr, dev->irq); > > > > spin_lock_irqsave(&pdata->phy_lock, flags); > > flags useless: ->open() is issued in irq-enabled context. How do you mean? I thought an irq-enabled context meant i DO have to disable irqs? > > unsigned long flags; > > > > SMSC_TRACE("ioctl cmd 0x%x", cmd); > > switch (cmd) { > > case SIOCGMIIPHY: > > case SIOCDEVPRIVATE: > > The SIOCDEVPRIVATE can/should be removed. I have removed these, they were only in as a quick fix because mii-tool here sends SIOCDEVPRIVATE instead of SIOCGMIIPHY. I fixed my copy of mii-tool instead :o) Best Regards, -- Steve Glendinning SMSC GmbH m: +44 777 933 9124 e: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
On Wed, Aug 02, 2006 at 08:23:40PM +0100, [EMAIL PROTECTED] wrote: > > Does this need the magic "for (addr=1; addr <=32; addr++)" trick that > > has become idiomatic for PHY discovery in our drivers? > > I don't understand the question - surely 32 is not a valid PHY address? That's why it is magic! :-) The idea is to probe PHY addr 0 last in the series. Apparently some PHYs don't like seeing addr 0 or somesuch, so you try it last to avoid screwing them up. It may well be folklore and legend at this point. Still, you will find several examples in the various drivers. The sundance driver is one example. John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/1]SNMPv2 "ipv6IfStatsInHdrErrors" counter error
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Mon, 31 Jul 2006 18:36:25 +0900 (JST) > Hello. > > Next time, please put your "Signed-off-by" line before the patch. > Thank you. > > In article <[EMAIL PROTECTED]> (at Tue, 01 Aug 2006 05:45:33 -0400), weidong > <[EMAIL PROTECTED]> says: > > > signed-off-by:Wei Dong <[EMAIL PROTECTED]> > Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Patch applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: means to artificially alter the bandwidth of a system
>Hi, > >For research purposes we are considering to develop a program to alter >the bandwidth of a system via the software, so instance: a machine has >100 MB/s and we change it to 1MB/s. > >Does something like this already exist? Or is there a way to do this >without creating a program/kernel module Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc) >Any help will be highly appreciated! > >Irfan Habib HGN You may also want to look at Netem http://linux-net.osdl.org/index.php/Netem if you want to play with delay, loss as well. The examples there are good but I can send scripts for you as well if you wish. Ian -- Ian McDonald Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group Department of Computer Science University of Waikato New Zealand - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: means to artificially alter the bandwidth of a system
* Irfan Habib | 2006-08-02 23:04:41 [+0500]: >Hi, > >For research purposes we are considering to develop a program to alter >the bandwidth of a system via the software, so instance: a machine has >100 MB/s and we change it to 1MB/s. > >Does something like this already exist? Or is there a way to do this >without creating a program/kernel module Of course: see http://linux-net.osdl.org/index.php/Iproute2 (especially tc) >Any help will be highly appreciated! > >Irfan Habib HGN - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/1]SNMPv2 "ipv6IfStatsOutFragCreates" counter error
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Mon, 31 Jul 2006 18:43:14 +0900 (JST) > The patch seems sane to me. > > In article <[EMAIL PROTECTED]> (at Tue, 01 Aug 2006 05:45:39 -0400), weidong > <[EMAIL PROTECTED]> says: > > > signed-off-by: Wei Dong <[EMAIL PROTECTED]> > Acked-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Also applied, thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH Fix bonding active-backup behavior for VLAN interfaces
From: Christophe Devriese <[EMAIL PROTECTED]> Date: Mon, 31 Jul 2006 10:15:40 +0200 Thanks for the detailed explanation. > If you bond 2 vlan subinterfaces, the patch is not necessary at all. In that > case also the source device will be changed from eth0. to bond. So > that's correct behavior no ? > > In the second case, you create vlan subifs on a bonding device, vlan > subinterfaces will be created on the slave interfaces. In that case the vlan > code will reassign the skb->dev node, and because skb_bond needs to know the > actual input device in order to make an informed drop decision before passing > this code (skb active-backup mode needs to drop packets from the backup slave > interface, if you don't do that you get big problems with broadcasts). > > The same struct vlan_group is assigned to all slave devices and so the only > vlan subinterfaces that exist in this case are the bond. > subinterfaces, and the vlan path for both slaves will assign the > bond. interface to skb->dev, thereby erasing the information about > where the packet came from. Assuming it is correct to do the skb_bond() here in the VLAN hwaccel RX path, then there is still one piece missing from what I can see. Notice that in the netif_receive_skb() path, the return value from skb_bond() is used as the third argument to the deliver_skb() routine and friends which in turn gets passed to the packet_type functions. Bonding, in particular, makes use of this third argument, see: bond_3ad_lacpdu_recv() rlb_arp_recv() So if the new "orig_dev" you are computing in the VLAN hwaccel RX path is the correct one, somehow this has to propagate down to the third argument of the packet type ->func() invocations, right? Finally, I'm still a little stumped about why this change is necessary still, to be honest. When you configure the bond, the slaves should be the VLAN devices as far as I can tell. Therefore it should be the "vlan_device->masster" that we are interested in not the top-level "dev->master". If the ethernet is on a VLAN, and the administrator configures the underlying ethernet device as the slaves of the bond, this to me seems like a misconfiguration rather than something we should put hacks in to support. The fact that you do not propagate the "orig_dev" returned from skb_bond() down to the packet type functions seems to support this. >From my perspective, this looks like a hack for a bonding misconfiguration. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: neigh_lookup lockdep bug.
From: Arjan van de Ven <[EMAIL PROTECTED]> Date: Wed, 02 Aug 2006 04:26:49 +0200 > fwiw the patch is at > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc2/2.6.18-rc2-mm1/broken-out/lockdep-split-the-skb_queue_head_init-lock-class.patch > and a followup cleanup at > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc2/2.6.18-rc2-mm1/broken-out/lockdep-split-the-skb_queue_head_init-lock-class-tidy.patch Both applied, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space
David Miller <[EMAIL PROTECTED]> writes: > I think Alexey is saying that setting ->hard_header() creates an > agreement between the device and IP that IP will make sure > that dev->hard_header_len bytes are available in the header > area. I think I now understand it: hard_header_len is guaranteed while calling hard_header() (because the check is done just before the call) but not elsewhere, particularly not in hard_start_xmit(). dev->hard_header being NULL or not doesn't change anything. I think hard_start_xmit() can be called without calling hard_header() first, for example with things like PF_PACKET. This way the hard_header_len check is skipped. It looks it needs to be audited. I think either: a) dev->hard_header_len must be eliminated completely and skb allocations have to assume some sane amount of header space (32 bytes or so), or b) all dev->hard_header() and dev->hard_start_xmit() calling paths must be checked to contain at least dev->hard_header_len header space, or c) dev->hard_header_len must be clearly marked as advisory, no core code changes (all drivers must be audited and fixed). d) another idea? What do you prefer? a) would IMHO the best code quality, reallocations where they are needed and no strange semantics which can be easily broken by accident (nobody would count on nonexistent hard_header_len either). Fast path would not need to reallocate skb data, though the check would still be in place. We could test it by reducing "default" header space to zero, possibly a "hacking" kernel config option may be useful? b) my patch is the starting point but I'm not sure it's practical. c) IMHO the worst by all means. I think I could do a) in a couple of weeks so that it could go into 2.6.19. Back to my patch. I understand the part about ip_output() is ok for 2.6.18, isn't it? What about the psched_mtu() thing? While it's not kernel panic, I think we should fix it. I'm not sure it should return dev->mtu + dev->hard_header_len or just dev->mtu, though. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMSC LAN911x and LAN921x vendor driver
[EMAIL PROTECTED] <[EMAIL PROTECTED]> : > Mezigues : [...] > > Does the platform guarantees that the register write has actually > reached > > the real register when the udelay is issued ? > > I think so, but maybe you can help me check. The LAN911x device is always > directly connected to a simple SRAM-like host bus, and smsc911x_reg_write > is implemented using readl. Does this implicitly guarantee it to be > volatile? (s/readl/writel/) It's probably safe if it's non-cached SRAM like but I strongly suggest to read Documentation/DocBook/deviceiobook.tmpl. It explains better than me. [...] > > > spin_lock_irqsave(&pdata->phy_lock, flags); > > > > flags useless: ->open() is issued in irq-enabled context. > > How do you mean? I thought an irq-enabled context meant i DO have to > disable irqs? Yes but you can disable unconditionally and later enable unconditionnally because you know that the irq are _always_ enabled before the lock (in ->open()). 'flags' saves the state. If the state is constant, you can either: - s/spin_{lock_irqsave/unlock_irqrestore}/spin_{lock/unlock}_irq/ (irq always on before the lock) or: - s/spin_{lock_irqsave/unlock_irqrestore}/spin_{lock/unlock}/ (irq always off before the lock) -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Mobile IPv6 introduction
From: Masahide NAKAMURA <[EMAIL PROTECTED]> Date: Wed, 02 Aug 2006 22:03:16 +0900 > If there is much requirement to add new type number without any > modification of kernel code at all I would support ICMPv6 filter > approach, too. There is no such requirement, please just continue to prepare your current patches for inclusion. There is a limit to how much nit-picking we can do for such a large body of work, and we should thus take evolutionary approach to this work. We can make all kinds of refinements later to improve the implementation. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NET]: Fix ___pskb_trim when entire frag_list needs dropping
From: "Marco Berizzi" <[EMAIL PROTECTED]> Date: Wed, 02 Aug 2006 17:01:17 +0200 > The problem is fixed now. I have applied this > patch to 2.6.18-rc2 > Many thanks Herbert for all the time spent > to debug this problem. Thank you for testing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] DECnet Fix for DECnet routing bug
From: Patrick Caulfield <[EMAIL PROTECTED]> Date: Wed, 02 Aug 2006 15:19:24 +0100 > This patch fixes a bug in the DECnet routing code where we were selecting > a loopback device in preference to an outward facing device even when > the destination was known non-local. This patch should fix the problem. > > Signed-off-by: Patrick Caulfield <[EMAIL PROTECTED]> > Signed-off-by: Steven Whitehouse <[EMAIL PROTECTED]> Applied, thanks Patrick. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Please pull 'upstream-fixes' branch of wireless-2.6
The following changes since commit 49b1e3ea19b1c95c2f012b8331ffb3b169e4c042: Linus Torvalds: Merge branch 'merge' of git://git.kernel.org/.../paulus/powerpc are found in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-fixes Daniel Drake: zd1211rw: Pass more management frame types up to host zd1211rw: Fix software encryption/decryption zd1211rw: Remove bogus assert Ulrich Kunitz: zd1211rw: Fixes radiotap header zd1211rw: Fixed endianess issue with length info tag detection zd1211rw: Packet filter fix for managed (STA) mode drivers/net/wireless/zd1211rw/zd_chip.c |4 ++-- drivers/net/wireless/zd1211rw/zd_chip.h | 10 ++ drivers/net/wireless/zd1211rw/zd_mac.c | 16 drivers/net/wireless/zd1211rw/zd_usb.c |7 +++ 4 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/net/wireless/zd1211rw/zd_chip.c b/drivers/net/wireless/zd1211rw/zd_chip.c index efc9c4b..da9d06b 100644 --- a/drivers/net/wireless/zd1211rw/zd_chip.c +++ b/drivers/net/wireless/zd1211rw/zd_chip.c @@ -797,7 +797,7 @@ static int zd1211_hw_init_hmac(struct zd { CR_ADDA_MBIAS_WARMTIME, 0x3808 }, { CR_ZD1211_RETRY_MAX, 0x2 }, { CR_SNIFFER_ON,0 }, - { CR_RX_FILTER, AP_RX_FILTER }, + { CR_RX_FILTER, STA_RX_FILTER }, { CR_GROUP_HASH_P1, 0x00 }, { CR_GROUP_HASH_P2, 0x8000 }, { CR_REG1, 0xa4 }, @@ -844,7 +844,7 @@ static int zd1211b_hw_init_hmac(struct z { CR_ZD1211B_AIFS_CTL2, 0x008C003C }, { CR_ZD1211B_TXOP, 0x01800824 }, { CR_SNIFFER_ON,0 }, - { CR_RX_FILTER, AP_RX_FILTER }, + { CR_RX_FILTER, STA_RX_FILTER }, { CR_GROUP_HASH_P1, 0x00 }, { CR_GROUP_HASH_P2, 0x8000 }, { CR_REG1, 0xa4 }, diff --git a/drivers/net/wireless/zd1211rw/zd_chip.h b/drivers/net/wireless/zd1211rw/zd_chip.h index 8051210..069d2b4 100644 --- a/drivers/net/wireless/zd1211rw/zd_chip.h +++ b/drivers/net/wireless/zd1211rw/zd_chip.h @@ -461,10 +461,15 @@ #define CR_UNDERRUN_CNT CTL_REG(0x0688 #define CR_RX_FILTER CTL_REG(0x068c) #define RX_FILTER_ASSOC_RESPONSE 0x0002 +#define RX_FILTER_REASSOC_RESPONSE 0x0008 #define RX_FILTER_PROBE_RESPONSE 0x0020 #define RX_FILTER_BEACON 0x0100 +#define RX_FILTER_DISASSOC 0x0400 #define RX_FILTER_AUTH 0x0800 -/* Sniff modus sets filter to 0xf */ +#define AP_RX_FILTER 0x0400feff +#define STA_RX_FILTER 0x + +/* Monitor mode sets filter to 0xf */ #define CR_ACK_TIMEOUT_EXT CTL_REG(0x0690) #define CR_BCN_FIFO_SEMAPHORE CTL_REG(0x0694) @@ -546,9 +551,6 @@ #define CR_ZD1211B_AIFS_CTL2CTL_REG(0x #define CR_ZD1211B_TXOPCTL_REG(0x0b20) #define CR_ZD1211B_RETRY_MAX CTL_REG(0x0b28) -#define AP_RX_FILTER 0x0400feff -#define STA_RX_FILTER 0x - #define CWIN_SIZE 0x007f043f diff --git a/drivers/net/wireless/zd1211rw/zd_mac.c b/drivers/net/wireless/zd1211rw/zd_mac.c index 3bdc54d..d6f3e02 100644 --- a/drivers/net/wireless/zd1211rw/zd_mac.c +++ b/drivers/net/wireless/zd1211rw/zd_mac.c @@ -108,7 +108,9 @@ int zd_mac_init_hw(struct zd_mac *mac, u if (r) goto disable_int; - r = zd_set_encryption_type(chip, NO_WEP); + /* We must inform the device that we are doing encryption/decryption in +* software at the moment. */ + r = zd_set_encryption_type(chip, ENC_SNIFFER); if (r) goto disable_int; @@ -136,10 +138,8 @@ static int reset_mode(struct zd_mac *mac { struct ieee80211_device *ieee = zd_mac_to_ieee80211(mac); struct zd_ioreq32 ioreqs[3] = { - { CR_RX_FILTER, RX_FILTER_BEACON|RX_FILTER_PROBE_RESPONSE| - RX_FILTER_AUTH|RX_FILTER_ASSOC_RESPONSE }, + { CR_RX_FILTER, STA_RX_FILTER }, { CR_SNIFFER_ON, 0U }, - { CR_ENCRYPTION_TYPE, NO_WEP }, }; if (ieee->iw_mode == IW_MODE_MONITOR) { @@ -713,10 +713,10 @@ static int zd_mac_tx(struct zd_mac *mac, struct zd_rt_hdr { struct ieee80211_radiotap_header rt_hdr; u8 rt_flags; + u8 rt_rate; u16 rt_channel; u16 rt_chbitmask; - u16 rt_rate; -}; +} __attribute__((packed)); static void fill_rt_header(void *buffer, struct zd_mac *mac,
Re: [PATCH 2/6] zd1211rw: Pass more management frame types up to host
On Tue, Aug 01, 2006 at 11:43:31PM +0200, Ulrich Kunitz wrote: > From: Daniel Drake <[EMAIL PROTECTED]> > > We'll be needing these at some point... This one doesn't really seem like a fix. But since the later fixes seem to depend on it, I guess it makes sense to take it. I just didn't want you to think I wasn't looking... :-) John -- John W. Linville [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/23] [PATCH] [XFRM] POLICY: sub policy support.
From: James Morris <[EMAIL PROTECTED]> Date: Wed, 2 Aug 2006 12:04:31 -0400 (EDT) > Why can't IPSec & MIP transforms be bundled on the same policy? At the first year of netconf, Yoshifuji went into detail as to why the IPSEC and MIP transformations had to live seperately. It's partly a side effect of different userland daemons controlling IPSEC vs. MIP configuration. > Or, perhaps a different approach is needed, where the disposition of a > policy can be to re-submit a packet for another policy match after the > current bundle has been traversed (something like NF_REPEAT). We can consider an approach like this as a future refinement. It would allow arbitrary nesting of sub-transforms, for sure, just like netfilter's NF_REPEAT. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 driver problem with RTL8110SB chip (on iop3xx ARM board)
Martin Michlmayr <[EMAIL PROTECTED]> : [...] > Sorry, to pester you, but I was wondering if you had a chance to look > at the register dump. No problem. It would have been easier with a decoded output of the register dump though (see Lennert dump below). Lines prefixed by '>' come from Realtek's driver. I have outlined the differences which seem relevant to me. 0x00: MAC Address 00:14:fd:10:27:74 > 000x00 > 010x14 > 020xfd > 030x10 > 040x27 > 050x74 > 060x00 > 070x00 0x08: Multicast Address Filter 0x 0x > 080x00 > 090x00 > 100x00 > 110x80 ^^ /me scratches head > 120x00 > 130x00 > 140x00 > 150x00 0x10: Dump Tally Counter Command 0xd7bbfec0 0xfb74b6fb > 160xc0 > 170xfe > 180xbb > 190xdf > 200x7b > 210xb6 > 220x74 > 230xff > 240x00 > 250x00 > 260x00 > 270x00 > 280x00 > 290x00 > 300x00 > 310x00 0x20: Tx Normal Priority Ring Addr 0x 0x > 320x00 > 330x80 > 340x20 > 350x07 > 360x00 > 370x00 > 380x00 > 390x00 0x28: Tx High Priority Ring Addr 0xfffc3f00 0xfef7f6ad > 400x00 > 410x3f > 420xfc > 430xff > 440xac > 450xf7 > 460xf7 > 470xfe 0x30: Flash memory read/write 0x > 480x00 > 490x00 > 500x00 > 510x00 0x34: Early Rx Byte Count 0 > 520x00 > 530x00 0x36: Early Rx Status 0x00 > 540x08 0x37: Command 0x00 Rx off, Tx off > 550x0c ^^ -> CmdRxEnb | CmdTxEnb If ChipCmd is not set, the driver will hardly work. Lennert, your dump was taken while the kernel driver was down, right ? > 560x00 > 570x00 > 580x00 > 590x00 0x3C: Interrupt Mask 0x > 600x7f ^^ > 610x00 RxFIFOOver = 0x40, LinkChg= 0x20, RxOverflow = 0x10, TxErr = 0x08, TxOK = 0x04, RxErr = 0x02, RxOK = 0x01, 0x is typical of rtl8169_irq_mask_and_ack(). The device seems down. 0x3E: Interrupt Status0x > 620x00 > 630x00 0x40: Tx Configuration0x1000 > 640x00 > 650x07 ^^ Realtek's driver uses an unlimited DMA burst (7) instead of 1024 (6, see TX_DMA_BURST). Probably harmless. > 660x00 > 670x13 0x44: Rx Configuration0x0002 > 680x0e ^^ These bit should be set by the kernel driver through rtl8169_set_rx_mode() when the device is brought up The exact same value would require to replace (in rtl8169_set_rx_mode): rx_mode = AcceptBroadcast | AcceptMyPhys; by: rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys; > 690xe7 > 700x00 > 710x00 0x48: Timer count 0x95887845 > 720x24 > 730x57 > 740xaf > 750x31 0x4C: Missed packet counter 0x00 > 760x00 > 770x00 > 780x00 > 790x00 0x50: EEPROM Command0x00 > 800x00 0x51: Config 0 0x04 > 810x04 0x52: Config 1 0x1f > 820x1f 0x53: Config 2 0x10 > 830x10 0x54: Config 3 0x20 > 840x20 0x55: Config 4 0x80 > 850x80 0x56: Config 5 0x01 > 860x03 > 870x00 0x58: Timer interrupt 0x > 880x00 > 890x00 > 900x00 > 910x00 0x5C: Multiple Interrupt Select 0x > 920x00 > 930x00 > 940x10 > 950x00 0x60: PHY access 0x80001000 > 960x6d > 970x79 > 980x01 > 990x80 0x64: TBI control and status 0x > 100 0x00 > 101 0x00 > 102 0x00 > 103 0x00 0x68: TBI Autonegotiation advertisement (ANAR)0x > 104 0x00 > 105 0x00 0x6A: TBI Link partner ability (LPAR) 0x > 106 0x00 > 107 0x00 0x6C: PHY status0x6b > 108 0x6b > 109 0x00 > 110 0x00 > 111 0x00 > 112 0xff > 113 0xfd > 114 0xfb > 115 0xff > 116 0xfc > 117 0x03 > 118 0x00 > 119 0x00 > 120 0x00 > 121 0xff > 122 0xff > 123 0xff > 124 0x00 > 125 0x00 > 126 0x00 > 127 0x00 > 128 0x00 > 129 0x00 > 130 0x00 > 131 0x00 0x84: PM wakeup frame 00xbcfeef7f 0xde9f > 132 0x7f > 133 0xef > 134 0xfe > 135 0xfc > 136 0x9f > 137 0xfe > 138 0xff > 139 0xdf 0x8C: PM wakeup frame 10xabf7cf3f 0xfffbdbbf > 140 0x3f > 141 0xcf > 142 0xf7 > 143 0xab > 144 0xb
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
Catherine you really must begin to remember to add proper "Signed-off-by: " lines to your patch submissions. I'll sign off on this bug fix, but in the future I will not do so for you any more as you've been told at least 3 or 4 times about this. Thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
David, I will remember this in the future, I promise. thank you, Catherine David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 05:11:03 PM: > > Catherine you really must begin to remember to add > proper "Signed-off-by: " lines to your patch submissions. > > I'll sign off on this bug fix, but in the future I will not > do so for you any more as you've been told at least 3 or 4 > times about this. > > Thank you. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
From: Xiaolan Zhang <[EMAIL PROTECTED]> Date: Wed, 2 Aug 2006 17:14:31 -0400 > I will remember this in the future, I promise. Can you also remember to test your patches with CONFIG_SECURITY disabled, as you also promised in the past several times?!??!?! In file included from init/main.c:34: include/linux/security.h: In function $,1rx(Bsecurity_release_secctx$,1ry(B: include/linux/security.h:2757: warning: $,1rx(Breturn$,1ry(B with a value, in function returning void I'll fix this one up, but this is getting rediculious. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
orinoco driver causes *lots* of lockdep spew
Wow. Nearly 400 lines of debug spew, from a simple 'ifup eth1'. Dave ADDRCONF(NETDEV_UP): eth1: link is not ready eth1: New link status: Disconnected (0002) == [ INFO: hard-safe -> hard-unsafe lock order detected ] -- events/0/5 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (af_callback_keys + sk->sk_family){-.--}, at: [] sock_def_readable+0x19/0x6f and this task is already holding: (&priv->lock){++..}, at: [] orinoco_send_wevents+0x28/0x8b [orinoco] which would create a new lock dependency: (&priv->lock){++..} -> (af_callback_keys + sk->sk_family){-.--} but this new dependency connects a hard-irq-safe lock: (&priv->lock){++..} ... which became hard-irq-safe at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] orinoco_interrupt+0x4d/0xf49 [orinoco] [] handle_IRQ_event+0x2b/0x64 [] __do_IRQ+0xae/0x114 [] do_IRQ+0xf7/0x107 [] common_interrupt+0x64/0x65 to a hard-irq-unsafe lock: (af_callback_keys + sk->sk_family){-.--} ... which became hard-irq-unsafe at: ... [] lock_acquire+0x4a/0x69 [] _write_lock_bh+0x29/0x36 [] netlink_release+0x139/0x2ca [] sock_release+0x19/0x9b [] sock_close+0x33/0x3a [] __fput+0xc6/0x1a8 [] fput+0x13/0x16 [] filp_close+0x64/0x70 [] sys_close+0x93/0xb0 [] system_call+0x7d/0x83 other info that might help us debug this: 1 lock held by events/0/5: #0: (&priv->lock){++..}, at: [] orinoco_send_wevents+0x28/0x8b [orinoco] the hard-irq-safe lock's dependencies: -> (&priv->lock){++..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irq+0x2a/0x38 [] orinoco_init+0x934/0x966 [orinoco] [] register_netdevice+0xe6/0x375 [] register_netdev+0x5a/0x69 [] orinoco_cs_probe+0x3d7/0x475 [orinoco_cs] [] pcmcia_device_probe+0x7f/0x124 [] driver_probe_device+0x5b/0xb1 [] __driver_attach+0x88/0xdb [] bus_for_each_dev+0x48/0x7a [] driver_attach+0x1b/0x1e [] bus_add_driver+0x88/0x138 [] driver_register+0x8e/0x93 [] pcmcia_register_driver+0xd0/0xda [] 0x880a9024 [] sys_init_module+0x16f2/0x18b7 [] system_call+0x7d/0x83 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] orinoco_interrupt+0x4d/0xf49 [orinoco] [] handle_IRQ_event+0x2b/0x64 [] __do_IRQ+0xae/0x114 [] do_IRQ+0xf7/0x107 [] common_interrupt+0x64/0x65 in-softirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] orinoco_interrupt+0x4d/0xf49 [orinoco] [] handle_IRQ_event+0x2b/0x64 [] __do_IRQ+0xae/0x114 [] do_IRQ+0xf7/0x107 [] common_interrupt+0x64/0x65 [] scheduler_tick+0xc1/0x362 [] call_softirq+0x1d/0x28 [] irq_exit+0x56/0x59 [] smp_apic_timer_interrupt+0x5c/0x62 [] apic_timer_interrupt+0x69/0x70 } ... key at: [] __key.22351+0x0/0x27fa [orinoco] -> (&cwq->lock){++..} ops: 0 { initial-use at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] __queue_work+0x17/0x5e [] queue_work+0x4d/0x57 [] call_usermodehelper_keys+0x119/0x137 [] kobject_uevent+0x3e5/0x42e [] class_device_add+0x314/0x471 [] class_device_register+0x18/0x1d [] class_device_create+0xf7/0x129 [] vtconsole_class_init+0x74/0xbb [] init+0x1fc/0x3cd [] child_rip+0x7/0x12 in-hardirq-W at: [] lock_acquire+0x4a/0x69 [] _spin_lock_irqsave+0x2b/0x3c [] __queue_work+0x17/0x5e [] queue_work+0x4d/0x57 [] kblockd_schedule_work+0x15/0x18 [] __cfq_slice_expired+0x63/0xe6 [] cfq_completed_request+0x116/0x154 [] elv_completed_request+0x38/0x85 [] __blk_put_request+0x35/0x9f [] end_that_request_la
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
David, I did test it with CONFIG_SECURITY disabled, but did not catch the warning -- I verified that the build completes with a valid vmlinux image. There are many warnings (device drivers, and others) during the build and I didn't do a grep to find which one is specific to my patch. Next time I'll do a diff on warnings too. thanks, Catherine David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 05:32:04 PM: > > Can you also remember to test your patches with CONFIG_SECURITY > disabled, as you also promised in the past several times?!??!?! > > In file included from init/main.c:34: > include/linux/security.h: In function rxsecurity_release_secctxry: > include/linux/security.h:2757: warning: rxreturnry with a value, in > function returning void > > I'll fix this one up, but this is getting rediculious. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
From: Xiaolan Zhang <[EMAIL PROTECTED]> Date: Wed, 2 Aug 2006 18:18:07 -0400 > I did test it with CONFIG_SECURITY disabled, but did not catch the warning > -- I verified that the build completes with a valid vmlinux image. There > are many warnings (device drivers, and others) during the build and I > didn't do a grep to find which one is specific to my patch. Next time > I'll do a diff on warnings too. Some platforms build their platform code under arch/${ARCH}/foo with -Werror added to CFLAGS, sparc64 is one such platform. So the build did break for me. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] kernel memory leak fix for af_unix datagram getpeersec patch
I see. The build was fine under x86 and there are so many warnings that a -Werror probably won't work for me. thanks, Catherine David Miller <[EMAIL PROTECTED]> wrote on 08/02/2006 06:19:06 PM: > From: Xiaolan Zhang <[EMAIL PROTECTED]> > Date: Wed, 2 Aug 2006 18:18:07 -0400 > > > I did test it with CONFIG_SECURITY disabled, but did not catch the warning > > -- I verified that the build completes with a valid vmlinux image. There > > are many warnings (device drivers, and others) during the build and I > > didn't do a grep to find which one is specific to my patch. Next time > > I'll do a diff on warnings too. > > Some platforms build their platform code under arch/${ARCH}/foo with > -Werror added to CFLAGS, sparc64 is one such platform. So the build > did break for me. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/2][RFC] iWARP Core Support
Roland, Here is the iWARP Core Support patchset merged to your latest for-2.6.19 branch. It has gone through 3 reviews on lklm and netdev a while ago, and I think its ready to be pulled in. Steve. This patchset defines the modifications to the Linux infiniband subsystem to support iWARP devices. The patchset consists of 2 patches: 1 - New iWARP CM implementation. 2 - Core changes to support iWARP. Signed-off-by: Tom Tucker <[EMAIL PROTECTED]> Signed-off-by: Steve Wise <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/2] iWARP Core Changes.
This patch contains modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP. V2 Review updates: V1 Review updates: - copy_addr() -> rdma_copy_addr() - dst_dev_addr param in rdma_copy_addr to const. - various spacing nits with recasting - include linux/inetdevice.h to get ip_dev_find() prototype. - dev_put() after successful ip_dev_find() --- drivers/infiniband/core/Makefile |4 drivers/infiniband/core/addr.c | 19 + drivers/infiniband/core/cache.c |8 - drivers/infiniband/core/cm.c |3 drivers/infiniband/core/cma.c| 356 +++--- drivers/infiniband/core/device.c |6 drivers/infiniband/core/mad.c| 11 + drivers/infiniband/core/sa_query.c |5 drivers/infiniband/core/smi.c| 18 + drivers/infiniband/core/sysfs.c | 18 + drivers/infiniband/core/ucm.c|5 drivers/infiniband/core/user_mad.c |9 - drivers/infiniband/hw/ipath/ipath_verbs.c|2 drivers/infiniband/hw/mthca/mthca_provider.c |2 drivers/infiniband/ulp/ipoib/ipoib_main.c|8 + drivers/infiniband/ulp/srp/ib_srp.c |2 include/rdma/ib_addr.h | 16 + include/rdma/ib_verbs.h | 39 ++- 18 files changed, 438 insertions(+), 93 deletions(-) diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index 68e73ec..163d991 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -1,7 +1,7 @@ infiniband-$(CONFIG_INFINIBAND_ADDR_TRANS) := ib_addr.o rdma_cm.o obj-$(CONFIG_INFINIBAND) +=ib_core.o ib_mad.o ib_sa.o \ - ib_cm.o $(infiniband-y) + ib_cm.o iw_cm.o $(infiniband-y) obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=ib_uverbs.o ib_ucm.o @@ -14,6 +14,8 @@ ib_sa-y :=sa_query.o ib_cm-y := cm.o +iw_cm-y := iwcm.o + rdma_cm-y := cma.o ib_addr-y := addr.o diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index d294bbc..83f84ef 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -32,6 +32,7 @@ #include #include #include #include +#include #include #include #include @@ -60,12 +61,15 @@ static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *addr_wq; -static int copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, -unsigned char *dst_dev_addr) +int rdma_copy_addr(struct rdma_dev_addr *dev_addr, struct net_device *dev, +const unsigned char *dst_dev_addr) { switch (dev->type) { case ARPHRD_INFINIBAND: - dev_addr->dev_type = IB_NODE_CA; + dev_addr->dev_type = RDMA_NODE_IB_CA; + break; + case ARPHRD_ETHER: + dev_addr->dev_type = RDMA_NODE_RNIC; break; default: return -EADDRNOTAVAIL; @@ -77,6 +81,7 @@ static int copy_addr(struct rdma_dev_add memcpy(dev_addr->dst_dev_addr, dst_dev_addr, MAX_ADDR_LEN); return 0; } +EXPORT_SYMBOL(rdma_copy_addr); int rdma_translate_ip(struct sockaddr *addr, struct rdma_dev_addr *dev_addr) { @@ -88,7 +93,7 @@ int rdma_translate_ip(struct sockaddr *a if (!dev) return -EADDRNOTAVAIL; - ret = copy_addr(dev_addr, dev, NULL); + ret = rdma_copy_addr(dev_addr, dev, NULL); dev_put(dev); return ret; } @@ -160,7 +165,7 @@ static int addr_resolve_remote(struct so /* If the device does ARP internally, return 'done' */ if (rt->idev->dev->flags & IFF_NOARP) { - copy_addr(addr, rt->idev->dev, NULL); + rdma_copy_addr(addr, rt->idev->dev, NULL); goto put; } @@ -180,7 +185,7 @@ static int addr_resolve_remote(struct so src_in->sin_addr.s_addr = rt->rt_src; } - ret = copy_addr(addr, neigh->dev, neigh->ha); + ret = rdma_copy_addr(addr, neigh->dev, neigh->ha); release: neigh_release(neigh); put: @@ -244,7 +249,7 @@ static int addr_resolve_local(struct soc if (ZERONET(src_ip)) { src_in->sin_family = dst_in->sin_family; src_in->sin_addr.s_addr = dst_ip; - ret = copy_addr(addr, dev, dev->dev_addr); + ret = rdma_copy_addr(addr, dev, dev->dev_addr); } else if (LOOPBACK(src_ip)) { ret = rdma_translate_ip((struct sockaddr *)dst_in, addr); if (!ret) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache
[PATCH v4 1/2] iWARP Connection Manager.
This patch provides the new files implementing the iWARP Connection Manager. This module is a logical instance of the xx_cm where xx is the transport type (ib or iw). The symbols exported are used by the transport independent rdma_cm module, and are available also for transport dependent ULPs. V2 Review Changes: - BUG_ON(1) -> BUG() - Don't typecast whan assigning between something* and void* - pre-allocate iwcm_work objects to avoid allocating them in the interrupt context. - copy private data on connect request and connect reply events. - #if !defined() -> #ifndef V1 Review Changes: - sizeof -> sizeof() - removed printks - removed TT debug code - cleaned up lock/unlock around switch statements. - waitqueue -> completion for destroy path. --- drivers/infiniband/core/iwcm.c | 1008 include/rdma/iw_cm.h | 255 ++ include/rdma/iw_cm_private.h | 63 +++ 3 files changed, 1326 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c new file mode 100644 index 000..fe43c00 --- /dev/null +++ b/drivers/infiniband/core/iwcm.c @@ -0,0 +1,1008 @@ +/* + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2004 Topspin Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Voltaire Corporation. All rights reserved. + * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved. + * Copyright (c) 2005 Network Appliance, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_AUTHOR("Tom Tucker"); +MODULE_DESCRIPTION("iWARP CM"); +MODULE_LICENSE("Dual BSD/GPL"); + +static struct workqueue_struct *iwcm_wq; +struct iwcm_work { + struct work_struct work; + struct iwcm_id_private *cm_id; + struct list_head list; + struct iw_cm_event event; + struct list_head free_list; +}; + +/* + * The following services provide a mechanism for pre-allocating iwcm_work + * elements. The design pre-allocates them based on the cm_id type: + * LISTENING IDS: Get enough elements preallocated to handle the + * listen backlog. + * ACTIVE IDS: 4: CONNECT_REPLY, ESTABLISHED, DISCONNECT, CLOSE + * PASSIVE IDS:3: ESTABLISHED, DISCONNECT, CLOSE + * + * Allocating them in connect and listen avoids having to deal + * with allocation failures on the event upcall from the provider (which + * is called in the interrupt context). + * + * One exception is when creating the cm_id for incoming connection requests. + * There are two cases: + * 1) in the event upcall, cm_event_handler(), for a listening cm_id. If + *the backlog is exceeded, then no more connection request events will + *be processed. cm_event_handler() returns -ENOMEM in this case. Its up + *to the provider to reject the connectino request. + * 2) in the connection request workqueue handler, cm_conn_req_handler(). + *If work elements cannot be allocated for the new connect request cm_id, + *then IWCM will call the provider reject method. This is ok since + *cm_conn_req_handler() runs in the workqueue thread context. + */ + +static struct iwcm_work *get_work(struct iwcm_id_private *cm_id_priv) +{ + struct iwcm_work *work; + + if (list_empty(&cm_id_priv->work_free_list)) + return NULL; + work = list_entry(cm_id_priv->work_free_list
[PATCH 3/6] htb: if HTB_HYSTERIS cleanup
Change the conditional compilation around HTB_HYSTERSIS since code was splitting mid expression. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 27 +-- 1 files changed, 17 insertions(+), 10 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index c0b80b7..d8c1a6b 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -483,6 +483,20 @@ static void htb_deactivate_prios(struct htb_remove_class_from_row(q,cl,mask); } +#if HTB_HYSTERESIS +static inline long htb_lowater(const struct htb_class *cl) +{ + return cl->cmode != HTB_CANT_SEND ? -cl->cbuffer : 0; +} +static inline long htb_hiwater(const struct htb_class *cl) +{ + return cl->cmode == HTB_CAN_SEND ? -cl->buffer : 0; +} +#else +#define htb_lowater(cl)(0) +#define htb_hiwater(cl)(0) +#endif + /** * htb_class_mode - computes and returns current class mode * @@ -499,19 +513,12 @@ htb_class_mode(struct htb_class *cl,long { long toks; -if ((toks = (cl->ctokens + *diff)) < ( -#if HTB_HYSTERESIS - cl->cmode != HTB_CANT_SEND ? -cl->cbuffer : -#endif - 0)) { +if ((toks = (cl->ctokens + *diff)) < htb_lowater(cl)) { *diff = -toks; return HTB_CANT_SEND; } -if ((toks = (cl->tokens + *diff)) >= ( -#if HTB_HYSTERESIS - cl->cmode == HTB_CAN_SEND ? -cl->buffer : -#endif - 0)) + +if ((toks = (cl->tokens + *diff)) >= htb_hiwater(cl)) return HTB_CAN_SEND; *diff = -toks; -- 1.4.0 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATHC 4/6] htb: Lindent
Code was a mess in terms of indentation. Run through Lindent script, and cleanup the damage. Also, don't use, vim magic comment, and substitute inline for __inline__. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 1001 +++ 1 files changed, 526 insertions(+), 475 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index d8c1a6b..528d5c5 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1,4 +1,4 @@ -/* vim: ts=8 sw=8 +/* * net/sched/sch_htb.c Hierarchical token bucket, feed tree version * * This program is free software; you can redistribute it and/or @@ -68,11 +68,11 @@ #include one less than their parent. */ -#define HTB_HSIZE 16 /* classid hash size */ -#define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */ -#define HTB_RATECM 1/* whether to use rate computer */ -#define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */ -#define HTB_VER 0x30011/* major must be matched with number suplied by TC as version */ +#define HTB_HSIZE 16 /* classid hash size */ +#define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */ +#define HTB_RATECM 1 /* whether to use rate computer */ +#define HTB_HYSTERESIS 1 /* whether to use mode hysteresis for speedup */ +#define HTB_VER 0x30011/* major must be matched with number suplied by TC as version */ #if HTB_VER >> 16 != TC_HTB_PROTOVER #error "Mismatched sch_htb.c and pkt_sch.h" @@ -80,154 +80,152 @@ #endif /* used internaly to keep status of single class */ enum htb_cmode { -HTB_CANT_SEND, /* class can't send and can't borrow */ -HTB_MAY_BORROW,/* class can't send but may borrow */ -HTB_CAN_SEND /* class can send */ + HTB_CANT_SEND, /* class can't send and can't borrow */ + HTB_MAY_BORROW, /* class can't send but may borrow */ + HTB_CAN_SEND/* class can send */ }; /* interior & leaf nodes; props specific to leaves are marked L: */ -struct htb_class -{ -/* general class parameters */ -u32 classid; -struct gnet_stats_basic bstats; -struct gnet_stats_queue qstats; -struct gnet_stats_rate_est rate_est; -struct tc_htb_xstats xstats;/* our special stats */ -int refcnt;/* usage count of this class */ +struct htb_class { + /* general class parameters */ + u32 classid; + struct gnet_stats_basic bstats; + struct gnet_stats_queue qstats; + struct gnet_stats_rate_est rate_est; + struct tc_htb_xstats xstats;/* our special stats */ + int refcnt; /* usage count of this class */ #ifdef HTB_RATECM -/* rate measurement counters */ -unsigned long rate_bytes,sum_bytes; -unsigned long rate_packets,sum_packets; + /* rate measurement counters */ + unsigned long rate_bytes, sum_bytes; + unsigned long rate_packets, sum_packets; #endif -/* topology */ -int level; /* our level (see above) */ -struct htb_class *parent; /* parent class */ -struct list_head hlist;/* classid hash list item */ -struct list_head sibling; /* sibling list item */ -struct list_head children; /* children list */ - -union { - struct htb_class_leaf { - struct Qdisc *q; - int prio; - int aprio; - int quantum; - int deficit[TC_HTB_MAXDEPTH]; - struct list_head drop_list; - } leaf; - struct htb_class_inner { - struct rb_root feed[TC_HTB_NUMPRIO]; /* feed trees */ - struct rb_node *ptr[TC_HTB_NUMPRIO]; /* current class ptr */ -/* When class changes from state 1->2 and disconnects from - parent's feed then we lost ptr value and start from the - first child again. Here we store classid of the - last valid ptr (used when ptr is NULL). */ - u32 last_ptr_id[TC_HTB_NUMPRIO]; - } inner; -} un; -struct rb_node node[TC_HTB_NUMPRIO]; /* node for self or feed tree */ -struct rb_node pq_node; /* node for event queue */ -unsigned long pq_key; /* the same type as jiffies global */ - -int prio_activity; /* for which prios are we active */ -enum htb_cmode cmode; /* current mode of the class */ - -/* class attached filters */ -struct tcf_proto *filter_list; -int filter_cnt; - -int warned;/* only one warning about non work conserving .. */ - -/* token bucket parameters */ -struct qdisc_rate_table *rate; /* rate table of the class itself */ -struct qdisc_rate_table *ceil; /* ceiling rate (limits borrows too) */ -long buffer,cbuffer; /*
[PATCH 1/6] htb: remove broken debug code
The HTB network scheduler had debug code that wouldn't compile and confused and obfuscated the code, remove it. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 302 ++- 1 files changed, 34 insertions(+), 268 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 880a339..73094e7 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -70,7 +70,6 @@ #include #define HTB_HSIZE 16 /* classid hash size */ #define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */ -#undef HTB_DEBUG /* compile debugging support (activated by tc tool) */ #define HTB_RATECM 1/* whether to use rate computer */ #define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */ #define HTB_QLOCK(S) spin_lock_bh(&(S)->dev->queue_lock) @@ -81,51 +80,6 @@ #if HTB_VER >> 16 != TC_HTB_PROTOVER #error "Mismatched sch_htb.c and pkt_sch.h" #endif -/* debugging support; S is subsystem, these are defined: - 0 - netlink messages - 1 - enqueue - 2 - drop & requeue - 3 - dequeue main - 4 - dequeue one prio DRR part - 5 - dequeue class accounting - 6 - class overlimit status computation - 7 - hint tree - 8 - event queue - 10 - rate estimator - 11 - classifier - 12 - fast dequeue cache - - L is level; 0 = none, 1 = basic info, 2 = detailed, 3 = full - q->debug uint32 contains 16 2-bit fields one for subsystem starting - from LSB - */ -#ifdef HTB_DEBUG -#define HTB_DBG_COND(S,L) (((q->debug>>(2*S))&3) >= L) -#define HTB_DBG(S,L,FMT,ARG...) if (HTB_DBG_COND(S,L)) \ - printk(KERN_DEBUG FMT,##ARG) -#define HTB_CHCL(cl) BUG_TRAP((cl)->magic == HTB_CMAGIC) -#define HTB_PASSQ q, -#define HTB_ARGQ struct htb_sched *q, -#define static -#undef __inline__ -#define __inline__ -#undef inline -#define inline -#define HTB_CMAGIC 0xFEFAFEF1 -#define htb_safe_rb_erase(N,R) do { BUG_TRAP((N)->rb_color != -1); \ - if ((N)->rb_color == -1) break; \ - rb_erase(N,R); \ - (N)->rb_color = -1; } while (0) -#else -#define HTB_DBG_COND(S,L) (0) -#define HTB_DBG(S,L,FMT,ARG...) -#define HTB_PASSQ -#define HTB_ARGQ -#define HTB_CHCL(cl) -#define htb_safe_rb_erase(N,R) rb_erase(N,R) -#endif - - /* used internaly to keep status of single class */ enum htb_cmode { HTB_CANT_SEND, /* class can't send and can't borrow */ @@ -136,9 +90,6 @@ enum htb_cmode { /* interior & leaf nodes; props specific to leaves are marked L: */ struct htb_class { -#ifdef HTB_DEBUG - unsigned magic; -#endif /* general class parameters */ u32 classid; struct gnet_stats_basic bstats; @@ -238,7 +189,6 @@ struct htb_sched int nwc_hit; /* this to disable mindelay complaint in dequeue */ int defcls;/* class where unclassified flows go to */ -u32 debug; /* subsystem debug levels */ /* filters for qdisc itself */ struct tcf_proto *filter_list; @@ -354,75 +304,21 @@ #endif return cl; } -#ifdef HTB_DEBUG -static void htb_next_rb_node(struct rb_node **n); -#define HTB_DUMTREE(root,memb) if(root) { \ - struct rb_node *n = (root)->rb_node; \ - while (n->rb_left) n = n->rb_left; \ - while (n) { \ - struct htb_class *cl = rb_entry(n, struct htb_class, memb); \ - printk(" %x",cl->classid); htb_next_rb_node (&n); \ - } } - -static void htb_debug_dump (struct htb_sched *q) -{ - int i,p; - printk(KERN_DEBUG "htb*g j=%lu lj=%lu\n",jiffies,q->jiffies); - /* rows */ - for (i=TC_HTB_MAXDEPTH-1;i>=0;i--) { - printk(KERN_DEBUG "htb*r%d m=%x",i,q->row_mask[i]); - for (p=0;prow[i][p].rb_node) continue; - printk(" p%d:",p); - HTB_DUMTREE(q->row[i]+p,node[p]); - } - printk("\n"); - } - /* classes */ - for (i = 0; i < HTB_HSIZE; i++) { - struct list_head *l; - list_for_each (l,q->hash+i) { - struct htb_class *cl = list_entry(l,struct htb_class,hlist); - long diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, (u32)cl->mbuffer); - printk(KERN_DEBUG "htb*c%x m=%d t=%ld c=%ld pq=%lu df=%ld ql=%d " - "pa=%x f:", - cl->classid,cl->cmode,cl->tokens,cl->ctokens, - cl->pq_node.rb_color==-1?0:cl->pq_key,diff, - cl->level?0:cl->un.leaf.q->q.qlen,cl->prio_activity); - if (cl->level) - for (p=0;pun.inner.feed[p].rb_node) continue; - printk(" p%d a=%x:",p,cl->un.inner.ptr[p]?rb_entry(cl->un.inner.ptr[p], struct htb_class,node[p])->classid:0); - HTB_DUMTREE(cl->un.inner.feed+p,node[p]); - } -
[PATCH 6/6] htb: rbtree cleanup
Add code to initialize rb tree nodes, and check for double deletion. This is not a real fix, but I can make it trap sometimes and may be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 7853c6f..3f3e9df 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -366,7 +366,7 @@ static void htb_add_to_wait_tree(struct * When we are past last key we return NULL. * Average complexity is 2 steps per call. */ -static void htb_next_rb_node(struct rb_node **n) +static inline void htb_next_rb_node(struct rb_node **n) { *n = rb_next(*n); } @@ -388,6 +388,18 @@ static inline void htb_add_class_to_row( } } +/* If this triggers, it is a bug in this code, but it need not be fatal */ +static void htb_safe_rb_erase(struct rb_node *rb, struct rb_root *root) +{ + if (RB_EMPTY_NODE(rb)) { + WARN_ON(1); + } else { + rb_erase(rb, root); + RB_CLEAR_NODE(rb); + } +} + + /** * htb_remove_class_from_row - removes class from its row * @@ -401,10 +413,12 @@ static inline void htb_remove_class_from while (mask) { int prio = ffz(~mask); + mask &= ~(1 << prio); if (q->ptr[cl->level][prio] == cl->node + prio) htb_next_rb_node(q->ptr[cl->level] + prio); - rb_erase(cl->node + prio, q->row[cl->level] + prio); + + htb_safe_rb_erase(cl->node + prio, q->row[cl->level] + prio); if (!q->row[cl->level][prio].rb_node) m |= 1 << prio; } @@ -472,7 +486,7 @@ static void htb_deactivate_prios(struct p->un.inner.ptr[prio] = NULL; } - rb_erase(cl->node + prio, p->un.inner.feed + prio); + htb_safe_rb_erase(cl->node + prio, p->un.inner.feed + prio); if (!p->un.inner.feed[prio].rb_node) mask |= 1 << prio; @@ -739,7 +753,7 @@ #define HTB_ACCNT(T,B,R) toks = diff + c htb_change_class_mode(q, cl, &diff); if (old_mode != cl->cmode) { if (old_mode != HTB_CAN_SEND) - rb_erase(&cl->pq_node, q->wait_pq + cl->level); + htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level); if (cl->cmode != HTB_CAN_SEND) htb_add_to_wait_tree(q, cl, diff); } @@ -782,7 +796,7 @@ static long htb_do_events(struct htb_sch if (time_after(cl->pq_key, q->jiffies)) { return cl->pq_key - q->jiffies; } - rb_erase(p, q->wait_pq + level); + htb_safe_rb_erase(p, q->wait_pq + level); diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, (u32) cl->mbuffer); htb_change_class_mode(q, cl, &diff); if (cl->cmode != HTB_CAN_SEND) @@ -1279,7 +1293,7 @@ static void htb_destroy_class(struct Qdi htb_deactivate(q, cl); if (cl->cmode != HTB_CAN_SEND) - rb_erase(&cl->pq_node, q->wait_pq + cl->level); + htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level); kfree(cl); } @@ -1370,6 +1384,8 @@ static int htb_change_class(struct Qdisc if (!cl) { /* new class */ struct Qdisc *new_q; + int prio; + /* check for valid classid */ if (!classid || TC_H_MAJ(classid ^ sch->handle) || htb_find(classid, sch)) @@ -1389,6 +1405,10 @@ static int htb_change_class(struct Qdisc INIT_HLIST_NODE(&cl->hlist); INIT_LIST_HEAD(&cl->children); INIT_LIST_HEAD(&cl->un.leaf.drop_list); + RB_CLEAR_NODE(&cl->pq_node); + + for (prio = 0; prio < TC_HTB_NUMPRIO; prio++) + RB_CLEAR_NODE(&cl->node[prio]); /* create leaf qdisc early because it uses kmalloc(GFP_KERNEL) so that can't be used inside of sch_tree_lock @@ -1404,7 +1424,7 @@ static int htb_change_class(struct Qdisc /* remove from evt list because of level change */ if (parent->cmode != HTB_CAN_SEND) { - rb_erase(&parent->pq_node, q->wait_pq); + htb_safe_rb_erase(&parent->pq_node, q->wait_pq); parent->cmode = HTB_CAN_SEND; } parent->level = (parent->parent ? parent->parent->level -- 1.4.0 - To unsubscribe from this list: send the line "unsubscr
[PATCH 5/6] htb: use hlist for hash lists.
Use hlist instead of list for the hash list. This saves space, and we can check for double delete better. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 49 +++-- 1 files changed, 27 insertions(+), 22 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 528d5c5..7853c6f 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -104,7 +104,7 @@ #endif /* topology */ int level; /* our level (see above) */ struct htb_class *parent; /* parent class */ - struct list_head hlist; /* classid hash list item */ + struct hlist_node hlist;/* classid hash list item */ struct list_head sibling; /* sibling list item */ struct list_head children; /* children list */ @@ -163,8 +163,8 @@ static inline long L2T(struct htb_class struct htb_sched { struct list_head root; /* root classes list */ - struct list_head hash[HTB_HSIZE]; /* hashed by classid */ - struct list_head drops[TC_HTB_NUMPRIO]; /* active leaves (for drops) */ + struct hlist_head hash[HTB_HSIZE]; /* hashed by classid */ + struct list_head drops[TC_HTB_NUMPRIO];/* active leaves (for drops) */ /* self list - roots of self generating tree */ struct rb_root row[TC_HTB_MAXDEPTH][TC_HTB_NUMPRIO]; @@ -220,12 +220,13 @@ #endif static inline struct htb_class *htb_find(u32 handle, struct Qdisc *sch) { struct htb_sched *q = qdisc_priv(sch); - struct list_head *p; + struct hlist_node *p; + struct htb_class *cl; + if (TC_H_MAJ(handle) != sch->handle) return NULL; - list_for_each(p, q->hash + htb_hash(handle)) { - struct htb_class *cl = list_entry(p, struct htb_class, hlist); + hlist_for_each_entry(cl, p, q->hash + htb_hash(handle), hlist) { if (cl->classid == handle) return cl; } @@ -675,7 +676,9 @@ static void htb_rate_timer(unsigned long { struct Qdisc *sch = (struct Qdisc *)arg; struct htb_sched *q = qdisc_priv(sch); - struct list_head *p; + struct hlist_node *p; + struct htb_class *cl; + /* lock queue so that we can muck with it */ spin_lock_bh(&sch->dev->queue_lock); @@ -686,9 +689,8 @@ static void htb_rate_timer(unsigned long /* scan and recompute one bucket at time */ if (++q->recmp_bucket >= HTB_HSIZE) q->recmp_bucket = 0; - list_for_each(p, q->hash + q->recmp_bucket) { - struct htb_class *cl = list_entry(p, struct htb_class, hlist); + hlist_for_each_entry(cl,p, q->hash + q->recmp_bucket, hlist) { RT_GEN(cl->sum_bytes, cl->rate_bytes); RT_GEN(cl->sum_packets, cl->rate_packets); } @@ -1041,10 +1043,10 @@ static void htb_reset(struct Qdisc *sch) int i; for (i = 0; i < HTB_HSIZE; i++) { - struct list_head *p; - list_for_each(p, q->hash + i) { - struct htb_class *cl = - list_entry(p, struct htb_class, hlist); + struct hlist_node *p; + struct htb_class *cl; + + hlist_for_each_entry(cl, p, q->hash + i, hlist) { if (cl->level) memset(&cl->un.inner, 0, sizeof(cl->un.inner)); else { @@ -1091,7 +1093,7 @@ static int htb_init(struct Qdisc *sch, s INIT_LIST_HEAD(&q->root); for (i = 0; i < HTB_HSIZE; i++) - INIT_LIST_HEAD(q->hash + i); + INIT_HLIST_HEAD(q->hash + i); for (i = 0; i < TC_HTB_NUMPRIO; i++) INIT_LIST_HEAD(q->drops + i); @@ -1269,7 +1271,8 @@ static void htb_destroy_class(struct Qdi struct htb_class, sibling)); /* note: this delete may happen twice (see htb_delete) */ - list_del(&cl->hlist); + if (!hlist_unhashed(&cl->hlist)) + hlist_del(&cl->hlist); list_del(&cl->sibling); if (cl->prio_activity) @@ -1317,7 +1320,9 @@ static int htb_delete(struct Qdisc *sch, sch_tree_lock(sch); /* delete from hash and active; remainder in destroy_class */ - list_del_init(&cl->hlist); + if (!hlist_unhashed(&cl->hlist)) + hlist_del(&cl->hlist); + if (cl->prio_activity) htb_deactivate(q, cl); @@ -1381,7 +1386,7 @@ static int htb_change_class(struct Qdisc cl->refcnt = 1; INIT_LIST_HEAD(&cl->sibling); - INIT_LIST_HEAD(&cl->hlist); + INIT_HLIST_NODE(&cl->hlist); INIT_LIST_HEAD(&cl->children); INIT_LIST_HEAD(&cl->un.leaf.drop_list); @@ -1420,7 +1425,7 @@ static int htb_change_class(struct Qdisc
[PATCH 0/6] htb: cleanup
The HTB scheduler code is a mess, this patch set does some basic house cleaning. The first four should cause no code change, but the last two need more testing. -- Stephen Hemminger <[EMAIL PROTECTED]> "And in the Packet there writ down that doome" - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] htb: remove lock macro
Get rid of the macro's being used to obscure the locking. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 18 -- 1 files changed, 8 insertions(+), 10 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 73094e7..c0b80b7 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -72,8 +72,6 @@ #define HTB_HSIZE 16 /* classid hash siz #define HTB_EWMAC 2/* rate average over HTB_EWMAC*HTB_HSIZE sec */ #define HTB_RATECM 1/* whether to use rate computer */ #define HTB_HYSTERESIS 1/* whether to use mode hysteresis for speedup */ -#define HTB_QLOCK(S) spin_lock_bh(&(S)->dev->queue_lock) -#define HTB_QUNLOCK(S) spin_unlock_bh(&(S)->dev->queue_lock) #define HTB_VER 0x30011/* major must be matched with number suplied by TC as version */ #if HTB_VER >> 16 != TC_HTB_PROTOVER @@ -667,7 +665,7 @@ static void htb_rate_timer(unsigned long struct list_head *p; /* lock queue so that we can muck with it */ - HTB_QLOCK(sch); + spin_lock_bh(&sch->dev->queue_lock); q->rttim.expires = jiffies + HZ; add_timer(&q->rttim); @@ -681,7 +679,7 @@ static void htb_rate_timer(unsigned long RT_GEN (cl->sum_bytes,cl->rate_bytes); RT_GEN (cl->sum_packets,cl->rate_packets); } - HTB_QUNLOCK(sch); + spin_unlock_bh(&sch->dev->queue_lock); } #endif @@ -1089,7 +1087,7 @@ static int htb_dump(struct Qdisc *sch, s unsigned char*b = skb->tail; struct rtattr *rta; struct tc_htb_glob gopt; - HTB_QLOCK(sch); + spin_lock_bh(&sch->dev->queue_lock); gopt.direct_pkts = q->direct_pkts; gopt.version = HTB_VER; @@ -1100,10 +1098,10 @@ static int htb_dump(struct Qdisc *sch, s RTA_PUT(skb, TCA_OPTIONS, 0, NULL); RTA_PUT(skb, TCA_HTB_INIT, sizeof(gopt), &gopt); rta->rta_len = skb->tail - b; - HTB_QUNLOCK(sch); + spin_unlock_bh(&sch->dev->queue_lock); return skb->len; rtattr_failure: - HTB_QUNLOCK(sch); + spin_unlock_bh(&sch->dev->queue_lock); skb_trim(skb, skb->tail - skb->data); return -1; } @@ -1116,7 +1114,7 @@ static int htb_dump_class(struct Qdisc * struct rtattr *rta; struct tc_htb_opt opt; - HTB_QLOCK(sch); + spin_lock_bh(&sch->dev->queue_lock); tcm->tcm_parent = cl->parent ? cl->parent->classid : TC_H_ROOT; tcm->tcm_handle = cl->classid; if (!cl->level && cl->un.leaf.q) @@ -1133,10 +1131,10 @@ static int htb_dump_class(struct Qdisc * opt.level = cl->level; RTA_PUT(skb, TCA_HTB_PARMS, sizeof(opt), &opt); rta->rta_len = skb->tail - b; - HTB_QUNLOCK(sch); + spin_unlock_bh(&sch->dev->queue_lock); return skb->len; rtattr_failure: - HTB_QUNLOCK(sch); + spin_unlock_bh(&sch->dev->queue_lock); skb_trim(skb, b - skb->data); return -1; } -- 1.4.0 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] htb: rbtree cleanup
Add code to initialize rb tree nodes, and check for double deletion. This is not a real fix, but I can make it trap sometimes and may be a bandaid for: http://bugzilla.kernel.org/show_bug.cgi?id=6681 Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/sched/sch_htb.c | 34 +++--- 1 files changed, 27 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 7853c6f..3f3e9df 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -366,7 +366,7 @@ static void htb_add_to_wait_tree(struct * When we are past last key we return NULL. * Average complexity is 2 steps per call. */ -static void htb_next_rb_node(struct rb_node **n) +static inline void htb_next_rb_node(struct rb_node **n) { *n = rb_next(*n); } @@ -388,6 +388,18 @@ static inline void htb_add_class_to_row( } } +/* If this triggers, it is a bug in this code, but it need not be fatal */ +static void htb_safe_rb_erase(struct rb_node *rb, struct rb_root *root) +{ + if (RB_EMPTY_NODE(rb)) { + WARN_ON(1); + } else { + rb_erase(rb, root); + RB_CLEAR_NODE(rb); + } +} + + /** * htb_remove_class_from_row - removes class from its row * @@ -401,10 +413,12 @@ static inline void htb_remove_class_from while (mask) { int prio = ffz(~mask); + mask &= ~(1 << prio); if (q->ptr[cl->level][prio] == cl->node + prio) htb_next_rb_node(q->ptr[cl->level] + prio); - rb_erase(cl->node + prio, q->row[cl->level] + prio); + + htb_safe_rb_erase(cl->node + prio, q->row[cl->level] + prio); if (!q->row[cl->level][prio].rb_node) m |= 1 << prio; } @@ -472,7 +486,7 @@ static void htb_deactivate_prios(struct p->un.inner.ptr[prio] = NULL; } - rb_erase(cl->node + prio, p->un.inner.feed + prio); + htb_safe_rb_erase(cl->node + prio, p->un.inner.feed + prio); if (!p->un.inner.feed[prio].rb_node) mask |= 1 << prio; @@ -739,7 +753,7 @@ #define HTB_ACCNT(T,B,R) toks = diff + c htb_change_class_mode(q, cl, &diff); if (old_mode != cl->cmode) { if (old_mode != HTB_CAN_SEND) - rb_erase(&cl->pq_node, q->wait_pq + cl->level); + htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level); if (cl->cmode != HTB_CAN_SEND) htb_add_to_wait_tree(q, cl, diff); } @@ -782,7 +796,7 @@ static long htb_do_events(struct htb_sch if (time_after(cl->pq_key, q->jiffies)) { return cl->pq_key - q->jiffies; } - rb_erase(p, q->wait_pq + level); + htb_safe_rb_erase(p, q->wait_pq + level); diff = PSCHED_TDIFF_SAFE(q->now, cl->t_c, (u32) cl->mbuffer); htb_change_class_mode(q, cl, &diff); if (cl->cmode != HTB_CAN_SEND) @@ -1279,7 +1293,7 @@ static void htb_destroy_class(struct Qdi htb_deactivate(q, cl); if (cl->cmode != HTB_CAN_SEND) - rb_erase(&cl->pq_node, q->wait_pq + cl->level); + htb_safe_rb_erase(&cl->pq_node, q->wait_pq + cl->level); kfree(cl); } @@ -1370,6 +1384,8 @@ static int htb_change_class(struct Qdisc if (!cl) { /* new class */ struct Qdisc *new_q; + int prio; + /* check for valid classid */ if (!classid || TC_H_MAJ(classid ^ sch->handle) || htb_find(classid, sch)) @@ -1389,6 +1405,10 @@ static int htb_change_class(struct Qdisc INIT_HLIST_NODE(&cl->hlist); INIT_LIST_HEAD(&cl->children); INIT_LIST_HEAD(&cl->un.leaf.drop_list); + RB_CLEAR_NODE(&cl->pq_node); + + for (prio = 0; prio < TC_HTB_NUMPRIO; prio++) + RB_CLEAR_NODE(&cl->node[prio]); /* create leaf qdisc early because it uses kmalloc(GFP_KERNEL) so that can't be used inside of sch_tree_lock @@ -1404,7 +1424,7 @@ static int htb_change_class(struct Qdisc /* remove from evt list because of level change */ if (parent->cmode != HTB_CAN_SEND) { - rb_erase(&parent->pq_node, q->wait_pq); + htb_safe_rb_erase(&parent->pq_node, q->wait_pq); parent->cmode = HTB_CAN_SEND; } parent->level = (parent->parent ? parent->parent->level -- 1.4.0 - To unsubscribe from this list: send the line "unsubscr
Re: [PATCH 0/6] htb: cleanup
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Wed, 2 Aug 2006 12:56:36 -0700 > The HTB scheduler code is a mess, this patch set does some basic > house cleaning. The first four should cause no code change, but the > last two need more testing. These patches look fine to me. Once everyone think's they are ready just let me know and I'll push them into net-2.6.19 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: netlink status fix
Fix code that passes back netlink status messages about bridge changes. Submitted by [EMAIL PROTECTED] Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/bridge/br_netlink.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 06abb66..53086fb 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -85,7 +85,7 @@ void br_ifinfo_notify(int event, struct goto err_out; err = br_fill_ifinfo(skb, port, current->pid, 0, event, 0); - if (err) + if (err < 0) goto err_kfree; NETLINK_CB(skb).dst_group = RTNLGRP_LINK; -- 1.4.0 -- - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] gre: transparent ethernet bridging
Stephen Hemminger wrote: > On Wed, 02 Aug 2006 16:17:42 +1000 > Philip Craig <[EMAIL PROTECTED]> wrote: >> It generates a random mac address for gre ports, and also stores >> a copy of the mac address for ethernet ports, rather than checking >> dev->type everywhere. > > That looks cleaner. I wonder if using a fixed OUI would be better > than random addresses but then choosing an OUI would be a problem. random_ether_addr() sets the local assignment bit. This is what various other virtual devices do (including tap devices, which can also be bridged). > You probably should add a comment about what this function is doing, > and why. Okay. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in IPSEC?
On Wed, Aug 02, 2006 at 05:08:39PM +0200, Louis Croisez wrote: > > I think that 96 bits for the truncated version of the hmac is not > enough with respect to RFC 2104, p5 ?1 : > "... We recommend that the output length to be not less than half the > length of the hash output ... and not less than 80 bits ..." > > I thing that the truncated length should be 128 bits in this case... > Do you agree? (To recap our sha256 IPsec implementation truncates the output to 96 bits while the last IETF draft on sha256 and the general HMAC RFC requires 128 bits) Yes I agree with your assessment. Changing it is nasty though since we don't know how many Linux users have deployed this. Also, we should keep in mind that the IETF has given up on sha256 altogether. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] gre: transparent ethernet bridging
Lennert Buytenhek wrote: > On Mon, Jul 31, 2006 at 10:08:22PM -0700, Stephen Hemminger wrote: > Why not use existing bridge code? >>> It does use the existing bridge code. Perhaps the name is misleading. >>> All it does is encapsulate the full ethernet header in a gre packet, >>> rather than only layer 3. That is, currently gre uses ARPHRD_IPGRE, >>> but bridging requires ARPHRD_ETHER. >> I am not against making the bridge code smarter to handle other >> encapsulation. > > What if you want to run ethernet directly over a GRE tunnel, without > using bridging? But on the other hand, this method allows you to send both ethernet and non-ethernet traffic over the same GRE tunnel. Is that useful? Actually, this feature is what makes the handling of the LLC_SAP_BSPAN packets simple. The patch to bridging is a lot cleaner than the patch to GRE, and it also sidesteps the userspace configuration issues, so I don't want to go back to modifying the GRE device. Both could be achieved by creating a new virtual device that sits between GRE and bridging. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Create IP100A Driver
Dear Jeff: I had discuss with our peoples. We decided to use sundance.c to support IP100A. We will also update some bug fix to this driver. Thanks for your suggestion. Best Regards, Jesse Huang - Original Message - From: "Jeff Garzik" <[EMAIL PROTECTED]> To: "Jesse Huang" <[EMAIL PROTECTED]> Cc: "John W. Linville" <[EMAIL PROTECTED]>; ; ; <[EMAIL PROTECTED]> Sent: Friday, July 28, 2006 6:14 PM Subject: Re: [PATCH] Create IP100A Driver Although it is occasionally OK to duplicate a driver, I do not see a compelling case with ip100a. The stronger case for a single codebase is won on the strengths of lower long-term maintenance costs, increased strength of review, doesn't break existing sundance driver uses, and re-use of existing testing benefits. If you feel strongly about not showing "sundance" to your users, you can always submit a one-line MODULE_ALIAS() change which permits users to load "ip100a" (really sundance.c). Using MODULE_ALIAS() seems quite reasonable, given that IC Plus appears to be taking the lead in future Sundance-like chip development. So, please resubmit as changes to the existing sundance.c. This is better for the standard Linux kernel engineering process. Thanks, Jeff - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] RFC: matching interface groups
Balazs Scheidler writes: > I would like to easily match a set of dynamically created interfaces > from my packet filter rules. The attached patch forms the basis of my > implementation and I would like to know whether something like this is > mergeable to mainline. [snip] > The implementation: > > Each interface can belong to a single "group" at a time, an interface > comes up without being a member in any of the groups. You can get a similar effect by (ab)using the iflink field i.e. set the iflink to the parent interface and modify ip_tables.c:ip_packet_match to check the ifindex (or iflink if defined) for a match. An advantage of this is that it doesn't require adding any new fields and the only kernel change is to ip_tables.c:ip_packet_match (and its caller). That said, an explicit group (or zone as various firewall vendors call it) is cleaner. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
bonding questions: carrier based link monitoring / slave device state flags
First, i'd like to verify what is the parameter setting to have the bonding driver use netif_carrier_ok(slave_device) as the means for link detection. Is setting use_carrier = 1 enough or one needs to set miimon to non-zero as well??? (where the value of miimon translates to the link monitoring frequency). Second, I understand that an enslaved device must **not** be UP, so when enslaving a device the bonding driver calls dev_open(slave_device), and make sure the device is UP, correct? What i want to better understand here, is whether for the bonding driver to declare a slave as "being able to carry traffic" it assumes the slave will move from UP to RUNNING state (and later netif_carrier_ok would return TRUE) without an IP address being set for the slave device ??? Is (per the bonding driver) the **time** it should take the slave to get from UP to (RUNNING && carrier_ok) state limited and/or controlled? Or. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/6] zd1211rw: Pass more management frame types up to host
On 06-08-02 17:58 John W. Linville wrote: > On Tue, Aug 01, 2006 at 11:43:31PM +0200, Ulrich Kunitz wrote: > > From: Daniel Drake <[EMAIL PROTECTED]> > > > > We'll be needing these at some point... > > This one doesn't really seem like a fix. But since the later fixes > seem to depend on it, I guess it makes sense to take it. > > I just didn't want you to think I wasn't looking... :-) John, you are absolutely right. The patch is needed, because the patch sequence would break without it. Would it be acceptable if I hint on such "bridge" patches in the futture or should we create a clean patch sequence? The latter would require us to rewrite patches manually. BTW we will have a greater number of patches for 2.6.19 (around 30), which supports more devices and also cleans stuff. Some work is however still required. Regards, Uli -- Uli Kunitz - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html