Re: Patch: RFC2863 #1 (incomplete)
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-09 17:13 > Am Mittwoch 09 November 2005 15:25 schrieb Thomas Graf: > > > It's still behind eth3 but not reachable at the moment. If you know > > that the network can be reached via other interfaces then just don't > > statically assign a route for it and have the routing daemon take care > > of it, where's the problem? Nobody forces you to configure your > > addresses as prefixes, it's just a usability thing, really. > > This doesn't work. What do you think a routing daemon does when it is told to > announce a network/mask? It announces all interfaces that are in this range > with their network address, calculated by address and mask configured on that > interface. With your suggestion, it would announce /32, not the network > behind it. Obviously you would tell the routing daemon of the network, it would then add it to the kernel and to its internal connected routes list. > And if that's not enough, the system would be totally unreachable without a > running routing daemon in this configuration. Never ever give your phone > number to an admin of such a router. Yes, what's your next argument? Running dhcp on every interface by default because the admin could forget adding the address at bootup and the system would be unreachable? > Wait, not totally. Broadcasts would always go out of this interface due to an > implicit route in the local routing table that should not be touched from > userspace. The broadcast routes in the local table are not generated for /31 and /32 prefixes. > > That should work, you have have two default routes, fib_detect_death() > > should notice that the first default route is dead and use the other. > > It implies the neighbour entry of the default gateway to be unreachable > > though so it might take a while to notice. > > For todays' failover requirements, It takes ages for an arp entry to timeout. I agree, we might just want to force a neighbour probe when the carrier is lost to speed this up. OTOH, userspace can always replace the default route when receiving !IFF_RUNNING notifications. My point is that in this example, disabling the route because of a carrier loss is just non-sense, the carrier state wouldn't help in many cases because f.e. although the carrier is up, the gateway is unreachable. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf <[EMAIL PROTECTED]> writes: > Sorry but OPER_DORMANT_L3DOWN is what this discussion was all about, > if you're talking of something else you should say so. OPER_DORMANT_L3DOWN, yes. So we now have about 1 day if we want (do we still?) to have something in 2.6.15 if I calculate correctly, right? -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf <[EMAIL PROTECTED]> writes: > OK, I'm gonna explain how I see it a bit more detailed to hopefully > make things clearer. Somehow it seems to be assumed that addresses have > to be configured as prefixes, already including the netmask of the > subnet it is on. This feature added recently i.e., to 4.3BSD IIRC? No offense, but this all is crossing limits of insanity. We can workaround that problem with routing daemon but it doesn't mean there is no problem. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Hasso Tepper <[EMAIL PROTECTED]> writes: > No. This mandates usage of routing daemon even in my laptop which is > nonsense. Precisely. If the routes are dropped on !IFF_RUNNING, one could ifconfig 10.0.0.1 peer 10.0.0.2 dev eth0 ifconfig 10.0.0.1 peer 10.0.0.2 dev irdanet0 ifconfig 10.0.0.1 peer 10.0.0.2 dev usbnet0 ifconfig 10.0.0.1 peer 10.0.0.2 dev adsl0 and it would work fine. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Am Mittwoch 09 November 2005 15:25 schrieb Thomas Graf: > It's still behind eth3 but not reachable at the moment. If you know > that the network can be reached via other interfaces then just don't > statically assign a route for it and have the routing daemon take care > of it, where's the problem? Nobody forces you to configure your > addresses as prefixes, it's just a usability thing, really. This doesn't work. What do you think a routing daemon does when it is told to announce a network/mask? It announces all interfaces that are in this range with their network address, calculated by address and mask configured on that interface. With your suggestion, it would announce /32, not the network behind it. Now you try redistribute static routes. However, you cannot convert an external route as intra-area in OSPF, as you would need to. And if that's not enough, the system would be totally unreachable without a running routing daemon in this configuration. Never ever give your phone number to an admin of such a router. Wait, not totally. Broadcasts would always go out of this interface due to an implicit route in the local routing table that should not be touched from userspace. Should the routing daemon do another bit of guesswork here? It would also be interesting to see what a dhcp server does if it cannot find the interface it should serve leases on. Or keepalived. This idea is so insane that you should bury it as soon as possible. Really. > IFF_RUNNING is a very weak way to detect if an interface is unusable, > not all drivers correctly report the carrier state and overubscriptions, > or carrier losses on virtual links are a far more likely cause for a link > getting unusable. We should solve this issue correctly and not just work > around it. Carrier detection is being added to more and more (ethernet) drivers. Devices that support it can benefit from this feature, the rest cannot. > That should work, you have have two default routes, fib_detect_death() > should notice that the first default route is dead and use the other. > It implies the neighbour entry of the default gateway to be unreachable > though so it might take a while to notice. For todays' failover requirements, It takes ages for an arp entry to timeout. Anyway, we might be getting too off topic. Let's first see whether my RFC2863 idea or the dormant/protocol flag makes it. Stefan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
> > We're talking about the auto routes in the main table, right? There > > is no reason for the addresses to be configured as a prefix, they > > can, and in my opinion should, be added as /~0 if the net is not > > guaranteed to be reachable at exactly this interface at all times. > > e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that > > 1.1.1.0/24 is always behind eth3, no matter what. > > Really? How it can be behind eth3 if state of eth3 says that "no packets > to this link - it's DOWN"? It's still behind eth3 but not reachable at the moment. If you know that the network can be reached via other interfaces then just don't statically assign a route for it and have the routing daemon take care of it, where's the problem? Nobody forces you to configure your addresses as prefixes, it's just a usability thing, really. > I'm not talking about new operational state of interface. I'm talking > about simple rule - if link is known to be unusable (!IFF_RUNNING), don't > use routes pointed to this link. Whether they are connected routes > created by kernel or static routes created by user. Sorry but OPER_DORMANT_L3DOWN is what this discussion was all about, if you're talking of something else you should say so. IFF_RUNNING is a very weak way to detect if an interface is unusable, not all drivers correctly report the carrier state and overubscriptions, or carrier losses on virtual links are a far more likely cause for a link getting unusable. We should solve this issue correctly and not just work around it. Of course you could force everyone to use loopback devices implementing a heartbeat protocol which would control IFF_RUNNING but that's probably not what we want. In fact I think we should have a userspace daemon checking the links and putting them into dormant state upon failure so IFF_RUNNING gets cleared. > Simple scenario with my personal laptop - wired ethernet > (192.168.1.10/24) and wireless (172.16.100.10/24). Two defaults, wireless > one is with higher metric. If I walk away from my table and disconnect > wired ethernet cable, I'd expect that it wouldn't use default pointed to > wired network _and_ 192.168.1.0/24 pointed to it any more. Default > pointed to the wireless is there, wireless link is known to be usable > (IFF_RUNNING) and 192.168.1.0/24 is reachable via wireless as well. That should work, you have have two default routes, fib_detect_death() should notice that the first default route is dead and use the other. It implies the neighbour entry of the default gateway to be unreachable though so it might take a while to notice. If not this would be a bug and we'd have to look into this separately. * Paul Jakma <[EMAIL PROTECTED]> 2005-11-09 11:40 > The automatic kernel "connected" route should either be removed, or > ignored by the kernel until the interface is fully up (UP&RUNNING). > > E.g: > > A-B > | | >- > | > C > > B and A have a PtP link to each other and each have an interface to a > shared network, which has C on it. B's link to the subnet fails > (carrier is lost, IFF_RUNNING cleared), B now no longer can > communicate with C, even if it has a route to C via A (e.g. a default > route) because the kernel routing table still contains a valid > connected route for the shared subnet pointing out the ~IFF_RUNNING > interface. > > That connected route of the kernel should be removed (or marked > invalid/unusable in some way) while the interface is ~IFF_RUNNING. OK, I'm gonna explain how I see it a bit more detailed to hopefully make things clearer. Somehow it seems to be assumed that addresses have to be configured as prefixes, already including the netmask of the subnet it is on. This feature added recently is really only there to make life easier for the casual static routing case. It makes setting up static routes painless. As soon as the routes aren't static anymore they should be added by the daemon in charge, commonly the routing daemon. If the routing daemon sees the interface going down and has an alternative route it is in his responsibility to replace the route with the alternative route. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Hi Thomas, On Wed, 9 Nov 2005, Thomas Graf wrote: Now even if this is not acceptable and you really need a way to favour routes added by userspace over auto/connected routes then we should fix this in the routing itself and not with a new opertional state interface state. Yes, exactly. The automatic kernel "connected" route should either be removed, or ignored by the kernel until the interface is fully up (UP&RUNNING). E.g: A-B | | - | C B and A have a PtP link to each other and each have an interface to a shared network, which has C on it. B's link to the subnet fails (carrier is lost, IFF_RUNNING cleared), B now no longer can communicate with C, even if it has a route to C via A (e.g. a default route) because the kernel routing table still contains a valid connected route for the shared subnet pointing out the ~IFF_RUNNING interface. That connected route of the kernel should be removed (or marked invalid/unusable in some way) while the interface is ~IFF_RUNNING. regards, -- Paul Jakma [EMAIL PROTECTED] [EMAIL PROTECTED] Key ID: 64A2FF6A Fortune: Hackers are just a migratory lifeform with a tropism for computers. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf wrote: > OK, my point might not be the best, still I think it is valid. The fact > that certain routes have to be excluded from the routing calculation > even if their link is up and running makes it obvious that the routing > daemon can take care of DORMANT_L3DOWN itself and we don't need the > kernel to hide them. Nothing more. No. Routing daemon deals only with info from neighbours. While you can say that "if link is down => neighbour is unreachable", you can't say absolutely nothing about link if neighbour is down. Even with PtP media. If neighbour is down, it just means that neighbour is down, but best route to the neighbour is still via this link. > We're talking about the auto routes in the main table, right? There > is no reason for the addresses to be configured as a prefix, they > can, and in my opinion should, be added as /~0 if the net is not > guaranteed to be reachable at exactly this interface at all times. > e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that > 1.1.1.0/24 is always behind eth3, no matter what. Really? How it can be behind eth3 if state of eth3 says that "no packets to this link - it's DOWN"? > If the route is supposed to be non-static then the address should be > added as /32 to avoid the auto route in main table and the broadcast > routes in the local table. Instead the routing daemon should be > responsible of manging 1.1.1.0/24 at all times. > > Does this make sense? No. This mandates usage of routing daemon even in my laptop which is nonsense. > Now even if this is not acceptable and you really need a way to > favour routes added by userspace over auto/connected routes then > we should fix this in the routing itself and not with a new > opertional state interface state. I'm not talking about new operational state of interface. I'm talking about simple rule - if link is known to be unusable (!IFF_RUNNING), don't use routes pointed to this link. Whether they are connected routes created by kernel or static routes created by user. Simple scenario with my personal laptop - wired ethernet (192.168.1.10/24) and wireless (172.16.100.10/24). Two defaults, wireless one is with higher metric. If I walk away from my table and disconnect wired ethernet cable, I'd expect that it wouldn't use default pointed to wired network _and_ 192.168.1.0/24 pointed to it any more. Default pointed to the wireless is there, wireless link is known to be usable (IFF_RUNNING) and 192.168.1.0/24 is reachable via wireless as well. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 17:14 > OSPF doesn't care if there is connected route or not if !IFF_RUNNING. And > it doesn't matter if it's demand circuit or usual one. If kernel reports > link down (!IFF_RUNNING), no any packets are sent over this link - > hellos, LS updates etc. Hellos and other heartbeat mechanisms are used to > discover neighbours, not whether link works or not. If link is down, yes, > it means that neighbour is down as well. But if neighbour is down, it > doesn't mean that link is down. It just means that neighbour isn't > considered for routing calculations, but forming adjacencies are still > attempted via sending periodical hellos (or whatever keepalive packets). Great, that's how it should be and what I meant actually. > Note, that it really doesn't matter if routes are removed or marked as > unusable. Just argument about routing protocols keepalives is completely > irrelevant. OK, my point might not be the best, still I think it is valid. The fact that certain routes have to be excluded from the routing calculation even if their link is up and running makes it obvious that the routing daemon can take care of DORMANT_L3DOWN itself and we don't need the kernel to hide them. Nothing more. > The only problem is kernel itself > - routing daemon may add really usable route to the network behind > directly connected interface, but it will not be used by kernel, because > there is connected route in the kernel. We're talking about the auto routes in the main table, right? There is no reason for the addresses to be configured as a prefix, they can, and in my opinion should, be added as /~0 if the net is not guaranteed to be reachable at exactly this interface at all times. e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that 1.1.1.0/24 is always behind eth3, no matter what. If the route is supposed to be non-static then the address should be added as /32 to avoid the auto route in main table and the broadcast routes in the local table. Instead the routing daemon should be responsible of manging 1.1.1.0/24 at all times. Does this make sense? Now even if this is not acceptable and you really need a way to favour routes added by userspace over auto/connected routes then we should fix this in the routing itself and not with a new opertional state interface state. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf wrote: > Whatever is used, there is a link state detection implemented as a > heartbeat protocol. Even on DC the probing can be done periodically > while the link is up. My point is that there are such things and it's > basically the same as the kernel reporting !IFF_RUNNING. I cannot > imagine that you want the kernel to disable the relevant connected > routes to handle this case. OSPF doesn't care if there is connected route or not if !IFF_RUNNING. And it doesn't matter if it's demand circuit or usual one. If kernel reports link down (!IFF_RUNNING), no any packets are sent over this link - hellos, LS updates etc. Hellos and other heartbeat mechanisms are used to discover neighbours, not whether link works or not. If link is down, yes, it means that neighbour is down as well. But if neighbour is down, it doesn't mean that link is down. It just means that neighbour isn't considered for routing calculations, but forming adjacencies are still attempted via sending periodical hellos (or whatever keepalive packets). Note, that it really doesn't matter if routes are removed or marked as unusable. Just argument about routing protocols keepalives is completely irrelevant. And in fact Quagga (and probably other routing protocol implementations as well) ignores any connected route add/delete messages at all from kernel. It manages his own connected routes in his own RIB and connected routes are removed from there if !IFF_RUNNING. So, from point of view of routing calculations all works already. The only problem is kernel itself - routing daemon may add really usable route to the network behind directly connected interface, but it will not be used by kernel, because there is connected route in the kernel. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 15:39 > The same is true for any OSPF link. Only periodical hellos are used for > that in standard case. Demand circuits are there to suppress periodical > hellos and link state update refreshes - ie. if there is no real > application traffic going over demand circuit, link may stay down. What > RFC3883 does, is that it defines mechanism to detect if neighbour is > really still alive if link will be brought up and used for traffic. > That's all. Whatever is used, there is a link state detection implemented as a heartbeat protocol. Even on DC the probing can be done periodically while the link is up. My point is that there are such things and it's basically the same as the kernel reporting !IFF_RUNNING. I cannot imagine that you want the kernel to disable the relevant connected routes to handle this case. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf wrote: > Yes there is, it's called neighbour probing. Link state updates are > sent periodically and acked. It's there to handle crashed OSPF software > and oversubscription. Then after DCs were introcued in rfc1793 it got > unuseable and wasn't reintroduced until rfc3883, it's back there and > in use. The same is true for any OSPF link. Only periodical hellos are used for that in standard case. Demand circuits are there to suppress periodical hellos and link state update refreshes - ie. if there is no real application traffic going over demand circuit, link may stay down. What RFC3883 does, is that it defines mechanism to detect if neighbour is really still alive if link will be brought up and used for traffic. That's all. -- Hasso Tepper Elion Enterprises Ltd. WAN administrator - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 14:55 > Thomas Graf wrote: > > Then how are you going to implement OSPF which has its own link state > > detection? > > OSPF doesn't have any link state detection. If carrier is up, hello > packets are sent out to the interface, if carrier is down, no hellos. > With hellos adjacencies can be formed to exchange routing information. > I don't see any concept of "OSPF's own link state detection" here. Yes there is, it's called neighbour probing. Link state updates are sent periodically and acked. It's there to handle crashed OSPF software and oversubscription. Then after DCs were introcued in rfc1793 it got unuseable and wasn't reintroduced until rfc3883, it's back there and in use. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf wrote: > Then how are you going to implement OSPF which has its own link state > detection? OSPF doesn't have any link state detection. If carrier is up, hello packets are sent out to the interface, if carrier is down, no hellos. With hellos adjacencies can be formed to exchange routing information. I don't see any concept of "OSPF's own link state detection" here. regards, -- Hasso Tepper Elion Enterprises Ltd. WAN administrator - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
On Mon, 7 Nov 2005, Krzysztof Halasa wrote: Stefan Rompf <[EMAIL PROTECTED]> writes: But what happens when the link from (2) to (C) fails? The routing protocol would realize it (actually it was me how has added IFF_RUNNING support to quagga) and add a route to (C) via (3) on (B). However, the "connected" route would still have precedence and the linux IP stack on (2) would happily try to continue using the broken link to (C), even if we add juju magic to the link layer to recognize and drop L3 packets. Linux should just IMHO drop direct route when !IFF_RUNNING. ACK, please do. It's the kernel's route, the kernel should manage it. The reason setup of the connected route became automatic in 2.3 was, iirc, because its pointless to have an UP&RUNNING interface without a connected route - exact same reasoning suggests that interface which is !(UP&RUNNING) should have the kernel connected route removed by kernel. regards, -- Paul Jakma [EMAIL PROTECTED] [EMAIL PROTECTED] Key ID: 64A2FF6A Fortune: Leela: Is there some way to keep them from breeding? Paul: Cold showers don't work on Antarctic creatures. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Stefan Rompf <[EMAIL PROTECTED]> writes: > I don't want the kernel to remove these routes, that would be broken. Why exactly? They would be brought back when the carrier raises again, of course. > I want > them be marked as inactive or unusable and being ignored by the routing > decision while in this state. Why do we need inactive or unusable routes in the kernel? -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf <[EMAIL PROTECTED]> writes: > Well, firstly we cannot provide a perfect notification schema. Of course, that's exactly was I said several times :-) > What I actually want to say is that we probably cannot guarentee > an up-to-date informal of all parties, what we can do is avoid > is inconsistency. So considering the above case, netif_carrier_on() > has to be syncronized by the caller, this means the carrier state > may not change while we're inside the function. Yes. But netif_dormant_off() isn't synchronized and it can cause the "inconsistency". That's precisely the same effect with my patch which you try fighting with yours. Note that this isn't a real inconsistency, e.g., we never signal IFF_RUNNING if no carrier was detected at the time frame, or if the device was simply down etc. It's just rather like Heisenberg's uncertainty principle, you can't really know the flag state if it's changing in this millisecond. > Secondly we know > that when the function is called, there must have been no carrier > for some time _if_ the carrier is currently set to off. This > means that the above code is only issued for legal carrier > resumes. Right. But a layer driving "dormant" flag may have not yet noticed and might now be doing dormant_off(). > The fact that we go to dormant state before changing the > carrier state makes dev_oper_state() behave correctly because in > case we actually call dev_oper_state() between netif_dormant_on() > and setting the carrier bit the interface will still be seen as > down and then switches to dormant in a single atomic operation. That's correct. But it may switch to "up" just thereafter, due to notification latencies. >> Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING >> may be stuck for long. > > The IFF_WAIT flag can only be set by root, it is in his reponsibly > to handle this correctly. It is intended for userspace although > it can of course be used by drivers or stackable interfaces as well. I assume all flag changes from userspace require root. > Theoretically yes, it has the same "issues" as all other link states. Precisely. This is the same theory as with my patch. Thus, because we can't have such "strict" signaling without locks, and because we don't really need it (I can't imagine any routing solution which could need it or even notify the difference), I suggest giving it up and do all flag changes on their respective layers. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Am Montag 07 November 2005 22:04 schrieb Thomas Graf: > > As the kernel maintains this connected route, userspace is IMHO not > > responsible. Quagga could add its routes with a higher metric > > (administrative distance in cisco terms), kernel should use them as soon > > as the connected route becomes unavailable due to carrier failure (or > > dormant for wireless interfaces). > > Then how are you going to implement OSPF which has its own link state > detection? It's basically the same just that checking the carrier is > a very weak solution. What do you mean with OSPF link state detection? That a router and the (backup) designated router of a network must see each others hello before they exchange routing information? This has nothing to do with the question whether the kernel uses a kernel created route pointing to the oper down interface or not. > The routes, especially the local routes, must > exist while there is no carrier. I don't want the kernel to remove these routes, that would be broken. I want them be marked as inactive or unusable and being ignored by the routing decision while in this state. But maybe such a behaviour change should be configurable. > I think this is a very weak excuse to move something into the kernel > that can exist in userspace. Even then, the kernel creates this connected route implicitely when the interface is brought admin up, and is therefore responsible for it. Obviously we disagree here. > I'm not sure about what exactly you're argueing. Using a char for the > operational state may imply bit operations on some architectures. Ok, you got that point. > There goes your atomicity. dev->flags is certainly not protected by > dev_base_lock, I'm not even sure if the rtnl semaphore is acquired for > all write accesses to dev->flags. dev->flags is, look f.e. at dev_ioctl(); dev->state is not. But beside a short locking howto in core/dev.c, this is really not very well documented. May be this is the reason why some stuff in dev.c does not look totally race free ;-) > So you mean we should not follow the RFC in this case? What operational > status would it be then? OPER_UNKNOWN? Well actually I meant you shouldn't have this difference in your dev_oper_state() function: > + if (!netif_running(dev)) > + return IF_OPER_DOWN; > + > + if (!netif_carrier_ok(dev)) > + return IF_OPER_LOWERLAYERDOWN; Both should be IF_OPER_DOWN then. > > Can't see that requirement from RFC, and devices are normally initialized > > in their open()-"methods". > > It's an SNMP RFC, it's not supposed to tell us how to manage interface > states in the kernel. The carrier state is event driven, which means > that when we don't memorize this information while we announce the > interface as down we'd have to ask the drivers about the carrier state > when we put it up again which is just non-sense. OTOH, (ethernet) drivers usually check link state when they are opened, so there should be a valid operstate after dev->open() returns. Anyway, there is indeed no reason to force operstate to some value from core, beside setting it to unknown when struct net_device is allocated. Stefan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 20:25 > dose:~ # ip addr add 192.168.200.1/24 dev eth1 > dose:~ # ip link set eth1 up > dose:~ # ifconfig eth1 | fgrep UP > UP BROADCAST MULTICAST MTU:1500 Metric:1 > dose:~ # ip route show 192.168.200.0/24 > 192.168.200.0/24 dev eth1 proto kernel scope link src 192.168.200.1 > dose:~ # ping 192.168.200.10 > PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data. > From 192.168.200.1: icmp_seq=2 Destination Host Unreachable > dose:~ # arp > 192.168.200.10 (incomplete) eth1 > > Even though the interface is not IFF_RUNNING and queueing is therefore > disabled, the kernel created a route pointing to it and uses it. Yes, just like it is supposed to do. > > Isn't it the routing daemon's fault when preferring a route > > which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-) > > As the kernel maintains this connected route, userspace is IMHO not > responsible. Quagga could add its routes with a higher metric (administrative > distance in cisco terms), kernel should use them as soon as the connected > route becomes unavailable due to carrier failure (or dormant for wireless > interfaces). Then how are you going to implement OSPF which has its own link state detection? It's basically the same just that checking the carrier is a very weak solution. The routes, especially the local routes, must exist while there is no carrier. > Btw, adding a userspace workaround would be dangerous. If the routing daemon > crashes while a link is down, the userspace removed connected route would not > come back, leaving the router unreachable on this interface. *Very* bad. I think this is a very weak excuse to move something into the kernel that can exist in userspace. The routing daemon has to check the interfaces on start-up eventually bringing up interfaces so all you have to make sure is that it doesn't crash all the time and that you supervise the process somehow. > > The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it > > used to be is exactly atomicty. Since you write to oper_state from > > various locations the operation must be atomic to avoid corruption. > > No, it's because the IFF_* flags are a bit field that can only be changed > from > process context protected by rtnl/dev_base_lock, different to > __LINK_STATE_NO_CARRIER (or a operational state field) that must be > accessible from interrupts. I'm not sure about what exactly you're argueing. Using a char for the operational state may imply bit operations on some architectures. There goes your atomicity. dev->flags is certainly not protected by dev_base_lock, I'm not even sure if the rtnl semaphore is acquired for all write accesses to dev->flags. > > > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag. > > > > Right but the RFC specifies that admin. down implies operational down: > > Indeed, but we should not set OPER_DOWN if and only if admin is down ;-) So you mean we should not follow the RFC in this case? What operational status would it be then? OPER_UNKNOWN? > > This also describes why your scheme cannot work, we have to memorize > > the status of for example carrier state. > > Can't see that requirement from RFC, and devices are normally initialized in > their open()-"methods". It's an SNMP RFC, it's not supposed to tell us how to manage interface states in the kernel. The carrier state is event driven, which means that when we don't memorize this information while we announce the interface as down we'd have to ask the drivers about the carrier state when we put it up again which is just non-sense. I have a hard time seeing you first not following the RFC and then relying on it like it would be the overall perfect solution. > Ok, so I hope a finally managed to make my point clear about the different > dormant states ;-) I understand your idea but I don't agree. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Am Montag 07 November 2005 18:45 schrieb Thomas Graf: > I'm sorry, I don't get the point. I assume the above is not a carrier > failure but something else to be detected by a keepalive protocol. No, it's a carrier failure. Try this, eth1 is an ethernet interface without the cable connected: dose:~ # ip addr add 192.168.200.1/24 dev eth1 dose:~ # ip link set eth1 up dose:~ # ifconfig eth1 | fgrep UP UP BROADCAST MULTICAST MTU:1500 Metric:1 dose:~ # ip route show 192.168.200.0/24 192.168.200.0/24 dev eth1 proto kernel scope link src 192.168.200.1 dose:~ # ping 192.168.200.10 PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data. >From 192.168.200.1: icmp_seq=2 Destination Host Unreachable dose:~ # arp 192.168.200.10 (incomplete) eth1 Even though the interface is not IFF_RUNNING and queueing is therefore disabled, the kernel created a route pointing to it and uses it. > Isn't it the routing daemon's fault when preferring a route > which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-) As the kernel maintains this connected route, userspace is IMHO not responsible. Quagga could add its routes with a higher metric (administrative distance in cisco terms), kernel should use them as soon as the connected route becomes unavailable due to carrier failure (or dormant for wireless interfaces). And now if we support this, we'd need a dormant state with the route disabled and one with it enabled. Btw, adding a userspace workaround would be dangerous. If the routing daemon crashes while a link is down, the userspace removed connected route would not come back, leaving the router unreachable on this interface. *Very* bad. > > OTOH, we don't need to be completely atomic as the > > netif_carrier_*-functions already require driver controlled > > synchronisation. We just need to make sure that the caches are coherent > > before linkwatch kernel thread runs. > > The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it > used to be is exactly atomicty. Since you write to oper_state from > various locations the operation must be atomic to avoid corruption. No, it's because the IFF_* flags are a bit field that can only be changed from process context protected by rtnl/dev_base_lock, different to __LINK_STATE_NO_CARRIER (or a operational state field) that must be accessible from interrupts. > > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag. > > Right but the RFC specifies that admin. down implies operational down: Indeed, but we should not set OPER_DOWN if and only if admin is down ;-) > This also describes why your scheme cannot work, we have to memorize > the status of for example carrier state. Can't see that requirement from RFC, and devices are normally initialized in their open()-"methods". > We're not far apart, the difference is for the additional L3 disabled > dormant state which I don't understand yet and secondaly that I continue > to keep the current link states with the addition of a new dormant state > and then translate it into the RFC2863 operational status. Ok, so I hope a finally managed to make my point clear about the different dormant states ;-) Stefan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Krzysztof Halasa <[EMAIL PROTECTED]> 2005-11-07 18:04 > Thomas Graf <[EMAIL PROTECTED]> writes: > > void netif_carrier_on(struct net_device *dev) > > { > > + if ((dev->flags & IFF_WAIT) && > > + test_bit(__LINK_STATE_NOCARRIER, &dev->state)) > > + netif_dormant_on(dev); > ^^ > Does not prevent spurious momentary IFF_RUNNING: netif_dormant_off() > may come after it before "the session" is "established" as an effect > of pre-carrier-flap activity (due to asynchronous notification) - so, > in comparison to my patch, I don't see the gain. Not sure what locking > (short of a single spinlock for all those flags) could solve that but > I'm not a spinlock expert. Well, firstly we cannot provide a perfect notification schema. Take for example the carrier, when we reach netif_carrier_off(), the carrier might in fact be up already. We then fire linkwatch and while we wait for the workqueue to handle it, the carrier state may theoretically change again. Events to userspace can get lost due to various reasons, for example memory pressure. What I actually want to say is that we probably cannot guarentee an up-to-date informal of all parties, what we can do is avoid is inconsistency. So considering the above case, netif_carrier_on() has to be syncronized by the caller, this means the carrier state may not change while we're inside the function. Secondly we know that when the function is called, there must have been no carrier for some time _if_ the carrier is currently set to off. This means that the above code is only issued for legal carrier resumes. The fact that we go to dormant state before changing the carrier state makes dev_oper_state() behave correctly because in case we actually call dev_oper_state() between netif_dormant_on() and setting the carrier bit the interface will still be seen as down and then switches to dormant in a single atomic operation. > Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING > may be stuck for long. The IFF_WAIT flag can only be set by root, it is in his reponsibly to handle this correctly. It is intended for userspace although it can of course be used by drivers or stackable interfaces as well. > > if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) > > linkwatch_fire_event(dev); > > - if (netif_running(dev)) > > + > > + if (netif_running(dev) && !netif_dormant(dev)); > > __netdev_watchdog_up(dev); > ^ > Could be called even if the above netif_dormant_on() set the status. > Hope the protocol resets it when it gets 'carrier down' event. Theoretically yes, it has the same "issues" as all other link states. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Stefan Rompf <[EMAIL PROTECTED]> writes: > But what happens when the link from (2) to (C) fails? The routing protocol > would realize it (actually it was me how has added IFF_RUNNING support to > quagga) and add a route to (C) via (3) on (B). However, the "connected" > route > would still have precedence and the linux IP stack on (2) would happily try > to continue using the broken link to (C), even if we add juju magic to the > link layer to recognize and drop L3 packets. Linux should just IMHO drop direct route when !IFF_RUNNING. Another thing is if you have quagga (never used it, but I done things like that with mrtd and even without IFF_RUNNING I think) you can configure /32 mask on the managed interface and let quagga add dynamic routes. That's what quagga and not the kernel is for - dynamic routing, right? > OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, > but of course layer L3 must consider routes pointing to them, or they would > never dial. I'm not entirely against L3DOWN/UP idea. Normally dial-up devices show IFF_RUNNING and the internal state is not visible. I would rather do it with another flag (and another patch, when they are users), signaling that the device isn't yet connected even if it's IFF_RUNNING. > OTOH, we don't need to be completely atomic as the netif_carrier_*-functions > already require driver controlled synchronisation. We can't assume anything like this. > We just need to make sure > that the caches are coherent before linkwatch kernel thread runs. wmb() is for gcc and bus reordering etc, not needed here. > Bad > problem > here: To allow software/userspace controlled probe and association packets, > wireless devices must not go to netif_carrier_off() anymore when the > association is lost I'm told it doesn't seem a problem not limitation at all, right? -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
From: Krzysztof Halasa <[EMAIL PROTECTED]> Date: Mon, 07 Nov 2005 17:37:40 +0100 > David? I'm still 20,000+ emails deep in backlog after my 2 week vaction, so I won't be able to look into this for at least a few more days, at best. Also, please use "[EMAIL PROTECTED]" for emails to me. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Stefan Rompf <[EMAIL PROTECTED]> writes: > + /* Operation state */ > + unsigned char operstate_kernel; > + unsigned char operstate_useroverride; 8 bits don't make sense. OTOH I think all writes and possibly reads would need spinlocks. > +void netif_set_kernel_operstate(struct net_device *dev, unsigned char > newstate) { > + char oldstate = dev->operstate_kernel; > + if (oldstate != newstate) { > + dev->operstate_kernel = newstate; > + wmb(); > + linkwatch_fire_event(dev); > + } > +} You can't assume any serialization for this function as it could be called from anywhere anytime - it would require a spinlock (even with simple Ethernet). OTOH I think we could just do that spinlock and forget about it, given all read and write accesses are atomic (so we can safely read without spinlock). Even without "user override", I don't know how do you plan entering the following device states: + OPER_UNKNOWN = 0, + OPER_DOWN = 16, + OPER_LOWERLAYERDOWN, + OPER_TESTING = 32, + OPER_DORMANTL3DOWN = 48, /* OS queue start from here */ + OPER_DORMANTL3UP = 64, /* L3 should try using interface */ + OPER_UP = 96 I understand OPEN_DOWN and OPEN_UP (despite their names) aren't used for setting administrative status? What's their exact meaning (WRT administrative status)? I don't exactly understand relation between operstate_kernel and __LINK_STATE_* flags. Do you plan removing them or not? -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 17:41 > > Can't we put the driver, stacked interface, etc. in charge for dropping > > packets which shouldn't be let through yet? I would favour this and have > > the dormant state limited to link layer responsibilities. > > Imagine this situation: > > +---+ > | 1 | > +---+ >|+---+ +---+ >+| 2 |---| 3 | > A +---+ B +---+ > | | +---+ > +---+-| 4 | > C +---+ > > 1,2,3,4 are acting as routers with the quagga daemon running. A,B,C are > networks. > > Now (1) wants to reach (4). Normally, the packet would travel to (2) via (A), > from there to (4) via (C). > > But what happens when the link from (2) to (C) fails? The routing protocol > would realize it (actually it was me how has added IFF_RUNNING support to > quagga) and add a route to (C) via (3) on (B). However, the "connected" route > would still have precedence and the linux IP stack on (2) would happily try > to continue using the broken link to (C), even if we add juju magic to the > link layer to recognize and drop L3 packets. > > That's why I think L3 protocols should be responsible for not using a broken > link and why we need OPER_DORMANTL3DOWN. I'm sorry, I don't get the point. I assume the above is not a carrier failure but something else to be detected by a keepalive protocol. That keepalive protocol could put the interface into dormant state and thus control the IFF_RUNNING flag with the queues enabled so it can continue sending. Isn't it the routing daemon's fault when preferring a route which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-) > OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, > but of course layer L3 must consider routes pointing to them, or they would > never dial. This is clear, the packet must get to the interface in order to actually trigger the dial. > OTOH, we don't need to be completely atomic as the netif_carrier_*-functions > already require driver controlled synchronisation. We just need to make sure > that the caches are coherent before linkwatch kernel thread runs. The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it used to be is exactly atomicty. Since you write to oper_state from various locations the operation must be atomic to avoid corruption. > > +#define IFF_LINK_UP0x2000 /* lower link up > > */ > > +#define IFF_DORMANT0x4000 /* waiting for external event > > */ > > +#define IFF_WAIT 0x8000 /* auto. enter dormant state*/ > > > +int dev_oper_state(struct net_device *dev) > > +{ > > + if (!netif_device_present(dev)) > > + return IF_OPER_NOTPRESENT; > > + > > + if (!netif_running(dev)) > > + return IF_OPER_DOWN; > > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag. Right but the RFC specifies that admin. down implies operational down: If ifAdminStatus is down(2) then ifOperStatus should be down(2). If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic; it should change to dormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection); it should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state; it should remain in the notPresent(6) state if the interface has missing (typically, hardware) components." This also describes why your scheme cannot work, we have to memorize the status of for example carrier state. > > + if (!netif_carrier_ok(dev)) > > + return IF_OPER_LOWERLAYERDOWN; > > No, this is OPER_DOWN. LOWERLAYERDOWN is IMHO f.e. for VLANs stacked on a > physical interface that is OPER_DOWN. Correct, I agree although I don't understand this. The physical lower link is just like a the lower interface(s) of a stackable interface. I can't seem to understand how the RFC proposes to retreive the carrier state while the interface is administratively down. > > void netif_carrier_on(struct net_device *dev) > > { > > + if ((dev->flags & IFF_WAIT) && > > + test_bit(__LINK_STATE_NOCARRIER, &dev->state)) > > + netif_dormant_on(dev); > > That would be for avoiding the userspace supplicant race I assume. Bad > problem > here: To allow software/userspace controlled probe and association packets, > wireless devices must not go to netif_carrier_off() anymore when the > association is lost or must have a way to transmit packets without using the > kernel scheduler ;-( May be wireless developers could live with this > limitation. The wireless driver can just issue netif_dormant_on(). the automatic dormant state when the carrier is on again is just for user
Re: Patch: RFC2863 #1 (incomplete)
Thomas Graf <[EMAIL PROTECTED]> writes: > Here is how I see things, it's not complete and lacks proper locking but > enough to get the point. Much better IMHO, though my opinion may of course be subjective due to similarity to my own patch. > void netif_carrier_on(struct net_device *dev) > { > + if ((dev->flags & IFF_WAIT) && > + test_bit(__LINK_STATE_NOCARRIER, &dev->state)) > + netif_dormant_on(dev); ^^ Does not prevent spurious momentary IFF_RUNNING: netif_dormant_off() may come after it before "the session" is "established" as an effect of pre-carrier-flap activity (due to asynchronous notification) - so, in comparison to my patch, I don't see the gain. Not sure what locking (short of a single spinlock for all those flags) could solve that but I'm not a spinlock expert. Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING may be stuck for long. > if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) > linkwatch_fire_event(dev); > - if (netif_running(dev)) > + > + if (netif_running(dev) && !netif_dormant(dev)); > __netdev_watchdog_up(dev); ^ Could be called even if the above netif_dormant_on() set the status. Hope the protocol resets it when it gets 'carrier down' event. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Am Montag 07 November 2005 14:50 schrieb Thomas Graf: > > -migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN > > I think that bonding should check on the administrative status of > its slaves as well. Why not check for OPER_UP? Yes, currently it seems to ignore admin status and does not even maintain its own operative status. > I still don't see the point in OPER_DORMANTL3DOWN vs. OPER_DORMANTL3UP. Seems my English is really bad ;-) > Can't we put the driver, stacked interface, etc. in charge for dropping > packets which shouldn't be let through yet? I would favour this and have > the dormant state limited to link layer responsibilities. Imagine this situation: +---+ | 1 | +---+ |+---+ +---+ +| 2 |---| 3 | A +---+ B +---+ | | +---+ +---+-| 4 | C +---+ 1,2,3,4 are acting as routers with the quagga daemon running. A,B,C are networks. Now (1) wants to reach (4). Normally, the packet would travel to (2) via (A), from there to (4) via (C). But what happens when the link from (2) to (C) fails? The routing protocol would realize it (actually it was me how has added IFF_RUNNING support to quagga) and add a route to (C) via (3) on (B). However, the "connected" route would still have precedence and the linux IP stack on (2) would happily try to continue using the broken link to (C), even if we add juju magic to the link layer to recognize and drop L3 packets. That's why I think L3 protocols should be responsible for not using a broken link and why we need OPER_DORMANTL3DOWN. OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, but of course layer L3 must consider routes pointing to them, or they would never dial. I see this clearly as a kernel problem, we should not expect userspace to workaround this limitation. And it's even useful without a routing daemon if a host has more than one network interface. > char access is not guaranteed to be atomic, just use int for the states. OTOH, we don't need to be completely atomic as the netif_carrier_*-functions already require driver controlled synchronisation. We just need to make sure that the caches are coherent before linkwatch kernel thread runs. > +#define IFF_LINK_UP 0x2000 /* lower link up*/ > +#define IFF_DORMANT 0x4000 /* waiting for external event */ > +#define IFF_WAIT 0x8000 /* auto. enter dormant state*/ > +int dev_oper_state(struct net_device *dev) > +{ > + if (!netif_device_present(dev)) > + return IF_OPER_NOTPRESENT; > + > + if (!netif_running(dev)) > + return IF_OPER_DOWN; No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag. > + if (!netif_carrier_ok(dev)) > + return IF_OPER_LOWERLAYERDOWN; No, this is OPER_DOWN. LOWERLAYERDOWN is IMHO f.e. for VLANs stacked on a physical interface that is OPER_DOWN. > void netif_carrier_on(struct net_device *dev) > { > + if ((dev->flags & IFF_WAIT) && > + test_bit(__LINK_STATE_NOCARRIER, &dev->state)) > + netif_dormant_on(dev); That would be for avoiding the userspace supplicant race I assume. Bad problem here: To allow software/userspace controlled probe and association packets, wireless devices must not go to netif_carrier_off() anymore when the association is lost or must have a way to transmit packets without using the kernel scheduler ;-( May be wireless developers could live with this limitation. Your solution would work for Krzysztof and for WPA/802.1X, but with the limitation I've just mentioned. However, as it does not define a clear point for L3 protocols to start, it would not enhance support for routing daemons and dialup interfaces, what my idea can do. As I've already used linux boxes as routers in a redundant topology, I consider this important. What to do now? I have the impression that most arguments are said. Maybe someone like acme or davem should chime in and decide which direction to take? Stefan - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
Stefan Rompf <[EMAIL PROTECTED]> writes: > Unfortunately, me and Krzysztof have totally different opinions if this > complexity is needed. I wouldn't call it "unfortunate". I'll happily use your interface if it's available and I assume it will be available when it's needed, clean, robust and complete. David? Jeff? Your opinion maybe? Not sure if you have my patch sent a week ago, can resend. -- Krzysztof Halasa - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Patch: RFC2863 #1 (incomplete)
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 00:42 > -userspace interaction, including userspace supplicant + locking of > operstate_useroverride > -migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN I think that bonding should check on the administrative status of its slaves as well. Why not check for OPER_UP? > I'd leave making IPV4 not consider routes when interface is for 2.6.16. > Unfortunately, me and Krzysztof have totally different opinions if this > complexity is needed. I think yes, as it will make it possible to use a > userspace supplicant for connection completion (WPA), layer 3 protocols not > considering down interfaces for routing decisions (imporant for quagga), dial > on demand interfaces, clear state forwarding for stacked interfaces. This is > stuff required now, not in a distant future, and if we touch these flags > anyway, let's do it right. I still don't see the point in OPER_DORMANTL3DOWN vs. OPER_DORMANTL3UP. Can't we put the driver, stacked interface, etc. in charge for dropping packets which shouldn't be let through yet? I would favour this and have the dormant state limited to link layer responsibilities. > This is not yet for applying, just compilation tested, comments welcome. Is > my > wmb() to avoid 4 byte atomic_t the right idea? char access is not guaranteed to be atomic, just use int for the states. Here is how I see things, it's not complete and lacks proper locking but enough to get the point. Index: linux-2.6/include/linux/if.h === --- linux-2.6.orig/include/linux/if.h +++ linux-2.6/include/linux/if.h @@ -42,8 +42,11 @@ #define IFF_SLAVE 0x800 /* slave of a load balancer */ #define IFF_MULTICAST 0x1000 /* Supports multicast */ +#define IFF_LINK_UP0x2000 /* lower link up*/ +#define IFF_DORMANT0x4000 /* waiting for external event */ +#define IFF_WAIT 0x8000 /* auto. enter dormant state*/ -#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_RUNNING) +#define IFF_VOLATILE (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_LINK_UP|IFF_DORMANT|IFF_RUNNING) #define IFF_PORTSEL0x2000 /* can set media type */ #define IFF_AUTOMEDIA 0x4000 /* auto media select active */ @@ -80,6 +83,16 @@ #define IF_PROTO_FR_ETH_PVC 0x200B #define IF_PROTO_RAW0x200C /* RAW Socket */ +/* RFC 2863 operational status */ +enum { + IF_OPER_UNKNOWN, + IF_OPER_NOTPRESENT, + IF_OPER_DOWN, + IF_OPER_LOWERLAYERDOWN, + IF_OPER_TESTING, + IF_OPER_DORMANT, + IF_OPER_UP, +}; /* * Device mapping structure. I'd just gone off and designed a Index: linux-2.6/include/linux/netdevice.h === --- linux-2.6.orig/include/linux/netdevice.h +++ linux-2.6/include/linux/netdevice.h @@ -230,7 +230,8 @@ enum netdev_state_t __LINK_STATE_SCHED, __LINK_STATE_NOCARRIER, __LINK_STATE_RX_SCHED, - __LINK_STATE_LINKWATCH_PENDING + __LINK_STATE_LINKWATCH_PENDING, + __LINK_STATE_DORMANT, }; @@ -751,6 +752,23 @@ static inline void netif_device_attach(s } } +static inline void netif_dormant_on(struct net_device *dev) +{ + if (!test_and_set_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline void netif_dormant_off(struct net_device *dev) +{ + if (test_and_clear_bit(__LINK_STATE_DORMANT, &dev->state)) + linkwatch_fire_event(dev); +} + +static inline int netif_dormant(struct net_device *dev) +{ + return test_bit(__LINK_STATE_DORMANT, &dev->state); +} + /* * Network interface message level settings */ @@ -926,6 +944,7 @@ extern void dev_mcast_init(void); extern int netdev_max_backlog; extern int weight_p; extern int netdev_set_master(struct net_device *dev, struct net_device *master); +extern int dev_oper_state(struct net_device *dev); extern int skb_checksum_help(struct sk_buff *skb, int inward); /* rx skb timestamps */ extern voidnet_enable_timestamp(void); Index: linux-2.6/net/core/dev.c === --- linux-2.6.orig/net/core/dev.c +++ linux-2.6/net/core/dev.c @@ -994,6 +994,23 @@ int call_netdevice_notifiers(unsigned lo return notifier_call_chain(&netdev_chain, val, v); } +int dev_oper_state(struct net_device *dev) +{ + if (!netif_device_present(dev)) + return IF_OPER_NOTPRESENT; + + if (!netif_running(dev)) + return IF_OPER_DOWN; + + if (!netif_carrier_ok(dev)) + return IF_OPER_LOWERLAYERDOWN; + + if (netif_dormant(dev)) + return IF_OPER_DO
Patch: RFC2863 #1 (incomplete)
Hi, attached is my current state of RFC2863 implementation. It defines the new status fields and operations. Yet not implemented but in my 2.6.15 time frame is -userspace interaction, including userspace supplicant + locking of operstate_useroverride -migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN -Documentation ;-) I'd leave making IPV4 not consider routes when interface is --- linux-2.6.14/include/linux/netdevice.h.old 2005-11-02 11:08:10.0 +0100 +++ linux-2.6.14/include/linux/netdevice.h 2005-11-07 00:23:12.0 +0100 @@ -228,11 +228,35 @@ enum netdev_state_t __LINK_STATE_START, __LINK_STATE_PRESENT, __LINK_STATE_SCHED, - __LINK_STATE_NOCARRIER, + __OBSOLETE_LINK_STATE_NOCARRIER, __LINK_STATE_RX_SCHED, __LINK_STATE_LINKWATCH_PENDING }; +/* Operational state of a network device, mostly taken from RFC2863 */ +enum netdev_operstate_t +{ + OPER_UNKNOWN = 0, + OPER_NOTPRESENT, /* unused, placeholder */ + OPER_DOWN = 16, + OPER_LOWERLAYERDOWN, + OPER_TESTING = 32, + OPER_DORMANTL3DOWN = 48, /* OS queue start from here */ + OPER_DORMANTL3UP = 64, /* L3 should try using interface */ + OPER_UP = 96 +}; + +/* Userspace application may limit visible external state to + * DORMANTL3DOWN or DORMANTL3UP */ +enum netdev_operoverride_t { + OPERU_IGNORE, /* report state as defined by operstate_kernel */ + OPERU_DORMANTL3DOWN, /* never report >= DORMANTL3DOWN */ + OPERU_DORMANTL3UP, /* never report >= DORMANTL3UP */ + OPERU_UP, /* report kernel state up to UP */ + OPERU_UP_UNTIL_DORMANTL3DOWN, /* change user override to OPERU_DORMANTL3DOWN on kernel <= OPER_DORMANTL3UP */ + OPERU_UP_UNTIL_DORMANTL3UP /* change user override to OPERU_DORMANTL3UP on kernel <= OPER_DORMANTL3UP */ +}; + /* * This structure holds at boot time configured netdevice settings. They @@ -346,7 +370,10 @@ struct net_device struct net_device *master; /* Pointer to master device of a group, * which this device is member of. */ - + /* Operation state */ + unsigned char operstate_kernel; + unsigned char operstate_useroverride; + /* Interface address info. */ unsigned char perm_addr[MAX_ADDR_LEN]; /* permanent hw address */ unsigned char addr_len; /* hardware address length */ @@ -709,23 +736,36 @@ static inline void dev_put(struct net_de #define __dev_put(dev) atomic_dec(&(dev)->refcnt) #define dev_hold(dev) atomic_inc(&(dev)->refcnt) -/* Carrier loss detection, dial on demand. The functions netif_carrier_on - * and _off may be called from IRQ context, but it is caller +/* Carrier loss detection, dial on demand. The function + * netif_set_kernel_operstate may be called from IRQ context, but it is caller * who is responsible for serialization of these calls. */ extern void linkwatch_fire_event(struct net_device *dev); -static inline int netif_carrier_ok(const struct net_device *dev) -{ - return !test_bit(__LINK_STATE_NOCARRIER, &dev->state); -} - extern void __netdev_watchdog_up(struct net_device *dev); -extern void netif_carrier_on(struct net_device *dev); +extern void netif_set_kernel_operstate(struct net_device *dev, unsigned char newstate); + +static inline unsigned char netif_get_kernel_operstate(const struct net_device *dev) { + return dev->operstate_kernel; +} + +/* Consolidated state. Should be called from process context */ +extern unsigned char netif_get_operstate(const struct net_device *dev); -extern void netif_carrier_off(struct net_device *dev); +/* Backward compatibility functions, use netif_set_kernel_operstate + instead */ +static inline void netif_carrier_on(struct net_device *dev) { + netif_set_kernel_operstate(dev, OPER_UP); +} +static inline void netif_carrier_off(struct net_device *dev) { + netif_set_kernel_operstate(dev, OPER_DOWN); +} +static inline int netif_carrier_ok(const struct net_device *dev) { + return (dev->operstate_kernel >= OPER_DORMANTL3UP || + dev->operstate_kernel == OPER_UNKNOWN); +} /* Hot-plugging. */ static inline int netif_device_present(struct net_device *dev) --- linux-2.6.14/net/core/link_watch.c.old 2005-06-17 21:48:29.0 +0200 +++ linux-2.6.14/net/core/link_watch.c 2005-11-06 18:06:14.0 +0100 @@ -75,7 +75,14 @@ void linkwatch_run_queue(void) clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); if (dev->flags & IFF_UP) { - if (netif_carrier_ok(dev)) { + if (netif_get_kernel_operstate(dev) < OPER_UP) { +if (dev->operstate_useroverride == OPERU_UP_UNTIL_DORMANTL3DOWN) + dev->operstate_useroverride = OPERU_DORMANTL3DOWN; +else if (dev->operstate_useroverride == OPERU_UP_UNTIL_DORMANTL3UP) + dev->operstate_useroverride = OPERU_DORMANTL3UP; + } + + if (netif_get_operstate(dev) >= OPER_DORMANTL3DOWN) { WARN_ON(dev->qdisc_sleeping == &noop_qdisc); dev_activate(dev); } else @@ -141,4 +148,32 @@ void linkwatch_fire_event(struct net_dev } } -EXPORT_SYMBOL(linkwatch_fire_event); +void netif_set_kernel_operstate(struct net_device *dev, unsigned char newstate) {