Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Thomas Graf
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-09 17:13
> Am Mittwoch 09 November 2005 15:25 schrieb Thomas Graf:
> 
> > It's still behind eth3 but not reachable at the moment. If you know
> > that the network can be reached via other interfaces then just don't
> > statically assign a route for it and have the routing daemon take care
> > of it, where's the problem? Nobody forces you to configure your
> > addresses as prefixes, it's just a usability thing, really.
> 
> This doesn't work. What do you think a routing daemon does when it is told to 
> announce a network/mask? It announces all interfaces that are in this range 
> with their network address, calculated by address and mask configured on that 
> interface. With your suggestion, it would announce /32, not the network 
> behind it.

Obviously you would tell the routing daemon of the network, it would
then add it to the kernel and to its internal connected routes list.

> And if that's not enough, the system would be totally unreachable without a 
> running routing daemon in this configuration. Never ever give your phone 
> number to an admin of such a router.

Yes, what's your next argument? Running dhcp on every interface by
default because the admin could forget adding the address at bootup
and the system would be unreachable?
 
> Wait, not totally. Broadcasts would always go out of this interface due to an 
> implicit route in the local routing table that should not be touched from 
> userspace.

The broadcast routes in the local table are not generated for /31
and /32 prefixes.

> > That should work, you have have two default routes, fib_detect_death()
> > should notice that the first default route is dead and use the other.
> > It implies the neighbour entry of the default gateway to be unreachable
> > though so it might take a while to notice.
> 
> For todays' failover requirements, It takes ages for an arp entry to timeout.

I agree, we might just want to force a neighbour probe when the carrier
is lost to speed this up. OTOH, userspace can always replace the default
route when receiving !IFF_RUNNING notifications.

My point is that in this example, disabling the route because of a
carrier loss is just non-sense, the carrier state wouldn't help in 
many cases because f.e. although the carrier is up, the gateway is
unreachable.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Krzysztof Halasa
Thomas Graf <[EMAIL PROTECTED]> writes:

> Sorry but OPER_DORMANT_L3DOWN is what this discussion was all about,
> if you're talking of something else you should say so.

OPER_DORMANT_L3DOWN, yes. So we now have about 1 day if we want (do we
still?) to have something in 2.6.15 if I calculate correctly, right?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Krzysztof Halasa
Thomas Graf <[EMAIL PROTECTED]> writes:

> OK, I'm gonna explain how I see it a bit more detailed to hopefully
> make things clearer. Somehow it seems to be assumed that addresses have
> to be configured as prefixes, already including the netmask of the
> subnet it is on. This feature added recently

i.e., to 4.3BSD IIRC?
No offense, but this all is crossing limits of insanity.

We can workaround that problem with routing daemon but it doesn't mean
there is no problem.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Krzysztof Halasa
Hasso Tepper <[EMAIL PROTECTED]> writes:

> No. This mandates usage of routing daemon even in my laptop which is
> nonsense.

Precisely. If the routes are dropped on !IFF_RUNNING, one could

ifconfig 10.0.0.1 peer 10.0.0.2 dev eth0
ifconfig 10.0.0.1 peer 10.0.0.2 dev irdanet0
ifconfig 10.0.0.1 peer 10.0.0.2 dev usbnet0
ifconfig 10.0.0.1 peer 10.0.0.2 dev adsl0

and it would work fine.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Stefan Rompf
Am Mittwoch 09 November 2005 15:25 schrieb Thomas Graf:

> It's still behind eth3 but not reachable at the moment. If you know
> that the network can be reached via other interfaces then just don't
> statically assign a route for it and have the routing daemon take care
> of it, where's the problem? Nobody forces you to configure your
> addresses as prefixes, it's just a usability thing, really.

This doesn't work. What do you think a routing daemon does when it is told to 
announce a network/mask? It announces all interfaces that are in this range 
with their network address, calculated by address and mask configured on that 
interface. With your suggestion, it would announce /32, not the network 
behind it.

Now you try redistribute static routes. However, you cannot convert an 
external route as intra-area in OSPF, as you would need to.

And if that's not enough, the system would be totally unreachable without a 
running routing daemon in this configuration. Never ever give your phone 
number to an admin of such a router.

Wait, not totally. Broadcasts would always go out of this interface due to an 
implicit route in the local routing table that should not be touched from 
userspace. Should the routing daemon do another bit of guesswork here?

It would also be interesting to see what a dhcp server does if it cannot find 
the interface it should serve leases on. Or keepalived.

This idea is so insane that you should bury it as soon as possible. Really.

> IFF_RUNNING is a very weak way to detect if an interface is unusable,
> not all drivers correctly report the carrier state and overubscriptions,
> or carrier losses on virtual links are a far more likely cause for a link
> getting unusable.  We should solve this issue correctly and not just work
> around it.

Carrier detection is being added to more and more (ethernet) drivers. Devices 
that support it can benefit from this feature, the rest cannot.

> That should work, you have have two default routes, fib_detect_death()
> should notice that the first default route is dead and use the other.
> It implies the neighbour entry of the default gateway to be unreachable
> though so it might take a while to notice.

For todays' failover requirements, It takes ages for an arp entry to timeout.

Anyway, we might be getting too off topic. Let's first see whether my RFC2863 
idea or the dormant/protocol flag makes it.

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Thomas Graf
> > We're talking about the auto routes in the main table, right? There
> > is no reason for the addresses to be configured as a prefix, they
> > can, and in my opinion should, be added as /~0 if the net is not
> > guaranteed to be reachable at exactly this interface at all times.
> > e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that
> > 1.1.1.0/24 is always behind eth3, no matter what.
> 
> Really? How it can be behind eth3 if state of eth3 says that "no packets
> to this link - it's DOWN"?

It's still behind eth3 but not reachable at the moment. If you know
that the network can be reached via other interfaces then just don't
statically assign a route for it and have the routing daemon take care
of it, where's the problem? Nobody forces you to configure your
addresses as prefixes, it's just a usability thing, really.

> I'm not talking about new operational state of interface. I'm talking
> about simple rule - if link is known to be unusable (!IFF_RUNNING), don't
> use routes pointed to this link. Whether they are connected routes
> created by kernel or static routes created by user.

Sorry but OPER_DORMANT_L3DOWN is what this discussion was all about,
if you're talking of something else you should say so.

IFF_RUNNING is a very weak way to detect if an interface is unusable,
not all drivers correctly report the carrier state and overubscriptions,
or carrier losses on virtual links are a far more likely cause for a link
getting unusable.  We should solve this issue correctly and not just work
around it. Of course you could force everyone to use loopback devices
implementing a heartbeat protocol which would control IFF_RUNNING but
that's probably not what we want. In fact I think we should have a
userspace daemon checking the links and putting them into dormant state
upon failure so IFF_RUNNING gets cleared.

> Simple scenario with my personal laptop - wired ethernet
> (192.168.1.10/24) and wireless (172.16.100.10/24). Two defaults, wireless
> one is with higher metric. If I walk away from my table and disconnect
> wired ethernet cable, I'd expect that it wouldn't use default pointed to
> wired network _and_ 192.168.1.0/24 pointed to it any more. Default
> pointed to the wireless is there, wireless link is known to be usable
> (IFF_RUNNING) and 192.168.1.0/24 is reachable via wireless as well.

That should work, you have have two default routes, fib_detect_death()
should notice that the first default route is dead and use the other.
It implies the neighbour entry of the default gateway to be unreachable
though so it might take a while to notice. If not this would be a bug
and we'd have to look into this separately.

* Paul Jakma <[EMAIL PROTECTED]> 2005-11-09 11:40
> The automatic kernel "connected" route should either be removed, or 
> ignored by the kernel until the interface is fully up (UP&RUNNING).
> 
> E.g:
> 
>   A-B
> | |
>-
> |
> C
> 
> B and A have a PtP link to each other and each have an interface to a 
> shared network, which has C on it. B's link to the subnet fails 
> (carrier is lost, IFF_RUNNING cleared), B now no longer can 
> communicate with C, even if it has a route to C via A (e.g. a default 
> route) because the kernel routing table still contains a valid 
> connected route for the shared subnet pointing out the ~IFF_RUNNING 
> interface.
> 
> That connected route of the kernel should be removed (or marked 
> invalid/unusable in some way) while the interface is ~IFF_RUNNING.

OK, I'm gonna explain how I see it a bit more detailed to hopefully
make things clearer. Somehow it seems to be assumed that addresses have
to be configured as prefixes, already including the netmask of the
subnet it is on. This feature added recently is really only there to
make life easier for the casual static routing case. It makes setting
up static routes painless. As soon as the routes aren't static anymore
they should be added by the daemon in charge, commonly the routing
daemon. If the routing daemon sees the interface going down and has an
alternative route it is in his responsibility to replace the route
with the alternative route.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Paul Jakma

Hi Thomas,

On Wed, 9 Nov 2005, Thomas Graf wrote:

Now even if this is not acceptable and you really need a way to 
favour routes added by userspace over auto/connected routes then we 
should fix this in the routing itself and not with a new opertional 
state interface state.


Yes, exactly.

The automatic kernel "connected" route should either be removed, or 
ignored by the kernel until the interface is fully up (UP&RUNNING).


E.g:

A-B
| |
   -
|
C

B and A have a PtP link to each other and each have an interface to a 
shared network, which has C on it. B's link to the subnet fails 
(carrier is lost, IFF_RUNNING cleared), B now no longer can 
communicate with C, even if it has a route to C via A (e.g. a default 
route) because the kernel routing table still contains a valid 
connected route for the shared subnet pointing out the ~IFF_RUNNING 
interface.


That connected route of the kernel should be removed (or marked 
invalid/unusable in some way) while the interface is ~IFF_RUNNING.


regards,
--
Paul Jakma  [EMAIL PROTECTED]   [EMAIL PROTECTED]   Key ID: 64A2FF6A
Fortune:
Hackers are just a migratory lifeform with a tropism for computers.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-09 Thread Hasso Tepper
Thomas Graf wrote:
> OK, my point might not be the best, still I think it is valid. The fact
> that certain routes have to be excluded from the routing calculation
> even if their link is up and running makes it obvious that the routing
> daemon can take care of DORMANT_L3DOWN itself and we don't need the
> kernel to hide them. Nothing more.

No. Routing daemon deals only with info from neighbours. While you can
say that "if link is down => neighbour is unreachable", you can't say
absolutely nothing about link if neighbour is down. Even with PtP media.
If neighbour is down, it just means that neighbour is down, but best
route to the neighbour is still via this link.
 
> We're talking about the auto routes in the main table, right? There
> is no reason for the addresses to be configured as a prefix, they
> can, and in my opinion should, be added as /~0 if the net is not
> guaranteed to be reachable at exactly this interface at all times.
> e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that
> 1.1.1.0/24 is always behind eth3, no matter what.

Really? How it can be behind eth3 if state of eth3 says that "no packets
to this link - it's DOWN"?
 
> If the route is supposed to be non-static then the address should be
> added as /32 to avoid the auto route in main table and the broadcast
> routes in the local table. Instead the routing daemon should be
> responsible of manging 1.1.1.0/24 at all times.
>
> Does this make sense?

No. This mandates usage of routing daemon even in my laptop which is
nonsense.
 
> Now even if this is not acceptable and you really need a way to
> favour routes added by userspace over auto/connected routes then
> we should fix this in the routing itself and not with a new
> opertional state interface state.

I'm not talking about new operational state of interface. I'm talking
about simple rule - if link is known to be unusable (!IFF_RUNNING), don't
use routes pointed to this link. Whether they are connected routes
created by kernel or static routes created by user.

Simple scenario with my personal laptop - wired ethernet
(192.168.1.10/24) and wireless (172.16.100.10/24). Two defaults, wireless
one is with higher metric. If I walk away from my table and disconnect
wired ethernet cable, I'd expect that it wouldn't use default pointed to
wired network _and_ 192.168.1.0/24 pointed to it any more. Default
pointed to the wireless is there, wireless link is known to be usable
(IFF_RUNNING) and 192.168.1.0/24 is reachable via wireless as well.


-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Thomas Graf
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 17:14
> OSPF doesn't care if there is connected route or not if !IFF_RUNNING. And
> it doesn't matter if it's demand circuit or usual one. If kernel reports
> link down (!IFF_RUNNING), no any packets are sent over this link -
> hellos, LS updates etc. Hellos and other heartbeat mechanisms are used to
> discover neighbours, not whether link works or not. If link is down, yes,
> it means that neighbour is down as well. But if neighbour is down, it
> doesn't mean that link is down. It just means that neighbour isn't
> considered for routing calculations, but forming adjacencies are still
> attempted via sending periodical hellos (or whatever keepalive packets).

Great, that's how it should be and what I meant actually.

> Note, that it really doesn't matter if routes are removed or marked as
> unusable. Just argument about routing protocols keepalives is completely
> irrelevant.

OK, my point might not be the best, still I think it is valid. The fact
that certain routes have to be excluded from the routing calculation
even if their link is up and running makes it obvious that the routing
daemon can take care of DORMANT_L3DOWN itself and we don't need the
kernel to hide them. Nothing more.

> The only problem is kernel itself
> - routing daemon may add really usable route to the network behind
> directly connected interface, but it will not be used by kernel, because
> there is connected route in the kernel.

We're talking about the auto routes in the main table, right? There
is no reason for the addresses to be configured as a prefix, they
can, and in my opinion should, be added as /~0 if the net is not
guaranteed to be reachable at exactly this interface at all times.
e.g. a user adding the address 1.1.1.1/24 on eth3 clearly says that
1.1.1.0/24 is always behind eth3, no matter what.

If the route is supposed to be non-static then the address should be
added as /32 to avoid the auto route in main table and the broadcast
routes in the local table. Instead the routing daemon should be
responsible of manging 1.1.1.0/24 at all times.

Does this make sense?

Now even if this is not acceptable and you really need a way to
favour routes added by userspace over auto/connected routes then
we should fix this in the routing itself and not with a new
opertional state interface state.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Hasso Tepper
Thomas Graf wrote:
> Whatever is used, there is a link state detection implemented as a
> heartbeat protocol. Even on DC the probing can be done periodically
> while the link is up. My point is that there are such things and it's
> basically the same as the kernel reporting !IFF_RUNNING. I cannot
> imagine that you want the kernel to disable the relevant connected
> routes to handle this case.

OSPF doesn't care if there is connected route or not if !IFF_RUNNING. And
it doesn't matter if it's demand circuit or usual one. If kernel reports
link down (!IFF_RUNNING), no any packets are sent over this link -
hellos, LS updates etc. Hellos and other heartbeat mechanisms are used to
discover neighbours, not whether link works or not. If link is down, yes,
it means that neighbour is down as well. But if neighbour is down, it
doesn't mean that link is down. It just means that neighbour isn't
considered for routing calculations, but forming adjacencies are still
attempted via sending periodical hellos (or whatever keepalive packets).

Note, that it really doesn't matter if routes are removed or marked as
unusable. Just argument about routing protocols keepalives is completely
irrelevant. 

And in fact Quagga (and probably other routing protocol implementations
as well) ignores any connected route add/delete messages at all from
kernel. It manages his own connected routes in his own RIB and connected
routes are removed from there if !IFF_RUNNING. So, from point of view of
routing calculations all works already. The only problem is kernel itself
- routing daemon may add really usable route to the network behind
directly connected interface, but it will not be used by kernel, because
there is connected route in the kernel.


-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Thomas Graf
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 15:39
> The same is true for any OSPF link. Only periodical hellos are used for
> that in standard case. Demand circuits are there to suppress periodical
> hellos and link state update refreshes - ie. if there is no real
> application traffic going over demand circuit, link may stay down. What
> RFC3883 does, is that it defines mechanism to detect if neighbour is
> really still alive if link will be brought up and used for traffic.
> That's all.

Whatever is used, there is a link state detection implemented as a
heartbeat protocol. Even on DC the probing can be done periodically
while the link is up. My point is that there are such things and it's
basically the same as the kernel reporting !IFF_RUNNING. I cannot
imagine that you want the kernel to disable the relevant connected
routes to handle this case.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Hasso Tepper
Thomas Graf wrote:
> Yes there is, it's called neighbour probing. Link state updates are
> sent periodically and acked. It's there to handle crashed OSPF software
> and oversubscription. Then after DCs were introcued in rfc1793 it got
> unuseable and wasn't reintroduced until rfc3883, it's back there and
> in use.

The same is true for any OSPF link. Only periodical hellos are used for
that in standard case. Demand circuits are there to suppress periodical
hellos and link state update refreshes - ie. if there is no real
application traffic going over demand circuit, link may stay down. What
RFC3883 does, is that it defines mechanism to detect if neighbour is
really still alive if link will be brought up and used for traffic.
That's all.

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Thomas Graf
* Hasso Tepper <[EMAIL PROTECTED]> 2005-11-08 14:55
> Thomas Graf wrote:
> > Then how are you going to implement OSPF which has its own link state
> > detection?
> 
> OSPF doesn't have any link state detection. If carrier is up, hello
> packets are sent out to the interface, if carrier is down, no hellos.
> With hellos adjacencies can be formed to exchange routing information.
> I don't see any concept of "OSPF's own link state detection" here.

Yes there is, it's called neighbour probing. Link state updates are
sent periodically and acked. It's there to handle crashed OSPF software
and oversubscription. Then after DCs were introcued in rfc1793 it got
unuseable and wasn't reintroduced until rfc3883, it's back there and
in use.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-08 Thread Hasso Tepper
Thomas Graf wrote:
> Then how are you going to implement OSPF which has its own link state
> detection?

OSPF doesn't have any link state detection. If carrier is up, hello
packets are sent out to the interface, if carrier is down, no hellos.
With hellos adjacencies can be formed to exchange routing information.
I don't see any concept of "OSPF's own link state detection" here.

regards,

-- 
Hasso Tepper
Elion Enterprises Ltd.
WAN administrator
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Paul Jakma

On Mon, 7 Nov 2005, Krzysztof Halasa wrote:


Stefan Rompf <[EMAIL PROTECTED]> writes:


But what happens when the link from (2) to (C) fails? The routing protocol
would realize it (actually it was me how has added IFF_RUNNING support to
quagga) and add a route to (C) via (3) on (B). However, the "connected"
route
would still have precedence and the linux IP stack on (2) would happily try
to continue using the broken link to (C), even if we add juju magic to the
link layer to recognize and drop L3 packets.


Linux should just IMHO drop direct route when !IFF_RUNNING.


ACK, please do. It's the kernel's route, the kernel should manage it. 
The reason setup of the connected route became automatic in 2.3 was, 
iirc, because its pointless to have an UP&RUNNING interface without a 
connected route - exact same reasoning suggests that interface which 
is !(UP&RUNNING) should have the kernel connected route removed by 
kernel.


regards,
--
Paul Jakma  [EMAIL PROTECTED]   [EMAIL PROTECTED]   Key ID: 64A2FF6A
Fortune:
 Leela: Is there some way to keep them from breeding?
 Paul: Cold showers don't work on Antarctic creatures.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Stefan Rompf <[EMAIL PROTECTED]> writes:

> I don't want the kernel to remove these routes, that would be broken.

Why exactly?
They would be brought back when the carrier raises again, of course.

> I want 
> them be marked as inactive or unusable and being ignored by the routing 
> decision while in this state.

Why do we need inactive or unusable routes in the kernel?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Thomas Graf <[EMAIL PROTECTED]> writes:

> Well, firstly we cannot provide a perfect notification schema.

Of course, that's exactly was I said several times :-)

> What I actually want to say is that we probably cannot guarentee
> an up-to-date informal of all parties, what we can do is avoid
> is inconsistency. So considering the above case, netif_carrier_on()
> has to be syncronized by the caller, this means the carrier state
> may not change while we're inside the function.

Yes. But netif_dormant_off() isn't synchronized and it can cause
the "inconsistency". That's precisely the same effect with my patch
which you try fighting with yours.

Note that this isn't a real inconsistency, e.g., we never signal
IFF_RUNNING if no carrier was detected at the time frame, or if the
device was simply down etc. It's just rather like Heisenberg's
uncertainty principle, you can't really know the flag state if it's
changing in this millisecond.

> Secondly we know
> that when the function is called, there must have been no carrier
> for some time _if_ the carrier is currently set to off. This
> means that the above code is only issued for legal carrier
> resumes.

Right. But a layer driving "dormant" flag may have not yet noticed
and might now be doing dormant_off().

> The fact that we go to dormant state before changing the
> carrier state makes dev_oper_state() behave correctly because in
> case we actually call dev_oper_state() between netif_dormant_on()
> and setting the carrier bit the interface will still be seen as
> down and then switches to dormant in a single atomic operation.

That's correct. But it may switch to "up" just thereafter, due to
notification latencies.

>> Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING
>> may be stuck for long.
>
> The IFF_WAIT flag can only be set by root, it is in his reponsibly
> to handle this correctly. It is intended for userspace although
> it can of course be used by drivers or stackable interfaces as well.

I assume all flag changes from userspace require root.

> Theoretically yes, it has the same "issues" as all other link states.

Precisely. This is the same theory as with my patch. Thus, because we
can't have such "strict" signaling without locks, and because we don't
really need it (I can't imagine any routing solution which could need
it or even notify the difference), I suggest giving it up and do all
flag changes on their respective layers.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Stefan Rompf
Am Montag 07 November 2005 22:04 schrieb Thomas Graf:

> > As the kernel maintains this connected route, userspace is IMHO not
> > responsible. Quagga could add its routes with a higher metric
> > (administrative distance in cisco terms), kernel should use them as soon
> > as the connected route becomes unavailable due to carrier failure (or
> > dormant for wireless interfaces).
>
> Then how are you going to implement OSPF which has its own link state
> detection? It's basically the same just that checking the carrier is
> a very weak solution.

What do you mean with OSPF link state detection? That a router and the 
(backup) designated router of a network must see each others hello before 
they exchange routing information? This has nothing to do with the question 
whether the kernel uses a kernel created route pointing to the oper down 
interface or not.

> The routes, especially the local routes, must 
> exist while there is no carrier.

I don't want the kernel to remove these routes, that would be broken. I want 
them be marked as inactive or unusable and being ignored by the routing 
decision while in this state. But maybe such a behaviour change should be 
configurable.

> I think this is a very weak excuse to move something into the kernel
> that can exist in userspace.

Even then, the kernel creates this connected route implicitely when the 
interface is brought admin up, and is therefore responsible for it.

Obviously we disagree here.

> I'm not sure about what exactly you're argueing. Using a char for the
> operational state may imply bit operations on some architectures.

Ok, you got that point.

> There goes your atomicity. dev->flags is certainly not protected by
> dev_base_lock, I'm not even sure if the rtnl semaphore is acquired for
> all write accesses to dev->flags.

dev->flags is, look f.e. at dev_ioctl(); dev->state is not. But beside a short 
locking howto in core/dev.c, this is really not very well documented. May be 
this is the reason why some stuff in dev.c does not look totally race 
free ;-)

> So you mean we should not follow the RFC in this case? What operational
> status would it be then? OPER_UNKNOWN?

Well actually I meant you shouldn't have this difference in your 
dev_oper_state() function:

> +   if (!netif_running(dev))
> +   return IF_OPER_DOWN;
> +
> +   if (!netif_carrier_ok(dev))
> +   return IF_OPER_LOWERLAYERDOWN;

Both should be IF_OPER_DOWN then.

> > Can't see that requirement from RFC, and devices are normally initialized
> > in their open()-"methods".
>
> It's an SNMP RFC, it's not supposed to tell us how to manage interface
> states in the kernel. The carrier state is event driven, which means
> that when we don't memorize this information while we announce the
> interface as down we'd have to ask the drivers about the carrier state
> when we put it up again which is just non-sense.

OTOH, (ethernet) drivers usually check link state when they are opened, so 
there should be a valid operstate after dev->open() returns. Anyway, there is 
indeed no reason to force operstate to some value from core, beside setting 
it to unknown when struct net_device is allocated.

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Thomas Graf
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 20:25
> dose:~ # ip addr add 192.168.200.1/24 dev eth1
> dose:~ # ip link set eth1 up
> dose:~ # ifconfig eth1 | fgrep UP
>   UP BROADCAST MULTICAST  MTU:1500  Metric:1
> dose:~ # ip route show 192.168.200.0/24
> 192.168.200.0/24 dev eth1  proto kernel  scope link  src 192.168.200.1
> dose:~ # ping 192.168.200.10
> PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data.
> From 192.168.200.1: icmp_seq=2 Destination Host Unreachable
> dose:~ # arp
> 192.168.200.10   (incomplete) eth1
> 
> Even though the interface is not IFF_RUNNING and queueing is therefore 
> disabled, the kernel created a route pointing to it and uses it.

Yes, just like it is supposed to do.

> > Isn't it the routing daemon's fault when preferring a route 
> > which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-)
> 
> As the kernel maintains this connected route, userspace is IMHO not 
> responsible. Quagga could add its routes with a higher metric (administrative 
> distance in cisco terms), kernel should use them as soon as the connected 
> route becomes unavailable due to carrier failure (or dormant for wireless 
> interfaces).

Then how are you going to implement OSPF which has its own link state
detection? It's basically the same just that checking the carrier is
a very weak solution. The routes, especially the local routes, must
exist while there is no carrier.

> Btw, adding a userspace workaround would be dangerous. If the routing daemon 
> crashes while a link is down, the userspace removed connected route would not 
> come back, leaving the router unreachable on this interface. *Very* bad.

I think this is a very weak excuse to move something into the kernel
that can exist in userspace. The routing daemon has to check the
interfaces on start-up eventually bringing up interfaces so all you have
to make sure is that it doesn't crash all the time and that you supervise
the process somehow.

> > The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it
> > used to be is exactly atomicty. Since you write to oper_state from
> > various locations the operation must be atomic to avoid corruption.
> 
> No, it's because the IFF_* flags are a bit field that can only be changed 
> from 
> process context protected by rtnl/dev_base_lock, different to 
> __LINK_STATE_NO_CARRIER (or a operational state field) that must be 
> accessible from interrupts.

I'm not sure about what exactly you're argueing. Using a char for the
operational state may imply bit operations on some architectures.
There goes your atomicity. dev->flags is certainly not protected by
dev_base_lock, I'm not even sure if the rtnl semaphore is acquired for
all write accesses to dev->flags.

> > > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag.
> >
> > Right but the RFC specifies that admin. down implies operational down:
> 
> Indeed, but we should not set OPER_DOWN if and only if admin is down ;-)

So you mean we should not follow the RFC in this case? What operational
status would it be then? OPER_UNKNOWN?

> > This also describes why your scheme cannot work, we have to memorize
> > the status of for example carrier state.
> 
> Can't see that requirement from RFC, and devices are normally initialized in 
> their open()-"methods".

It's an SNMP RFC, it's not supposed to tell us how to manage interface
states in the kernel. The carrier state is event driven, which means
that when we don't memorize this information while we announce the
interface as down we'd have to ask the drivers about the carrier state
when we put it up again which is just non-sense.

I have a hard time seeing you first not following the RFC and then
relying on it like it would be the overall perfect solution.

> Ok, so I hope a finally managed to make my point clear about the different 
> dormant states ;-)

I understand your idea but I don't agree.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Stefan Rompf
Am Montag 07 November 2005 18:45 schrieb Thomas Graf:

> I'm sorry, I don't get the point. I assume the above is not a carrier
> failure but something else to be detected by a keepalive protocol. 

No, it's a carrier failure. Try this, eth1 is an ethernet interface without 
the cable connected:

dose:~ # ip addr add 192.168.200.1/24 dev eth1
dose:~ # ip link set eth1 up
dose:~ # ifconfig eth1 | fgrep UP
  UP BROADCAST MULTICAST  MTU:1500  Metric:1
dose:~ # ip route show 192.168.200.0/24
192.168.200.0/24 dev eth1  proto kernel  scope link  src 192.168.200.1
dose:~ # ping 192.168.200.10
PING 192.168.200.10 (192.168.200.10) 56(84) bytes of data.
>From 192.168.200.1: icmp_seq=2 Destination Host Unreachable
dose:~ # arp
192.168.200.10   (incomplete) eth1

Even though the interface is not IFF_RUNNING and queueing is therefore 
disabled, the kernel created a route pointing to it and uses it.

> Isn't it the routing daemon's fault when preferring a route 
> which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-)

As the kernel maintains this connected route, userspace is IMHO not 
responsible. Quagga could add its routes with a higher metric (administrative 
distance in cisco terms), kernel should use them as soon as the connected 
route becomes unavailable due to carrier failure (or dormant for wireless 
interfaces).

And now if we support this, we'd need a dormant state with the route disabled 
and one with it enabled.

Btw, adding a userspace workaround would be dangerous. If the routing daemon 
crashes while a link is down, the userspace removed connected route would not 
come back, leaving the router unreachable on this interface. *Very* bad.

> > OTOH, we don't need to be completely atomic as the
> > netif_carrier_*-functions already require driver controlled
> > synchronisation. We just need to make sure that the caches are coherent
> > before linkwatch kernel thread runs.
>
> The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it
> used to be is exactly atomicty. Since you write to oper_state from
> various locations the operation must be atomic to avoid corruption.

No, it's because the IFF_* flags are a bit field that can only be changed from 
process context protected by rtnl/dev_base_lock, different to 
__LINK_STATE_NO_CARRIER (or a operational state field) that must be 
accessible from interrupts.

> > No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag.
>
> Right but the RFC specifies that admin. down implies operational down:

Indeed, but we should not set OPER_DOWN if and only if admin is down ;-)

> This also describes why your scheme cannot work, we have to memorize
> the status of for example carrier state.

Can't see that requirement from RFC, and devices are normally initialized in 
their open()-"methods".

> We're not far apart, the difference is for the additional L3 disabled
> dormant state which I don't understand yet and secondaly that I continue
> to keep the current link states with the addition of a new dormant state
> and then translate it into the RFC2863 operational status.

Ok, so I hope a finally managed to make my point clear about the different 
dormant states ;-)

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Thomas Graf
* Krzysztof Halasa <[EMAIL PROTECTED]> 2005-11-07 18:04
> Thomas Graf <[EMAIL PROTECTED]> writes:
> >  void netif_carrier_on(struct net_device *dev)
> >  {
> > +   if ((dev->flags & IFF_WAIT) &&
> > +   test_bit(__LINK_STATE_NOCARRIER, &dev->state))
> > +   netif_dormant_on(dev);
> ^^
> Does not prevent spurious momentary IFF_RUNNING: netif_dormant_off()
> may come after it before "the session" is "established" as an effect
> of pre-carrier-flap activity (due to asynchronous notification) - so,
> in comparison to my patch, I don't see the gain. Not sure what locking
> (short of a single spinlock for all those flags) could solve that but
> I'm not a spinlock expert.

Well, firstly we cannot provide a perfect notification schema.
Take for example the carrier, when we reach netif_carrier_off(),
the carrier might in fact be up already. We then fire linkwatch
and while we wait for the workqueue to handle it, the carrier
state may theoretically change again. Events to userspace can
get lost due to various reasons, for example memory pressure.

What I actually want to say is that we probably cannot guarentee
an up-to-date informal of all parties, what we can do is avoid
is inconsistency. So considering the above case, netif_carrier_on()
has to be syncronized by the caller, this means the carrier state
may not change while we're inside the function. Secondly we know
that when the function is called, there must have been no carrier
for some time _if_ the carrier is currently set to off. This
means that the above code is only issued for legal carrier
resumes. The fact that we go to dormant state before changing the
carrier state makes dev_oper_state() behave correctly because in
case we actually call dev_oper_state() between netif_dormant_on()
and setting the carrier bit the interface will still be seen as
down and then switches to dormant in a single atomic operation.

> Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING
> may be stuck for long.

The IFF_WAIT flag can only be set by root, it is in his reponsibly
to handle this correctly. It is intended for userspace although
it can of course be used by drivers or stackable interfaces as well.

> > if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state))
> > linkwatch_fire_event(dev);
> > -   if (netif_running(dev))
> > +
> > +   if (netif_running(dev) && !netif_dormant(dev));
> > __netdev_watchdog_up(dev);
> ^
> Could be called even if the above netif_dormant_on() set the status.
> Hope the protocol resets it when it gets 'carrier down' event.

Theoretically yes, it has the same "issues" as all other link states.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Stefan Rompf <[EMAIL PROTECTED]> writes:

> But what happens when the link from (2) to (C) fails? The routing protocol 
> would realize it (actually it was me how has added IFF_RUNNING support to 
> quagga) and add a route to (C) via (3) on (B). However, the "connected"
> route 
> would still have precedence and the linux IP stack on (2) would happily try 
> to continue using the broken link to (C), even if we add juju magic to the 
> link layer to recognize and drop L3 packets.

Linux should just IMHO drop direct route when !IFF_RUNNING.

Another thing is if you have quagga (never used it, but I done things
like that with mrtd and even without IFF_RUNNING I think) you can
configure /32 mask on the managed interface and let quagga add dynamic
routes. That's what quagga and not the kernel is for - dynamic routing,
right?

> OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, 
> but of course layer L3 must consider routes pointing to them, or they would 
> never dial.

I'm not entirely against L3DOWN/UP idea. Normally dial-up devices show
IFF_RUNNING and the internal state is not visible. I would rather do it
with another flag (and another patch, when they are users), signaling
that the device isn't yet connected even if it's IFF_RUNNING.

> OTOH, we don't need to be completely atomic as the netif_carrier_*-functions 
> already require driver controlled synchronisation.

We can't assume anything like this.

> We just need to make sure 
> that the caches are coherent before linkwatch kernel thread runs.

wmb() is for gcc and bus reordering etc, not needed here.

> Bad
> problem 
> here: To allow software/userspace controlled probe and association packets, 
> wireless devices must not go to netif_carrier_off() anymore when the 
> association is lost

I'm told it doesn't seem a problem not limitation at all, right?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread David S. Miller
From: Krzysztof Halasa <[EMAIL PROTECTED]>
Date: Mon, 07 Nov 2005 17:37:40 +0100

> David?

I'm still 20,000+ emails deep in backlog after my 2 week
vaction, so I won't be able to look into this for at least
a few more days, at best.

Also, please use "[EMAIL PROTECTED]" for emails to me.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Stefan Rompf <[EMAIL PROTECTED]> writes:

> + /* Operation state */
> + unsigned char operstate_kernel;
> + unsigned char operstate_useroverride;

8 bits don't make sense. OTOH I think all writes and possibly reads would
need spinlocks.

> +void netif_set_kernel_operstate(struct net_device *dev, unsigned char 
> newstate) {
> + char oldstate = dev->operstate_kernel;
> + if (oldstate != newstate) {
> + dev->operstate_kernel = newstate;
> + wmb();
> + linkwatch_fire_event(dev);
> + }
> +}

You can't assume any serialization for this function as it could be
called from anywhere anytime - it would require a spinlock (even with
simple Ethernet).

OTOH I think we could just do that spinlock and forget about it, given
all read and write accesses are atomic (so we can safely read without
spinlock).


Even without "user override", I don't know how do you plan entering
the following device states:

+   OPER_UNKNOWN = 0,
+   OPER_DOWN = 16,
+   OPER_LOWERLAYERDOWN,
+   OPER_TESTING = 32,
+   OPER_DORMANTL3DOWN = 48, /* OS queue start from here */
+   OPER_DORMANTL3UP = 64, /* L3 should try using interface */
+   OPER_UP = 96

I understand OPEN_DOWN and OPEN_UP (despite their names) aren't used
for setting administrative status? What's their exact meaning (WRT
administrative status)?

I don't exactly understand relation between operstate_kernel and
__LINK_STATE_* flags. Do you plan removing them or not?
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Thomas Graf
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 17:41
> > Can't we put the driver, stacked interface, etc. in charge for dropping
> > packets which shouldn't be let through yet? I would favour this and have
> > the dormant state limited to link layer responsibilities.
> 
> Imagine this situation:
> 
>  +---+
>  | 1 |
>  +---+
>|+---+   +---+
>+| 2 |---| 3 |
> A   +---+ B +---+
>   |   | +---+
>   +---+-| 4 |
>   C +---+
> 
> 1,2,3,4 are acting as routers with the quagga daemon running. A,B,C are 
> networks.
> 
> Now (1) wants to reach (4). Normally, the packet would travel to (2) via (A), 
> from there to (4) via (C).
> 
> But what happens when the link from (2) to (C) fails? The routing protocol 
> would realize it (actually it was me how has added IFF_RUNNING support to 
> quagga) and add a route to (C) via (3) on (B). However, the "connected" route 
> would still have precedence and the linux IP stack on (2) would happily try 
> to continue using the broken link to (C), even if we add juju magic to the 
> link layer to recognize and drop L3 packets.
> 
> That's why I think L3 protocols should be responsible for not using a broken 
> link and why we need OPER_DORMANTL3DOWN.

I'm sorry, I don't get the point. I assume the above is not a carrier
failure but something else to be detected by a keepalive protocol. That
keepalive protocol could put the interface into dormant state and thus
control the IFF_RUNNING flag with the queues enabled so it can continue
sending. Isn't it the routing daemon's fault when preferring a route
which has the IFF_RUNNING flag cleared? I'm sorry fot not getting it. ;-)

> OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, 
> but of course layer L3 must consider routes pointing to them, or they would 
> never dial.

This is clear, the packet must get to the interface in order to actually
trigger the dial.

> OTOH, we don't need to be completely atomic as the netif_carrier_*-functions 
> already require driver controlled synchronisation. We just need to make sure 
> that the caches are coherent before linkwatch kernel thread runs.

The reason we now have netif_carrier_* instead of |= IFF_RUNNING as it
used to be is exactly atomicty. Since you write to oper_state from
various locations the operation must be atomic to avoid corruption.

> > +#define IFF_LINK_UP0x2000  /* lower link up
> > */
> > +#define IFF_DORMANT0x4000  /* waiting for external event   
> > */
> > +#define IFF_WAIT   0x8000  /* auto. enter dormant state*/
> 
> > +int dev_oper_state(struct net_device *dev)
> > +{
> > +   if (!netif_device_present(dev))
> > +   return IF_OPER_NOTPRESENT;
> > +
> > +   if (!netif_running(dev))
> > +   return IF_OPER_DOWN;
> 
> No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag.

Right but the RFC specifies that admin. down implies operational down:

 If ifAdminStatus is down(2) then ifOperStatus
should be down(2). If ifAdminStatus is changed to up(1)
then ifOperStatus should change to up(1) if the interface is
ready to transmit and receive network traffic; it should
change to dormant(5) if the interface is waiting for
external actions (such as a serial line waiting for an
incoming connection); it should remain in the down(2) state
if and only if there is a fault that prevents it from going
to the up(1) state; it should remain in the notPresent(6)
state if the interface has missing (typically, hardware)
components."

This also describes why your scheme cannot work, we have to memorize
the status of for example carrier state.

> > +   if (!netif_carrier_ok(dev))
> > +   return IF_OPER_LOWERLAYERDOWN;
> 
> No, this is OPER_DOWN. LOWERLAYERDOWN is IMHO f.e. for VLANs stacked on a 
> physical interface that is OPER_DOWN.

Correct, I agree although I don't understand this. The physical lower
link is just like a the lower interface(s) of a stackable interface. I
can't seem to understand how the RFC proposes to retreive the carrier
state while the interface is administratively down.

> >  void netif_carrier_on(struct net_device *dev)
> >  {
> > +   if ((dev->flags & IFF_WAIT) &&
> > +   test_bit(__LINK_STATE_NOCARRIER, &dev->state))
> > +   netif_dormant_on(dev);
> 
> That would be for avoiding the userspace supplicant race I assume. Bad 
> problem 
> here: To allow software/userspace controlled probe and association packets, 
> wireless devices must not go to netif_carrier_off() anymore when the 
> association is lost or must have a way to transmit packets without using the 
> kernel scheduler ;-( May be wireless developers could live with this 
> limitation.

The wireless driver can just issue netif_dormant_on(). the automatic
dormant state when the carrier is on again is just for user

Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Thomas Graf <[EMAIL PROTECTED]> writes:

> Here is how I see things, it's not complete and lacks proper locking but
> enough to get the point.

Much better IMHO, though my opinion may of course be subjective due to
similarity to my own patch.

>  void netif_carrier_on(struct net_device *dev)
>  {
> + if ((dev->flags & IFF_WAIT) &&
> + test_bit(__LINK_STATE_NOCARRIER, &dev->state))
> + netif_dormant_on(dev);
^^
Does not prevent spurious momentary IFF_RUNNING: netif_dormant_off()
may come after it before "the session" is "established" as an effect
of pre-carrier-flap activity (due to asynchronous notification) - so,
in comparison to my patch, I don't see the gain. Not sure what locking
(short of a single spinlock for all those flags) could solve that but
I'm not a spinlock expert.

Protocol driver must still call netif_dormant_on() itself or IFF_RUNNING
may be stuck for long.

>   if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state))
>   linkwatch_fire_event(dev);
> - if (netif_running(dev))
> +
> + if (netif_running(dev) && !netif_dormant(dev));
>   __netdev_watchdog_up(dev);
^
Could be called even if the above netif_dormant_on() set the status.
Hope the protocol resets it when it gets 'carrier down' event.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Stefan Rompf
Am Montag 07 November 2005 14:50 schrieb Thomas Graf:

> > -migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN
>
> I think that bonding should check on the administrative status of
> its slaves as well. Why not check for OPER_UP?

Yes, currently it seems to ignore admin status and does not even maintain its 
own operative status.

> I still don't see the point in OPER_DORMANTL3DOWN vs. OPER_DORMANTL3UP.

Seems my English is really bad ;-)

> Can't we put the driver, stacked interface, etc. in charge for dropping
> packets which shouldn't be let through yet? I would favour this and have
> the dormant state limited to link layer responsibilities.

Imagine this situation:

 +---+
 | 1 |
 +---+
   |+---+   +---+
   +| 2 |---| 3 |
A   +---+ B +---+
  |   | +---+
  +---+-| 4 |
  C +---+

1,2,3,4 are acting as routers with the quagga daemon running. A,B,C are 
networks.

Now (1) wants to reach (4). Normally, the packet would travel to (2) via (A), 
from there to (4) via (C).

But what happens when the link from (2) to (C) fails? The routing protocol 
would realize it (actually it was me how has added IFF_RUNNING support to 
quagga) and add a route to (C) via (3) on (B). However, the "connected" route 
would still have precedence and the linux IP stack on (2) would happily try 
to continue using the broken link to (C), even if we add juju magic to the 
link layer to recognize and drop L3 packets.

That's why I think L3 protocols should be responsible for not using a broken 
link and why we need OPER_DORMANTL3DOWN.

OPER_DORMANTL3UP is another beast. Dial on demand interfaces may be dormant, 
but of course layer L3 must consider routes pointing to them, or they would 
never dial.

I see this clearly as a kernel problem, we should not expect userspace to 
workaround this limitation. And it's even useful without a routing daemon if 
a host has more than one network interface.

> char access is not guaranteed to be atomic, just use int for the states.

OTOH, we don't need to be completely atomic as the netif_carrier_*-functions 
already require driver controlled synchronisation. We just need to make sure 
that the caches are coherent before linkwatch kernel thread runs.

> +#define IFF_LINK_UP  0x2000  /* lower link up*/
> +#define IFF_DORMANT  0x4000  /* waiting for external event   */
> +#define IFF_WAIT 0x8000  /* auto. enter dormant state*/

> +int dev_oper_state(struct net_device *dev)
> +{
> + if (!netif_device_present(dev))
> + return IF_OPER_NOTPRESENT;
> +
> + if (!netif_running(dev))
> + return IF_OPER_DOWN;

No, !netif_running() is ADMIN down, mostly representing the IFF_UP flag.

> + if (!netif_carrier_ok(dev))
> + return IF_OPER_LOWERLAYERDOWN;

No, this is OPER_DOWN. LOWERLAYERDOWN is IMHO f.e. for VLANs stacked on a 
physical interface that is OPER_DOWN.

>  void netif_carrier_on(struct net_device *dev)
>  {
> + if ((dev->flags & IFF_WAIT) &&
> + test_bit(__LINK_STATE_NOCARRIER, &dev->state))
> + netif_dormant_on(dev);

That would be for avoiding the userspace supplicant race I assume. Bad problem 
here: To allow software/userspace controlled probe and association packets, 
wireless devices must not go to netif_carrier_off() anymore when the 
association is lost or must have a way to transmit packets without using the 
kernel scheduler ;-( May be wireless developers could live with this 
limitation.

Your solution would work for Krzysztof and for WPA/802.1X, but with the 
limitation I've just mentioned. However, as it does not define a clear point 
for L3 protocols to start, it would not enhance support for routing daemons 
and dialup interfaces, what my idea can do. As I've already used linux boxes 
as routers in a redundant topology, I consider this important.

What to do now? I have the impression that most arguments are said. Maybe 
someone like acme or davem should chime in and decide which direction to 
take?

Stefan
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Krzysztof Halasa
Stefan Rompf <[EMAIL PROTECTED]> writes:

> Unfortunately, me and Krzysztof have totally different opinions if this 
> complexity is needed.

I wouldn't call it "unfortunate".
I'll happily use your interface if it's available and I assume
it will be available when it's needed, clean, robust and complete.

David? Jeff? Your opinion maybe? Not sure if you have my patch sent
a week ago, can resend.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Patch: RFC2863 #1 (incomplete)

2005-11-07 Thread Thomas Graf
* Stefan Rompf <[EMAIL PROTECTED]> 2005-11-07 00:42
> -userspace interaction, including userspace supplicant + locking of 
> operstate_useroverride
> -migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN

I think that bonding should check on the administrative status of
its slaves as well. Why not check for OPER_UP?

> I'd leave making IPV4 not consider routes when interface is  for 2.6.16.

> Unfortunately, me and Krzysztof have totally different opinions if this 
> complexity is needed. I think yes, as it will make it possible to use a 
> userspace supplicant for connection completion (WPA), layer 3 protocols not 
> considering down interfaces for routing decisions (imporant for quagga), dial 
> on demand interfaces, clear state forwarding for stacked interfaces. This is 
> stuff required now, not in a distant future, and if we touch these flags 
> anyway, let's do it right.

I still don't see the point in OPER_DORMANTL3DOWN vs. OPER_DORMANTL3UP.
Can't we put the driver, stacked interface, etc. in charge for dropping
packets which shouldn't be let through yet? I would favour this and have
the dormant state limited to link layer responsibilities.

> This is not yet for applying, just compilation tested, comments welcome. Is 
> my 
> wmb() to avoid 4 byte atomic_t the right idea?

char access is not guaranteed to be atomic, just use int for the states.

Here is how I see things, it's not complete and lacks proper locking but
enough to get the point.

Index: linux-2.6/include/linux/if.h
===
--- linux-2.6.orig/include/linux/if.h
+++ linux-2.6/include/linux/if.h
@@ -42,8 +42,11 @@
 #define IFF_SLAVE  0x800   /* slave of a load balancer */
 
 #define IFF_MULTICAST  0x1000  /* Supports multicast   */
+#define IFF_LINK_UP0x2000  /* lower link up*/
+#define IFF_DORMANT0x4000  /* waiting for external event   */
+#define IFF_WAIT   0x8000  /* auto. enter dormant state*/
 
-#define IFF_VOLATILE   
(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_RUNNING)
+#define IFF_VOLATILE   
(IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_MASTER|IFF_SLAVE|IFF_LINK_UP|IFF_DORMANT|IFF_RUNNING)
 
 #define IFF_PORTSEL0x2000  /* can set media type   */
 #define IFF_AUTOMEDIA  0x4000  /* auto media select active */
@@ -80,6 +83,16 @@
 #define IF_PROTO_FR_ETH_PVC 0x200B
 #define IF_PROTO_RAW0x200C  /* RAW Socket   */
 
+/* RFC 2863 operational status */
+enum {
+   IF_OPER_UNKNOWN,
+   IF_OPER_NOTPRESENT,
+   IF_OPER_DOWN,
+   IF_OPER_LOWERLAYERDOWN,
+   IF_OPER_TESTING,
+   IF_OPER_DORMANT,
+   IF_OPER_UP,
+};
 
 /*
  * Device mapping structure. I'd just gone off and designed a 
Index: linux-2.6/include/linux/netdevice.h
===
--- linux-2.6.orig/include/linux/netdevice.h
+++ linux-2.6/include/linux/netdevice.h
@@ -230,7 +230,8 @@ enum netdev_state_t
__LINK_STATE_SCHED,
__LINK_STATE_NOCARRIER,
__LINK_STATE_RX_SCHED,
-   __LINK_STATE_LINKWATCH_PENDING
+   __LINK_STATE_LINKWATCH_PENDING,
+   __LINK_STATE_DORMANT,
 };
 
 
@@ -751,6 +752,23 @@ static inline void netif_device_attach(s
}
 }
 
+static inline void netif_dormant_on(struct net_device *dev)
+{
+   if (!test_and_set_bit(__LINK_STATE_DORMANT, &dev->state))
+   linkwatch_fire_event(dev);
+}
+
+static inline void netif_dormant_off(struct net_device *dev)
+{
+   if (test_and_clear_bit(__LINK_STATE_DORMANT, &dev->state))
+   linkwatch_fire_event(dev);
+}
+
+static inline int netif_dormant(struct net_device *dev)
+{
+   return test_bit(__LINK_STATE_DORMANT, &dev->state);
+}
+
 /*
  * Network interface message level settings
  */
@@ -926,6 +944,7 @@ extern void dev_mcast_init(void);
 extern int netdev_max_backlog;
 extern int weight_p;
 extern int netdev_set_master(struct net_device *dev, struct 
net_device *master);
+extern int dev_oper_state(struct net_device *dev);
 extern int skb_checksum_help(struct sk_buff *skb, int inward);
 /* rx skb timestamps */
 extern voidnet_enable_timestamp(void);
Index: linux-2.6/net/core/dev.c
===
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -994,6 +994,23 @@ int call_netdevice_notifiers(unsigned lo
return notifier_call_chain(&netdev_chain, val, v);
 }
 
+int dev_oper_state(struct net_device *dev)
+{
+   if (!netif_device_present(dev))
+   return IF_OPER_NOTPRESENT;
+
+   if (!netif_running(dev))
+   return IF_OPER_DOWN;
+
+   if (!netif_carrier_ok(dev))
+   return IF_OPER_LOWERLAYERDOWN;
+
+   if (netif_dormant(dev))
+   return IF_OPER_DO

Patch: RFC2863 #1 (incomplete)

2005-11-06 Thread Stefan Rompf
Hi,

attached is my current state of RFC2863 implementation. It defines the new 
status fields and operations. Yet not implemented but in my 2.6.15 time frame 
is

-userspace interaction, including userspace supplicant + locking of 
operstate_useroverride
-migrate VLAN + BONDING to use OPER_LOWERLAYERDOWN
-Documentation ;-)

I'd leave making IPV4 not consider routes when interface is --- linux-2.6.14/include/linux/netdevice.h.old	2005-11-02 11:08:10.0 +0100
+++ linux-2.6.14/include/linux/netdevice.h	2005-11-07 00:23:12.0 +0100
@@ -228,11 +228,35 @@ enum netdev_state_t
 	__LINK_STATE_START,
 	__LINK_STATE_PRESENT,
 	__LINK_STATE_SCHED,
-	__LINK_STATE_NOCARRIER,
+	__OBSOLETE_LINK_STATE_NOCARRIER,
 	__LINK_STATE_RX_SCHED,
 	__LINK_STATE_LINKWATCH_PENDING
 };
 
+/* Operational state of a network device, mostly taken from RFC2863 */
+enum netdev_operstate_t
+{
+	OPER_UNKNOWN = 0,
+	OPER_NOTPRESENT, /* unused, placeholder */
+	OPER_DOWN = 16,
+	OPER_LOWERLAYERDOWN,
+	OPER_TESTING = 32,
+	OPER_DORMANTL3DOWN = 48, /* OS queue start from here */
+	OPER_DORMANTL3UP = 64, /* L3 should try using interface */
+	OPER_UP = 96
+};
+
+/* Userspace application may limit visible external state to
+ * DORMANTL3DOWN or DORMANTL3UP */
+enum netdev_operoverride_t {
+	OPERU_IGNORE, /* report state as defined by operstate_kernel */
+	OPERU_DORMANTL3DOWN, /* never report >= DORMANTL3DOWN */
+	OPERU_DORMANTL3UP, /* never report >= DORMANTL3UP */
+	OPERU_UP, /* report kernel state up to UP */
+	OPERU_UP_UNTIL_DORMANTL3DOWN, /* change user override to OPERU_DORMANTL3DOWN on kernel <= OPER_DORMANTL3UP */
+	OPERU_UP_UNTIL_DORMANTL3UP /* change user override to OPERU_DORMANTL3UP on kernel <= OPER_DORMANTL3UP */
+};
+
 
 /*
  * This structure holds at boot time configured netdevice settings. They
@@ -346,7 +370,10 @@ struct net_device
 	struct net_device	*master; /* Pointer to master device of a group,
 	  * which this device is member of.
 	  */
-
+	/* Operation state */
+	unsigned char operstate_kernel;
+	unsigned char operstate_useroverride;
+	
 	/* Interface address info. */
 	unsigned char		perm_addr[MAX_ADDR_LEN]; /* permanent hw address */
 	unsigned char		addr_len;	/* hardware address length	*/
@@ -709,23 +736,36 @@ static inline void dev_put(struct net_de
 #define __dev_put(dev) atomic_dec(&(dev)->refcnt)
 #define dev_hold(dev) atomic_inc(&(dev)->refcnt)
 
-/* Carrier loss detection, dial on demand. The functions netif_carrier_on
- * and _off may be called from IRQ context, but it is caller
+/* Carrier loss detection, dial on demand. The function
+ * netif_set_kernel_operstate may be called from IRQ context, but it is caller
  * who is responsible for serialization of these calls.
  */
 
 extern void linkwatch_fire_event(struct net_device *dev);
 
-static inline int netif_carrier_ok(const struct net_device *dev)
-{
-	return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
-}
-
 extern void __netdev_watchdog_up(struct net_device *dev);
 
-extern void netif_carrier_on(struct net_device *dev);
+extern void netif_set_kernel_operstate(struct net_device *dev, unsigned char newstate);
+
+static inline unsigned char netif_get_kernel_operstate(const struct net_device *dev) {
+	return dev->operstate_kernel;
+}
+
+/* Consolidated state. Should be called from process context */
+extern unsigned char netif_get_operstate(const struct net_device *dev);
 
-extern void netif_carrier_off(struct net_device *dev);
+/* Backward compatibility functions, use netif_set_kernel_operstate
+   instead */
+static inline void netif_carrier_on(struct net_device *dev) {
+	netif_set_kernel_operstate(dev, OPER_UP);
+}
+static inline void netif_carrier_off(struct net_device *dev) {
+	netif_set_kernel_operstate(dev, OPER_DOWN);
+}
+static inline int netif_carrier_ok(const struct net_device *dev) {
+	return (dev->operstate_kernel >= OPER_DORMANTL3UP ||
+		dev->operstate_kernel == OPER_UNKNOWN);
+}
 
 /* Hot-plugging. */
 static inline int netif_device_present(struct net_device *dev)
--- linux-2.6.14/net/core/link_watch.c.old	2005-06-17 21:48:29.0 +0200
+++ linux-2.6.14/net/core/link_watch.c	2005-11-06 18:06:14.0 +0100
@@ -75,7 +75,14 @@ void linkwatch_run_queue(void)
 		clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
 
 		if (dev->flags & IFF_UP) {
-			if (netif_carrier_ok(dev)) {
+			if (netif_get_kernel_operstate(dev) < OPER_UP) {
+if (dev->operstate_useroverride == OPERU_UP_UNTIL_DORMANTL3DOWN)
+	dev->operstate_useroverride = OPERU_DORMANTL3DOWN;
+else if (dev->operstate_useroverride == OPERU_UP_UNTIL_DORMANTL3UP)
+	dev->operstate_useroverride = OPERU_DORMANTL3UP;
+			}
+
+			if (netif_get_operstate(dev) >= OPER_DORMANTL3DOWN) {
 WARN_ON(dev->qdisc_sleeping == &noop_qdisc);
 dev_activate(dev);
 			} else
@@ -141,4 +148,32 @@ void linkwatch_fire_event(struct net_dev
 	}
 }
 
-EXPORT_SYMBOL(linkwatch_fire_event);
+void netif_set_kernel_operstate(struct net_device *dev, unsigned char newstate) {