Hello James, sorry for the long graphomaniac post again.

It seems that some of my opinions are often different from yours, and the
foundations seem weak. Maybe one of us can make the other one change the 
point of view to a more correct one ;)

> In any event, I think we're way off the path. It's recognized as a
> bug (so no more argument is really needed) and it's something that we
> know we want to work on.

I'm afraid we (I) can go on and on about this. Maybe we should continue 
if my ideas, opinions and examples are of any use. But it takes time from
both of us, so I'd rather stop, too :)

Besides,
> You're always welcome and encouraged to contribute code instead of complaint. 
> ;-}
So far I'm limited to ideas. But the idea of becoming a respected kernel hacker
like thou is attractive, I'll give it some thought and time when I straighten
out the ideas and come to terms ;)

>> However this matching interface is not necessarily used to push the packet 
>> onto the network. This kinda makes sense in ipmp, lacp and other cases of 
>> many-to-many interface-to-address relations, as well as for virtual 
>> interfaces 
>> (i.e. a public IP hosted on the loopback); but this seems flawed for 
>> one-to-one 
>> relations of an [aliased] interface and an IP address.

> I don't follow that part.

What I meant is: the current algorithm only uses the destination address when
choosing a route, and doesn't use the source address to pick the gateway and 
the outgoing interface. Good or bad, it's a fact. It's not the point I'm 
arguing 
about. 

This is okay in many scenarios (I have never seen it as a problem for 10 years),
this may be desired in scenarios when there is no one (=0, >1) physical 
interface
bound to the source IP address. It happens to be a problem in case like mine -
where not all default routes are created equal, and choice of the gateway and of
the physical interface depends on the source IP address (the problem is indeed
valid for both a "mere host" and a "router" in terms above).

> it occurs in the same way that routing lookups occur -- by looking for the 
> longest prefix match.
> ...local addresses configured on the interfaces are represented as /32 (host 
> route) entries in the forwarding table. Thus, if they match at all, they're 
> always the best match possible (longest prefix length).
> ... In fact, interface subnets are represented in the routing 
> table with specially marked routing table entries...

Correct :-}

If the packet's destination IP field is matched as being in some interface's
connected subnet, the routing table lookup returns a "specially marked entry"
and the packet goes out *from the correct interface* (I hope I'm not mistaken
in the last part) packed into an L2 (ethernet) frame with a MAC address found
by the ARP table lookup of the destination host's IP address. Dammit, I even
think this piece even works correctly when the packet is sent to a previously
chosen gateway for a remote destination.

I kind of expected "route add ... -ifp" to be the flag I need, however it only 
seems to enforce sending of packets to a certain gateway through a certain 
interface (it's not the problem I had). Again, it does so with no relation to 
the source addresses.

# netstat -rn | grep default
Routing Table: IPv4
  Destination           Gateway           Flags  Ref     Use     Interface 
-------------------- -------------------- ----- ----- ---------- --------- 
default              194.67.186.65        UG        1        817 e1000g0   
default              81.5.113.1           UGS       1       1142 e1000g81000 

> The missing part (and the part we've been belaboring here) is that
> Solaris doesn't make any intelligent use of multiple equivalent
> matches. If there are equal matches on the destination address, we
> just pick one arbitrarily rather than using the source address to add
> some "flavor."
> That's arguably a bug, and is at the root of what you're seeing.

The choice among equivalent gateways (and thus output interfaces) is what needs
to be enhanced, in my opinion. And from your posts I see that the need for some
enhancement in this area is also something evident/accepted, and does not need 
to be argued about either :}

> (Of course, there are other scenarios that would be harder to deal with. 

Scenarios, variations and brainstorming for possible/best solutions is another 
interesting direction, though it shouldn't maybe done in this very thread. 
But am I wrong to think that OpenSolaris "networking-discuss" is the wrong 
forum to toss and kick these ideas around? :)

> For instance, if a better route [longer prefix] matches and
> puts the packet on the "wrong" interface given the source address,
> what do you do? I would argue that this is exactly what's supposed to
> happen based on the kernel forwarding table, and if you want a
> different result, then you need something like finer-grained routes,
> or even something exotic like source-based routing.)

It depends. This may be redirected traffic for transparent proxy situations.
Or a route to some hidden NAT/IDS/LB system - and the admin wants the 
traffic to get there regardless of the "non-matching" interface address.

For an L3 router there would be no "matching" interface anyway.

I think clever use of "route -ifp" can solve the issue of picking an interface
for the gateway for a particular admin's situation. At least in the current
routing model, where the destination alone determines the route selection ;)

> For what it's worth, I think of this in completely different terms.
> "Routers" are just hosts that happen to be configured to forward.

I don't think so. The "routers" are specifically configured for their 
job. More attention is paid to networking configuration of "routers", 
and when they break - it's everybody's problem. Unlike "hosts" which 
are jumpstarted in dozens or thousands all with the same template, 
and possibly use routing setup received via DHCP. It is inefficient 
to pay as much attention (time -> salary) to configuration of each 
host in the legion (and due to this simlpicity they should best work 
with default or easily assignable settings).

Configuration to forward is just one aspect of a router. They may 
also have more weird setups of ipfilter (NAT, redirections, as well 
as simply firewalls), strange routing algorithms and routing software 
(in.routed, quagga/zebra, gated are just a few), they can implement 
VPN software endpoints, etc.

> The fact that this issue happens to affect this one host that you're
> working with makes it visible to you as a host-related problem, but I
> don't think that makes it less interesting for routers.

> The same issue is visible with L3 routers, because the very same 
> decision process is going on internally -- if you have a choice of 
> interfaces to send on, picking one with a subnet match for the source
> address of the packet would be a good idea, if such a choice is possible.

True, my complaint so far involved the "host" scenario where the gateway 
selection based on packet's source address can be easilyimplemented by 
comparing this source address with the system's current configured 
addresses, and picking the output interface (and gateway) as a result.

However on a "router" one can arguably afford going through the trouble 
of devising, testing and maintaining more exotic configurations. Unlike 
a "host" referencing its configured (and UP'ed) interfaces, you'd have 
to store subnet-specific configuration somewhere on a "router", at the
very least.

Let me elaborate on an example, so that, hopefully, I can explain my 
idea of real-life differences between a "mere host" and a "router".

As one pointer which I haven't yet thoroughly checked myself, the 
quagga/zebra routing software suite seems to support the syntax of 
Cisco PBR (policy-based routing). It is mentioned in the project's 
documentation, but I haven't used that under Solaris so I don't know 
if this feature is actualy implemented and backed by the OS kernel
(we use it successfully on one our Cisco 3750's, though with serious 
usage limitations imposed by the vendor in the firmware). 

This feature allows to match IP ACL's for source addresses and do
some actions as a result - set a next-hop gateway, set an output
interface, modify the QOS/TOS bits, whatever in general (limited
by firmware and/or hardware in practice).

On one hand, zebra.conf could be quite an acceptable place to store 
all these subnet-specific configurations to pick the "more correct" 
next-hop gateways and/or interfaces. On the other hand, I'd hate to 
maintain thousands of copies of this file on each and every host,
and change them all whenever some specific subnet is added. This
should all be taken care of centrally, on one router (or a small
manageable amount of them in failover, etc).

> They're otherwise identical in terms of IP behavior, and that 
> identical behavior is a good thing: it means that IP's algorithms
> work the same everywhere. In a datagram network, special == bad.

When you put it this way, I find it hard to find disagreement arguments.
But something remains "fishy". I hope my arguments above uncover some
of that ;)

Nonetheless, I hold on to my opinion that due to any number of 
causes not limited to poor planning and lack of unlimited resources, 
there happen to be special cases out there, even in networking.
It is a heterogenous world, for best or worse.

When this happens, these special cases are best processed at some
centralized point (even if at the price of non-default configuration
and utilizing less common software stacks).

Let's call this centralized weird-configurations point a "router" 
and oppose it to standardized rarely-changing simple-configurations 
"hosts". And keep in mind that "hosts" can be unmodifiable in their
nature (i.e. you just can't influence the TCP/IP stack of a smartphone,
but you may still have to provide it with multi-ISP connectivity with
your Solaris router and a wi-fi access point).

> At a base level, I just reject the distinction between "host" and
> "router." It's never been terribly useful, and has actually
> contributed a lot of harm to the Internet. (For instance, ICMP
> redirects and router discovery, two completely horrible mechanisms
> that ape routing protocols poorly, are based on this false
> distinction.)

I wouldn't be so hard on them. I have working examples where ICMP
redirects help keep the end-hosts configuration simple (one default
router and the local IP+subnet are the only predefined entries), 
while several routers on the same subnet are actually used to 
connect to different special routes. That is, these connections
don't pass "through" the default router all the rime; it refers 
the sending hosts to correct secific local routers.

Arguably, local RIPv2 (or OSPF, whatever) would be better, but:
1) not all routers (including wi-fi access points) can send/process 
   RIP messages;
2) not all host devices' OSes have RIPv2 clients at all. None have
   it enabled by default. Enabling them won't solve the problems 
   for all hosts, but would be a large undertaking in itself.

Apparently, maintaining static routes on each host (if possible at
all) is even less practical. It is not their job.

> The lines are much blurrier than I think you're suggesting.

While this discussion is limited to Solaris software, I'd like to
remind that "routers" often differ from "hosts" not only in their
software (tasks, tunings), but also in hardware which is better 
suited for certain networking tasks (and may be less suited for
general processing). And when the difference is some quirk of a
general-purpose CPU architecture (like longer processing queues
and arguably inefficient processing of "quick short" interrupts)
this becomes important for Solaris-related discussion.

One can dedicate a server better suited for networking to be a
router, and use other servers optimized for heavy number-crunching 
to heat the planet. And they would all run the same build of Sun
Solaris - the favorite OS ;)

And with their different roles they may still have to handle 
similar-looking tasks very differently.

//Jim
-- 
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to