Jim Klimov writes: > As the morning comes, some of my thoughts got straight(er). > > >> My understanding is that in order for L2 ethernet frames with payload to go > >> between two hosts, their L3 IP "addresses" must be in the same L3 "subnet" > >> as > >> defined by network address bits and the mask size. > > That doesn't sound quite right. > > Well, I do think now that I phrased it clumsily ;( > > It referred to the second case of the lookup algorithm you explained. > Specifically > the part where "destination address matches an IP subnet configured on some > interface but is not a local address". This "matching" is the result of binary > ANDing of the two host addresses with a subnet mask and comparing the > resulting "network address" values. If they are the same, the two hosts are > "on the same subnet".
Correct. > The culprit I think is that "matching" in all the cases you described occurs > with > local IP addresses configured on *some* (any) local interface. More precisely, it occurs in the same way that routing lookups occur -- by looking for the longest prefix match. In fact, interface subnets are represented in the routing table with specially marked routing table entries. It's all one machine, though the text descriptions break these things down to make it a little easier to understand. > In this sense the > host's local addresses are somewhat involved in the process, but with no > regard > to one of them being the L3 packet's source address (as you reminded so many > times ;) ). The "somewhat involved" part is that the local addresses configured on the interfaces are represented as /32 (host route) entries in the forwarding table. Thus, if they match at all, they're always the best match possible (longest prefix length). > However this matching interface is not necessarily used to push the packet > onto the network. This kinda makes sense in ipmp, lacp and other cases of > many-to-many interface-to-address relations, as well as for virtual > interfaces > (i.e. a public IP hosted on the loopback); but this seems flawed for > one-to-one > relations of an [aliased] interface and an IP address. I don't follow that part. The matching of the route (based on destination IP address alone) specifies the output -- period. That's how it works. There's no other magic going on for normal IP transmission. IPMP does throw a twist into it, but it's still understandable in the above scheme: the group is treated as though it were one interface as far as IP is concered, and the outputs are fungible. ECMP (not implemented on Solaris yet) throws in another twist, but still understandable: you can have multiple routes that match equally well (same subnet mask / prefix length), and you select among them using a flow-based hash. The missing part (and the part we've been belaboring here) is that Solaris doesn't make any intelligent use of multiple equivalent matches. If there are equal matches on the destination address, we just pick one arbitrarily rather than using the source address to add some "flavor." That's arguably a bug, and is at the root of what you're seeing. (Of course, there are other scenarios that would be harder to deal with. For instance, if a better route [longer prefix] matches and puts the packet on the "wrong" interface given the source address, what do you do? I would argue that this is exactly what's supposed to happen based on the kernel forwarding table, and if you want a different result, then you need something like finer-grained routes, or even something exotic like source-based routing.) > >> Yes - but only for an L3 router device ;) > > The differences between an L3 host and an L3 router are vanishingly > > small in real life. Plus, Solaris *is* a fully functional L3 routing system. > > In fact, the differences exist (or I get them differently than you do, again > ;} ). > > These differences are implemented in Solaris as interfaces ifconfig'ed with or > without a ROUTER flag, as ndd keywords like ip_forwarding and so on. Those two are actually the same thing -- they're tied together internally. They enable IP forwarding for a given interface. (The ndd interface is legacy and should mostly be going away. The ROUTER flag is the "real" one.) > So while > Solaris is indeed an L3 OS, it does behave differently when it's configured as > a mere host or as a router/forwarder. And that's a good thing (availability of > choice as well as configuration methods). > > Semantically, an L3 host is a host with an L3 (IP) address assigned. This > host is > only responsible for processing IP addresses assigned to some of its > interfaces. The L3 "host" still must deal with any configured interfaces it has (with whatever subnet masks are set) and any routes that have been installed on it by any means (static or from a routing protocol of some sort). It's in that part that the output interface choice based on destination address shows up, and it's actually not different from how a router does the same job. The lines are much blurrier than I think you're suggesting. > An L3 router is a host also responsible for processing IP addresses not > assigned > to one of its interfaces (i.e. forwarding other hosts' packets between its > physical > or aliased interfaces). True. For what it's worth, I think of this in completely different terms. "Routers" are just hosts that happen to be configured to forward. They're otherwise identical in terms of IP behavior, and that identical behavior is a good thing: it means that IP's algorithms work the same everywhere. In a datagram network, special == bad. > I think most of this discussion with the idea that the source IP address > should be > used in selection of a source L2 interface and the intermediate gateway (from > the same IP subnet as the source IP address) is most relevant in the "L3 > host" > usecase. I fundamentally disagree with that. The same issue is visible with L3 routers, because the very same decision process is going on internally -- if you have a choice of interfaces to send on, picking one with a subnet match for the source address of the packet would be a good idea, if such a choice is possible. The fact that this issue happens to affect this one host that you're working with makes it visible to you as a host-related problem, but I don't think that makes it less interesting for routers. At a base level, I just reject the distinction between "host" and "router." It's never been terribly useful, and has actually contributed a lot of harm to the Internet. (For instance, ICMP redirects and router discovery, two completely horrible mechanisms that ape routing protocols poorly, are based on this false distinction.) > Perhaps this "src interface" addition to the address lookup algorithm > should be enabled by another ndd keyword (dladm keyword, whatever). After all > the QA cycles this keyword may become enabled by default :) I'd just implement it and be done with it. If anyone needs randomness, we can go in later and add a flag to enable random (or at least unpredictable) output selection. But adding yet another flag to a confusing mix as a peremptory move doesn't look like a good idea. > It is my belief that an L3 host as a more "mass product" should work out of > the > box with minimal configuration for the most common scenarios, at least by > default ;) Yep. Which is why I'm opposed to having a flag for this. The bug we're talking about should just be fixed. > > Nobody at Sun thinks it's perfect, either. ... That's being fixed now. > May I put these on a wall someday? Since some CR's are dated in the 1990's, > this quote should hang someplace near Mozilla's "three-year-old new bug" ;} Yes, the bug fixing process can indeed be slow, depending on the nature of the bug. You're always welcome and encouraged to contribute code instead of complaint. ;-} -- James Carlson 42.703N 71.076W <[email protected]> _______________________________________________ networking-discuss mailing list [email protected]
