Jim Klimov writes:
> As the morning comes, some of my thoughts got straight(er).
> 
> >> My understanding is that in order for L2 ethernet frames with payload to go
> >> between two hosts, their L3 IP "addresses" must be in the same L3 "subnet" 
> >> as
> >> defined by network address bits and the mask size.
> > That doesn't sound quite right.
> 
> Well, I do think now that I phrased it clumsily ;(
> 
> It referred to the second case of the lookup algorithm you explained. 
> Specifically
> the part where "destination address matches an IP subnet configured on some
> interface but is not a local address". This "matching" is the result of binary
> ANDing of the two host addresses with a subnet mask and comparing the
> resulting "network address" values. If they are the same, the two hosts are 
> "on the same subnet".

Correct.

> The culprit I think is that "matching" in all the cases you described occurs 
> with
> local IP addresses configured on *some* (any) local interface.

More precisely, it occurs in the same way that routing lookups occur
-- by looking for the longest prefix match.  In fact, interface
subnets are represented in the routing table with specially marked
routing table entries.  It's all one machine, though the text
descriptions break these things down to make it a little easier to
understand.

> In this sense the
> host's local addresses are somewhat involved in the process, but with no 
> regard 
> to one of them being the L3 packet's source address (as you reminded so many 
> times ;) ). 

The "somewhat involved" part is that the local addresses configured on
the interfaces are represented as /32 (host route) entries in the
forwarding table.  Thus, if they match at all, they're always the best
match possible (longest prefix length).

> However this matching interface is not necessarily used to push the packet 
> onto the network. This kinda makes sense in ipmp, lacp and other cases of 
> many-to-many interface-to-address relations, as well as for virtual 
> interfaces 
> (i.e. a public IP hosted on the loopback); but this seems flawed for 
> one-to-one 
> relations of an [aliased] interface and an IP address.

I don't follow that part.

The matching of the route (based on destination IP address alone)
specifies the output -- period.  That's how it works.  There's no
other magic going on for normal IP transmission.

IPMP does throw a twist into it, but it's still understandable in the
above scheme: the group is treated as though it were one interface as
far as IP is concered, and the outputs are fungible.

ECMP (not implemented on Solaris yet) throws in another twist, but
still understandable: you can have multiple routes that match equally
well (same subnet mask / prefix length), and you select among them
using a flow-based hash.

The missing part (and the part we've been belaboring here) is that
Solaris doesn't make any intelligent use of multiple equivalent
matches.  If there are equal matches on the destination address, we
just pick one arbitrarily rather than using the source address to add
some "flavor."

That's arguably a bug, and is at the root of what you're seeing.

(Of course, there are other scenarios that would be harder to deal
with.  For instance, if a better route [longer prefix] matches and
puts the packet on the "wrong" interface given the source address,
what do you do?  I would argue that this is exactly what's supposed to
happen based on the kernel forwarding table, and if you want a
different result, then you need something like finer-grained routes,
or even something exotic like source-based routing.)

> >> Yes - but only for an L3 router device ;) 
> > The differences between an L3 host and an L3 router are vanishingly
> > small in real life. Plus, Solaris *is* a fully functional L3 routing system.
> 
> In fact, the differences exist (or I get them differently than you do, again 
> ;} ). 
> 
> These differences are implemented in Solaris as interfaces ifconfig'ed with or
> without a ROUTER flag, as ndd keywords like ip_forwarding and so on.

Those two are actually the same thing -- they're tied together
internally.  They enable IP forwarding for a given interface.

(The ndd interface is legacy and should mostly be going away.  The
ROUTER flag is the "real" one.)

> So while
> Solaris is indeed an L3 OS, it does behave differently when it's configured as
> a mere host or as a router/forwarder. And that's a good thing (availability of
> choice as well as configuration methods).
> 
> Semantically, an L3 host is a host with an L3 (IP) address assigned. This 
> host is
> only responsible for processing IP addresses assigned to some of its 
> interfaces.

The L3 "host" still must deal with any configured interfaces it has
(with whatever subnet masks are set) and any routes that have been
installed on it by any means (static or from a routing protocol of
some sort).

It's in that part that the output interface choice based on
destination address shows up, and it's actually not different from how
a router does the same job.

The lines are much blurrier than I think you're suggesting.

> An L3 router is a host also responsible for processing IP addresses not 
> assigned
> to one of its interfaces (i.e. forwarding other hosts' packets between its 
> physical 
> or aliased interfaces).

True.  For what it's worth, I think of this in completely different
terms.  "Routers" are just hosts that happen to be configured to
forward.  They're otherwise identical in terms of IP behavior, and
that identical behavior is a good thing: it means that IP's algorithms
work the same everywhere.

In a datagram network, special == bad.

> I think most of this discussion with the idea that the source IP address 
> should be
> used in selection of a source L2 interface and the intermediate gateway (from
> the same IP subnet as the source IP address) is most relevant in the "L3 
> host" 
> usecase.

I fundamentally disagree with that.  The same issue is visible with L3
routers, because the very same decision process is going on internally
-- if you have a choice of interfaces to send on, picking one with a
subnet match for the source address of the packet would be a good
idea, if such a choice is possible.

The fact that this issue happens to affect this one host that you're
working with makes it visible to you as a host-related problem, but I
don't think that makes it less interesting for routers.

At a base level, I just reject the distinction between "host" and
"router."  It's never been terribly useful, and has actually
contributed a lot of harm to the Internet.  (For instance, ICMP
redirects and router discovery, two completely horrible mechanisms
that ape routing protocols poorly, are based on this false
distinction.)

> Perhaps this "src interface" addition to the address lookup algorithm 
> should be enabled by another ndd keyword (dladm keyword, whatever). After all 
> the QA cycles this keyword may become enabled by default :)

I'd just implement it and be done with it.  If anyone needs
randomness, we can go in later and add a flag to enable random (or at
least unpredictable) output selection.  But adding yet another flag to
a confusing mix as a peremptory move doesn't look like a good idea.

> It is my belief that an L3 host as a more "mass product" should work out of 
> the 
> box with minimal configuration for the most common scenarios, at least by 
> default ;) 

Yep.  Which is why I'm opposed to having a flag for this.  The bug
we're talking about should just be fixed.

> > Nobody at Sun thinks it's perfect, either. ... That's being fixed now.
> May I put these on a wall someday? Since some CR's are dated in the 1990's, 
> this quote should hang someplace near Mozilla's "three-year-old new bug" ;}

Yes, the bug fixing process can indeed be slow, depending on the
nature of the bug.  You're always welcome and encouraged to contribute
code instead of complaint.  ;-}

-- 
James Carlson         42.703N 71.076W         <[email protected]>
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to