Re: [networking-discuss] Multihomed host default routes problem

Jim Klimov Thu, 25 Jun 2009 01:56:42 -0700

Hi, Darren

> Having read this thread, I wonder if this problem came about because:
> * originally the initial route used with a socket was
> determined by the path the initial packet was received
> (seems like a reasonable optimisation);


If by "originally" you mean "many years ago", then maybe so - but I have no 
idea.
CR's to fix the thing (back?) seem to date at least 12 years back, though ;)

This optimisation was somewhat like the behavior I would expect, but my 
practice is different: incoming packets come in from an ISP's router on a 
certain 
(well-determined) interface where I expected them to be, but outgoing packets 
are issued on any interface which is in a subnet with any matching gateway
(default in my case), sometimes to the "wrong" ISP's router.

This way some connections remain half-opened (hang in SYN_RCVD) since the
initial packet came okay, but no further dialog progressed due to no two-way
communication happening.

> * this was perceived by someone as a "bug" because it
> didn't "properly" select "any" network interface for
> packets going out;
> * the "bug" got fixed.

Maybe so. As I said, I don't know pre-history. It seems related to "Strong ES"
model to which James objects and it's not my competence to decide or even
vote - whether it should be fully implemented and/or set as default. But 
it seems that we already have some parts of this, need some other parts, 
and don't want to have it whole ;) 

As far as I got it, "Strong End Systems model" requires a couple of behaviors 
from a multihomed host: 
1) Drop packets with a destination IP address, which arrive to an interface 
which does not have this address configured. This can be effected in Solaris 
by an ndd flag ip_strict_dst_multihoming=1.
2) Outgoing datagrams are restricted to the interface which corresponds with 
the source ip address. AFAIK, this is not implemented in Solaris.

As RFC1122 summary of Strong ES vs. Weak ES states, "the source address 
is included as a parameter in order to select a gateway that is directly 
reachable on the corresponding physical interface" - that's more or less 
what I want. However, "Note that this model logically requires that in
general there be at least one default gateway, and preferably multiple 
defaults, for each IP source address" - that's what nobody seems to want
nowadays ;)

And I did not find any relevant documented flags to effect the algorithm I 
need. 
So it either never existed, or it was fixed "for good" with no fallback methods.

In my previous posts I suggested that if this whole problem is fixed by tweaking
the algorithm (to make some affinity between source addresses of an outgoing 
interface and a gateway address for a packet) - this algorithm change should be
en-/dis-abled with a flag.

James objected to flagging that change ("Yep. Which is why I'm opposed to 
having a flag for this. The bug we're talking about should just be fixed"), but 
I still think it's reasonable to flag it. If Darren't guess that the bug was 
already
"fixed" once is correct, and if it was done in a similar one-way manner, that's 
another solid argument to make it a flagged feature ;)

Maybe it should be a flag for a kernel route (like the existing -ifp which does
not do this trick), or a flag to an interface (not unlike the ROUTER flag), or
a system-wide behavioral change - I don't know, and it's not for me to decide.

It is my understanding that since this behavior is around for such a long time
(if not always), there may be set-ups which rely on it. For these users such
behavior would actually be not a "bug" but a "feature". That's why I think it 
should not be made impossible to revert to using the currently existing 
algorithm. 
That's also why I think that enabling the fixed algorithm by default setting of 
the flag should take a few OS version or patch releases - in order for admins 
to become generally aware of the forthcoming change, so they can configure a 
determined and desired value of the flag in their current systems.

I know there's little pleasure in a system stopping working the way it used to 
work forever, after patching, just because some software engineer thought his
new features are so way cool that they should be default. I did that myself a 
few times when developing useful stuff... I found that undeclared changes which 
break the dreaded "legacy compatibility" work poorly when I'm not the only one 
using the code (or even when I'm mass-updating older systems with obsolete
versions of my own software - and my defaults changed between versions). 
Now I try to plan ahead so I'd never do that again ;}

And I respect Sun Microsystems for taking the effort to suport a decade or more 
of its customers' investment in training, existing practices and configuration 
of *working* systems ("If it ain't broke - don't fix it" is quite a motto - 
please
don't unpredictably break existing deployments).

//Jim
-- 
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Multihomed host default routes problem

Reply via email to