It looks like I missed a fun discussion while I was out on vacation.
[email protected] writes:
> I looked at the design doc. It describes the existing behavior and
> mentions that we are not currently implementing a "dad in progress"
> state, though we could in the future. Other than that, I did not see
> any discussion of the trade-offs. Then I looked at the code to see
Appendix B probably has the discussion that you're looking for.
The central issue is that the IFF_UP flag serves two distinct
purposes, and *BOTH* of these purposes are well-known to third-party
applications and thus can't be changed easily. It's been this way
since the early days of BSD. Those two distinct roles are:
1. The flag indicates administrative intent. To set an interface
up, you read the current flags with SIOCG{,L}IFFLAGS, set the
IFF_UP bit, then write back the new value with SIOCS{,L}IFFLAGS.
Yes, the interface that those Berkeley people designed has a
patently obvious race condition if you have more than one
process running, but it's well known and (at least historically)
works often enough to be usable.
2. The flag indicates interface usability. Applications can read
the list of interfaces with SIOCG{,L}IFCONF, get the flags, and
skip over interfaces that don't have IFF_UP set. The check for
IFF_UP is de rigueur; everyone has to do that, because many of
the IP interfaces will _FAIL_ if you ignore it and try to use
down interfaces. On a good day, you might see that the
application also checks for IFF_RUNNING. Many applications
don't, because on old BSD systems, IFF_RUNNING was almost always
nailed up and had nothing to do with actual interface status.
With respect to the DAD project, I wanted compatibility with existing
applications, *and* I wanted duplicate detection to work with those
applications, and not just have it bolted on the side of particular
utilities as it was before.
Because of this, I wasn't able to introduce a new flag or command to
mean "please bring this up when you're able," because that would
either break or render ineffective existing administrative
applications. Nor could I leave IFF_UP down until DAD was done, as
that would also break administrative applications. It was a careful
dance, all of which I documented in that design specification.
> /*
> * If the user is trying to mark an interface with a
> * duplicate address as "down," or convert a duplicate
> * test address to a data address, then fetch the
> * address and set it. This will cause IP to clear
> * the IFF_DUPLICATE flag and stop the automatic
> * recovery timer.
> */
> (I think the comment could use some clarification- in the first
> line, it's trying to mark the *logical* interface as down, not the
> physical, right?)
Yes. It's impossible to mark a physical interface as "down" on
Solaris in any explicit way, because the IFF_UP flag is only on the
ipif. So, yes, this comment refers to the logical interface.
> In the case described as "convert a duplicate test
> address to a data address", the data address is still going to be
> duplicate until DAD succeeds, right?
Not exactly.
When you administratively set the address on an interface (regardless
of what you set it to), the kernel assumes that this is a fresh start
for that address: the administrator has provided a "new" address (even
if it's one we previously used) that should be subjected to the same
DAD test. Therefore, it clears out the IFF_DUPLICATE flag and
disables the recovery timer (if any), and resets the address ready
flag.
The ifconfig code is taking advantage of that logic. If someone is
trying to mark an interface as "down," and it's already in a
DAD-failed state, then the only logical interpretation is that he
wants to clear out the DAD failure and leave it marked "down."
An alternative (but equivalent) explanation is that by saying
"ifconfig xxx down", the administrator wants to have the interface put
in a specific known state, without possibility of odd side-effects,
and that's exactly what this does: the interface consistently ends up
as "down" (~IFF_UP) with no DAD failure (~IFF_DUPLICATE).
Or consider the race condition: someone looks at an interface and sees
that it's "UP." He realizes that's a mistake and utters "ifconfig
<intf> down". If the interface "sometimes" ends up in IFF_DUPLICATE
state and sometimes doesn't, then that seems like more of a bug than a
feature.
> A duplicate address should already be down, so why can't we just delete
> that address?
Delete here means unplumb. I don't follow, because those (down and
unplumb) are currently very different operations.
Erik Nordmark writes:
> 1. The assumption that a SIOCSLIFFLAGS with IFF_UP for a logical
> interface is faster/more efficient than a SIOCSLIFADDIF. (Which I doubt
> is the case.) Perhaps the performance reason is about avoiding to dig
> through the SMF repo to find the information before passing it to
> SIOCSLIFADDIF?
I doubt that's the issue.
> 2. The convenience of using the kernel as a repository for inactive
> information. I don't know how far we want to push in that direction.
That looks to me like the real issue, but it's much more than just
allowing the use of the kernel as a repository:
a. This allows the user to set up the interface exactly as desired
using exactly the same tools that would be used to configure a
"live" interface. The only difference between the two would be
the state of the IFF_UP bit, and minimal difference means high
confidence in configuration.
b. The same tools can be used to observe the intended configuration
and verify it. Storing the configuration "elsewhere" would mean
that the user would have to use different programmatic
interfaces, and likely separate administrative interfaces, to
read out configuration depending on whether the interface is
currently "live" or "standby."
c. Exactly the same kernel-based validation of configuration
parmeters is done. There's no danger that the duplicated checks
built into some user-space repository of interface configuration
information will somehow be out of step with the kernel's
checks, resulting in configuration that "looks good" but then
fails right at the moment when you need it most -- when
activating the standby interface after a failure.
d. ifIndex assignment is done at plumbing time. Having the
interface all plumbed up and ready to go means that you've got a
nice ifIndex on which to hang your hat. It's useful if you're
going to (for instance) poll the interface statistics.
e. Storing information differently for "live" versus "standby"
means that you've introduced timing problems for applications
that read or write that configuration. This sounds like the old
IPMP again.
--
James Carlson, Solaris Networking <[email protected]>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677
_______________________________________________
networking-discuss mailing list
[email protected]