Re: [networking-discuss] Brussels II design doc (1.6)

James Carlson Tue, 28 Apr 2009 10:19:58 -0700

It looks like I missed a fun discussion while I was out on vacation.

[email protected] writes:
> I looked at the design doc. It describes the existing behavior and 
> mentions that we are not currently implementing a "dad in progress"
> state, though we could in the future. Other than that, I did not see
> any discussion of the trade-offs. Then I looked at the code to see

Appendix B probably has the discussion that you're looking for.

The central issue is that the IFF_UP flag serves two distinct
purposes, and *BOTH* of these purposes are well-known to third-party
applications and thus can't be changed easily.  It's been this way
since the early days of BSD.  Those two distinct roles are:

  1.  The flag indicates administrative intent.  To set an interface
      up, you read the current flags with SIOCG{,L}IFFLAGS, set the
      IFF_UP bit, then write back the new value with SIOCS{,L}IFFLAGS.
      Yes, the interface that those Berkeley people designed has a
      patently obvious race condition if you have more than one
      process running, but it's well known and (at least historically)
      works often enough to be usable.

  2.  The flag indicates interface usability.  Applications can read
      the list of interfaces with SIOCG{,L}IFCONF, get the flags, and
      skip over interfaces that don't have IFF_UP set.  The check for
      IFF_UP is de rigueur; everyone has to do that, because many of
      the IP interfaces will _FAIL_ if you ignore it and try to use
      down interfaces.  On a good day, you might see that the
      application also checks for IFF_RUNNING.  Many applications
      don't, because on old BSD systems, IFF_RUNNING was almost always
      nailed up and had nothing to do with actual interface status.

With respect to the DAD project, I wanted compatibility with existing
applications, *and* I wanted duplicate detection to work with those
applications, and not just have it bolted on the side of particular
utilities as it was before.

Because of this, I wasn't able to introduce a new flag or command to
mean "please bring this up when you're able," because that would
either break or render ineffective existing administrative
applications.  Nor could I leave IFF_UP down until DAD was done, as
that would also break administrative applications.  It was a careful
dance, all of which I documented in that design specification.

>                         /*
>                          * If the user is trying to mark an interface with a
>                          * duplicate address as "down," or convert a duplicate
>                          * test address to a data address, then fetch the
>                          * address and set it.  This will cause IP to clear
>                          * the IFF_DUPLICATE flag and stop the automatic
>                          * recovery timer.
>                          */
> (I think the comment could use some clarification- in the first
> line, it's trying to mark the *logical* interface as down, not the
> physical, right?)

Yes.  It's impossible to mark a physical interface as "down" on
Solaris in any explicit way, because the IFF_UP flag is only on the
ipif.  So, yes, this comment refers to the logical interface.

> In the case described as "convert a duplicate test
> address to a data address", the data address is still going to be
> duplicate until DAD succeeds, right?

Not exactly.

When you administratively set the address on an interface (regardless
of what you set it to), the kernel assumes that this is a fresh start
for that address: the administrator has provided a "new" address (even
if it's one we previously used) that should be subjected to the same
DAD test.  Therefore, it clears out the IFF_DUPLICATE flag and
disables the recovery timer (if any), and resets the address ready
flag.

The ifconfig code is taking advantage of that logic.  If someone is
trying to mark an interface as "down," and it's already in a
DAD-failed state, then the only logical interpretation is that he
wants to clear out the DAD failure and leave it marked "down."

An alternative (but equivalent) explanation is that by saying
"ifconfig xxx down", the administrator wants to have the interface put
in a specific known state, without possibility of odd side-effects,
and that's exactly what this does: the interface consistently ends up
as "down" (~IFF_UP) with no DAD failure (~IFF_DUPLICATE).

Or consider the race condition: someone looks at an interface and sees
that it's "UP."  He realizes that's a mistake and utters "ifconfig
<intf> down".  If the interface "sometimes" ends up in IFF_DUPLICATE
state and sometimes doesn't, then that seems like more of a bug than a
feature.

> A duplicate address should already be down, so why can't we just delete
> that address? 

Delete here means unplumb.  I don't follow, because those (down and
unplumb) are currently very different operations.

Erik Nordmark writes:
> 1. The assumption that a SIOCSLIFFLAGS with IFF_UP for a logical 
> interface is faster/more efficient than a SIOCSLIFADDIF. (Which I doubt 
> is the case.) Perhaps the performance reason is about avoiding to dig 
> through the SMF repo to find the information before passing it to 
> SIOCSLIFADDIF?

I doubt that's the issue.

> 2. The convenience of using the kernel as a repository for inactive 
> information. I don't know how far we want to push in that direction.

That looks to me like the real issue, but it's much more than just
allowing the use of the kernel as a repository:

  a.  This allows the user to set up the interface exactly as desired
      using exactly the same tools that would be used to configure a
      "live" interface.  The only difference between the two would be
      the state of the IFF_UP bit, and minimal difference means high
      confidence in configuration.

  b.  The same tools can be used to observe the intended configuration
      and verify it.  Storing the configuration "elsewhere" would mean
      that the user would have to use different programmatic
      interfaces, and likely separate administrative interfaces, to
      read out configuration depending on whether the interface is
      currently "live" or "standby."

  c.  Exactly the same kernel-based validation of configuration
      parmeters is done.  There's no danger that the duplicated checks
      built into some user-space repository of interface configuration
      information will somehow be out of step with the kernel's
      checks, resulting in configuration that "looks good" but then
      fails right at the moment when you need it most -- when
      activating the standby interface after a failure.

  d.  ifIndex assignment is done at plumbing time.  Having the
      interface all plumbed up and ready to go means that you've got a
      nice ifIndex on which to hang your hat.  It's useful if you're
      going to (for instance) poll the interface statistics.

  e.  Storing information differently for "live" versus "standby"
      means that you've introduced timing problems for applications
      that read or write that configuration.  This sounds like the old
      IPMP again.

-- 
James Carlson, Solaris Networking              <[email protected]>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Brussels II design doc (1.6)

Reply via email to