Webrev with the latest diffs:

    http://cr.opensolaris.org/~meem/clearview-ipmp-cr-diff6

 > > I'd suspect some third-party IP monitoring/administration utilities will
 > > benefit from it, but I didn't have anything specific in mind.
 > 
 > It looks like something worth a heads-up of some sort, I think, though
 > I'm not sure where it'd do the most good.

OK, I've put this on my list of post-integration tasks.

 > >  > +++ IPMP-aware userland changes
 > >  > 
 > >  > usr/src/cmd/cmd-inet/sbin/dhcpagent/bound.c
 > >  > 
 > >  >   360: what happens if the IPMP test addresses handed out via DHCP are
 > >  >   on a separate subnet and thus need a separate set of probe targets
 > >  >   (i.e., "routers")?  One of the weird IPMP things we still do is that
 > >  >   we use the gateway addresses as probe targets, and this code will
 > >  >   limit the source of that information.
 > > 
 > > In that case, the administrator is expected to explicitly configure any
 > > probe targets -- e.g., by using dhcpinfo.  As you know, I regard the fact
 > > that the DHCP client directly configures routes at all as a bug.
 > 
 > It's an odd bit of functionality, then.  Maybe the right question for
 > me to ask is, "what would happen if you allowed pif_under_ipmp
 > interfaces to use default routes like the others?"
 > 
 > Would it fail?

The kernel will convert the route add request so that the requested
interface is its associated IPMP interface, and that route would likely
already exist, and thus fail with EEXIST.

 > >  > usr/src/cmd/cmd-inet/usr.lib/in.ndpd/main.c
 > >
 > > Indeed, but I think it could be done as:
 > > 
 > >         packetlen += add_opt_lla(pi, (struct nd_opt_lla *)pptr); 
 > >         pptr = &packet[packetlen]; 
 > 
 > That won't work.  packet[] is defined as an array of uint64_t.  If you
 > want to do that, you'll need to do this for the second line:
 > 
 >      pptr = (char *)packet + packetlen;

Indeed; the compiler discovered this too ;-)

 > >  > usr/src/cmd/cmd-inet/usr.sbin/snoop/snoop_capture.c
 > >  > 
 > >  >   142: does snoop support for the IPMP virtual interfaces arrive in
 > >  >   some separate project?  (I'm not clear on how IPMP rearchitecture
 > >  >   and Clearview observability relate.)
 > > 
 > > No, it arrives with this wad.  The check here is because this (highly
 > > questionable) loop is trying to choose a default datalink (not IP
 > > interface) to snoop on.  Since IPMP has no datalink, we skip IPMP IP
 > > interfaces.  Ultimately, snoop should use dladm_walk() to do this, not
 > > SIOCGLIFCONF.  But changing that seemed out-of-scope.
 > 
 > I'm still unclear on how that works.  It looks like this will just
 > find the member interfaces by default, and will not find an IPMP
 > interface.

It will find datalinks, not interfaces.  (This code leads to
/dev/net/<foo> being opened, not /dev/ipnet/<foo>)

 > Doesn't the Clearview observability node give you "/dev/net/ipmp0" to
 > snoop on, even if IPMP itself doesn't have a "datalink?"

No, it gives you a /dev/ipnet/ipmp0, which is only available with the -I
snoop option.  Further, the -I option requires an argument -- so this code
applies only to snooping datalinks, not IP interfaces.

 > >  > usr/src/lib/libinetutil/common/ifaddrlistx.c
 > >  > 
 > >  >   79: not that I care much, but why alloca here?  I thought most
 > >  >   avoided it unless it was needed or obviously safe.
 > > 
 > > Just following the same logic as ifaddrlist().  It could be a problem in a
 > > multithreaded program if there are lots and lots of IP addresses, though,
 > > so I'm willing to change both if you feel strongly.
 > 
 > I don't feel strongly about it; I was just surprised that alloca
 > showed up here.  Some others may be much more opposed to it, though.
 > (I know I've had to justify my use of it in the past.)

The more I think about it, the more I feel there's a chance we could blow
the stack on big configurations.  I've reworked the code to use realloc().
(Unfortunately, I accidentally pushed this before generating the rest of
the webrev; I'll get it reviewed separately.)

 > >  > usr/src/lib/libsocket/inet/interface_id.c
 > >  > 
 > >  >   147: while here, please fix this; bump the count up and (if
 > >  >   possible) add the redo logic.  I doubt anyone's going to touch this
 > >  >   stuff again soon.
 > > 
 > > The function is junk, as is if_nameindex().  I'd want to rewrite them
 > > rather than tweak them, but that's just too far out-of-scope for IPMP.
 > > I'll file a CR if you'd like though.
 > 
 > I wasn't suggesting a rewrite, but rather a minor sprucing as long as
 > you're in the neighborhood.  There's no reason these functions should
 > be allowed to fail just because plumbing activity is going on at the
 > same time the call is made.

I've tried three times now to go in and "spruce" it up, and I can't do it.
It always ends up becoming a rewrite, and then I have to abandon ship.
My offer to file a CR is still open :-)

 > Peter Memishian writes:
 > >  > +++ IPMP boot/DR support
 > >  > 
 > >  > usr/src/cmd/cmd-inet/usr.sbin/if_mpadm.c
 >
 > >  >   159-163,218-222: this was changed from an actual error case to a
 > >  >   warning.  Why?
 > > 
 > > There wasn't a strong reason, but the reasoning was that those are just
 > > sanity checks and not needed for a successful offline/undo-offline
 > > operation.  I've never seen them trigger though, so if you feel strongly,
 > > I can change them to fatal errors.
 > 
 > I was more curious about how it was related to testing: whether you
 > saw an actual problem here.  If you're comfortable with the new
 > functionality, then I'm ok with it.

I didn't see any problems.

 > >  > usr/src/cmd/rcm_daemon/common/ip_rcm.c
 > [...]
 > >  >   63-70: negative logic is hard to read (and wtf?).
 > > 
 > > Not my code, but this is a clever trick to allow lint to do format string
 > > checking (since otherwise the function call obscures the format string for
 > > lint).
 > 
 > If lint's argument checking doesn't work with the normal gettext
 > stuff, then I think that's a bug.  We should have the necessary
 > #define in a central place, not littered in source files.

I could file a CR to have something added to <libintl.h>, if you like.

 > >  >   I suspect that this would break some known Zones-related cases.
 > >  >   (E.g., one interface plumbed and up on the global zone only, another
 > >  >   plumbed but not up in the global zone, with shared-stack non-global
 > >  >   zones using logicals that second interface.)
 > > 
 > > If there's no test address (which is the case we're concerned about), then
 > > there will just be a 0.0.0.0 address instead (the data addresses are all
 > > off on the IPMP IP interface).  So I don't think bringing it up will have
 > > a big effect.
 > 
 > OK.  I was worried instead about the non-IPMP case where you have a
 > bunch of addresses up on the interface in separate zones, but the
 > zeroth instance is not up.

OK, I've added a check to make sure we're in an IPMP-configuration.

 > >  > +++ IPMP userland core
 > 
 > >  > usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_tables.c
 > [...]
 > >  >   1384-1387: nit(?): it looks like you can't detach an interface with
 > >  >   a duplicate MAC address.  I guess that shouldn't happen, but it
 > >  >   seems like an odd corner case.
 > > 
 > > True.  I'm not sure if it's worth worrying about this case, but I think we
 > > could handle it by adding an explicit "offline reason" argument to
 > > phyint_offline(), and have phyint_offline() do the close even if
 > > pi_hwaddrdup is set as long as the offline reason is not "hwdup".
 > 
 > If that did happen ("local-mac-address?=false", maybe), how would the
 > user recover?

They'd need to either put it into another group or change the hardware
address through ifconfig.

 > >  > usr/src/uts/common/inet/arp/arp.c
 > 
 > >  >   620: I don't think I understand this logic.  Why delete on
 > >  >   ace_xmit_arl match?  Shouldn't that just cause us to spin the
 > >  >   ar_ipmp_lookup_xmit_arl wheel instead to pick a new output?
 > >  > 
 > > 
 > > Because if there's an ace_xmit_arl match in the non-IPMP case, then
 > > ace_arl will also match (no change in behavior), and if there's an
 > > ace_xmit_arl match in the IPMP case, then this is an intentional purge of
 > > the ACE triggered by an unplumb of an IP interface in a group, which IP
 > > controls.  For instance, if this is an ACE_F_PERMANENT ACE, then after the
 > > ar_ce_delete_per_arl(), IP will ask ARP to create a new ACE for that IP
 > > address with a different ace_hw_addr.  There's no sense in having the old
 > > ACE_F_PERMANENT hanging around with a NULL ace_xmit_arl and a stale
 > > ace_hw_addr.
 > 
 > OK ... I guess what's unclear to me is when IP is in control of the
 > interface selection and when the ARP module does it.  It seems like
 > there are cases where the ARP module can do this on its own.

Basically, for ACE_F_PERMANENT entries, IP is in control of the creation
and interface selection by passing AR_ENTRY_ADD messages that have a
hardware address specified.

I agree that there are some subtle cases -- e.g., cases where ARP can
safely delete the ACE "knowing" that IP will recreate it.  I think we can
make these interactions more obvious once IP and ARP are combined, but I
was reluctant to expand the ARP/IP message-passing interface beyond what I
needed to get the job done.

 > >  >   631: this could use a different name, as it's really the handler for
 > >  >   an arl leaving the group.
 > > 
 > > Do you have something in mind?
 > 
 > Dunno ... ar_ce_group_leave, maybe?

But ARP doesn't track leaves, only deactivations.  Went with
ar_ce_ipmp_deactivate()

 > >  >   654-686: if we just did 'ace->ace_arl = arl;', would that be enough?
 > > 
 > > I don't think it's appropriate to always keep these ACEs -- e.g., if it
 > > was an ACE for a proxy ARP entry on the IPMP interface, we don't want that
 > > ACE to linger behind on the removed interface.  (Note that all we're doing
 > > here is ar_ce_delete(); the comments make the code seem more daunting.)
 > 
 > What bugs me is that we're removing entries (possibly "permanent"
 > ones) that are on the upper-level IPMP interface, and we're removing
 > them only because the currently selected lower-level interface went
 > away.

But that went away because IP told us it was going away, and since IP
knows what ACE_F_PERMANENT entries it put where, it's smart enough to know
how to recreate them.  (This ties in with my "subtle cases" comment above.)

 > If the entries are supposed to be there (if they are, say, proxy ARP
 > entries configured by an administrator), what will bring them back?

The IPMP logic is aware of this case -- e.g., inside ipmp_ill_deactivate():

        if (!ill->ill_isv6) {
                /*
                 * Tell ARP `ill' is no longer active in the group.
                 */
                mp = ill->ill_ardeact_mp;
                ill->ill_ardeact_mp = NULL;
                ASSERT(mp != NULL);
                putnext(illg->ig_ipmp_ill->ill_rq, mp);

                /*
                 * Refresh any ARP entries that had been using `ill'.
                 */
                ipmp_illgrp_refresh_arpent(illg);
        }

Note the call to ipmp_illgrp_refresh_arpent(), which will walk through all
of the proxy ARP entries associated with IPMP data addresses and will
issue AR_ENTRY_ADD messages to ARP, asking those ACEs to be recreated on
other active ARLs in the group (if no active ARLs exist, then the ACEs
will be recreated once an interface becomes active via ipmp_ill_activate()
-> ipmp_illgrp_refresh_arpent()).

 > >  >   992: as above, I don't think that updating unresolved entries for
 > >  >   other groups makes sense.
 > > 
 > > My rationale was to preserve parity with the non-IPMP case.  That is, the
 > > original code updated unresolved ACEs on different ARLs in the non-IPMP
 > > case, and the equivalent behavior in the IPMP case is to update unresolved
 > > ACEs on ARLs in different groups.  Could you elaborate on how you see the
 > > two cases as requiring different treatment?
 > 
 > It did that only because it was ignorant of IPMP, and the sketchy
 > functionality of resolving ACEs on a separate ARL was there so that
 > the IPMP code would kinda work.
 > 
 > Now that ARP knows about IPMP, I don't think we should need to be
 > permissive about this: either the ACE is in the same group (and thus
 > any ARL in that group will do), or it's in a different group (and no
 > match exists).
 > 
 > This is a long-standing bug.  We shouldn't be receiving an ARP message
 > on one interface and turning around to update the ARP cache for a
 > completely different network!

Good point, and agreed.  Changed.  That said, I wonder if we'll see any
problems with (offiically unsupported, but somewhat functional) configs
where multiple IP interfaces are on the same network without being placed
into an IPMP group.

 > >  >   3086: why did 'arl' get removed here?  It seems like it's handy as
 > >  >   we dereference ace->ace_arl multiple times.
 > > 
 > > My concern was that it'd be easy to use `arl' and not realize you were
 > > using ace->ace_arl where you really wanted ace->ace_xmit_arl.  I could
 > > reintroduce it and name it ace_arl, but it was unclear to me that it
 > > was useful to save five characters (ace->ace_arl vs. ace_arl).
 > 
 > It's not a big deal, but it'd be nice to see.

Found a simpler alternative: all places that used ace->ace_arl used it to
get to arl_wq, so I declared a local `arl_wq' (which removed a line wrap).

 > >  > usr/src/cmd/cmd-inet/usr.sbin/ipmpstat/ipmpstat.c
 > 
 > >  >   869-870: not sure what this means.  (I think it refers to sysevents
 > >  >   as 'GPECs' ... but I'm not sure what the comment is trying to say.)
 > > 
 > > General Purpose Event Channels -- PSARC/2002/321.  I'll add the PSARC case
 > > reference for clarity.
 > 
 > It's not just the reference, but why it matters here.

Because GPECs provide a `void *arg' that we can use to pass in context.
Further expanded comment.

 > >  >   1282: why both gettext here and a .xcl file?
 > > 
 > > Something has to translate "<field>" to its localized value.  I could do
 > > it as part of the gettext() on 1284, but that led to an ugly line wrap.
 > 
 > The part that confuses me is why you need to do gettext() at all.  I
 > *thought* you were using an ipmpstat.xcl file with this program so
 > that you didn't have to wrap all the strings with gettext().
 >
 > So ... why do these strings need to be wrapped, but the others don't?

The gettext() in the source does two things:

        1. Clue in xgettext(1) on which strings need to be translated
           at build-time.
        2. Actually perform the translation at run-time.

The second one is required -- and is why we still need gettext() calls.
But because we're using an .xcl file and "xgettext -a", we don't need to
do (1), so we can do the gettext() calls right before calling fprintf(),
rather than littering the call sites that pass in the strings.

I did notice one violation (that should never happen in practice):
enum2str() put "<unknown value %d>" into a buffer that won't be localized.
That string would also end up being hideous in tabular output, so I've
changed it to just be "<%d>" (again, if that code is reached, there's a
bug somewhere).

-- 
meem

Reply via email to