[clearview-dev] [networking-discuss] Clearview IPMP Phase 2 code review [due 10/24]

James Carlson Mon, 27 Oct 2008 17:12:26 -0400

Peter Memishian writes:
>  > >  > +++ IPMP-aware userland changes
>  > >  > 
>  > >  > usr/src/cmd/cmd-inet/sbin/dhcpagent/bound.c
>  > >  > 
>  > >  >   360: what happens if the IPMP test addresses handed out via DHCP are
>  > >  >   on a separate subnet and thus need a separate set of probe targets
>  > >  >   (i.e., "routers")?  One of the weird IPMP things we still do is that
>  > >  >   we use the gateway addresses as probe targets, and this code will
>  > >  >   limit the source of that information.
>  > > 
>  > > In that case, the administrator is expected to explicitly configure any
>  > > probe targets -- e.g., by using dhcpinfo.  As you know, I regard the fact
>  > > that the DHCP client directly configures routes at all as a bug.
>  > 
>  > It's an odd bit of functionality, then.  Maybe the right question for
>  > me to ask is, "what would happen if you allowed pif_under_ipmp
>  > interfaces to use default routes like the others?"
>  > 
>  > Would it fail?
> 
> The kernel will convert the route add request so that the requested
> interface is its associated IPMP interface, and that route would likely
> already exist, and thus fail with EEXIST.


As I noted above, I'm interested in the case where the IPMP interface
is intentionally on a different subnet.  That's not likely to lead to
EEXIST.

I agree that there's a bunch of unresolved brokenness underneath here,
though.

>  > Doesn't the Clearview observability node give you "/dev/net/ipmp0" to
>  > snoop on, even if IPMP itself doesn't have a "datalink?"
> 
> No, it gives you a /dev/ipnet/ipmp0, which is only available with the -I
> snoop option.  Further, the -I option requires an argument -- so this code
> applies only to snooping datalinks, not IP interfaces.

OK, that's the part I was missing.

It seems a _little_ lame, as the intentional functionality in snoop
was that if you run it without "-d", you get the "primary" interface
(whatever that means), and now you don't really get that.

But ok.

>  > >  > usr/src/lib/libsocket/inet/interface_id.c
>  > >  > 
>  > >  >   147: while here, please fix this; bump the count up and (if
>  > >  >   possible) add the redo logic.  I doubt anyone's going to touch this
>  > >  >   stuff again soon.
>  > > 
>  > > The function is junk, as is if_nameindex().  I'd want to rewrite them
>  > > rather than tweak them, but that's just too far out-of-scope for IPMP.
>  > > I'll file a CR if you'd like though.
>  > 
>  > I wasn't suggesting a rewrite, but rather a minor sprucing as long as
>  > you're in the neighborhood.  There's no reason these functions should
>  > be allowed to fail just because plumbing activity is going on at the
>  > same time the call is made.
> 
> I've tried three times now to go in and "spruce" it up, and I can't do it.
> It always ends up becoming a rewrite, and then I have to abandon ship.
> My offer to file a CR is still open :-)

Adding "+ 10" on line 147 so that numifs isn't so likely to be too
small leads to a rewrite?  I don't quite follow.  I agree that the
code needs more help than this ... but I don't see how the sweater
unravels from that one line.

>  > >  > usr/src/cmd/rcm_daemon/common/ip_rcm.c
>  > [...]
>  > >  >   63-70: negative logic is hard to read (and wtf?).
>  > > 
>  > > Not my code, but this is a clever trick to allow lint to do format string
>  > > checking (since otherwise the function call obscures the format string 
> for
>  > > lint).
>  > 
>  > If lint's argument checking doesn't work with the normal gettext
>  > stuff, then I think that's a bug.  We should have the necessary
>  > #define in a central place, not littered in source files.
> 
> I could file a CR to have something added to <libintl.h>, if you like.

Yes, please.  This just looks bogus to me.

>  > >  > +++ IPMP userland core
>  > 
>  > >  > usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_tables.c
>  > [...]
>  > >  >   1384-1387: nit(?): it looks like you can't detach an interface with
>  > >  >   a duplicate MAC address.  I guess that shouldn't happen, but it
>  > >  >   seems like an odd corner case.
>  > > 
>  > > True.  I'm not sure if it's worth worrying about this case, but I think 
> we
>  > > could handle it by adding an explicit "offline reason" argument to
>  > > phyint_offline(), and have phyint_offline() do the close even if
>  > > pi_hwaddrdup is set as long as the offline reason is not "hwdup".
>  > 
>  > If that did happen ("local-mac-address?=false", maybe), how would the
>  > user recover?
> 
> They'd need to either put it into another group or change the hardware
> address through ifconfig.

How could you put it into another group if you can't detach it?

I guess we'll need to document the "invent a temporary hardware
address" work-around somewhere.

>  > >  > usr/src/uts/common/inet/arp/arp.c
>  > 
>  > >  >   620: I don't think I understand this logic.  Why delete on
>  > >  >   ace_xmit_arl match?  Shouldn't that just cause us to spin the
>  > >  >   ar_ipmp_lookup_xmit_arl wheel instead to pick a new output?
>  > >  > 
>  > > 
>  > > Because if there's an ace_xmit_arl match in the non-IPMP case, then
>  > > ace_arl will also match (no change in behavior), and if there's an
>  > > ace_xmit_arl match in the IPMP case, then this is an intentional purge of
>  > > the ACE triggered by an unplumb of an IP interface in a group, which IP
>  > > controls.  For instance, if this is an ACE_F_PERMANENT ACE, then after 
> the
>  > > ar_ce_delete_per_arl(), IP will ask ARP to create a new ACE for that IP
>  > > address with a different ace_hw_addr.  There's no sense in having the old
>  > > ACE_F_PERMANENT hanging around with a NULL ace_xmit_arl and a stale
>  > > ace_hw_addr.
>  > 
>  > OK ... I guess what's unclear to me is when IP is in control of the
>  > interface selection and when the ARP module does it.  It seems like
>  > there are cases where the ARP module can do this on its own.
> 
> Basically, for ACE_F_PERMANENT entries, IP is in control of the creation
> and interface selection by passing AR_ENTRY_ADD messages that have a
> hardware address specified.

What has me confused here is the logic in ar_ipmp_lookup_ipmp_arl.  It
seems that the ARP module is trying to help out IP by trying several
different means to find a usable arl_t -- first MAC address, then IP
address, and then finally just "any," but that we do this only once.

Why is it ok for this function to choose a different arl_t than the
one specified by IP, but not ok for ARP to choose a different arl_t
when needed?

> I agree that there are some subtle cases -- e.g., cases where ARP can
> safely delete the ACE "knowing" that IP will recreate it.  I think we can
> make these interactions more obvious once IP and ARP are combined, but I
> was reluctant to expand the ARP/IP message-passing interface beyond what I
> needed to get the job done.

OK.

>  > >  >   654-686: if we just did 'ace->ace_arl = arl;', would that be enough?
>  > > 
>  > > I don't think it's appropriate to always keep these ACEs -- e.g., if it
>  > > was an ACE for a proxy ARP entry on the IPMP interface, we don't want 
> that
>  > > ACE to linger behind on the removed interface.  (Note that all we're 
> doing
>  > > here is ar_ce_delete(); the comments make the code seem more daunting.)
>  > 
>  > What bugs me is that we're removing entries (possibly "permanent"
>  > ones) that are on the upper-level IPMP interface, and we're removing
>  > them only because the currently selected lower-level interface went
>  > away.
> 
> But that went away because IP told us it was going away, and since IP
> knows what ACE_F_PERMANENT entries it put where, it's smart enough to know
> how to recreate them.  (This ties in with my "subtle cases" comment above.)

This doesn't have any exceptions in it so that it just deletes
ACE_F_PERMANENT or things controlled by IP.  It deletes everything,
including outstanding queries (and the queued packets on them), static
ARP entries set up by a user, entries trying to do DAD, and learned
addresses.

> Note the call to ipmp_illgrp_refresh_arpent(), which will walk through all
> of the proxy ARP entries associated with IPMP data addresses and will
> issue AR_ENTRY_ADD messages to ARP, asking those ACEs to be recreated on
> other active ARLs in the group (if no active ARLs exist, then the ACEs
> will be recreated once an interface becomes active via ipmp_ill_activate()
> -> ipmp_illgrp_refresh_arpent()).

OK ... the missing bit of logic, at least for proxy ARP, is in the
"IPMP kernel core" code that I didn't review.  ;-}

>  > This is a long-standing bug.  We shouldn't be receiving an ARP message
>  > on one interface and turning around to update the ARP cache for a
>  > completely different network!
> 
> Good point, and agreed.  Changed.  That said, I wonder if we'll see any
> problems with (offiically unsupported, but somewhat functional) configs
> where multiple IP interfaces are on the same network without being placed
> into an IPMP group.

If you're on the same network and not in an IPMP group, then you
should be receiving the same ARP message multiple times -- once on
each interface.  It should just work.

> The second one is required -- and is why we still need gettext() calls.
> But because we're using an .xcl file and "xgettext -a", we don't need to
> do (1), so we can do the gettext() calls right before calling fprintf(),
> rather than littering the call sites that pass in the strings.

OK; it's the use of fprintf() rather than die().  Seems obvious now.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

[clearview-dev] [networking-discuss] Clearview IPMP Phase 2 code review [due 10/24]

Reply via email to