Peter Memishian writes:
> > > > +++ IPMP-aware userland changes
> > > >
> > > > usr/src/cmd/cmd-inet/sbin/dhcpagent/bound.c
> > > >
> > > > 360: what happens if the IPMP test addresses handed out via DHCP are
> > > > on a separate subnet and thus need a separate set of probe targets
> > > > (i.e., "routers")? One of the weird IPMP things we still do is that
> > > > we use the gateway addresses as probe targets, and this code will
> > > > limit the source of that information.
> > >
> > > In that case, the administrator is expected to explicitly configure any
> > > probe targets -- e.g., by using dhcpinfo. As you know, I regard the fact
> > > that the DHCP client directly configures routes at all as a bug.
> >
> > It's an odd bit of functionality, then. Maybe the right question for
> > me to ask is, "what would happen if you allowed pif_under_ipmp
> > interfaces to use default routes like the others?"
> >
> > Would it fail?
>
> The kernel will convert the route add request so that the requested
> interface is its associated IPMP interface, and that route would likely
> already exist, and thus fail with EEXIST.
As I noted above, I'm interested in the case where the IPMP interface
is intentionally on a different subnet. That's not likely to lead to
EEXIST.
I agree that there's a bunch of unresolved brokenness underneath here,
though.
> > Doesn't the Clearview observability node give you "/dev/net/ipmp0" to
> > snoop on, even if IPMP itself doesn't have a "datalink?"
>
> No, it gives you a /dev/ipnet/ipmp0, which is only available with the -I
> snoop option. Further, the -I option requires an argument -- so this code
> applies only to snooping datalinks, not IP interfaces.
OK, that's the part I was missing.
It seems a _little_ lame, as the intentional functionality in snoop
was that if you run it without "-d", you get the "primary" interface
(whatever that means), and now you don't really get that.
But ok.
> > > > usr/src/lib/libsocket/inet/interface_id.c
> > > >
> > > > 147: while here, please fix this; bump the count up and (if
> > > > possible) add the redo logic. I doubt anyone's going to touch this
> > > > stuff again soon.
> > >
> > > The function is junk, as is if_nameindex(). I'd want to rewrite them
> > > rather than tweak them, but that's just too far out-of-scope for IPMP.
> > > I'll file a CR if you'd like though.
> >
> > I wasn't suggesting a rewrite, but rather a minor sprucing as long as
> > you're in the neighborhood. There's no reason these functions should
> > be allowed to fail just because plumbing activity is going on at the
> > same time the call is made.
>
> I've tried three times now to go in and "spruce" it up, and I can't do it.
> It always ends up becoming a rewrite, and then I have to abandon ship.
> My offer to file a CR is still open :-)
Adding "+ 10" on line 147 so that numifs isn't so likely to be too
small leads to a rewrite? I don't quite follow. I agree that the
code needs more help than this ... but I don't see how the sweater
unravels from that one line.
> > > > usr/src/cmd/rcm_daemon/common/ip_rcm.c
> > [...]
> > > > 63-70: negative logic is hard to read (and wtf?).
> > >
> > > Not my code, but this is a clever trick to allow lint to do format string
> > > checking (since otherwise the function call obscures the format string
> for
> > > lint).
> >
> > If lint's argument checking doesn't work with the normal gettext
> > stuff, then I think that's a bug. We should have the necessary
> > #define in a central place, not littered in source files.
>
> I could file a CR to have something added to <libintl.h>, if you like.
Yes, please. This just looks bogus to me.
> > > > +++ IPMP userland core
> >
> > > > usr/src/cmd/cmd-inet/usr.lib/in.mpathd/mpd_tables.c
> > [...]
> > > > 1384-1387: nit(?): it looks like you can't detach an interface with
> > > > a duplicate MAC address. I guess that shouldn't happen, but it
> > > > seems like an odd corner case.
> > >
> > > True. I'm not sure if it's worth worrying about this case, but I think
> we
> > > could handle it by adding an explicit "offline reason" argument to
> > > phyint_offline(), and have phyint_offline() do the close even if
> > > pi_hwaddrdup is set as long as the offline reason is not "hwdup".
> >
> > If that did happen ("local-mac-address?=false", maybe), how would the
> > user recover?
>
> They'd need to either put it into another group or change the hardware
> address through ifconfig.
How could you put it into another group if you can't detach it?
I guess we'll need to document the "invent a temporary hardware
address" work-around somewhere.
> > > > usr/src/uts/common/inet/arp/arp.c
> >
> > > > 620: I don't think I understand this logic. Why delete on
> > > > ace_xmit_arl match? Shouldn't that just cause us to spin the
> > > > ar_ipmp_lookup_xmit_arl wheel instead to pick a new output?
> > > >
> > >
> > > Because if there's an ace_xmit_arl match in the non-IPMP case, then
> > > ace_arl will also match (no change in behavior), and if there's an
> > > ace_xmit_arl match in the IPMP case, then this is an intentional purge of
> > > the ACE triggered by an unplumb of an IP interface in a group, which IP
> > > controls. For instance, if this is an ACE_F_PERMANENT ACE, then after
> the
> > > ar_ce_delete_per_arl(), IP will ask ARP to create a new ACE for that IP
> > > address with a different ace_hw_addr. There's no sense in having the old
> > > ACE_F_PERMANENT hanging around with a NULL ace_xmit_arl and a stale
> > > ace_hw_addr.
> >
> > OK ... I guess what's unclear to me is when IP is in control of the
> > interface selection and when the ARP module does it. It seems like
> > there are cases where the ARP module can do this on its own.
>
> Basically, for ACE_F_PERMANENT entries, IP is in control of the creation
> and interface selection by passing AR_ENTRY_ADD messages that have a
> hardware address specified.
What has me confused here is the logic in ar_ipmp_lookup_ipmp_arl. It
seems that the ARP module is trying to help out IP by trying several
different means to find a usable arl_t -- first MAC address, then IP
address, and then finally just "any," but that we do this only once.
Why is it ok for this function to choose a different arl_t than the
one specified by IP, but not ok for ARP to choose a different arl_t
when needed?
> I agree that there are some subtle cases -- e.g., cases where ARP can
> safely delete the ACE "knowing" that IP will recreate it. I think we can
> make these interactions more obvious once IP and ARP are combined, but I
> was reluctant to expand the ARP/IP message-passing interface beyond what I
> needed to get the job done.
OK.
> > > > 654-686: if we just did 'ace->ace_arl = arl;', would that be enough?
> > >
> > > I don't think it's appropriate to always keep these ACEs -- e.g., if it
> > > was an ACE for a proxy ARP entry on the IPMP interface, we don't want
> that
> > > ACE to linger behind on the removed interface. (Note that all we're
> doing
> > > here is ar_ce_delete(); the comments make the code seem more daunting.)
> >
> > What bugs me is that we're removing entries (possibly "permanent"
> > ones) that are on the upper-level IPMP interface, and we're removing
> > them only because the currently selected lower-level interface went
> > away.
>
> But that went away because IP told us it was going away, and since IP
> knows what ACE_F_PERMANENT entries it put where, it's smart enough to know
> how to recreate them. (This ties in with my "subtle cases" comment above.)
This doesn't have any exceptions in it so that it just deletes
ACE_F_PERMANENT or things controlled by IP. It deletes everything,
including outstanding queries (and the queued packets on them), static
ARP entries set up by a user, entries trying to do DAD, and learned
addresses.
> Note the call to ipmp_illgrp_refresh_arpent(), which will walk through all
> of the proxy ARP entries associated with IPMP data addresses and will
> issue AR_ENTRY_ADD messages to ARP, asking those ACEs to be recreated on
> other active ARLs in the group (if no active ARLs exist, then the ACEs
> will be recreated once an interface becomes active via ipmp_ill_activate()
> -> ipmp_illgrp_refresh_arpent()).
OK ... the missing bit of logic, at least for proxy ARP, is in the
"IPMP kernel core" code that I didn't review. ;-}
> > This is a long-standing bug. We shouldn't be receiving an ARP message
> > on one interface and turning around to update the ARP cache for a
> > completely different network!
>
> Good point, and agreed. Changed. That said, I wonder if we'll see any
> problems with (offiically unsupported, but somewhat functional) configs
> where multiple IP interfaces are on the same network without being placed
> into an IPMP group.
If you're on the same network and not in an IPMP group, then you
should be receiving the same ARP message multiple times -- once on
each interface. It should just work.
> The second one is required -- and is why we still need gettext() calls.
> But because we're using an .xcl file and "xgettext -a", we don't need to
> do (1), so we can do the gettext() calls right before calling fprintf(),
> rather than littering the call sites that pass in the strings.
OK; it's the use of fprintf() rather than die(). Seems obvious now.
--
James Carlson, Solaris Networking <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677