> > http://cr.opensolaris.org/~meem/clearview-ipmp-cr/
Thanks again. As before, I've elided things that required no further
discussion. A webrev of the diffs is at:
http://cr.opensolaris.org/~meem/clearview-ipmp-cr-diff4
> +++ IPMP ARP support
>
> usr/src/uts/common/inet/arp.h
>
> 193: why +1?
The original (flawed) thinking was that an interface name could be up to
LIFNAMSIZ bytes without a NUL terminator. Fixed.
> usr/src/uts/common/inet/arp/arp.c
>
> 158-162: it might be simpler to use arl->arl_ipmp_arl = arl instead
> of NULL to signal "this is not in a group." (But perhaps not worth
> the effort at this point.)
I'd find that confusing; arl_ipmp_arl should always be an IPMP arl.
> 620: I don't think I understand this logic. Why delete on
> ace_xmit_arl match? Shouldn't that just cause us to spin the
> ar_ipmp_lookup_xmit_arl wheel instead to pick a new output?
>
Because if there's an ace_xmit_arl match in the non-IPMP case, then
ace_arl will also match (no change in behavior), and if there's an
ace_xmit_arl match in the IPMP case, then this is an intentional purge of
the ACE triggered by an unplumb of an IP interface in a group, which IP
controls. For instance, if this is an ACE_F_PERMANENT ACE, then after the
ar_ce_delete_per_arl(), IP will ask ARP to create a new ACE for that IP
address with a different ace_hw_addr. There's no sense in having the old
ACE_F_PERMANENT hanging around with a NULL ace_xmit_arl and a stale
ace_hw_addr.
> If ace_arl matches and ace_xmit_arl is non-NULL, then why not
> promote the latter by doing this?
>
> ace->ace_arl = ace->ace_xmit_arl;
> ace->ace_xmit_arl = NULL;
>
> That would effectively kick the ace back out of the group. You'd
> only need to delete the ace if both ace_arl and ace_xmit_arl fall to
> NULL.
We don't want to kick the ACE out of the group, we want to torch the ACE,
and let IP recreate it on another IP interface in the group if it wants.
Also, FWIW, most ACE consumers assume ace_xmit_arl is always non-NULL.
> The full code might look something like:
>
> if (ace->ace_xmit_arl == arl && (ace->ace_arl->arl_flags & ARL_F_IPMP))
> ace->ace_xmit_arl = ar_ipmp_lookup_xmit_arl(ace->ace_arl,
> ace->ace_hw_addr, ace->ace_hw_addr_length, NULL);
> if ((ace->ace_arl == arl && ace->ace_xmit_arl == arl) ||
> ace->ace_xmit_arl == NULL) {
> ace->ace_flags &= ~ACE_F_PERMANENT;
> ar_ce_delete(ace);
> } else if (ace->ace_arl == arl) {
> ace->ace_arl = ace->ace_xmit_arl;
> }
>
> 631: this could use a different name, as it's really the handler for
> an arl leaving the group.
Do you have something in mind?
> 640-645: the comment describes what the code does, but it's unclear
> why it does this. Shouldn't it be safe to leave all of these aces
> alone? They're already completely committed to the arl that's
> leaving the group. Why delete some of them?
I can't recall why it was written that way. I've changed the code to
leave them be; we'll see if anything happens :-)
> 654-686: if we just did 'ace->ace_arl = arl;', would that be enough?
I don't think it's appropriate to always keep these ACEs -- e.g., if it
was an ACE for a proxy ARP entry on the IPMP interface, we don't want that
ACE to linger behind on the removed interface. (Note that all we're doing
here is ar_ce_delete(); the comments make the code seem more daunting.)
> 870-872: under what conditions do we ever have to update an
> unresolved entry on some arl in a *different* group? It seems to me
> that the correct logic should be just this:
>
> If the resolving ARL is in the same group as a matching ACE's
> ARL, then update the entry. Otherwise, make no updates.
See my comment for 992.
> 944-955: this leaves an existing timing hole open. If we move an
> address from one arl to another, and we get one of our old ARP
> messages echoed back after the switch, we'll misidentify it as being
> a failure. It'd be a bit better if we could check whether the
> source hardware address matches any of ours and just ignore the
> message if so.
So you're suggesting:
if (ace->ace_flags & ACE_F_MYADDR) {
arl_t *hw_arl = as->as_arl_head;
arlphy_t *ap;
for (; hw_arl != NULL; hw_arl = hw_arl->arl_next) {
ap = hw_arl->arl_phy;
if (ap != NULL && ap->ap_hw_addrlen == hlen &&
bcmp(ap->ap_hw_addr, src_haddr, hlen) == 0)
return (AR_LOOPBACK);
}
}
... right? If not, please describe.
> 992: as above, I don't think that updating unresolved entries for
> other groups makes sense.
My rationale was to preserve parity with the non-IPMP case. That is, the
original code updated unresolved ACEs on different ARLs in the non-IPMP
case, and the equivalent behavior in the IPMP case is to update unresolved
ACEs on ARLs in different groups. Could you elaborate on how you see the
two cases as requiring different treatment?
> 1163-1176: the suggested changes to ar_ce_delete_per_arl would make
> this extra walk unnecessary.
Perhaps, but I think you're proposing a different breakdown for the way
ARP/IP manage ACEs (especially ACE_F_PERMANENT ones), and it's not clear
to me that there's an issue with the breakdown I've implemented.
> 2224: I think we assume elsewhere that the string must be less than
> LIFNAMSIZ, don't we? I'm not sure this is needed anyway, as the
> callee just uses this for strcmp, and that's bounded by the arl_t's
> actual name. We can't run off the end.
I've changed it to LIFNAMSIZ-1 to handle your arp.h comment. I prefer
explicitly NUL-terminating the string since ar_ipmp_lookup() shouldn't
know about the implementation of ar_ll_lookup_by_name().
> 3086: why did 'arl' get removed here? It seems like it's handy as
> we dereference ace->ace_arl multiple times.
My concern was that it'd be easy to use `arl' and not realize you were
using ace->ace_arl where you really wanted ace->ace_xmit_arl. I could
reintroduce it and name it ace_arl, but it was unclear to me that it
was useful to save five characters (ace->ace_arl vs. ace_arl).
> 3599: should this be client_arl or something else?
> 3601-3603: I don't see how this code is reachable.
These two issues (combined) are the result of a mismerge with 6691986;
thanks for catching this.
> +++ ipmpstat
>
> usr/src/cmd/cmd-inet/usr.sbin/ipmpstat/ipmpstat.c
>
> 150: should note here that these are bit flags, and not just
> options. (Otherwise, the explicit assignments look strange.)
Added (there used to be more output modes, so it was clearer).
> 194,1288-1304: it seems a shame that the libc version of this
> function isn't good enough.
Indeed, but it isn't (need gettext()).
> 551,733,747,761,778,793,797,802,817,821,826,840,855: should this
> return "?" in buf? (Unclear on '--' versus '?' usage.)
Good point. Basically, we want to print "?" when the value could not be
determined, and "--" when the value is simply not set. Added a new
sfunc_nvwarn() routine which properly sets `buf' to "?" on nvlist errors.
> 567: it doesn't seem like this 'active' buffer is needed. You could
> just write into 'buf' directly.
Yes, but I think the current logic is more symmetric/elegant as-written.
> 732,746,760,777,796,801,820,825,839,854: nvlist_* functions don't
> set errno, so this warn() call won't print the right error.
New sfunc_nvwarn() adds "\n" to warn() call so that it won't call
strerror().
> 793,817: no warning?
I've now split the two cases up (not having an IPMP_PROBE_STATE is an
error; not being in the IPMP_PROBE_ACKED state is not).
> 843,858: what if the value is zero?
If the probe never comes back, in.mpathd passes these values as zero --
but ipmpstat needs to print "--" rather than "0".
> 869-870: not sure what this means. (I think it refers to sysevents
> as 'GPECs' ... but I'm not sure what the comment is trying to say.)
General Purpose Event Channels -- PSARC/2002/321. I'll add the PSARC case
reference for clarity.
> 986: why do we display v4 targets if both are disabled? The comment
> above doesn't seem to explain what's going on here.
Comment has been expanded. Basically, we want to tell the admin that
the mode is "DISABLED" so we just pick one of the two.
> 1029: ofmt_head could be NULL here; does this need a die() check
> like line 1066?
How could it be NULL? It'd require the first field to have an f_width >
IPMPSTAT_NCOL, which would mean deeper problems are afoot.
> 1183: I think we're guaranteed at this point that valwidth < width.
> If it weren't, then line 1177 would have been true.
Changed.
> 1282: why both gettext here and a .xcl file?
Something has to translate "<field>" to its localized value. I could do
it as part of the gettext() on 1284, but that led to an ugly line wrap.
> usr/src/cmd/cmd-inet/usr.sbin/ipmpstat/ipmpstat.xcl
>
> General: not sure I'm wild about this sort of 'exclude' list. It'll
> get stale and grow weird. I know gettext() is irritating, but it
> seems a little less bad to me.
I've tried both approaches, and I prefer the exclude list. The worst that
happens with a stale exclude list is some string are translated and those
translations are never used. On the flipside, gettext() is a constant
eyesore and when you forget it (easy to do), translation doesn't happen.
> usr/src/lib/libipmp/common/ipmp_admin.c
>
> 67: nit: should probably set errno only if result.me_mpathd_error ==
> IPMP_FAILURE.
When would it matter? The value of errno should only be inspected if
me_mpathd_error is IPMP_FAILURE.
> usr/src/lib/libipmp/common/ipmp_query.c
>
> 172,173,199,200: is it possible for these to be non-NULL at this
> point?
I think they'd still contain the pointers from in.mpathd's address space,
since we don't bother clearing them out as part of sending them (e.g., in
send_groupinfo()). These are low-level internal utility routines that can
only be used by libipmp, so I think that's OK.
> 1076: may need a comment here that the lack of a
> ipmp_read*info_lists call in this case means that we don't have to
> bother with freeing anything.
Comment added.
> usr/src/lib/libipmp/common/ipmp_query.h
>
> 136,153,154: won't it_name just duplicate what's already in if_name?
Yes, but it makes the targinfo self-describing, which simplifies the code
in ipmpstat that uses it (otherwise, the sfunc_targ_*() functions would
instead need to take another structure type that had both if_name and a
targinfop so that sfunc_targ_ifname() could work).
--
meem