> > At long last, we're ready to start the Clearview IPMP code review.
 > > The review is divided into two phases:
 > > 
 > >     * Phase 1 covers the removal of the existing Nevada IPMP kernel
 > >       implementation.  We're starting this now.
 > > 
 > >     * Phase 2 covers the Clearview IPMP kernel implementation, plus all
 > >       userland changes.  This is the real meat of the project, and will
 > >       start in a couple of weeks.
 > 
 > Cool!  This is a great way to organize a review of a rewrite.  It
 > would have been completely illegible otherwise.  I hope others
 > (including myself ;-}) can learn from this.

Thanks -- and thanks for the thorough review :-)  I've elided parts where
you had no comments.  In general, a number of your comments covered
additional cleanup which I've mostly accepted; see below for more.

A webrev showing the diffs due to your requested changes is at:

        http://cr.opensolaris.org/~meem/clearview-noipmp-cr-diff1

Please let me know if you'd also like a new overall webrev.

 > >     http://cr.opensolaris.org/~meem/clearview-noipmp-cr/
 > 
 > General:
 > 
 >   We probably need a new ARC case to obsolete ipmp_hook_emulation and
 >   let customers know that this is going away.

The removal is covered in PSARC/2007/272, though it was officially
Uncommitted to start with.

 >   IPQoS is probably tougher.  if_groupname is part of the publicly
 >   documented configuration interface for it:
 > 
 >   http://docs.sun.com/app/docs/doc/816-4554/ipqos-reference-1?a=view

Well, I could preserve it if it's an issue, but I would tend to think we
could just notify folks that it's gone, as there appears to be no evidence
anyone has actually used it.  (In general, I would expect the set of IPQoS
selectors to be free to evolve from release to release.)  I will make sure
this issue is covered in the commitment for PSARC/2007/272.

 >   Why does SIOCSIPMPFAILBACK still exist in sys/sockio.h (and truss
 >   and in.mpathd)?  It's gone from the kernel, and I wouldn't expect
 >   the new IPMP to make use of it, so I don't quite see how it's
 >   useful.

In general, I left symbols used by userland around because I didn't want
to drag accompanying userland modifications into this wad (since in
general IPMP userland is not being entirely rewritten).  For instance,
removing SIOCSIPMPFAILBACK means in.mpathd no longer compiles.

 >   Can the ilm discrepancy (ipif for IPv4, but ill for IPv6) go away
 >   now?  I think this was done for IPMP.

No, it predates IPMP considerably and has to do with the userland API for
doing multicast joins.  That said, I hear Erik has cleaned this up in his
IP refactoring wad.

 > usr/src/uts/common/inet/ip.h
 > 
 >   758: this XXX comment appears related to the old IPMP code.

I'm not exactly sure what that comment means, but it was introduced with
IPv6, several years prior to IPMP.

 >   1218: note: will likely need to scan for contract holders on
 >   ipif_t.  (CGTP and Cluster seem possible here.)

I didn't find any contracts for any of the removed fields.  FWIW, we've
got the Clearview IPMP bits up and running on Sun Cluster, so I think
we're good there.

 >   1411: nit: while you're here, a translation of this to English would
 >   be helpful.

This whole comment has been extensively rewritten in the Clearview IPMP
wad; will that suffice? :-)

 >   1438,1456: as long as you're modifying this, changing it to a bona
 >   fide enum (rather than #define-like usage) would be nice.

Hmm, that becomes a bit of a merge mess since this field has moved to
ipx_waitfor in the Clearview IPMP bits (and thus the enum name and the
prefix name I'd add for the enum values would change as well).  So I'd
rather not do this at this point.

 >   1476: unreferenced; remove.

Fixed.

 >   2710: remove.

Unrelated to my changes, but removed.  (All three of these symbols
(MATCH_{V4,V6,ILL}_ONLY) were introduced as orphans with FireEngine.)

 > usr/src/uts/common/inet/ip/igmp.c
 > 
 >   1402-1403: I'm not sure I understand this comment.  This project
 >   makes the main reason to enter the IPSQ here historical (and thus
 >   irrelevant).  What things used by the timer are protected by the
 >   IPSQ?  Is it more than ilm_filter?

It's not entirely historical -- as you state, ilm_filter is used by
igmp_timeout_handler() -> igmp_timeout_handler_per_ill().  I didn't see
any other immediate issues, but also didn't think that reworking this to
try to avoid using the IPSQ was worthwhile given that Erik has an
extensive rework of IP multicast to not use the IPSQ in his wad (which
is built atop my changes).

 > usr/src/uts/common/inet/ip/ip.c
 > 
 >   7537: shouldn't this be ret_ill == NULL for the error case?
 >   Otherwise, this function always returns NULL and sometimes leaks a
 >   reference.

Indeed, this is a typo I introduced recently when cleaning more crud
out of that function.  Thanks for catching that!

 >   7851-7853: still references load spreading, but the code to do that
 >   load spreading (using ipif_rand()) is gone.

Fixed.

 >   7863-7867: as for IPv6, do we ever really need ire_ipif?

Yes, VNI uses it.  Moreover, it's part of the new Clearview IPMP design,
so removing it would introduce more code churn in the Clearview IPMP gate.

 >   7863: as for IPv6, RTF_MULTIRT code seems to have disappeared.

The code is still accounted for: all it did was skip IPMP load spreading,
which is now the common case.

 >   7869: how does this happen?

I've removed it, though it will be reintroduced in the Clearview IPMP gate.

 >   24927: this doesn't look right to me.  The original assert said
 >   "!attach_if || ill_index != 0".  Now, since attach if is going away,
 >   this means that attach_if is always false, and thus !attach_if
 >   should always be true.  If the original assert was right, that means
 >   we don't know anything about ill_index ... doesn't it?

True.  I misread the original ASSERT(), which I now see was trying to say
that if attach_if was being used, you had to specify an ill_index.  I've
removed the ASSERT().

 >   26310: ip_ioctl_finish?

No; that will be done after we're finally able to enter the IPSQ and
perform the operation.  This failure just means we couldn't enter the
IPSQ synchronously.

 > usr/src/uts/common/inet/ip/ip6.c
 > 
 >   101: duplicate of line 110.

Strange; must've crept in during a merge, as I didn't introduce any code
that depends on <sys/ethernet.h>.  Removed.

 >   1845-1847: it's unclear to me what the lock here is protecting,
 >   given that 'srcp' is later dereferenced without the lock held.

Good point; this lock was protecting illgrp_ill_count in the old code.
Removed.  Also removed ips_icmp_redirect_v6_src_index.

 >   4407: inbound load spreading?

Removed.

 >   4415-4418: looks like an old IPMP comment.

Removed.

 >   4420-4424: do we ever need to look at ire_ipif?  When?

As above.

 >   4420: CGTP RTF_MULTIRT code seems to have been nuked.

As above.

 >   4435-4446: I don't think I understand this code.  When is dst_ill !=
 >   ill?  What does it mean?  Does it signal a serious design flaw (in
 >   which case ip0dbg isn't strong enough; assert seems right)?  Or does
 >   it signal a configuration problem (in which case ip0dbg is way too
 >   strong)?
 > 
 >   At a guess, we shouldn't really be doing ip0dbg here.

I'm not sure either, but this isn't tied to the IPMP removal as we could
only get here before when dst_ill->ill_group == NULL.

 >   5479-5486: how can this code be reached?  Won't we panic on line
 >   5478 if dst_ill is NULL?  (I think this was left-over code for
 >   handling ip_newroute_get_dst_ill_v6 failure ... but that's not
 >   called anymore.)

Yes; removed.

 >   5941: this comment doesn't seem to match the code.

Unrelated to my changes, but I think the comment dates back to when this
codepath was only used by multicast, as opposed to being generally used to
send packets on a specific ipif.  It seems like a bug that we don't check
to see whether the packet is multicast.

 >   11398,11400: locks here are probably ineffective.

Agreed; removed.

 > usr/src/uts/common/inet/ip/ip6_if.c
 > 
 >   2347: nit: s/iset/set/

Fixed.

 >   2567: nit: would prefer != and remove 'continue'.

Changed.

 >   2770: not your code, but what's this (char *) cast about?  It
 >   doesn't seem to be needed.

Removed.

 >   3045,3047: these assignments don't do anything; the variables are
 >   used only in the 'bad' branch, and there are no more jumps there
 >   after line 3022.  (These were here because of the group-formation
 >   code, now removed.)

True; ire_added removed.  I'm reluctant to remove the clearing of
ip6_asp_table_held as it seems good to have it reflect the correct state
(that is, ip6_asp_table_held shouldn't be B_TRUE when we don't have a
hold, even if no one later in the code currently examines the field).

 >   3108: this can never be true.

Removed ire_added, as above.

 > usr/src/uts/common/inet/ip/ip6_ire.c
 > 
 >   535-537: is this still true?  Is it possible for ire_ipif->ipif_ill
 >   and ire_stq->q_ptr to go to different ill_t instances?  I thought
 >   that was the case only for IPMP.  (Maybe usesrc/vni is a factor here
 >   ...)

Yes, VNI, as you state.  Also, as I'll need to expand on the comment in
the Clearview IPMP bits, I'd prefer not to rework it more than necessary.

 >   1351-1352,1436-1437,1908-1909: odd that these design constraints
 >   aren't assertions.

Unrelated to my changes -- agreed, but not part of this wad.

 >   1749-1759: this comment is all wet.

Indeed.  Looking back at the history, it seems that prior to the original
IPMP implementation, it simply set MATCH_IRE_ILL.  Then, when IPMP was
introduced, MATCH_IRE_ILL_GROUP was introduced and they decided that it
was good to match across the group as long as the destination's off-link,
and ASSERT()'d this was the case.  Then someone tripped over the RTM_GET
case (4303907) and they added that awkward comment that's now sprinkled
all over the code.  With the removal of the old IPMP implementation, we're
back to just MATCH_IRE_ILL, and thus that comment can be removed and the
original logic restored.

As an aside, Clearview IPMP doesn't have MATCH_IRE_ILL_GROUP.  Instead, it
always matches across the group unless an IRE for test traffic is being
built (identified by MATCH_IRE_MARK_TESTHIDDEN, which is in turn set if
the source address of the packet is a test packet).

 >   2095-2096: should this be an assertion, or maybe just removed?  The
 >   deleted comment said that we should "never be here for onlink."  How
 >   do we get here in a case where we have to match the ill?
 > 
 >   The function is named "ire_ihandle_lookup_offlink_v6" ... shouldn't
 >   the ire be offlink and thus lack ire_ipif?

I'd expect offlink IREs to often have an ire_ipif -- e.g., an IRE_DEFAULT
for a route created with -ifp will have a matching ire_ipif->ipif_ill.

 >   2131-2133: nit: duplicated code; could perhaps do more simply with:
 > 
 >      match_flags &= ~MATCH_IRE_HANDLE & ~MATCH_IRE_MASK;

Unrelated to my changes, but I find the existing code clearer, especially
given the giant comment that separates the two initializations.

 > usr/src/uts/common/inet/ip/ip_ftable.c
 > 
 >   160-161: more odd things that might be assertions.

Unrelated to my changes -- agreed, but not part of this wad.

 >   290-298: comment no longer seems to have much to do with the
 >   adjacent code.  (This is another case where the previous comment
 >   said that we should *not* be doing MATCH_IRE_ILL ... but the new
 >   code uses that flag.  Is this right?)

As per my comments above, I think it's right.  I moved the existing
MATCH_IRE_COMPLETE comment to be closer to the approriate code.

 >   722-723: nit: this does nothing.

Unrelated to my changes, but fixed.

 > usr/src/uts/common/inet/ip/ip_if.c
 > 
 >   192: this doesn't exist.

Unrelated to my changes, but fixed.

 >   475: removing the "unused" IPMP flags means that they can't be
 >   reported in the ndd output.  It's unclear to me whether the flags
 >   are eventually going away completely (and thus the IFF_* ones should
 >   be scratched too) or if this is potentially broken.

They've all been added back in the Clearview IPMP gate.  I admit I wasn't
too uniform in my extermination of these flags, as I knew they'd be back.

 >   1496: nit: still-valid comment removed.

I couldn't decipher what its point was, or why it didn't apply to any of
the other similar cases.  Could you?

 >   1509-1513,1602-1604: is this still true?  It was true with IPMP, as
 >   the previous comment said, but what makes it still true after IPMP
 >   is gone?  (usesrc?)

Right (and Clearview IPMP adds the IPMP case back in, of course).

 >   1515-1517: a simpler check would be just for IRE_MARK_CONDEMNED.
 >   Note that ill_down already walks all the ire_ts before walking the
 >   conn_ts, so we're guaranteed here that if conn_ire_cache is still
 >   non-NULL, and it points to this ill_t, it'll already have been
 >   condemned by ill_downi().

Is there a chance conn_ire_cache() refers to an IRE that isn't in a list?
In that case ill_downi() -> ire_delete() will have done nothing.  I'm also
not sure conn_cleanup_ill() should rely on when it's sequenced relative
to ill_down().  Is the current code that offensive? ;-)

 >   Simpler still: we could just use conn_cleanup_stale_ire.

That internally grabs conn_lock, so we'd need to drop it first.  It also
seems a bit odd to directly invoke a walker callback.

 >   1586: second argument is void *, not char *.

Unrelated to my changes, but fixed.

 >   3875-3876: comment is still stale; there's no "for each" here.

Fixed.

 >   4751-4757: why is this a loop over all phyints?  This should just be
 >   a simple AVL lookup by index now, shouldn't it?

Nice find.  Fixed.

 >   4971-4975: this is "lo0" that we're plumbing here ... how can its
 >   ipsq be merged with anything else?  Isn't there at most one phyint
 >   for lo0?  (Or am I misunderstanding how v4/v6 now works ... ?)

It works just like any other phyint/ill pair, in that the IPv4 phyint/ill
and IPv6 phyint/ill start out separate, and then are merged into a single
phyint with two ills.  When that happens, the IPSQ from the merged-out
phyint needs to be deleted.

 >   13516: "some assembly required"

:-) It seemed less intrusive for the follow-up wad to leave the stubs in
place.

 >   14278: where are the others?  Wasn't ill_move the only culprit?

Fixed. (There's a new one with Clearview IPMP, so I had cheated and
recycled the comment.  But I'll just add the comment in that wad).

 >   15476-15479: no longer relevant; there are no more jumps to 'bad'
 >   past this point.

Fixed.

 >   15489: this assignment doesn't do anything; the flag isn't used
 >   until 'bad', and the last jump to that branch was at line 15469.

Fixed.

 >   15584: this can't be true due to the group formation removal.

Fixed.

 >   15833: small nit: should exclude any IPIF_DEPRECATED interfaces from
 >   ipif_allzones.  They'll be picked up at line 15842 if they're usable
 >   at all, and we should stick with non-deprecated for the allzones
 >   choosing logic.

Unrelated to my changes, but OK.

 >   15854: this doesn't necessarily look good to me, though I might be
 >   missing something.  At one point, we had code here that
 >   intentionally used ipif_rand() to choose among a set of equally
 >   viable ipif_t entries for a source address.  Now we don't, even if
 >   the user creates many of them.

It's true that we used to this even in the non-IPMP case, but it's unclear
what the value was given that all the IP addresses will map to the same
hardware address.  Indeed, S8 FCS and earlier (pre-IPMP and of course
before numerous other bits of complexity tied to address selection, like
zones) just picked the first usable non-deprecated ipif, and for IPv6 I
recall that Erik had concerns about randomizing source address selection
when IPMP was not in-use (though I can't recall the issues).

 >   It looks to me like the assumption here is that the _only_ reason
 >   you'd ever create multiple ipifs on the same subnet is to run IPMP.
 >   Do we know that this is true ... ?

I'm not sure I follow.  Surely, separately from IPMP it can be useful to
have multiple IP addresses on the same subnet and same ill (e.g.,
shared-stack zones).

 > usr/src/uts/common/inet/ip/ip_ire.c
 > 
 >   2058: nit: cast not needed.

Unrelated to my changes, but removed.

 >   4123-4124: similar questions about MATCH_IRE_ILL as with v6;
 >   original comment seemed to say that this couldn't happen.

Same answer as above.

 >   5520-5521: do any of these 'in-lines' still exist?  If not, then
 >   does this need to be done here?

None of the inlines still exist, but I'm not sure what else may have built
assumptions around ire_nce being assigned here, so I'm reluctant to change
it given that there's no apparent harm from doing the initialization here.

 > usr/src/uts/common/inet/ip/ip_ndp.c
 > 
 >   570,586: not your code, but cast not needed.

Fixed.

 >   1454: ipif_state_flags, right?  (Are you fixing this with your wad?)

I wasn't, but I am now :-)  (I'll leave the RE for 6738310 clear for now in
case someone else beats me to it.)

 >   2041-2047,2087: remove this junk; it was part of the group-handling
 >   loop that used to be here.  The passed-in 'ill' pointer is ref-held
 >   by the caller if needed, so none of this code is now necessary.

Good catch; done.  (The check for ill != NULL is also spurious and saves a
level of indenting.)

 >   2755-2770: this code doesn't seem to do much.  Maybe this whole
 >   function can go away.

Unfortunately, these corner cases have to be added back in with Clearview
IPMP, so I'd prefer to leave it there to reduce code churn.

 > usr/src/uts/common/inet/ip_if.h
 > 
 >   100-103,129: unused; could be removed.  IPMP flags are not examined
 >   within the kernel on phyint or ipif anymore.  (I don't much care if
 >   they remain, though it might be nice to have some way of making sure
 >   we don't accidentally start using them again.)

Clearview IPMP reuses those flags, so I've left them behind.

 >   135-139: looks like this can become a boolean now.

Clearview IPMP adds new values (also, I find the enum more readable than a
B_FALSE/B_TRUE argument).

 > usr/src/uts/common/inet/ip_impl.h
 > 
 >   397: nit: while you're here, s/Definitons/Definitions/

Fixed.

 > usr/src/uts/common/inet/ip_ire.h
 > 
 >   97: nit: unused value is 0x0800, not 0x1000.

Fixed.

 > usr/src/uts/common/inet/ip_ndp.h
 > 
 >   144: NCE_F_UNSOL_ADV doesn't seem to be used anymore.

Clearview IPMP reuses the flag, so I've left it behind.

 > usr/src/uts/common/inet/ip_stack.h
 > 
 >   This change will break 'pidentd' and probably other open source
 >   tools that grovel in kernel structures.  We'll probably need to
 >   contact the authors so they know what to do.

The same comment applies to any shifting around of the ip_stack_t, right?
I'm not sure there's much we can do to help them, as we frequently change
the contents of this, but I'll put it on my list to contact them.

 > usr/src/uts/common/inet/tcp/tcp.c
 > 
 >   4975-4978: entire comment is probably uninteresting now.

Removed.

 >   19388: doesn't this leak a reference to conn_outgoing_ill?  The
 >   conn_set_outgoing_ill() function returns a ref-held ill_t there, but
 >   the variable at 19386 seems to go out of scope without the reference
 >   being dropped.

This bug is unrelated to the IPMP removal, but agreed.  Looking further,
there seem to be additional problems here: (a) conn_set_outgoing_ill()
expects a broadcast IRE, but TCP may not give it one, (b) without IPMP
(and even with Clearview IPMP) broadcast IREs will have the right ill
without any of this and (c) the whole logic of conn_set_outgoing_ill() is
needless without IPMP (it's just a wrapper around conn_get_held_ill()).
As such, I've removed conn_set_outgoing_ill() and rewritten the former
call sites.

 > usr/src/uts/common/ipp/ipgpc/classifierddi.c
 > 
 >   455: nit: s/fields/field/

Fixed.

 >   458: nit: s/and if_group//

Fixed (and just when I'd thought I'd caught every misspelling ;-)

-- 
meem

Reply via email to