Peter Memishian writes:
> A webrev showing the diffs due to your requested changes is at:
> 
>       http://cr.opensolaris.org/~meem/clearview-noipmp-cr-diff1

The changes look good.

> Please let me know if you'd also like a new overall webrev.

No, I have plenty of work.  Thanks for the offer, though.  ;-}

>  >   We probably need a new ARC case to obsolete ipmp_hook_emulation and
>  >   let customers know that this is going away.
> 
> The removal is covered in PSARC/2007/272, though it was officially
> Uncommitted to start with.

OK.

>  >   IPQoS is probably tougher.  if_groupname is part of the publicly
>  >   documented configuration interface for it:
>  > 
>  >   http://docs.sun.com/app/docs/doc/816-4554/ipqos-reference-1?a=view
> 
> Well, I could preserve it if it's an issue, but I would tend to think we
> could just notify folks that it's gone, as there appears to be no evidence
> anyone has actually used it.  (In general, I would expect the set of IPQoS
> selectors to be free to evolve from release to release.)

In the possibly unusual case that someone uses both IPMP and IPQoS, I
wouldn't expect that previously-valid configuration files would
suddenly become invalid and fail to load on an upgrade.  Having new
selectors and features show up is unsurprising from a user's point of
view, but having things break is not expected.

I agree that IPQoS is and was a questionable feature at best with
dubious design and feature set, and I'm as anxious as any to have it
removed (I obviously didn't want it there at all in the first place),
but I think the administrative interface piece needs to be handled in
the same way as any other feature removal.

>  I will make sure
> this issue is covered in the commitment for PSARC/2007/272.

OK.

>  >   Why does SIOCSIPMPFAILBACK still exist in sys/sockio.h (and truss
>  >   and in.mpathd)?  It's gone from the kernel, and I wouldn't expect
>  >   the new IPMP to make use of it, so I don't quite see how it's
>  >   useful.
> 
> In general, I left symbols used by userland around because I didn't want
> to drag accompanying userland modifications into this wad (since in
> general IPMP userland is not being entirely rewritten).  For instance,
> removing SIOCSIPMPFAILBACK means in.mpathd no longer compiles.

OK; I see.

I think it might have been been better to include some of the user
space removals as well.  I assume that the above ioctl does in fact go
away with the final version of code, right?

>  >   Can the ilm discrepancy (ipif for IPv4, but ill for IPv6) go away
>  >   now?  I think this was done for IPMP.
> 
> No, it predates IPMP considerably and has to do with the userland API for
> doing multicast joins.  That said, I hear Erik has cleaned this up in his
> IP refactoring wad.

OK.

>  > usr/src/uts/common/inet/ip.h
>  > 
>  >   758: this XXX comment appears related to the old IPMP code.
> 
> I'm not exactly sure what that comment means, but it was introduced with
> IPv6, several years prior to IPMP.

Hmm.  OK.  It looked to me like it was related to the ifindex logic
that you just removed from the multicast code.  If not, then that's
fine, though I think the comment is probably a bit terse.

>  >   1218: note: will likely need to scan for contract holders on
>  >   ipif_t.  (CGTP and Cluster seem possible here.)
> 
> I didn't find any contracts for any of the removed fields.

It's not just the fields themselves, but the offset of any fields that
follow.  There are contracts that include access to ipif_t -- see, for
example, 'contract-cgtp' in PSARC 2000/539.

There well may be no issue here, but we're supposed to notify.

>  FWIW, we've
> got the Clearview IPMP bits up and running on Sun Cluster, so I think
> we're good there.

Good to hear.

>  >   1411: nit: while you're here, a translation of this to English would
>  >   be helpful.
> 
> This whole comment has been extensively rewritten in the Clearview IPMP
> wad; will that suffice? :-)

Yes.  ;-}

>  >   1438,1456: as long as you're modifying this, changing it to a bona
>  >   fide enum (rather than #define-like usage) would be nice.
> 
> Hmm, that becomes a bit of a merge mess since this field has moved to
> ipx_waitfor in the Clearview IPMP bits (and thus the enum name and the
> prefix name I'd add for the enum values would change as well).  So I'd
> rather not do this at this point.

OK.

>  >   2710: remove.
> 
> Unrelated to my changes, but removed.  (All three of these symbols
> (MATCH_{V4,V6,ILL}_ONLY) were introduced as orphans with FireEngine.)

Cool.  ;-}

>  > usr/src/uts/common/inet/ip/igmp.c
>  > 
>  >   1402-1403: I'm not sure I understand this comment.  This project
>  >   makes the main reason to enter the IPSQ here historical (and thus
>  >   irrelevant).  What things used by the timer are protected by the
>  >   IPSQ?  Is it more than ilm_filter?
> 
> It's not entirely historical -- as you state, ilm_filter is used by
> igmp_timeout_handler() -> igmp_timeout_handler_per_ill().  I didn't see
> any other immediate issues, but also didn't think that reworking this to
> try to avoid using the IPSQ was worthwhile given that Erik has an
> extensive rework of IP multicast to not use the IPSQ in his wad (which
> is built atop my changes).

Oh, ok.  I didn't know about that.

>  > usr/src/uts/common/inet/ip/ip.c
[...]
>  >   7863-7867: as for IPv6, do we ever really need ire_ipif?
> 
> Yes, VNI uses it.  Moreover, it's part of the new Clearview IPMP design,
> so removing it would introduce more code churn in the Clearview IPMP gate.

OK ... I guess that's one of the few hassles with reviewing a separate
removal and then rewrite.

>  >   7863: as for IPv6, RTF_MULTIRT code seems to have disappeared.
> 
> The code is still accounted for: all it did was skip IPMP load spreading,
> which is now the common case.

Oh, ok.  I see it now; thanks.

>  >   24927: this doesn't look right to me.  The original assert said
>  >   "!attach_if || ill_index != 0".  Now, since attach if is going away,
>  >   this means that attach_if is always false, and thus !attach_if
>  >   should always be true.  If the original assert was right, that means
>  >   we don't know anything about ill_index ... doesn't it?
> 
> True.  I misread the original ASSERT(), which I now see was trying to say
> that if attach_if was being used, you had to specify an ill_index.  I've
> removed the ASSERT().

OK.

>  >   26310: ip_ioctl_finish?
> 
> No; that will be done after we're finally able to enter the IPSQ and
> perform the operation.  This failure just means we couldn't enter the
> IPSQ synchronously.

Right; I missed the queuing here.

>  > usr/src/uts/common/inet/ip/ip6.c
[...]
>  >   4435-4446: I don't think I understand this code.  When is dst_ill !=
>  >   ill?  What does it mean?  Does it signal a serious design flaw (in
>  >   which case ip0dbg isn't strong enough; assert seems right)?  Or does
>  >   it signal a configuration problem (in which case ip0dbg is way too
>  >   strong)?
>  > 
>  >   At a guess, we shouldn't really be doing ip0dbg here.
> 
> I'm not sure either, but this isn't tied to the IPMP removal as we could
> only get here before when dst_ill->ill_group == NULL.

I'm still not terribly happy with log-spamming ip0dbg here, but ok.
The reason I noted it was that having two ills 'unexpectedly' is sort
of the hallmark of IPMP, so this looked like it may have been related
code (even if it was in the ill_group NULL case).

The functionality just isn't clear to me.

>  >   5941: this comment doesn't seem to match the code.
> 
> Unrelated to my changes, but I think the comment dates back to when this
> codepath was only used by multicast, as opposed to being generally used to
> send packets on a specific ipif.  It seems like a bug that we don't check
> to see whether the packet is multicast.

OK.

>  > usr/src/uts/common/inet/ip/ip6_if.c
[...]
>  >   3045,3047: these assignments don't do anything; the variables are
>  >   used only in the 'bad' branch, and there are no more jumps there
>  >   after line 3022.  (These were here because of the group-formation
>  >   code, now removed.)
> 
> True; ire_added removed.  I'm reluctant to remove the clearing of
> ip6_asp_table_held as it seems good to have it reflect the correct state
> (that is, ip6_asp_table_held shouldn't be B_TRUE when we don't have a
> hold, even if no one later in the code currently examines the field).

OK; I guess I don't feel strongly about that one.

>  > usr/src/uts/common/inet/ip/ip6_ire.c
>  > 
>  >   535-537: is this still true?  Is it possible for ire_ipif->ipif_ill
>  >   and ire_stq->q_ptr to go to different ill_t instances?  I thought
>  >   that was the case only for IPMP.  (Maybe usesrc/vni is a factor here
>  >   ...)
> 
> Yes, VNI, as you state.  Also, as I'll need to expand on the comment in
> the Clearview IPMP bits, I'd prefer not to rework it more than necessary.

OK.

>  >   1749-1759: this comment is all wet.
> 
> Indeed.  Looking back at the history, it seems that prior to the original
> IPMP implementation, it simply set MATCH_IRE_ILL.  Then, when IPMP was
> introduced, MATCH_IRE_ILL_GROUP was introduced and they decided that it
> was good to match across the group as long as the destination's off-link,
> and ASSERT()'d this was the case.  Then someone tripped over the RTM_GET
> case (4303907) and they added that awkward comment that's now sprinkled
> all over the code.  With the removal of the old IPMP implementation, we're
> back to just MATCH_IRE_ILL, and thus that comment can be removed and the
> original logic restored.

OK; sounds good.

> As an aside, Clearview IPMP doesn't have MATCH_IRE_ILL_GROUP.  Instead, it
> always matches across the group unless an IRE for test traffic is being
> built (identified by MATCH_IRE_MARK_TESTHIDDEN, which is in turn set if
> the source address of the packet is a test packet).

OK ... I'll be interested to see this.

>  >   2095-2096: should this be an assertion, or maybe just removed?  The
>  >   deleted comment said that we should "never be here for onlink."  How
>  >   do we get here in a case where we have to match the ill?
>  > 
>  >   The function is named "ire_ihandle_lookup_offlink_v6" ... shouldn't
>  >   the ire be offlink and thus lack ire_ipif?
> 
> I'd expect offlink IREs to often have an ire_ipif -- e.g., an IRE_DEFAULT
> for a route created with -ifp will have a matching ire_ipif->ipif_ill.

I guess the basic question here is: why do we need to test ire_ipif?
It looks like we're painting around the assertions that the subsequent
lookup ends up making (if you set MATCH_IRE_ILL, you have to pass in a
non-NULL ill_t pointer), but unless these two cases (with and without
ire_ipif set) actually _mean_ something, it just looks sloppy to me.

Why bother asserting inside the lookup functions that ill_t isn't NULL
if the callers just paint around that check by turning off the flag
when it is?

>  >   2131-2133: nit: duplicated code; could perhaps do more simply with:
>  > 
>  >    match_flags &= ~MATCH_IRE_HANDLE & ~MATCH_IRE_MASK;
> 
> Unrelated to my changes, but I find the existing code clearer, especially
> given the giant comment that separates the two initializations.

OK.

>  > usr/src/uts/common/inet/ip/ip_ftable.c
>  > 
>  >   160-161: more odd things that might be assertions.
> 
> Unrelated to my changes -- agreed, but not part of this wad.

OK.

>  >   290-298: comment no longer seems to have much to do with the
>  >   adjacent code.  (This is another case where the previous comment
>  >   said that we should *not* be doing MATCH_IRE_ILL ... but the new
>  >   code uses that flag.  Is this right?)
> 
> As per my comments above, I think it's right.  I moved the existing
> MATCH_IRE_COMPLETE comment to be closer to the approriate code.

OK; thanks.

>  > usr/src/uts/common/inet/ip/ip_if.c
[...]
>  >   475: removing the "unused" IPMP flags means that they can't be
>  >   reported in the ndd output.  It's unclear to me whether the flags
>  >   are eventually going away completely (and thus the IFF_* ones should
>  >   be scratched too) or if this is potentially broken.
> 
> They've all been added back in the Clearview IPMP gate.  I admit I wasn't
> too uniform in my extermination of these flags, as I knew they'd be back.

OK.

>  >   1496: nit: still-valid comment removed.
> 
> I couldn't decipher what its point was, or why it didn't apply to any of
> the other similar cases.  Could you?

It's saying that we're going to force the socket to fall back to "late
binding" -- picking an output ill_t when the user transmits -- even
though the user explicitly specified a particular interface to use.
It's a silent failure case.

I guess I don't care very much, but I was a little surprised to see
the comment go.

>  >   1515-1517: a simpler check would be just for IRE_MARK_CONDEMNED.
>  >   Note that ill_down already walks all the ire_ts before walking the
>  >   conn_ts, so we're guaranteed here that if conn_ire_cache is still
>  >   non-NULL, and it points to this ill_t, it'll already have been
>  >   condemned by ill_downi().
> 
> Is there a chance conn_ire_cache() refers to an IRE that isn't in a list?

That'd stink.  ;-}

If there are ire_ts that aren't in lists that still refer to a given
ill_t that's being nuked, then how do we know that this is the *only*
place that can point to that ire_t?  How would we know that we're not
missing other non-list ire_ts?

And how'd such an ire_t ever get here?  The assignments to
conn_ire_cache (there are 5 of them) are _all_ using an 'ire' pointer
returned from ire_cache_lookup_v6(), ire_cache_lookup() or
ire_ctable_lookup_v6().  We go out of our way in tcp_adapt_ire() to
make sure that we never place a stray ire_t in conn_ire_cache (see the
"ire_cacheable" flag).

Grumble ... ok ... if you want to, then leave it.  But I still think
this can be safely simplified.

> In that case ill_downi() -> ire_delete() will have done nothing.  I'm also
> not sure conn_cleanup_ill() should rely on when it's sequenced relative
> to ill_down().  Is the current code that offensive? ;-)

No, not really.  It's just unclear why it needs to work like this.

>  >   Simpler still: we could just use conn_cleanup_stale_ire.
> 
> That internally grabs conn_lock, so we'd need to drop it first.  It also
> seems a bit odd to directly invoke a walker callback.

True; ok.

>  >   4971-4975: this is "lo0" that we're plumbing here ... how can its
>  >   ipsq be merged with anything else?  Isn't there at most one phyint
>  >   for lo0?  (Or am I misunderstanding how v4/v6 now works ... ?)
> 
> It works just like any other phyint/ill pair, in that the IPv4 phyint/ill
> and IPv6 phyint/ill start out separate, and then are merged into a single
> phyint with two ills.  When that happens, the IPSQ from the merged-out
> phyint needs to be deleted.

Oh, ok.  Yes, I guess that makes sense to me now.

>  >   13516: "some assembly required"
> 
> :-) It seemed less intrusive for the follow-up wad to leave the stubs in
> place.

OK.

>  >   15854: this doesn't necessarily look good to me, though I might be
>  >   missing something.  At one point, we had code here that
>  >   intentionally used ipif_rand() to choose among a set of equally
>  >   viable ipif_t entries for a source address.  Now we don't, even if
>  >   the user creates many of them.
> 
> It's true that we used to this even in the non-IPMP case, but it's unclear
> what the value was given that all the IP addresses will map to the same
> hardware address.

True; I keep forgetting that we're losing the ill groups.

>  Indeed, S8 FCS and earlier (pre-IPMP and of course
> before numerous other bits of complexity tied to address selection, like
> zones) just picked the first usable non-deprecated ipif, and for IPv6 I
> recall that Erik had concerns about randomizing source address selection
> when IPMP was not in-use (though I can't recall the issues).

OK.  I guess it doesn't matter at this level.

>  >   It looks to me like the assumption here is that the _only_ reason
>  >   you'd ever create multiple ipifs on the same subnet is to run IPMP.
>  >   Do we know that this is true ... ?
> 
> I'm not sure I follow.  Surely, separately from IPMP it can be useful to
> have multiple IP addresses on the same subnet and same ill (e.g.,
> shared-stack zones).

I'm not talking about cases like that, because we'd filter out on
zoneid first.  I was talking only about the "regular" case -- no IPMP,
no Zones, nothing special like that; just two interfaces configured
with the same subnet.

The new code seems to assume that either this configuration is never
used or that if it is used, then the first ipif_t we encounter is
always sufficient.

>  > usr/src/uts/common/inet/ip/ip_ire.c
[...]
>  >   5520-5521: do any of these 'in-lines' still exist?  If not, then
>  >   does this need to be done here?
> 
> None of the inlines still exist, but I'm not sure what else may have built
> assumptions around ire_nce being assigned here, so I'm reluctant to change
> it given that there's no apparent harm from doing the initialization here.

OK.  Then it's just a comment on the comment.

>  > usr/src/uts/common/inet/ip/ip_ndp.c
[...]
>  >   1454: ipif_state_flags, right?  (Are you fixing this with your wad?)
> 
> I wasn't, but I am now :-)  (I'll leave the RE for 6738310 clear for now in
> case someone else beats me to it.)

OK.

>  >   2755-2770: this code doesn't seem to do much.  Maybe this whole
>  >   function can go away.
> 
> Unfortunately, these corner cases have to be added back in with Clearview
> IPMP, so I'd prefer to leave it there to reduce code churn.

Really?  OK.  The only thing I see here is ip6i cruft that appeared to
be related to the old attach_ill bits, and I though that was mostly
being nuked.

>  > usr/src/uts/common/inet/ip_if.h
>  > 
>  >   100-103,129: unused; could be removed.  IPMP flags are not examined
>  >   within the kernel on phyint or ipif anymore.  (I don't much care if
>  >   they remain, though it might be nice to have some way of making sure
>  >   we don't accidentally start using them again.)
> 
> Clearview IPMP reuses those flags, so I've left them behind.

OK.

>  >   135-139: looks like this can become a boolean now.
> 
> Clearview IPMP adds new values (also, I find the enum more readable than a
> B_FALSE/B_TRUE argument).

OK.

>  > usr/src/uts/common/inet/ip_ndp.h
>  > 
>  >   144: NCE_F_UNSOL_ADV doesn't seem to be used anymore.
> 
> Clearview IPMP reuses the flag, so I've left it behind.

OK.  I guess it was just an eye test.  ;-}

>  > usr/src/uts/common/inet/ip_stack.h
>  > 
>  >   This change will break 'pidentd' and probably other open source
>  >   tools that grovel in kernel structures.  We'll probably need to
>  >   contact the authors so they know what to do.
> 
> The same comment applies to any shifting around of the ip_stack_t, right?

Yes.

> I'm not sure there's much we can do to help them, as we frequently change
> the contents of this, but I'll put it on my list to contact them.

OK; thanks.

>  > usr/src/uts/common/inet/tcp/tcp.c
[...]
>  >   19388: doesn't this leak a reference to conn_outgoing_ill?  The
>  >   conn_set_outgoing_ill() function returns a ref-held ill_t there, but
>  >   the variable at 19386 seems to go out of scope without the reference
>  >   being dropped.
> 
> This bug is unrelated to the IPMP removal, but agreed.  Looking further,
> there seem to be additional problems here: (a) conn_set_outgoing_ill()
> expects a broadcast IRE, but TCP may not give it one, (b) without IPMP
> (and even with Clearview IPMP) broadcast IREs will have the right ill
> without any of this and (c) the whole logic of conn_set_outgoing_ill() is
> needless without IPMP (it's just a wrapper around conn_get_held_ill()).
> As such, I've removed conn_set_outgoing_ill() and rewritten the former
> call sites.

Thanks!  Looks much better to me.

>  > usr/src/uts/common/ipp/ipgpc/classifierddi.c
[...]
>  >   458: nit: s/and if_group//
> 
> Fixed (and just when I'd thought I'd caught every misspelling ;-)

Might need to leave one behind for posterity.  ;-}

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Reply via email to