[clearview-dev] IPMP ARP fix

Peter Memishian Mon, 8 Dec 2008 00:48:44 -0500

 > > Xiang hit an assertion failure during testing that is basically due to a
 > > Nevada bug where some codepaths call ipif_arp_down() even on a
 > > non-ILLF_XRESOLV IPv6 interface.  However, I think the right fix is to
 > > rename ipif_arp_down() to ipif_resolver_down() and have it work like
 > > ipif_resolver_up(), in the sense that it will simply return if it's called
 > > on an IPv6 interface that's not ILLF_RESOLV.  This simplifies the code a
 > > bit and resolved the bug.  Seb, can you take a look?
 > > 
 > >    http://zhadum.east/ws/clearview/clearview-ipmpdev/webrev
 > 
 > I have no comments.  I also reviewed the other fixes in the webrev.


I've updated the webrev to account for an interesting bug that was found
after fixing the multicast issue you reviewed along with the ARP fix
above.  Specifically, the multicast issue was that for packets sent to
ff02::1, we may end up not using the nominated cast ill.  This happens
because when we build an NCE for a local IPv6 address, we send unsolicited
neighbor adveritsements to ff02::1 and that takes us through ipif_ndp_up()
-> ndp_lookup_then_add_v6() -> ndp_add_v6() -> nce_xmit_advert() ->
nce_xmit() -> ip_output_v6() -> ip_newroute_ipif_v6() on an underlying
interface.  As you saw, I fixed that by having ip_newroute_ipif_v6() use
the nominated cast ill even when on an underlying interface (as long as
it's not sending probe traffic).

Now, the interesting part is that fixing this led to IPv6 DAD coming
unhinged, and us reporting strange messages like:

   node 00:00:00:00:00:00 is using our IP address
   2007:56::f0a3:df91:b7dd:432d on bge2

It turns out this is due to two problems:

        1. The bogus diagnostic message is due to a bug in Nevada,
           covered by 6781883.  I've added this fix to my IPMP wad.

        2. The system thinks there's another node with the same IPv6 address
           because it sees its own unsolicited neighbor advertisements via
           the IPv6 multicast loopback path.  (I'm not sure if this is by
           design or by accident; see the conditions under which
           IP_FF_NO_MCAST_LOOP is set).  Usually these are ignored because
           of this code in ndp_input_advert():

                               /*
                                 * Someone just announced one of our local
                                 * addresses.  If it wasn't us, then this is a
                                 * conflict.  Defend the address or shut it
                                 * down.
                                 */
                                if (dl_mp != NULL &&
                                    (haddr == NULL ||
                                    nce_cmp_ll_addr(dst_nce, haddr,
                                    ill->ill_nd_lla_len))) {
                                        ip_ndp_conflict(ill, mp, dl_mp,
                                            dst_nce);
                                }

           (The above code is from Nevada; there's similar code in the
           Clearview IPMP gate.)  Previously, this code sufficed to filter
           out the unsolicited neighbor advertisements because we always
           sent those advertiments over the same ill that's tied to the
           nce.  With the fix for the multicast issue above, that's no
           longer true and thus we erronously think it's a duplicate.  The
           fix is to check to see if the hardware address in the LLA
           matches any of the IP interfaces in the IPMP group.  One issue
           is that to safely do this, we need to either become writer on
           the IPSQ or hold ill_g_lock as reader.  I've updated the code
           to do the latter, which kinda sucks but it's no worse than the
           Nevada code.

Please have a look at the webrev.  BTW, I've already integrated this fix
into the gate to ensure it'll be in tomorrow's BFU archives, but of course
I'm happy to tweak it further.

--
meem

[clearview-dev] IPMP ARP fix

Reply via email to