[ CC'ing clearview-dev ]

 > No I haven't studied the new IPSQ logic fully. In fact I didn't get a 
 > chance to look at the comments I wrote a 2nd time.   If I find any 
 > potential issues this week I will let you know.

Thanks!

My initial responses are below.

 > 1. kstat stuff
 > 
 > Unplumb of the IPMP meta-interface and a kstat read operation could 
 > deadlock ??
 > 
 > T1. unplumb(IPMP meta-interface) ...->ill_glist_delete -> ill_phyint_free ->
 >     ipmp_grp_destroy -> ipmp_grp_destroy_kstats -> kstat_delete_netstack ->
 >     kstat_delete -> kstat_hold_bykid -> kstat_hold
 > 
 > T2. read_kstat_data -> kstat_hold_bykid -> kstat_hold
 >                 <-
 >                 -> ks_update() i.e. ipmp_grp_update_kstats()
 > 
 > Interleaving between T1 and T2 could result in the following sequence.
 > 
 > T1 holds ipst->ips_ill_g_lock as writer in ill_glist_delete and attempts
 > to acquire the kstat ownership in kstat_hold and cv_waits for the current
 > owner in e->e_owner (say T2) to finish.
 > 
 > T2 gets ownership of the kstat in e->e_owner in kstat_hold and
 > continues to ipmp_grp_update_kstats and attempts to acquire
 > ipst->ips_ill_g_lock which is held by T1.
 > 
 > T1 and T2 deadlock.

Yes, this is an issue.  I will look into possible fixes.

 > We end up holding IP layer locks across calls to GLD in the following
 > sequence ipmp_grp_update_kstats -> ipmp_phint_get_kstats ->
 > KSTAT_UPDATE.  This is something to be aware of and be cautious. - at
 > least put in a comment

OK.  It may be unnecessary depending on how the deadlock you mention above
is solved.

 > 2. ip_newroute: L7983/4 should it really be
 > IS_UNDER_IPMP || IS_IPMP ? In all cases we need to select the ipmp interface
 > rather than under. Case of talking to onlink host - routing could have
 > returned the interface ire of the nofailover/under rather than the IPMP
 > if it were ahead in the routing table ?

Why would the routing table return that entry?  All IRE_INTERFACEs for
underlying interfaces are tagged with IRE_MARK_TESTHIDDEN, and are only
available if the caller requests MATCH_IRE_MARK_TESTHIDDEN.  For IPv4,
that only happens if the application has explicitly bound to an underlying
interface (e.g., in.mpathd does an IP_BOUND_IF for each of its underlying
interfaces to send test traffic), via this code in ip_output_options():

send_from_ill:
                if (xmit_ill != NULL) {
                        ipif_t *ipif;

                        /*
                         * Mark this packet as originated locally
                         */
                        mp->b_prev = mp->b_next = NULL;

                        /*
                         * Could be SO_DONTROUTE case also.
                         * Verify that at least one ipif is up on the ill.
                         */
                        if (xmit_ill->ill_ipif_up_count == 0) {
                                ip1dbg(("ip_output: xmit_ill %s is down\n",
                                    xmit_ill->ill_name));
                                goto drop_pkt;
                        }

                        ipif = ipif_get_next_ipif(NULL, xmit_ill);
                        if (ipif == NULL) {
                                ip1dbg(("ip_output: xmit_ill %s NULL ipif\n",
                                    xmit_ill->ill_name));
                                goto drop_pkt;
                        }

                        match_flags = 0;
-->                     if (IS_UNDER_IPMP(xmit_ill))
-->                             match_flags |= MATCH_IRE_MARK_TESTHIDDEN;

 > 3. We create the ill_grp only in I_PLINK of the IPMP ill. However the
 > IPMP ill becomes known globally as soon as the ill_glist_insert happens
 > as part of the SLIFNAME ioctl. There is a small window between the two
 > ioctls and it is not clear if something bad can happen in that
 > window. There is a check in ip_process_ioctl that fail writer
 > operations on the IPMP ill before the PLINK.  But other operations may
 > also end up accessing ill_grp.
 > 
 > a. ND_GET/SET. These are forked off ip_sioctl_copyin_setup.
 > ill_foward-set expects a non-null ill_grp in the case of an IPMP ill.
 > 
 > b. ioctls that don't need IPI_WR. In this case ip_process_ioctl lets them.
 > SIOCGXARP: ip_sioctl_arp() expects a non-null ill_grp in the case of 
 > IPMP ill.
 > 
 > c. ip_opt_add_group_v6 through T_SVR4_OPTMGMT_REQ. In this case can we end
 > up accessing ill_grp in ip_ll_send_enabmulti_req ?
 > 
 > If the above reasoning is right, then I suspect the same holds in the
 > interval between I_PUNLINK and ill_delete.

Your reasoning is right and I've always been a bit uncomfortable with
having ill_grp be NULL after SIOCSLIFNAME has been done for the IPMP
meta-interface.  I think there may be a more robust alternative: we can
always do the create/destroy as part of SIOCSLIFNAME/close, but do the
ipmp_illgrp_{un,}link_grp() in I_PLINK/I_PUNLINK.  This would still give
us the ability to fail the unplumb operation but ensure that there's
always a valid ill_grp pointer.  However, that ill_grp pointer will not be
accessible via the gr_v4/gr_v6 lists until the I_PLINK is done.  I'll give
this a whirl.

Thanks!
-- 
meem

Reply via email to