Updated responses follow.

  > 1. kstat stuff
  > 
  > Unplumb of the IPMP meta-interface and a kstat read operation could 
  > deadlock ??
  > 
  > T1. unplumb(IPMP meta-interface) ...->ill_glist_delete -> ill_phyint_free ->
  >     ipmp_grp_destroy -> ipmp_grp_destroy_kstats -> kstat_delete_netstack ->
  >     kstat_delete -> kstat_hold_bykid -> kstat_hold
  > 
  > T2. read_kstat_data -> kstat_hold_bykid -> kstat_hold
  >                 <-
  >                 -> ks_update() i.e. ipmp_grp_update_kstats()
  > 
  > Interleaving between T1 and T2 could result in the following sequence.
  > 
  > T1 holds ipst->ips_ill_g_lock as writer in ill_glist_delete and attempts
  > to acquire the kstat ownership in kstat_hold and cv_waits for the current
  > owner in e->e_owner (say T2) to finish.
  > 
  > T2 gets ownership of the kstat in e->e_owner in kstat_hold and
  > continues to ipmp_grp_update_kstats and attempts to acquire
  > ipst->ips_ill_g_lock which is held by T1.
  > 
  > T1 and T2 deadlock.

One core problem here is that we're holding ill_g_lock when we're tearing
down the IPMP group in ill_phyint_free().  The IPMP code has no need for
ill_g_lock to be held, and in fact it will cause other problems -- e.g.
ipmp_phyint_leave_grp() (also called from ill_phyint_free()) may generate
routing socket messages, which we will putnext() with ill_g_lock held,
which also isn't good.

Looking at this more, the root of the bug is that ill_phyint_free()
doesn't just free the phyint, it also pulls the phyint out of the global
AVL lists and NULLs out various bits of linkage, and *then* does a bunch
of free-related stuff.  I propose we change ill_phyint_free() to *just* do
the free-related stuff, which does not need ill_g_lock.  We can then
either inline the linkage logic into ill_glist_delete() (its one caller),
or create a new ill_phyint_unlink() function to do that part.  I suspect
inlining will be better because it makes the locking more obvious.

 >  > We end up holding IP layer locks across calls to GLD in the following
 >  > sequence ipmp_grp_update_kstats -> ipmp_phint_get_kstats ->
 >  > KSTAT_UPDATE.  This is something to be aware of and be cautious. - at
 >  > least put in a comment

Indeed -- and even with the fix above, holding this lock across layers
still makes me uncomfortable.  One possible solution would be to build up
a dynamically-allocated list of phyints (and refhold an ill on each one so
it can't go away) with ill_g_lock held, then drop ill_g_lock and walk that
list when calling down to GLDv3.

 >  > 3. We create the ill_grp only in I_PLINK of the IPMP ill. However the
 >  > IPMP ill becomes known globally as soon as the ill_glist_insert happens
 >  > as part of the SLIFNAME ioctl. There is a small window between the two
 >  > ioctls and it is not clear if something bad can happen in that
 >  > window. There is a check in ip_process_ioctl that fail writer
 >  > operations on the IPMP ill before the PLINK.  But other operations may
 >  > also end up accessing ill_grp.
 >  > 
 >  > a. ND_GET/SET. These are forked off ip_sioctl_copyin_setup.
 >  > ill_foward-set expects a non-null ill_grp in the case of an IPMP ill.
 >  > 
 >  > b. ioctls that don't need IPI_WR. In this case ip_process_ioctl lets them.
 >  > SIOCGXARP: ip_sioctl_arp() expects a non-null ill_grp in the case of 
 >  > IPMP ill.
 >  > 
 >  > c. ip_opt_add_group_v6 through T_SVR4_OPTMGMT_REQ. In this case can we end
 >  > up accessing ill_grp in ip_ll_send_enabmulti_req ?
 >  > 
 >  > If the above reasoning is right, then I suspect the same holds in the
 >  > interval between I_PUNLINK and ill_delete.

I've coded up a fix for this.  Basically, ipmp_illgrp_create() is now
called in ipif_allocate() (right after we set PHYI_IPMP, which is what
IS_IPMP() tests), and ipmp_illgrp_destroy() is now called from
ill_delete_tail(), right before we call ill_glist_delete().  I think this
will eliminate this class of bugs, and still preserve the benefit of
failing an "unplumb" of an active group in I_PUNLINK.  Please have a look:

        http://cr.opensolaris.org/~meem/clearview-ipmp-illgrp/

-- 
meem

Reply via email to