[ CC'ing clearview-dev ]
> No I haven't studied the new IPSQ logic fully. In fact I didn't get a
> chance to look at the comments I wrote a 2nd time. If I find any
> potential issues this week I will let you know.
Thanks!
My initial responses are below.
> 1. kstat stuff
>
> Unplumb of the IPMP meta-interface and a kstat read operation could
> deadlock ??
>
> T1. unplumb(IPMP meta-interface) ...->ill_glist_delete -> ill_phyint_free ->
> ipmp_grp_destroy -> ipmp_grp_destroy_kstats -> kstat_delete_netstack ->
> kstat_delete -> kstat_hold_bykid -> kstat_hold
>
> T2. read_kstat_data -> kstat_hold_bykid -> kstat_hold
> <-
> -> ks_update() i.e. ipmp_grp_update_kstats()
>
> Interleaving between T1 and T2 could result in the following sequence.
>
> T1 holds ipst->ips_ill_g_lock as writer in ill_glist_delete and attempts
> to acquire the kstat ownership in kstat_hold and cv_waits for the current
> owner in e->e_owner (say T2) to finish.
>
> T2 gets ownership of the kstat in e->e_owner in kstat_hold and
> continues to ipmp_grp_update_kstats and attempts to acquire
> ipst->ips_ill_g_lock which is held by T1.
>
> T1 and T2 deadlock.
Yes, this is an issue. I will look into possible fixes.
> We end up holding IP layer locks across calls to GLD in the following
> sequence ipmp_grp_update_kstats -> ipmp_phint_get_kstats ->
> KSTAT_UPDATE. This is something to be aware of and be cautious. - at
> least put in a comment
OK. It may be unnecessary depending on how the deadlock you mention above
is solved.
> 2. ip_newroute: L7983/4 should it really be
> IS_UNDER_IPMP || IS_IPMP ? In all cases we need to select the ipmp interface
> rather than under. Case of talking to onlink host - routing could have
> returned the interface ire of the nofailover/under rather than the IPMP
> if it were ahead in the routing table ?
Why would the routing table return that entry? All IRE_INTERFACEs for
underlying interfaces are tagged with IRE_MARK_TESTHIDDEN, and are only
available if the caller requests MATCH_IRE_MARK_TESTHIDDEN. For IPv4,
that only happens if the application has explicitly bound to an underlying
interface (e.g., in.mpathd does an IP_BOUND_IF for each of its underlying
interfaces to send test traffic), via this code in ip_output_options():
send_from_ill:
if (xmit_ill != NULL) {
ipif_t *ipif;
/*
* Mark this packet as originated locally
*/
mp->b_prev = mp->b_next = NULL;
/*
* Could be SO_DONTROUTE case also.
* Verify that at least one ipif is up on the ill.
*/
if (xmit_ill->ill_ipif_up_count == 0) {
ip1dbg(("ip_output: xmit_ill %s is down\n",
xmit_ill->ill_name));
goto drop_pkt;
}
ipif = ipif_get_next_ipif(NULL, xmit_ill);
if (ipif == NULL) {
ip1dbg(("ip_output: xmit_ill %s NULL ipif\n",
xmit_ill->ill_name));
goto drop_pkt;
}
match_flags = 0;
--> if (IS_UNDER_IPMP(xmit_ill))
--> match_flags |= MATCH_IRE_MARK_TESTHIDDEN;
> 3. We create the ill_grp only in I_PLINK of the IPMP ill. However the
> IPMP ill becomes known globally as soon as the ill_glist_insert happens
> as part of the SLIFNAME ioctl. There is a small window between the two
> ioctls and it is not clear if something bad can happen in that
> window. There is a check in ip_process_ioctl that fail writer
> operations on the IPMP ill before the PLINK. But other operations may
> also end up accessing ill_grp.
>
> a. ND_GET/SET. These are forked off ip_sioctl_copyin_setup.
> ill_foward-set expects a non-null ill_grp in the case of an IPMP ill.
>
> b. ioctls that don't need IPI_WR. In this case ip_process_ioctl lets them.
> SIOCGXARP: ip_sioctl_arp() expects a non-null ill_grp in the case of
> IPMP ill.
>
> c. ip_opt_add_group_v6 through T_SVR4_OPTMGMT_REQ. In this case can we end
> up accessing ill_grp in ip_ll_send_enabmulti_req ?
>
> If the above reasoning is right, then I suspect the same holds in the
> interval between I_PUNLINK and ill_delete.
Your reasoning is right and I've always been a bit uncomfortable with
having ill_grp be NULL after SIOCSLIFNAME has been done for the IPMP
meta-interface. I think there may be a more robust alternative: we can
always do the create/destroy as part of SIOCSLIFNAME/close, but do the
ipmp_illgrp_{un,}link_grp() in I_PLINK/I_PUNLINK. This would still give
us the ability to fail the unplumb operation but ensure that there's
always a valid ill_grp pointer. However, that ill_grp pointer will not be
accessible via the gr_v4/gr_v6 lists until the I_PLINK is done. I'll give
this a whirl.
Thanks!
--
meem