On Sunday 28 June 2009 23:04, Yossi Etigin wrote:
> Jack Morgenstein wrote:
> >  in ipoib_mcast_leave():
> >          *** NEED TO WAIT HERE BEFORE CONTINUING (so that BUSY is cleared 
> > (mcast->mc is in error),
> >          *** or BUSY flag is set and mcast->mc is a valid, non-NULL pointer 
> > ****
> >          if (test_and_clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags))
> >                ib_sa_free_multicast(mcast->mc);
> >  
> > - Jack
> 
> How about making the leave/free mcast operation take place on the 
> ipoib_workqueue, on which
> the join operation takes place? this way we can avoid this race, and more 
> potential races
> of this kind.
> 
> --Yossi
> 
> 
It may be possible.  The only place where the leave does not take place on the 
ipoib_workqueue
is ipoib_stop (where, of course, the crash occurs).

Maybe we can transfer some of the stop operation to the workqueue?

static int ipoib_stop(struct net_device *dev)
{
        struct ipoib_dev_priv *priv = netdev_priv(dev);

        ipoib_dbg(priv, "stopping interface\n");

        clear_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
        napi_disable(&priv->napi);

        netif_stop_queue(dev);
************* Transfer to work queue starting here ****************
        ipoib_ib_dev_down(dev, 0);
        ipoib_ib_dev_stop(dev, 0);

        if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
                struct ipoib_dev_priv *cpriv;

                /* Bring down any child interfaces too */
                mutex_lock(&priv->vlan_mutex);
                list_for_each_entry(cpriv, &priv->child_intfs, list) {
                        int flags;

                        flags = cpriv->dev->flags;
                        if (!(flags & IFF_UP))
                                continue;

                        dev_change_flags(cpriv->dev, flags & ~IFF_UP);
                }
                mutex_unlock(&priv->vlan_mutex);
        }
****************** transfer to here *******
        return 0;
}

Or, maybe change the order of things as well, as below:

static int ipoib_stop(struct net_device *dev)
{
        struct ipoib_dev_priv *priv = netdev_priv(dev);

        ipoib_dbg(priv, "stopping interface\n");

        clear_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);
        napi_disable(&priv->napi);

        netif_stop_queue(dev);
        if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
                struct ipoib_dev_priv *cpriv;

                /* Bring down any child interfaces too */
                mutex_lock(&priv->vlan_mutex);
                list_for_each_entry(cpriv, &priv->child_intfs, list) {
                        int flags;

                        flags = cpriv->dev->flags;
                        if (!(flags & IFF_UP))
                                continue;

                        dev_change_flags(cpriv->dev, flags & ~IFF_UP);
                }
                mutex_unlock(&priv->vlan_mutex);
        }
************* Transfer to work queue starting here ****************
        ipoib_ib_dev_down(dev, 0);
        ipoib_ib_dev_stop(dev, 0);
****************** transfer to here *******
        return 0;
}

Yossi, Roland, what do you think?
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to