On 3/27/07, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
> Changed from v3
>       * added a flush_scheduled_work call before we restart the QP in order
>         to ensure that the pkey table we read from the cache is updated


+void ipoib_ib_dev_restart_qp(struct work_struct *work)
+{
+       struct ipoib_dev_priv *priv =
+               container_of(work, struct ipoib_dev_priv, restart_qp_task);
+       /* We only restart the QP in case of pkey change event */
+       ipoib_dbg(priv, "Flushing %s and restarting it's QP\n", 
priv->dev->name);
+       /* Ensures the pkey table we read  from the cache is updated properly */
+       flush_scheduled_work();
+       __ipoib_ib_dev_flush(priv, 1);
+}
+

I think doing flush_scheduled_work from inside the ipoib workqueue
can trigger deadlocks - which deadlocks the workqueue was
created to avoid, in the first place. Look at the comment
in ipoib_main.c where the WQ is created.

       /*
        * We create our own workqueue mainly because we want to be
        * able to flush it when devices are being removed.  We can't
        * use schedule_work()/flush_scheduled_work() because both
        * unregister_netdev() and linkwatch_event take the rtnl lock,
        * so flush_scheduled_work() can deadlock during device
        * removal.
        */


I read that few times and I understand it as : ipoib workqueue was
added because if the default system workqueue was used
unregister_netdev() and linkwatch_event() would deadlock. Am I wrong
at assuming that after we added the ipoib workqueue we can call
flush_scheduled_work ?



And, I don't think that depending on the fact that the cache
uses a default schedule queue internally is such a good idea.

You're definitely right. The coherency enforcement should be inside
the cache implementation. We can move the call to
flush_scheduled_work() inside the ib_find_cached_pkey and
ib_get_cached_pkey. What do you think ?


How about simply requeueing the work again if the cache query failed?

The problem is that it does not fail. It returns a non coherent
result, which just don't reflect the pkey table change.

-- Moni


--
MST

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to