Re: [PATCH 2/8] IPoIB: Make the carrier_on_task race aware

Doug Ledford Tue, 09 Sep 2014 00:18:08 -0700

On 09/04/2014 08:13 AM, Erez Shitrit wrote:

We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task, the
flush_workqueue waits on this thread too.  But this thread is deadlocked
on the rtnl lock.  The better thing here is to use trylock and loop on
that until we either get the lock or we see that FLAG_ADMIN_UP has
been cleared, in which case we don't need to do anything anyway and we
just return.


Signed-off-by: Doug Ledford <dledf...@redhat.com>
---
  drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 21
+++++++++++++++------
  1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index a0a42859f12..7e9cd39b5ef 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -353,18 +353,27 @@ void ipoib_mcast_carrier_on_task(struct
work_struct *work)
                             carrier_on_task);
      struct ib_port_attr attr;
-    /*
-     * Take rtnl_lock to avoid racing with ipoib_stop() and
-     * turning the carrier back on while a device is being
-     * removed.
-     */
      if (ib_query_port(priv->ca, priv->port, &attr) ||
          attr.state != IB_PORT_ACTIVE) {
          ipoib_dbg(priv, "Keeping carrier off until IB port is
active\n");
          return;
      }
-    rtnl_lock();
+    /*
+     * Take rtnl_lock to avoid racing with ipoib_stop() and
+     * turning the carrier back on while a device is being
+     * removed.  However, ipoib_stop() will attempt to flush
+     * the workqueue while holding the rtnl lock, so loop
+     * on trylock until either we get the lock or we see
+     * FLAG_ADMIN_UP go away as that signals that we are bailing
+     * and can safely ignore the carrier on work
+     */
+    while (!rtnl_trylock()) {
+        if (!test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
+            return;


I think that this check shouldn't be connected to the rtnl_lock, if
IPOIB_FLAG_ADMIN_UP is clear no need to take the carrier on, just wait
to the next loop when ipoib_open will be called.

Actually, I was optimizing for the common case. The common case is thatsomeone is *not* running a while true; do ifup/ifdown; done torturetest. In that situation, you will get the rtnl lock, which you have tohave before doing to two config items below, and so you can totally skipthe check for FLAG_ADMIN_UP because A) the code that drops this flaggets the rtnl lock first and B) that same code flushes this workqueuewhile holding the lock, so if we ever get the lock, we already know thestate of the ADMIN_UP flag. This is all moot though in the sense thatonce the devices are switched to per device work queues, the need to dothis loop goes away and we can simplify the code. This was only neededbecause our flush task and carrier on task shared a work queue and wecould deadlock without this little loop.

+        else
+            msleep(20);
+    }
      if (!ipoib_cm_admin_enabled(priv->dev))
          dev_set_mtu(priv->dev, min(priv->mcast_mtu, priv->admin_mtu));
      netif_carrier_on(priv->dev);


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/8] IPoIB: Make the carrier_on_task race aware

Reply via email to