On Tue, Aug 12, 2014 at 4:38 PM, Doug Ledford <dledf...@redhat.com> wrote:
> Commit a9c8ba5884 (IPoIB: Fix usage of uninitialized multicast objects)
> added a new flag MCAST_JOIN_STARTED, but was not very strict in how it
> was used.  We didn't always initialize the completion struct before we
> set the flag, and we didn't always call complete on the completion
> struct from all paths that complete it.  This made it less than totally
> effective, and certainly made its use confusing.  And in the flush
> function we would use the presence of this flag to signal that we should
> wait on the completion struct, but we never cleared this flag, ever.
> This is further muddied by the fact that we overload the MCAST_FLAG_BUSY
> flag to mean two different things: we have a join in flight, and we have
> succeeded in getting an ib_sa_join_multicast.
>

[snip]

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> index 7e9cd39b5ef..f5e8da530d9 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
> @@ -271,16 +271,27 @@ ipoib_mcast_sendonly_join_complete(int status,
>         struct ipoib_mcast *mcast = multicast->context;
>         struct net_device *dev = mcast->dev;
>
> +       /*
> +        * We have to take the mutex to force mcast_sendonly_join to
> +        * return from ib_sa_multicast_join and set mcast->mc to a
> +        * valid value.  Otherwise we were racing with ourselves in
> +        * that we might fail here, but get a valid return from
> +        * ib_sa_multicast_join after we had cleared mcast->mc here,
> +        * resulting in mis-matched joins and leaves and a deadlock
> +        */
> +       mutex_lock(&mcast_mutex);
> +
>         /* We trap for port events ourselves. */
>         if (status == -ENETRESET)
> -               return 0;
> +               goto out;
>

[snip]


> +out:
> +       clear_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags);
> +       if (status)
> +               mcast->mc = NULL;
> +       complete(&mcast->done);
> +       if (status == -ENETRESET)
> +               status = 0;
> +       mutex_unlock(&mcast_mutex);
>         return status;
>  }
>

I have not gone thru the whole patch yet. However, here is something quick ...

Would this "return 0 if (status == -ENETRESET)" be wrong in the
original code ?  Say ..... when it returns to process_group_error(),
without proper error return code, it'll skip ib_sa_free_multicast().
Should we worry about it ?

-- Wendy
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to