On 12/30/17 2:32 PM, Sowmini Varadhan wrote:
On (12/30/17 13:37), santosh.shilim...@oracle.com wrote:

[...]

Thats what I thought as well initially but since the reported case,
the rs seems to be valid where as sk seems to be freed up as part of
sock_release callback.

I dont understand the statement above- how can "rs be valid, and sk
be freed"?

rs_sk is embedded in the struct rds_sock, it is not a pointer.

I was going with order of evaluation of if () but you made good point.
rs_sk isn't a pointer so sk can't be null.

let's find and fix the refcount bug. See stack trace in commit comment.
The socket release is happening prematurely and existing WARN_ONs
are not catching it.

Right. This was loop transport in action so xmit will just flip
the direction with receive. And rds_recv_incoming() can race with socket_release. rds_find_bound() is suppose to add ref count on
socket for rds_recv_incoming() but by that time socket is DEAD &
freed by socket release callback.
And rds_release is suppose to be synced with rs_recv_lock. But
release callback is marking the sk orphan before syncing
up with receive path and updating the bind table. Probably it
can pushed down further after the socket clean up buut need
to think bit more.


diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index b405f77..11e1426 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -65,7 +65,6 @@ static int rds_release(struct socket *sock)

        rs = rds_sk_to_rs(sk);

-       sock_orphan(sk);
        /* Note - rds_clear_recv_queue grabs rs_recv_lock, so
         * that ensures the recv path has completed messing
         * with the socket. */
@@ -85,6 +84,7 @@ static int rds_release(struct socket *sock)

        rds_trans_put(rs->rs_transport);

+       sock_orphan(sk);
        sock->sk = NULL;
        sock_put(sk);
 out:

Reply via email to