On Wed, 2011-02-23 at 08:11 -0800, Hefty, Sean wrote: > Doug Ledford and RedHat reported a crash when running the rdma_cm on a real > time OS. The crash has the following call trace: > > cm_process_work > cma_req_handler > cma_disable_callback > rdma_create_id > kzalloc > init_completion > cma_get_net_info > cma_save_net_info > cma_any_addr > cma_zero_addr > rdma_translate_ip > rdma_copy_addr > cma_acquire_dev > rdma_addr_get_sgid > ib_find_cached_gid > cma_attach_to_dev > ucma_event_handler > kzalloc > ib_copy_ah_attr_to_user > cma_comp > > [ preempted ] > > cma_write > copy_from_user > ucma_destroy_id > copy_from_user > _ucma_find_context > ucma_put_ctx > ucma_free_ctx > rdma_destroy_id > cma_exch > cma_cancel_operation > rdma_node_get_transport > > rt_mutex_slowunlock > bad_area_nosemaphore > oops_enter > > They were able to reproduce the crash multiple times with the following > details: > > Crash seems to always happen on the: > mutex_unlock(&conn_id->handler_mutex); > as conn_id looks to have been freed during this code path. > > An examination of the code shows that a race exists in the request handlers. > When a new connection request is received, the rdma_cm allocates a new > connection identifier. This identifier has a single reference count on it. > If a user calls rdma_destroy_id() from another thread after receiving a > callback, rdma_destroy_id will proceed to destroy the id and free > the associated memory. However, the request handlers may still be in > the process of running. When control returns to the request handlers, > they can attempt to access the newly created identifiers. > > Fix this by holding a reference on the newly created rdma_cm_id until > the request handler is through accessing it. > > Signed-off-by: Sean Hefty <sean.he...@intel.com>
Acked-by: Doug Ledford <dledf...@redhat.com> > --- > This is obviously serious enough to cause a crash, but the bug has also been > there > for a while before anyone detected it. I have no real opinion on whether this > targets 2.6.38 (if that's even possible at this point) or 2.6.39. > > drivers/infiniband/core/cma.c | 15 +++++++++++++++ > 1 files changed, 15 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > index 6884da2..e450c5a 100644 > --- a/drivers/infiniband/core/cma.c > +++ b/drivers/infiniband/core/cma.c > @@ -1210,6 +1210,11 @@ static int cma_req_handler(struct ib_cm_id *cm_id, > struct ib_cm_event *ib_event) > cm_id->context = conn_id; > cm_id->cm_handler = cma_ib_handler; > > + /* > + * Protect against the user destroying conn_id from another thread > + * until we're done accessing it. > + */ > + atomic_inc(&conn_id->refcount); > ret = conn_id->id.event_handler(&conn_id->id, &event); > if (!ret) { > /* > @@ -1222,8 +1227,10 @@ static int cma_req_handler(struct ib_cm_id *cm_id, > struct ib_cm_event *ib_event) > ib_send_cm_mra(cm_id, CMA_CM_MRA_SETTING, NULL, 0); > mutex_unlock(&lock); > mutex_unlock(&conn_id->handler_mutex); > + cma_deref_id(conn_id); > goto out; > } > + cma_deref_id(conn_id); > > /* Destroy the CM ID by returning a non-zero value. */ > conn_id->cm_id.ib = NULL; > @@ -1425,17 +1432,25 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id, > event.param.conn.private_data_len = iw_event->private_data_len; > event.param.conn.initiator_depth = attr.max_qp_init_rd_atom; > event.param.conn.responder_resources = attr.max_qp_rd_atom; > + > + /* > + * Protect against the user destroying conn_id from another thread > + * until we're done accessing it. > + */ > + atomic_inc(&conn_id->refcount); > ret = conn_id->id.event_handler(&conn_id->id, &event); > if (ret) { > /* User wants to destroy the CM ID */ > conn_id->cm_id.iw = NULL; > cma_exch(conn_id, CMA_DESTROYING); > mutex_unlock(&conn_id->handler_mutex); > + cma_deref_id(conn_id); > rdma_destroy_id(&conn_id->id); > goto out; > } > > mutex_unlock(&conn_id->handler_mutex); > + cma_deref_id(conn_id); > > out: > if (dev) > > -- Doug Ledford <dledf...@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
signature.asc
Description: This is a digitally signed message part