Hi gang,

We're chasing some bugs in RDS.  In trying to explore possible causes I found
that I don't really understand the sequence of events needed to safely tear
down a cm_id.

I'm worried that we have cm event callbacks being processed in the ib_cm thread
racing with our krds thread which is tearing down the cm_id.

If you get rid of unrelated rds-specific teardown and error checking, our cm_id
teardown (rds_ib_conn_shutdown()) simplifies down to:

                rdma_disconnect(cm_id);
                rdma_destroy_qp(cm_id);
                rdma_destroy_id(cm_id);

We blow through all of those without waiting for anything specifically CM
related.  We could wait for some send and receive work completions, sure, but
we might not if the universe aligns just right and everything is idle when we
tear things down.

Is it safe to blow through those calls while the ib_cm thread might be
processing a callback for the given cm_id?   Should we be serializing the two
specifically?

I think this code was initially written under the assumption that
rmda_disconnect() would only return once all CM callbacks completed.  I'm
worried that rdma_disconnect() doesn't work that way.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to