Patch set of bug fixes as a result of scale-out testing on 128 nodes/1538 cores.

1/9 scm: remove modify QP to ERR state during disconnect on UD type QP
2/9 ucm: increase default UCM retry count for connect reply to 15
3/9 cma, ucm: cleanup issues with dat_ep_free on a connected EP without 
disconnecting.
4/9 ucm: UD mode, active side cm object released to soon, the RTU could be lost.
5/9 scm: SOCKOPT ERR Connection timed out on large clusters
6/9 scm: cr_thread occasionally segv's when disconnecting all-to-all MPI static 
connections
7/9 scm: add option to use other network devices with environment variable 
DAPL_SCM_NETDEV
8/9 scm, cma: fini code can be called multiple times and hang via fork
9/9 scm: check for hca object before signaling thread

The disconnect on a UD type QP should not modify QP to error
since this is a shared QP. The disconnect should be treated
as a NOP on the UD type QP and only be transitioned during
the QP destroy (dat_ep_free).

Signed-off-by: Arlin Davis <arlin.r.da...@intel.com>
---
 dapl/openib_scm/cm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c
index afd0d93..7465190 100644
--- a/dapl/openib_scm/cm.c
+++ b/dapl/openib_scm/cm.c
@@ -458,13 +458,13 @@ DAT_RETURN dapli_socket_disconnect(dp_ib_cm_handle_t 
cm_ptr)
        dapl_os_unlock(&cm_ptr->lock);
        
        /* send disc date, close socket, schedule destroy */
-       dapl_os_lock(&cm_ptr->ep->header.lock);
-       dapls_modify_qp_state(cm_ptr->ep->qp_handle, IBV_QPS_ERR, 0,0,0);
-       dapl_os_unlock(&cm_ptr->ep->header.lock);
        send(cm_ptr->socket, (char *)&disc_data, sizeof(disc_data), 0);
 
        /* disconnect events for RC's only */
        if (cm_ptr->ep->param.ep_attr.service_type == DAT_SERVICE_TYPE_RC) {
+               dapl_os_lock(&cm_ptr->ep->header.lock);
+               dapls_modify_qp_state(cm_ptr->ep->qp_handle, IBV_QPS_ERR, 
0,0,0);
+               dapl_os_unlock(&cm_ptr->ep->header.lock);
                if (cm_ptr->ep->cr_ptr) {
                        dapls_cr_callback(cm_ptr,
                                          IB_CME_DISCONNECTED,
-- 
1.5.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to