> #0 dapl_llist_remove_entry (head=0x636960, entry=0x7ffff0004bf8) at > dapl/common/dapl_llist.c:272 > #1 0x00007ffff799fb09 in dapl_sp_remove_cr (sp_ptr=0x6368c0, > cr_ptr=0x7ffff0004be0) at dapl/common/dapl_sp_util.c:229 > #2 0x00007ffff7998148 in dapli_connection_request (ib_cm_handle=<value > optimized out>, sp_ptr=0x6368c0, prd_ptr=<value optimized out>, > private_data_size=<value optimized out>, evd_ptr=0x633fb0) at > dapl/common/dapl_cr_callback.c:424 > > ... > > Now, it seems that some time back, a new release of dapl (dapl-2.0.34- > 1.src.rpm) was > introduced in OFED-1.5.4. So, I am just wondering if this is a > regression in the new > release of dapl? > Or if anyone is aware of this issue and what could possibly lead to > this > dapltest-server segfault then, it would be helpful if someone can shed > some light. > You should have seen a message like "WARNING: overflow event on EVD".
It appears that the default dapltest server allocates too small of a CR EVD for many client test configurations. When it hits the overflow queue case, the CR callback incorrectly frees the CR before it is removed from SP list. In your case, I am guessing that another CR came in on another thread and this memory was reallocated with flink ptr reinitialized. Please try the following patches. --------- Common: CR EVD overflow causes segfault. The CR is freed up incorrectly before unlinking with SP. Signed-off-by: Arlin Davis <arlin.r.da...@intel.com> diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c index 3997b38..c58444b 100644 --- a/dapl/common/dapl_cr_callback.c +++ b/dapl/common/dapl_cr_callback.c @@ -414,7 +414,6 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle, (DAT_CR_HANDLE) cr_ptr); if (dat_status != DAT_SUCCESS) { - dapls_cr_free(cr_ptr); (void)dapls_ib_reject_connection(ib_cm_handle, DAT_CONNECTION_EVENT_BROKEN, 0, NULL); @@ -423,6 +422,7 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle, dapl_os_lock(&sp_ptr->header.lock); dapl_sp_remove_cr(sp_ptr, cr_ptr); dapl_os_unlock(&sp_ptr->header.lock); + dapls_cr_free(cr_ptr); return DAT_INSUFFICIENT_RESOURCES; } ---------- dapltest: server CR EVD is too small for multi-client configurations. Increase default size from 8 to 32. Signed-off-by: Arlin Davis <arlin.r.da...@intel.com> diff --git a/test/dapltest/test/dapl_server.c b/test/dapltest/test/dapl_server.c index 443425c..92e0d21 100644 --- a/test/dapltest/test/dapl_server.c +++ b/test/dapltest/test/dapl_server.c @@ -34,7 +34,7 @@ #undef DFLT_QLEN #endif -#define DFLT_QLEN 8 /* default event queue length */ +#define DFLT_QLEN 32 /* default event queue length */ int send_control_data(DT_Tdep_Print_Head * phead, unsigned char *buffp, -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html