RE: [PATCH] dapl: Fix segfault while freeing qp
Thanks, applied. > -Original Message- > From: Bharat Potnuri [mailto:bha...@chelsio.com] > Sent: Tuesday, September 29, 2015 5:30 AM > To: Davis, Arlin R > Cc: linux-rdma@vger.kernel.org; sw...@opengridcomputing.com; > nirran...@chelsio.com; Bharat Potnuri > Subject: [PATCH] dapl: Fix segfault while freeing qp > > In function dapls_ib_qp_free(), pointers qp and cm_ptr->cm_id->qp are > pointing to the same qp structure, initialized in function > dapls_ib_qp_alloc(). > The memory pointed by these pointers are freed twice in function > dapls_ib_qp_free(), using rdma_destroy_qp() for the case _OPENIB_CMA > defined and then further using ibv_destroy_qp(), causing a segmentation fault > while freeing the qp. Therefore assigned NULL value to qp to avoid freeing > illegal memory. > > Fixes: 7ff4f840bf11 ("common: add CM-EP linking to support mutiple CM's and > proper protection during destruction") > > Signed-off-by: Bharat Potnuri <bha...@chelsio.com> > --- > dapl/openib_common/qp.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/dapl/openib_common/qp.c b/dapl/openib_common/qp.c index > 527fc1d4c46b..01f91ca2bd83 100644 > --- a/dapl/openib_common/qp.c > +++ b/dapl/openib_common/qp.c > @@ -397,6 +397,7 @@ DAT_RETURN dapls_ib_qp_free(IN DAPL_IA * ia_ptr, > IN DAPL_EP * ep_ptr) #ifdef _OPENIB_CMA_ > rdma_destroy_qp(cm_ptr->cm_id); > cm_ptr->cm_id->qp = NULL; > + qp = NULL; > #endif > > #ifdef _OPENIB_MCM_ > -- > 2.5.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.6-1 release
New release for uDAPL (2.1.6) is available at http://downloads.openfabrics.org/dapl/ Vlad, please pull into OFED 3.18-1 md5sum: dce3ef7c943807d35bcb26dae72b1d88 dapl-2.1.6.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.6-1 dapl-utils-2.1.6-1 dapl-devel-2.1.6-1 dapl-debuginfo-2.1.6-1 Release notes: http://downloads.openfabrics.org/dapl/documentation/uDAPL_release_notes.txt Summary: Release 2.1.6 ucm: add cluster size environments to adjust CM timers mpxyd: proxy_in data transfers can improperly start before RTU received mcm: forward open/query for MFO devices in query only mode mpxyd: byte swap incorrect on WRC wr_len dtest: remove ERR message from flush QP function dapltest: Quit command with -n port number will core dump config: update dat.conf for MFO qib devices, 2 adapters/ports mpxyd: add MFO support on proxy side mcm: add MFO proxy commands, device, and CM support mcm: add MFO support to openib_common code base mcm: add full offload (MFO) mode to provider to support qib on MIC dtest: pre-allocated buffer too small for RMR, DTO ops timeout mpxyd: fix buffer initialization when no-inline support is active mpxyd: reduce log level on qp_flush to CM level mcm: intra-node proxy missing LID setup on rejects mcm: add intra-node support via ibscif device and mcm provider mcm: provide MIC address info with proxy device open mcm: add device info to non-debug log common: add DAPL_DTO_TYPE_EXTENSION_IMM for rdma_write_imm DTO type checking mpxyd: fix up some of the PI logging dtest: modify rdma_write_with_msg to support uni-direction streaming mcm,mpxyd: fix dreq processing to defer QP flush when proxy WRs still pending mpxyd: update byte_len and comp_cnt for PO to remote HST communications mcm: bug fixes for non-inline devices mcm: return CM_rej with CM_req_in errors mpxyd,mcm: RDMA write with immed data not signaled on request side mcm: add WC opcode and wc_flags in debug log message mpxyd: set options bug fix for mcm_ib_inline Release commit: http://git.openfabrics.org/?p=~ardavis/dapl.git;a=commit;h=91febc42f0070b2b9eaa81c0c113c6ff7ab8ea60 Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.5-1 release
New release for uDAPL (2.1.5) is available at http://downloads.openfabrics.org/dapl/dapl-2.1.5.tar.gz Vlad, please pull into OFED 3.18 RC3 md5sum: 8a7735bfe24dd7c446aec38db3577728 dapl-2.1.5.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.5-1 dapl-utils-2.1.5-1 dapl-devel-2.1.5-1 dapl-debuginfo-2.1.5-1 Summary of changes: Release 2.1.5 update release notes, readme files dat.conf: update comments regarding versions dtest: add logging of provider private data size with -v scm: remove use of msg.resv field for process id logging cma: report correct CM req private data size on query mpxyd: memset ib_wr structure before post_send on WC and WR requests mcm: add HST side provider support for device without inline data capability ucm: CM changes for UD extended port space and indexer ucm: add device support for new port space hash table ucm: allocate/free AH hash table for UD endpoint types ucm: check for AH caching when destroying via UD extension ucm: optimizations for large scale UD communication management mpxyd: use wr opcode instead of wc opcode to support logging on error cases mcm: HST-MXS mode, using RDMA_WRITE_WITH_IMM, fails with dtest -w dapl: aarch64 support for linux dapltest: add scripts to dist, set default device to IPoIB mpxyd: add wc_flags to proxy work completions Release commit: http://git.openfabrics.org/?p=~ardavis/dapl.git;a=commit;h=facfb793374e4030ebf2ed539d77270a3e90d26e Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] dapl: ucm provider changes for large CM UD alltoall scaling
From: Arlin Davis arlin.r.da...@intel.com Tested on 1200n, 28ppn cluster, MPI AlltoAll Intel MPI, UD mode, static and dynamic, over 500m connections. Arlin Davis (5): ucm: optimizations for large scale UD communication management ucm: check for AH caching when destroying via UD extension ucm: allocate/free AH hash table for UD endpoint types ucm: add device support for new port space hash table ucm: CM changes for UD extended port space and indexer dapl/openib_common/dapl_ib_common.h | 15 +- dapl/openib_common/ib_extensions.c | 14 +- dapl/openib_common/qp.c | 32 +- dapl/openib_ucm/cm.c| 1255 +++ dapl/openib_ucm/dapl_ib_util.h | 37 +- dapl/openib_ucm/device.c| 153 - 6 files changed, 1030 insertions(+), 476 deletions(-) -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] dapl ucm: CM changes for UD extended port space and indexer
From: Arlin Davis arlin.r.da...@intel.com Change port manager to indexer and service ID manager to bitarray indexer. Reduces footprint for service IDs and allow direct lookup on CM messages. New insert, remove, lookup functions for processing ID based CM objects. Inbound requests, with the exception of new CM requests, will no longer parse list but use hash table lookups. AH caching is now used to prevent unnecessarily creating multiple AH's for same QP destination. Add 24-bit port space support to CM processing code and to wire protocol via DCM message reserve space. Add version check to limit to 16-bit for backward compatibility. Bump CM protocol version to 8 for xport and rtns fields. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c | 1255 +- 1 files changed, 826 insertions(+), 429 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 330b1c2..3d06c82 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -116,40 +116,9 @@ static int ucm_send(ib_hca_transport_t *tp, ib_cm_msg_t *msg, DAT_PVOID p_data, static void ucm_disconnect_final(dp_ib_cm_handle_t cm); DAT_RETURN dapli_cm_disconnect(dp_ib_cm_handle_t cm); DAT_RETURN dapli_cm_connect(DAPL_EP *ep, dp_ib_cm_handle_t cm); - -/* Service ids - port space */ -static uint16_t ucm_get_port(ib_hca_transport_t *tp, uint16_t port) -{ - int i = 0; - - dapl_os_lock(tp-plock); - /* get specific ID */ - if (port) { - if (tp-sid[port] == 0) { - tp-sid[port] = 1; - i = port; - } - goto done; - } - - /* get any free ID */ - for (i = 0x; i 0; i--) { - if (tp-sid[i] == 0) { - tp-sid[i] = 1; - break; - } - } -done: - dapl_os_unlock(tp-plock); - return i; -} - -static void ucm_free_port(ib_hca_transport_t *tp, uint16_t port) -{ - dapl_os_lock(tp-plock); - tp-sid[port] = 0; - dapl_os_unlock(tp-plock); -} +static int dapli_queue_listen(dp_ib_cm_handle_t cm, uint16_t sid); +static int dapli_queue_conn(dp_ib_cm_handle_t cm); +static dp_ib_cm_handle_t dapli_cm_lookup(ib_hca_transport_t *tp, int cm_id); static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) { @@ -163,16 +132,19 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) if ((time - cm-timer)/1000 = (cm-hca-ib_trans.rep_time cm-retries)) { dapl_log(DAPL_DBG_TYPE_CM_WARN, - CM_REQ retry %p %d [lid, port, cqp, iqp]: - %x %x %x %x - %x %x %x %x Time(ms) %d = %d\n, -cm, cm-retries+1, -ntohs(cm-msg.saddr.ib.lid), ntohs(cm-msg.sport), + CM_REQ %d retry %d: + %d %x %x %x %x - %d %x %x %x %x: %d %d(ms)\n, +cm-cm_id, cm-retries+1, +ntohl(cm-msg.s_id), ntohs(cm-msg.saddr.ib.lid), +UCM_PORT_NTOH(cm-msg.sportx, cm-msg.sport), ntohl(cm-msg.sqpn), ntohl(cm-msg.saddr.ib.qpn), -ntohs(cm-msg.daddr.ib.lid), ntohs(cm-msg.dport), +ntohl(cm-msg.d_id), ntohs(cm-msg.daddr.ib.lid), +UCM_PORT_NTOH(cm-msg.dportx, cm-msg.dport), ntohl(cm-msg.dqpn), ntohl(cm-msg.daddr.ib.qpn), (time - cm-timer)/1000, cm-hca-ib_trans.rep_time cm-retries); cm-retries++; + cm-msg.rtns = cm-retries; DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_REQ_RETRY); dapl_os_unlock(cm-lock); @@ -185,17 +157,20 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) if ((time - cm-timer)/1000 = (cm-hca-ib_trans.rtu_time cm-retries)) { dapl_log(DAPL_DBG_TYPE_CM_WARN, - CM_REP retry %d %s [lid, port, cqp, iqp]: - %x %x %x %x - %x %x %x %x r_pid %x Time(ms) %d = %d\n, -cm-retries+1, + CM_REP %d retry %d %s: + %d %x %x %x %x - %d %x %x %x %x: %d %d(ms)\n, +cm-cm_id, cm-retries+1, dapl_cm_op_str(ntohs(cm-msg.op)), -ntohs(cm-msg.saddr.ib.lid), ntohs(cm-msg.sport), +ntohl(cm-msg.s_id),
[PATCH 4/5] dapl ucm: add device support for new port space hash table
From: Arlin Davis arlin.r.da...@intel.com Allocate port space hash table during device open when creating CM services. Default settings are set to 4K entry chunks and 256K total port slots. Add environment variables for adjustments DAPL_UCM_ENTRY_BITS 11 DAPL_UCM_ARRAY_BITS 18 Add debug output for create CM service errors Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/device.c | 153 +++--- 1 files changed, 117 insertions(+), 36 deletions(-) diff --git a/dapl/openib_ucm/device.c b/dapl/openib_ucm/device.c index b9abbf0..94ce812 100644 --- a/dapl/openib_ucm/device.c +++ b/dapl/openib_ucm/device.c @@ -311,6 +311,9 @@ DAT_RETURN dapls_ib_open_hca(IN IB_HCA_NAME hca_name, if ((dapl_os_lock_init(hca_ptr-ib_trans.plock)) != DAT_SUCCESS) goto bail; + if ((dapl_os_lock_init(hca_ptr-ib_trans.ilock)) != DAT_SUCCESS) + goto bail; + /* EVD events without direct CQ channels, CNO support */ hca_ptr-ib_trans.ib_cq = ibv_create_comp_channel(hca_ptr-ib_hca_handle); @@ -367,11 +370,11 @@ done: hca_ptr-ib_trans.addr, sizeof(union dcm_addr)); - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + dapl_log(DAPL_DBG_TYPE_UTIL, %s open: dev %s port %d, GID %s, LID %x qpn %x sl %d\n, PROVIDER_NAME, hca_name, hca_ptr-port_num, inet_ntop(AF_INET6, hca_ptr-ib_trans.addr.ib.gid, - gid_str, sizeof(gid_str)), + gid_str, sizeof(gid_str)), ntohs(ucm_ia-ib.lid), ntohl(ucm_ia-ib.qpn), ucm_ia-ib.sl, ucm_ia-ib.qp_type); @@ -428,6 +431,7 @@ DAT_RETURN dapls_ib_close_hca(IN DAPL_HCA * hca_ptr) dapl_os_lock_destroy(hca_ptr-ib_trans.lock); dapl_os_lock_destroy(hca_ptr-ib_trans.llock); + dapl_os_lock_destroy(hca_ptr-ib_trans.ilock); destroy_os_signal(hca_ptr); ucm_service_destroy(hca_ptr); done: @@ -454,7 +458,7 @@ done: static void ucm_service_destroy(IN DAPL_HCA *hca) { ib_hca_transport_t *tp = hca-ib_trans; - int msg_size = sizeof(ib_cm_msg_t); + int i, msg_size = sizeof(ib_cm_msg_t); if (tp-mr_sbuf) ibv_dereg_mr(tp-mr_sbuf); @@ -475,26 +479,32 @@ static void ucm_service_destroy(IN DAPL_HCA *hca) ibv_destroy_comp_channel(tp-rch); if (tp-ah) { - int i; - - for (i = 0;i 0x; i++) { + for (i=0; iDCM_AH_SPACE; i++) { if (tp-ah[i]) ibv_destroy_ah(tp-ah[i]); } - dapl_os_free(tp-ah, (sizeof(*tp-ah) * 0x)); + dapl_os_free(tp-ah, (sizeof(*tp-ah) * DCM_AH_SPACE)); } if (tp-pd) ibv_dealloc_pd(tp-pd); if (tp-sid) - dapl_os_free(tp-sid, (sizeof(*tp-sid) * 0x)); + dapl_os_free(tp-sid, UCM_SID_SPACE/UCM_SID_ENTRY); if (tp-rbuf) dapl_os_free(tp-rbuf, (msg_size * tp-qpe)); if (tp-sbuf) dapl_os_free(tp-sbuf, (msg_size * tp-qpe)); + + if (tp-cm_idxr) { + for (i=0; i=tp-cm_idxr_cur; i++) { + dapl_os_free(tp-cm_idxr[i], +UCM_ENTRY_SIZE(tp-cm_entry_bits)); + tp-cm_idxr[i] = 0; + } + } } static int ucm_service_create(IN DAPL_HCA *hca) @@ -503,7 +513,7 @@ static int ucm_service_create(IN DAPL_HCA *hca) ib_hca_transport_t *tp = hca-ib_trans; struct ibv_recv_wr recv_wr, *recv_err; struct ibv_sge sge; - int i, mlen = sizeof(ib_cm_msg_t); + int i, array_sz, entry_sz, mlen = sizeof(ib_cm_msg_t); int hlen = sizeof(struct ibv_grh); /* hdr included with UD recv */ char *rbuf; @@ -518,31 +528,78 @@ static int ucm_service_create(IN DAPL_HCA *hca) tp-dreq_cnt = dapl_os_get_env_val(DAPL_UCM_DREQ_RETRY, DCM_DREQ_CNT); tp-drep_time = dapl_os_get_env_val(DAPL_UCM_DREP_TIME, DCM_DREP_TIME); tp-cm_timer = dapl_os_get_env_val(DAPL_UCM_TIMER, DCM_CM_TIMER); + /* default = 11-bit, 2KB entries; 18 bit, 256KB total */ + tp-cm_entry_bits = dapl_os_get_env_val(DAPL_UCM_ENTRY_BITS, UCM_ENTRY_BITS); + tp-cm_array_bits = DAPL_MAX(dapl_os_get_env_val(DAPL_UCM_ARRAY_BITS, UCM_ARRAY_BITS), tp-cm_entry_bits); + array_sz = UCM_ARRAY_SIZE(tp-cm_array_bits, tp-cm_entry_bits); + entry_sz = UCM_ENTRY_SIZE(tp-cm_entry_bits); + tp-pd = ibv_alloc_pd(hca-ib_hca_handle); -if (!tp-pd) -goto bail; +if (!tp-pd) { + dapl_log(DAPL_DBG_TYPE_ERR, +UCM: CM service: ERR ibv_pd (%s)\n, +strerror(errno)); + goto bail; +} -dapl_log(DAPL_DBG_TYPE_UTIL, -
[PATCH 1/5] dapl ucm: optimizations for large scale UD communication management
From: Arlin Davis arlin.r.da...@intel.com AH caching per QP, AH space set to 48K for LID unicast Bump port space up to 24 bits Reduce CM object and reduce private data to 68 bytes Add xport space and rtns to DCM reserve fields. New indexer macros for port space hash table management Add hash table storage to ibtrans device objects Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/dapl_ib_common.h | 15 ++--- dapl/openib_ucm/dapl_ib_util.h | 37 -- 2 files changed, 45 insertions(+), 7 deletions(-) diff --git a/dapl/openib_common/dapl_ib_common.h b/dapl/openib_common/dapl_ib_common.h index 180c876..7b3e5d0 100644 --- a/dapl/openib_common/dapl_ib_common.h +++ b/dapl/openib_common/dapl_ib_common.h @@ -43,11 +43,14 @@ #define true 1 #endif /*__cplusplus */ +#define DCM_AH_SPACE (0xC000) /* unicast LID range */ + /* Typedefs to map common DAPL provider types to IB verbs */ struct dcm_ib_qp { struct _ib_hca_transport *tp; struct dapl_ep *ep; struct ibv_qp*qp; /* local QP1 snd-rcv or rcv from PO */ + struct ibv_ah **ah; /* UD AH cache, LID index */ #ifdef _OPENIB_MCM_ struct dcm_ib_cq *req_cq; /* ref to req CQ for HST-MXS */ struct dcm_ib_cq *rcv_cq; /* ref to rcv CQ for HST-MXS */ @@ -98,11 +101,13 @@ typedefstruct ibv_context *ib_hca_handle_t; typedef ib_hca_handle_tdapl_ibal_ca_t; /* QP info to exchange, wire protocol version for these CM's */ -#define DCM_VER 7 +/* Version 8, 24-bit port space and rtns value */ +#define DCM_VER 8 +#define DCM_VER_XPS 8 /* extended port space, rtns */ #define DCM_VER_MIN 6 /* backward compatibility limit */ /* CM private data areas, same for all operations */ -#defineDCM_MAX_PDATA_SIZE 118 +#defineDCM_MAX_PDATA_SIZE 68 /* * UCM DAPL IB/QP address (lid, qp_num, gid) mapping to @@ -127,7 +132,6 @@ union dcm_addr { } ib; }; -/* 256 bytes total; default max_inline_send, min IB MTU size */ typedef struct _ib_cm_msg { uint16_tver; @@ -140,7 +144,10 @@ typedef struct _ib_cm_msg uint32_ts_id; /* src pid */ uint32_td_id; /* dst pid */ uint8_t rd_in; /* atomic_rd_in */ - uint8_t resv[5]; + uint8_t sportx; /* extend to 24 bits */ + uint8_t dportx; /* extend to 24 bits */ + uint8_t rtns; /* retransmissions */ + uint8_t resv[2]; union dcm_addr saddr; union dcm_addr daddr; union dcm_addr saddr_alt; diff --git a/dapl/openib_ucm/dapl_ib_util.h b/dapl/openib_ucm/dapl_ib_util.h index 8665491..ece9c88 100644 --- a/dapl/openib_ucm/dapl_ib_util.h +++ b/dapl/openib_ucm/dapl_ib_util.h @@ -33,15 +33,39 @@ #include openib_osd.h #include dapl_ib_common.h +#define UCM_SID_BITS 16 /* 64K */ +#define UCM_SID_SPACE (1 UCM_SID_BITS) +#define UCM_SID_MASK (UCM_SID_SPACE-1) +#define UCM_SID_ENTRY 8/* 8 bit entry */ + +#define UCM_CHK_SID(a,p) (a[p/UCM_SID_ENTRY] (1 (p%UCM_SID_ENTRY))) +#define UCM_SET_SID(a,p) (a[p/UCM_SID_ENTRY] = (a[p/UCM_SID_ENTRY] | (1 (p%UCM_SID_ENTRY +#define UCM_CLR_SID(a,p) (a[p/UCM_SID_ENTRY] = (a[p/UCM_SID_ENTRY] ~(1 (p%UCM_SID_ENTRY + +#define UCM_PORT_BITS 24 /* 16M total, wire protocol max */ +#define UCM_PORT_SPACE (1 UCM_SID_BITS) +#define UCM_PORT_MASK (UCM_PORT_SPACE-1) +#define UCM_PORT_NTOH(hi,lo) hi 0xff) 16) | (ntohs(lo) 0x)) (UCM_PORT_MASK)) +#define UCM_PORT(p) (p 0x) +#define UCM_PORTX(p) ((p 16) 0xff) + +#define UCM_ENTRY_BITS 11/* 2K entries, default */ +#define UCM_ARRAY_BITS 18/* 256K total ports, default */ +#define UCM_ENTRY_SIZE(ebits) (1 ebits) +#define UCM_ARRAY_SIZE(abits, ebits) (1 (abits - ebits)) +#define UCM_ARRAY_IDX_MAX(abits) ((1 abits) - 1) +#define UCM_ARRAY_IDX(idx, abits) (idx abits) +#define UCM_ENTRY_IDX(idx, abits) (idx (abits - 1)) + + /* DAPL CM objects MUST include list_entry, ref_count, event for EP linking */ struct ib_cm_handle { struct dapl_llist_entry list_entry; struct dapl_llist_entry local_entry; - DAPL_OS_WAIT_OBJECT d_event; - DAPL_OS_WAIT_OBJECT f_event; DAPL_OS_LOCKlock; DAPL_OS_TIMEVAL timer; + uint32_tcm_id; intref_count; int state; int retries; @@ -49,7 +73,6 @@ struct ib_cm_handle struct dapl_sp *sp; struct dapl_ep *ep; struct dapl_cr *cr; - struct ibv_ah *ah; uint16_tp_size; /* accept p_data, for retries */ uint8_t p_data[DCM_MAX_PDATA_SIZE];
[PATCH 2/5] dapl ucm: check for AH caching when destroying via UD extension
From: Arlin Davis arlin.r.da...@intel.com Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/ib_extensions.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/dapl/openib_common/ib_extensions.c b/dapl/openib_common/ib_extensions.c index fc03d9c..25db541 100644 --- a/dapl/openib_common/ib_extensions.c +++ b/dapl/openib_common/ib_extensions.c @@ -202,7 +202,6 @@ dapl_extensions(IN DAT_HANDLE dat_handle, status = DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_EP); else { - cm-ah = NULL; /* consumer will free AH */ status = dapls_ud_cm_free(ep, cm); } break; @@ -210,6 +209,7 @@ dapl_extensions(IN DAT_HANDLE dat_handle, case DAT_IB_UD_AH_FREE_OP: { DAT_IB_ADDR_HANDLE *dat_ah; + uint16_t lid; int ret; dapl_dbg_log(DAPL_DBG_TYPE_RTN, @@ -222,8 +222,18 @@ dapl_extensions(IN DAT_HANDLE dat_handle, status = DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_EP); } else { + lid = ntohs(((union dcm_addr *)dat_ah-ia_addr)-ib.lid); + + if (lid DCM_AH_SPACE) { + status = DAT_ERROR(DAT_INVALID_PARAMETER, + DAT_INVALID_ARG2); + break; + } + errno = 0; - ret = ibv_destroy_ah(dat_ah-ah); + if (!((DAPL_EP *)ep)-qp_handle-ah[lid]) + ret = ibv_destroy_ah(dat_ah-ah); + status = dapl_convert_errno(errno, destroy_ah); } break; -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.4-1 release
New release for uDAPL (2.1.4) is available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 3.18 RC2 md5sum: d20bd85bc71c2f3391fb4c8d61e4e62d dapl-2.1.4.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.4-1 dapl-utils-2.1.4-1 dapl-devel-2.1.4-1 dapl-debuginfo-2.1.4-1 Summary of changes: Release 2.1.4 (targeting OFED 3.18) mpxyd: fix typo in configuration file cma: RR attributes moved to common ib_cm struct mpxyd: tx thread incorrectly sleeps with negative pi_rw_cnt value dat.conf: add entries for True Scale qib device mpxyd: add support for devices without inline data support ucm: long disconnect times with many-to-one applications openib: add inline data support check during device open cleanup ib/cm attribute management across openib providers dapltest: fix -Werror=format-security issue with printf Release commit: http://git.openfabrics.org/?p=~ardavis/dapl.git;a=commit;h=3d0f53bd26db0c2c5261740d8fefc6e03209996a Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.3-1 release
New release for DAPL (2.1.3) is available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 3.18 daily builds. Latest Packages (see ChangeLog for recent changes, see README.mcm for MIC support): md5sum: 04537bdd405b89c562d73bfdd6027c2b dapl-2.1.3.tar.gz For package install RPM packages as follow: dapl-2.1.3-1 dapl-utils-2.1.3-1 dapl-devel-2.1.3-1 dapl-debuginfo-2.1.3-1 Full list of changes since last release: Amir Hanania (2): common: add srq support for openib verbs providers dtest: add dtestsrq for SRQ example and provider testing Arlin Davis (21): add provider and proxy support for GUID across platform mpxyd: log warning if running in COMPAT mode mpxyd/mcm: add provider specific attribute DAT_IB_PROXY_VERSION extension: add IB UD extensions to reduce provider CM and AH memory footprint openib: add new TIMEWAIT state for CM openib: add IB UD cm_free/ah_free extension support in UCM provider dtestx: update IB extension example test with new v2.0.9 features mcm: provide CPU family/model attribute on both host and mic sides openib: add port_num to provider named attributes mpxyd: set global seg_sz to 128KB for proxy data service mcm: add segmentation to HST-MXS mode for improved performance mcm: HST-MXS mode incorrectly signals multiple fragments per WR mpxyd: DTO completion ERR: status 12, op RDMA_WRITE running MPI alltoall test mpxyd: increase max open files for service ucm: RTU not retransmitted in TIMEWAIT state dtestx: allow scale up to 1000 EP's common: dapl_ep_free must serialize CM object destroy ucm: add time wait override capability for CM services dapl: add rdma_write_imm and write only option to dtest dapl: mpxyd service changes to support multi-thread single-core Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH for-next 9/9] Samples: Peer memory client example
-Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Hefty, Sean Sent: Wednesday, October 01, 2014 10:16 AM To: Yishai Hadas; rol...@kernel.org Cc: linux-rdma@vger.kernel.org; rain...@mellanox.com Subject: RE: [PATCH for-next 9/9] Samples: Peer memory client example Adds an example of a peer memory client which implements the peer memory API as defined under include/rdma/peer_mem.h. It uses the HOST memory functionality to implement the APIs and can be a good reference for peer memory client writers. Is there a real user of these changes? CCL (co-processor communication link) Direct for Intel Xeon Phi, included in OFED 3.12-1 and OFED-3.5-2-MIC, uses the peer-direct interface. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 4/5] dapl: add support for the s390x platform
Subject: [PATCH 4/5] dapl: add support for the s390x platform This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec, and dapl_os_atomic_assign function implementatios to the dapl userspace package to provide the DAPL API support on the s390x platform by adding Assembler language implemenation of those platform specific functions. Signed-off-by: Alexey Ishchuk aishc...@linux.vnet.ibm.com Acked-by: Arlin Davis arlin.r.da...@intel.com Committed. Thanks! -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.2-1 release
New release for uDAPL (2.1.2) is available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 3.12-1 RC2 md5sum: dd757dec11cb23702aea8474e76a0037 dapl-2.1.2.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.2-1 dapl-utils-2.1.2-1 dapl-devel-2.1.2-1 dapl-debuginfo-2.1.2-1 Summary of changes: Release 2.1.2 mpxyd: add global routing support for proxy connections mcm: only call mix_get_attr if running on MIC openib: modify check for link_layer to handle unspecified dapl: add support for the s390x platform dtest server exchange connection info with client mpxyd: 2 MICs in same numa_node will overlap CPU affinity, don't reset base mcm: implement proxy mix_prov_attr function, add fields CPU model and family mpxyd: tx thread may not be signaled on small segment writes Release commit: http://git.openfabrics.org/?p=~ardavis/dapl.git;a=commit;h=25568900892f9e72413e235ebc4ba77176343c84 Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.1-1 release
New release for uDAPL v2 (2.1.1) is available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 3.12-1 RC1 md5sum: ffdfefd85ddded65286a0d728508fbce dapl-2.1.1.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.1-1 dapl-utils-2.1.1-1 dapl-devel-2.1.1-1 dapl-debuginfo-2.1.1-1 Summary of v2.1 changes: Release 2.1.1 (OFED 3.12-1) common: add provider name to log messages mpxyd: log warning message if numa_node invalid include debuginfo with build build: include debuginfo with build mpxyd: tx thread doesn't sleep during no pending IO state mpxyd: change MIC cpu_mask to per numa node instead of adapter mpxyd: set to MXS mode if device numa_node is invalid (-1) mpxyd: MXS based alltoall benchmark hangs or returns post_send timeout mpxyd: add IO profile capabilities to help debug alltoall stall cases mpxyd: retry stalled inline post_send, init m_idx only when signaled Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.1.0-1 release - MIC support
New release for uDAPL v2 (2.1.0) is available at http://www.openfabrics.org/downloads/dapl MIC support has been added in this release and is provided with the new MCM provider and MPXYD service. MCM requires the Intel(R) MPSS 3.x (YOCTO) release for Linux to be installed on your system. MPSS 3.x for Linux can be downloaded from: http://software.intel.com/mic-developer Vlad, please pull into OFED 3.12-1 md5sum: 43bd0f2a7e72ef283d27f99f935e615a dapl-2.1.0.tar.gz For v2.1 package install RPM packages as follow: dapl-2.1.0-1 dapl-utils-2.1.0-1 dapl-devel-2.1.0-1 dapl-debuginfo-2.1.0-1 Summary of v2.1 changes: Release 2.1.0 (OFED 3.12-1) build: add missing NEWS file update autogen.sh add MCM provider and MPXYD service to build mpxyd: service startup script and configuration file add readme for MCM provider and MPXYD service update Copyright dates add new MIC RDMA proxy service daemon (MPXYD) add new dapl MIC provider (MCM) to support MIC RDMA proxy services MCM: new MIC provider and proxy service definitions cleanup build warnings common: add CQ,QP,MR abstractions for new MIC provider and data proxy service openib: cleanup, use inet_ntop for GIDs, remove some logs, destroy pipes on release common: new dapls_evd_cqe_to_event call, cqe to event common: init ring_buffer, assign hd/tl pos in range allow log level changes during device open ucm: fix cm rbuf setup, include grh pad on initialization ucm: remove duplicate async_event code, use common async event call new lightweight open_query/close_query IB extension for fast attribute query dtestcm: add more detailed debug during disconnect phase cma: long delays when opening cma provider with no IPoIB configured common: new debug levels for low system memory, IA stats, and package info build: remove library check for mverbs with --enable-fca IB extension: segfault in create collective group with non-vector type IA handle build: change configure help to correctly state collective default=none MIC support overview: The new MIC service is designed to provide MIC based DAPL Provider clients with higher bandwidth access to IB fabrics when direct IB fabric access is unavailable or constrained. It includes a DAPL provider (MCM) and a host based proxy data service (MPXYD) for SND/RCV and RDMA write operations. RDMA write with immediate data is the only IB extension supported. RDMA reads and atomics are not supported. The MCM provider maintains the DAT level API semantics, including ordering requirements of data flow. This new service communicates within a server platform over PCI-E bus using Symmetric Communications Interface (SCI) and a MCM specific MIX (MIC exchange) messaging protocol. On the wire, the MCM provider uses a new CM and WR/WC proxy protocol that can run from either a MIC or HOST. With this new protocol the MCM endpoint send and receive channels are managed separately for optimal data services based on physical locality of each endpoint (see below). Refer to mpxyd.conf for tunable proxy service attributes. See /etc/dat.conf for MCM provider device definitions. The following shows connectivity modes and data paths supported: HST - HST to HCA MSS - MIC to HCA same socket MXS - MIC to HCA cross socket 1. HST-HST:Host-HCA-fabric-HCA-Host (direct-direct) HST-HST:Host -HCA-fabric-HCA-Host (direct-direct) 2. MSS-MSS:KNC- Host -HCA-fabric-HCA-KNC (proxy-direct) MSS-MSS:KNC-HCA-fabric-HCA- Host -KNC (direct-proxy) 3. MSX-MSX:KNC- Host -HCA-fabric-HCA- Host -KNC (proxy-proxy) MSX-MSX:KNC- Host -HCA-fabric-HCA- Host -KNC (proxy-proxy) 4. MSS-MSX:KNC- Host -HCA-fabric-HCA- Host -KNC (proxy-proxy) MSS-MXS:KNC-HCA-fabric-HCA- Host -KNC (direct-proxy) 5. MSS-HST:KNC- Host -HCA-fabric-HCA- Host (proxy-direct) MSS-HST:KNC-HCA-fabric-HCA- Host (direct-direct) 6. MSX-HST:KNC- Host -HCA-fabric-HCA- Host (proxy-direct) MSX-HST:KNC- Host -HCA-fabric-HCA- Host (proxy-direct) Regards, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.0.42 release
New release for uDAPL v2 (2.0.42) available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 3.12 RC2. md5sum: 08a9ccf071935c9149d4bd1d6a0f9d65 dapl-2.0.42.tar.gz For v2.0 package install RPM packages as follow: dapl-2.0.42-1 dapl-utils-2.0.42-1 dapl-devel-2.0.42-1 dapl-debuginfo-2.0.42-1 Summary of v2.0 changes: Release 2.0.42 fixes (OFED 3.12 GA) dapltest: increase DTO evd size to prevent CQ overflow on limit_rpost test dapltest: RSP limit test fails. Creation of reserved SP moves EP state to DAT_EP_STATE_RESERVED in error cases. dapl: fix string bug in dapls_dto_op_str -arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] dat: lower log level on load errors of provider library
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/udat/linux/dat_osd.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dat/udat/linux/dat_osd.c b/dat/udat/linux/dat_osd.c index 28ac9fa..cbb95ba 100644 --- a/dat/udat/linux/dat_osd.c +++ b/dat/udat/linux/dat_osd.c @@ -156,7 +156,7 @@ dat_os_library_load ( } else { - dat_os_dbg_print (DAT_OS_DBG_TYPE_ERROR, + dat_os_dbg_print (DAT_OS_DBG_TYPE_GENERIC, DAT: library load failure: %s\n, dlerror ()); return DAT_INTERNAL_ERROR; -- 1.7.3
[PATCH 1/5] dapltest: update scripts for regression testing purposes
cl.sh and srv.sh update to provide better examples and a method to quickly regression test any dapltest changes. usage: srv.sh devicename where devicename is provider (default = ofa-v2-mlx4_0-1) usage: cl.sh hostname testname devicename where testname stop - request DAPLtest server to exit. conn - simple connection with limited dater transfer trans - single transaction test transm - transaction test: multiple transactions [RW SND, RDMA] transt - transaction test: multi-threaded, single transaction transme - transaction test: multi-endpoints per thread transmet - transaction test: multi: threads and endpoints per thread transmete - transaction test: multi threads == endpoints perf - Performance test threads - multi-threaded single transaction test. threadsm - multi: threads and endpoints, single transaction test. rdma-write - RDMA write rdma-read - RDMA read bw - bandwidth latb - latency tests, blocking for events latp - latency tests, polling for events lim - limit tests. regression - loop over a collection of all tests. where devicename is provider (default = ofa-v2-mlx4_0-1) Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/scripts/cl.sh | 236 +- test/dapltest/scripts/srv.sh | 24 +++-- 2 files changed, 226 insertions(+), 34 deletions(-) diff --git a/test/dapltest/scripts/cl.sh b/test/dapltest/scripts/cl.sh index 9a8d64f..46c7edb 100755 --- a/test/dapltest/scripts/cl.sh +++ b/test/dapltest/scripts/cl.sh @@ -1,6 +1,7 @@ #!/bin/sh # # Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. +# Copyright (c) 2014 Intel Corporation. All rights reserved. # # This Software is licensed under one of the following licenses: # @@ -28,31 +29,218 @@ # notice, one of the license notices in the documentation # and/or other materials provided with the distribution. # -# Sample client invocation +# Sample DAPLtest client Usage: cl.sh hostname [testname] [device] # +# default device = ofa-v2-mlx4_0-1 # -me=`basename $0` -case $# in -0) host=dat-linux3 -device=0 ;; -1) host=$1 -device=0 ;; -2) host=$1 -device=$2 ;; -*) echo Usage: $me '[hostname [device] ]' 12 ; exit 1;; -esac -# -# -# ./dapltest -T T -V -d -t 2 -w 2 -i 1000111 -s ${host} -D ${device} \ -# client RW 4096 1server RW 2048 4 \ -# client RR 1024 2server RR 2048 2 \ -# client SR 1024 3 -f server SR 256 3 -f - ./dapltest -T T -P-d -t 2 -w 2 -i 1024 -s ${host} -D ${device} \ -client RW 4096 1server RW 2048 4 \ -client RR 1024 2server RR 2048 2 \ -client SR 1024 3 -f server SR 256 3 -f +DT=dapltest +D=ofa-v2-mlx4_0-1 +L=1 +X= +T= +E= + +# need some help? +if [ $1 == -h ] ; then +T= +else +S=$1 +if [ ! $2 == ] ; then +T=$2 +if [ ! $3 == ] ; then + D=$3 +fi +fi +fi + +if [ ! $X == ] ; then +DAT_OS_DBG_TYPE=$X +DAT_DBG_TYPE=$X +DAT_DBG_LEVEL=$X +DAPL_DBG_LEVEL=$X +DAPL_DBG_TYPE=$X +else +DAT_DBG_TYPE=0x1 +DAT_DBG_LEVEL=1 +fi + +echo +echo uDAPL client test $DT $T $D - $S +echo + +# Endpoint and Thread stress +if [ $T == epa ] ; then +T=10 +E=10 +LT=10 +LE=50 +for ((T=$T ; $T = $LT ; $((T++)) )) ; do + for ((E=$E ; $E = $LE ; $((E++)) )) ; do + echo $T $E: Multi: Threads[$T] Endpoints[$E] Send/Recv test - 4096 iterations, 3 8K segs + $DT -T T -s $S -D $D -i 4096 -t $T -w $E client SR 8192 3 server SR 8192 3 + if [ $? -ne 0 ] ; then + echo failed $X + exit 1 + fi +done +done +echo THREADS $LT and ENDPOINTS $LE loops completed. +exit +fi + +if [ $T == conn ] ; then +# Connectivity test - client sends one buffer with one 4KB segments, one time. +# add '-d' for debug output. +$DT -T T -s $S -D $D -i 1 -t 1 -w 1 client SR 4096 server SR 4096 +exit +fi + +if [ $T == trans ] ; then +echo Transaction test - 8192 iterations, 1 thread, SR 4KB buffers + $DT -T T -s $S -D $D -i 8192 -t 1 -w 1 client SR 4096 server SR 4096 +exit +fi + +if [ $T == transm ] ; then +echo Multiple RW, RR, SR transactions, 4096 iterations +$DT -T T -P -t 1 -w 1 -i 4096 -s $S -D $D client RW 4096 1 server RW 2048 4 server RR 1024 1 client RR 2048 1 client SR 1024 3 -f server SR 256 3 -f +exit +fi + +if [ $T == transmx ] ; then +echo Multiple RW, RR, SR transactions, 8192 iterations +$DT -T T -P -t 1 -w 1 -i 8192 -s $S -D $D \ + client RW 32768 4 server RW 32768 4 \ + server RR 32768 1 client RR 32768 1 \ + client SR 16384 4 -f server SR 16384 4 -f +exit +fi + +if [ $T == transt ] ; then +echo Multi-threaded[4] Transaction test - 4096 iterations, 1 thread, SR 4KB buffers
[PATCH 3/5] dapltest: set default limit max to 1000
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/cmd/dapl_limit_cmd.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/test/dapltest/cmd/dapl_limit_cmd.c b/test/dapltest/cmd/dapl_limit_cmd.c index e59707e..77e1ae4 100644 --- a/test/dapltest/cmd/dapl_limit_cmd.c +++ b/test/dapltest/cmd/dapl_limit_cmd.c @@ -36,7 +36,7 @@ void DT_Limit_Cmd_Init(Limit_Cmd_t * cmd) memset((void *)cmd, 0, sizeof(Limit_Cmd_t)); cmd-ReliabilityLevel = DAT_QOS_BEST_EFFORT; cmd-width = 1; - cmd-maximum = ~0U; + cmd-maximum = 1000; cmd-port = SERVER_PORT_NUMBER; } @@ -197,6 +197,7 @@ void DT_Limit_Cmd_Usage(void) DT_Mdep_printf(USAGE: [-d] : debug (zero)\n); DT_Mdep_printf(USAGE: [-w width_of_resource_sets]\n); DT_Mdep_printf(USAGE: [-m maximum_for_exhaustion_tests]\n); + DT_Mdep_printf(USAGE: (1000 - Default)\n); DT_Mdep_printf(USAGE: [-R service reliability]\n); DT_Mdep_printf(USAGE: (BE == QOS_BEST_EFFORT - Default)\n); DT_Mdep_printf(USAGE: (HT == QOS_HIGH_THROUGHPUT)\n); -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 2/5] openib: add new provider specific attributes
DAT_IB_PROVIDER_NAME = UCM/CMA/SCM DAT_IB_DEVICE_NAME = ibv_get_device_name DAT_IB_CONNECTIVITY_MODE = DIRECT/PROXY DAT_IB_RDMA_READ = TRUE/FALSE DAT_IB_NODE_GUID = ::: DAT_IB_PORT_STATE = ibv_port_state_str Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_cma/dapl_ib_util.h |4 ++- dapl/openib_common/dapl_ib_common.h | 13 - dapl/openib_common/dapl_ib_dto.h| 10 +++ dapl/openib_common/util.c | 52 --- dapl/openib_scm/dapl_ib_util.h |4 ++- dapl/openib_ucm/dapl_ib_util.h |4 ++- test/dtest/dtest.c | 14 +- 7 files changed, 86 insertions(+), 15 deletions(-) diff --git a/dapl/openib_cma/dapl_ib_util.h b/dapl/openib_cma/dapl_ib_util.h index 454f7e1..de95485 100755 --- a/dapl/openib_cma/dapl_ib_util.h +++ b/dapl/openib_cma/dapl_ib_util.h @@ -120,10 +120,12 @@ typedef struct _ib_hca_transport uint8_t hop_limit; uint8_t tclass; uint8_t mtu; - DAT_NAMED_ATTR named_attr; uint8_t sl; uint16_tpkey; int pkey_idx; + uint64_tguid; + charguid_str[32]; + ib_named_attr_t na; #ifdef DAT_IB_COLLECTIVES /* Collective member device and address information */ ib_thread_state_t coll_thread_state; diff --git a/dapl/openib_common/dapl_ib_common.h b/dapl/openib_common/dapl_ib_common.h index ba805d0..dfc80a9 100644 --- a/dapl/openib_common/dapl_ib_common.h +++ b/dapl/openib_common/dapl_ib_common.h @@ -109,6 +109,17 @@ typedef struct _ib_cm_msg } ib_cm_msg_t; +typedef struct _ib_named_attr +{ +const char *dev; +const char *mode; +const char *read; +const char *guid; +const char *mtu; +const char *port; + +} ib_named_attr_t; + /* CM events */ typedef enum { IB_CME_CONNECTED, @@ -304,7 +315,7 @@ int32_t dapls_ib_release(void); /* util.c */ enum ibv_mtu dapl_ib_mtu(int mtu); -char *dapl_ib_mtu_str(enum ibv_mtu mtu); +const char *dapl_ib_mtu_str(enum ibv_mtu mtu); int getipaddr_netdev(char *name, char *addr, int addr_len); DAT_RETURN getlocalipaddr(char *addr, int addr_len); diff --git a/dapl/openib_common/dapl_ib_dto.h b/dapl/openib_common/dapl_ib_dto.h index b93565c..2bd6e7e 100644 --- a/dapl/openib_common/dapl_ib_dto.h +++ b/dapl/openib_common/dapl_ib_dto.h @@ -35,6 +35,16 @@ STATIC _INLINE_ int dapls_cqe_opcode(ib_work_completion_t *cqe_p); +#if defined(_OPENIB_CMA_) +#define PROVIDER_NAME CMA +#elif defined(_OPENIB_UCM_) +#define PROVIDER_NAME UCM +#elif defined(_OPENIB_SCM_) +#define PROVIDER_NAME SCM +#else +#define PROVIDER_NAME +#endif + #define CQE_WR_TYPE_UD(id) \ (((DAPL_COOKIE *)(uintptr_t)id)-ep-qp_handle-qp_type == IBV_QPT_UD) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index 20fb8b2..258d172 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -246,7 +246,7 @@ enum ibv_mtu dapl_ib_mtu(int mtu) } } -char *dapl_ib_mtu_str(enum ibv_mtu mtu) +const char *dapl_ib_mtu_str(enum ibv_mtu mtu) { switch (mtu) { case IBV_MTU_256: @@ -264,8 +264,6 @@ char *dapl_ib_mtu_str(enum ibv_mtu mtu) } } - - /* * dapls_ib_query_hca * @@ -377,10 +375,19 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA * hca_ptr, DAPL_MAX(dev_attr.local_ca_ack_delay, hca_ptr-ib_trans.ack_timer); - /* set MTU in transport specific named attribute */ - hca_ptr-ib_trans.named_attr.name = DAT_IB_TRANSPORT_MTU; - hca_ptr-ib_trans.named_attr.value = - dapl_ib_mtu_str(hca_ptr-ib_trans.mtu); + /* set provider/transport specific named attributes */ + hca_ptr-ib_trans.na.dev = ia_attr-adapter_name; + hca_ptr-ib_trans.na.mtu = dapl_ib_mtu_str(hca_ptr-ib_trans.mtu); + hca_ptr-ib_trans.na.port = ibv_port_state_str(port_attr.state); + hca_ptr-ib_trans.guid = ntohll(ibv_get_device_guid(hca_ptr-ib_trans.ib_dev)); + sprintf(hca_ptr-ib_trans.guid_str, %04x:%04x:%04x:%04x, + (unsigned) (hca_ptr-ib_trans.guid 48) 0x, + (unsigned) (hca_ptr-ib_trans.guid 32) 0x, + (unsigned) (hca_ptr-ib_trans.guid 16) 0x, + (unsigned) (hca_ptr-ib_trans.guid 0) 0x); + hca_ptr-ib_trans.na.guid = hca_ptr-ib_trans.guid_str; + hca_ptr-ib_trans.na.mode = DIRECT; + hca_ptr-ib_trans.na.read = TRUE; if (hca_ptr-ib_hca_handle-device-transport_type != IBV_TRANSPORT_IB) goto skip_ib; @@ -635,7 +642,9 @@ void dapli_async_event_cb(struct _ib_hca_transport
[PATCH 4/5] dat: dat_ia_open needs to close provider after failure
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/udat/udat.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/dat/udat/udat.c b/dat/udat/udat.c index 03edcf9..842b36c 100755 --- a/dat/udat/udat.c +++ b/dat/udat/udat.c @@ -210,6 +210,12 @@ dat_ia_openv(IN const DAT_NAME_PTR name, async_event_handle, ia_handle); if (dat_status == DAT_SUCCESS) { *ia_handle = (DAT_IA_HANDLE) dats_set_ia_handle(*ia_handle); + } else { + (void)dat_dr_provider_close(info); +#ifndef DAT_NO_STATIC_REGISTRY + (void)dat_sr_provider_close(info); +#endif + return dat_status; } /* -- 1.7.3
[PATCH] udapl: move dapltest default server port outside ephemeral port range
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/include/dapl_common.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/test/dapltest/include/dapl_common.h b/test/dapltest/include/dapl_common.h index a514973..7dfe471 100644 --- a/test/dapltest/include/dapl_common.h +++ b/test/dapltest/include/dapl_common.h @@ -33,7 +33,7 @@ #include dapl_proto.h -#define SERVER_PORT_NUMBER ((DAT_CONN_QUAL)0xB0de) +#define SERVER_PORT_NUMBER ((DAT_CONN_QUAL)62000) typedef enum { -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH V2] dapltest: Add final send/recv sync for transaction tests.
The transaction tests need both sides to send a sync message after running the test. This ensures that all remote operations are complete before dapltest deregeisters memory and disconnects the endpoints. Without this logic, we see intermittent async errors on iwarp devices because a read response or write arrives after the rmr has been destroyed. I believe this is more likely to happen with iWARP than IB because iWARP completions only indicate the local buffer can be reused. It doesn't imply that the message has even arrived at the peer, let alone been placed in the peer application's memory. Changes from V1: - allocate new send/recv buffers for the Final Sync message. - post the Final Sync recv buffer at the beginning of the final iteration of a test. - tests ok on cxgb4 and mlx4 devices. Signed-off-by: Steve Wise sw...@opengridcomputing.com --- Acked-by: Arlin Davis arlin.r.da...@intel.com Thanks!
RE: [PATCH V2] dapltest: Add final send/recv sync for transaction tests.
Hey Arlin, I'd like to get this fix into OFED-3.12 if possible. Ok, I will include this fix in the next dapl package targeted for OFED-3.12 RC2. N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [PATCH 1/4] NULL undeclared on Fedora
--- a/dat/include/dat2/dat_platform_specific.h +++ b/dat/include/dat2/dat_platform_specific.h @@ -147,6 +147,7 @@ typedef DAT_UINT64 DAT_PADDR; #if defined(__KERNEL__) #include linux/types.h #else +#include stdio.h You could use #include stddef.h if only NULL is needed http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stddef.h.html There was an issue with Fedora where linux/stddef.h is empty. Looks like I need to clean this up and use stddef.h instead of linux/stddef.h for the non-kernel build. Thanks.
[ANNOUNCE] dapl-2.0.40
A new release of dapl is available at http://www.openfabrics.org/downloads/dapl Latest Package (see ChangeLog for recent changes): md5sum: 7c6ef6e0573672ffb19f75db305b609c dapl-2.0.40.tar.gz Install following RPM packages: dapl-2.0.40-1 dapl-utils-2.0.40-1 dapl-devel-2.0.40-1 dapl-debuginfo-2.0.40-1 Release 2.0.40 fixes (OFED 3.12) - build/dist: ib collective extension include files missing - dapltest: the quit command is missing changes for -n option - dat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC - NULL undefined on Fedora, incorrectly using kernel stddef.h Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] dapltest: the quit command is missing changes for -n option.
Server-port was not being set properly during param init phase on the client side. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/cmd/dapl_params.c |1 + test/dapltest/cmd/dapl_quit_cmd.c | 10 +- test/dapltest/include/dapl_quit_cmd.h |1 + 3 files changed, 11 insertions(+), 1 deletions(-) diff --git a/test/dapltest/cmd/dapl_params.c b/test/dapltest/cmd/dapl_params.c index e7a2006..f038324 100644 --- a/test/dapltest/cmd/dapl_params.c +++ b/test/dapltest/cmd/dapl_params.c @@ -199,6 +199,7 @@ bool DT_Params_Parse(int argc, char *argv[], Params_t * params_ptr) params_ptr-ReliabilityLevel = Quit_Cmd-ReliabilityLevel; params_ptr-debug = Quit_Cmd-debug; + params_ptr-server_port = Quit_Cmd-port; DT_NetAddrLookupHostAddress(params_ptr-server_netaddr, Quit_Cmd-server_name); break; diff --git a/test/dapltest/cmd/dapl_quit_cmd.c b/test/dapltest/cmd/dapl_quit_cmd.c index d8536a7..d8930e4 100644 --- a/test/dapltest/cmd/dapl_quit_cmd.c +++ b/test/dapltest/cmd/dapl_quit_cmd.c @@ -35,6 +35,7 @@ void DT_Quit_Cmd_Init(Quit_Cmd_t * cmd) { memset((void *)cmd, 0, sizeof(Quit_Cmd_t)); cmd-ReliabilityLevel = DAT_QOS_BEST_EFFORT; + cmd-port = SERVER_PORT_NUMBER; } /*- */ @@ -45,7 +46,7 @@ DT_Quit_Cmd_Parse(Quit_Cmd_t * cmd, int c; for (;;) { - c = DT_mygetopt_r(my_argc, my_argv, ds:D:R:, opts); + c = DT_mygetopt_r(my_argc, my_argv, ds:D:R:n, opts); if (c == EOF) { break; } @@ -72,6 +73,11 @@ DT_Quit_Cmd_Parse(Quit_Cmd_t * cmd, DT_ParseQoS(opts-optarg); break; } + case 'n': + { + cmd-port = atoi(opts-optarg); + break; + } case '?': default: { @@ -113,6 +119,7 @@ void DT_Quit_Cmd_Usage(void) DT_Mdep_printf(USAGE: QUIT TEST \n); DT_Mdep_printf(USAGE: dapltest -T Q\n); DT_Mdep_printf(USAGE: -s server Name\n); + DT_Mdep_printf(USAGE: -n server port number\n); DT_Mdep_printf(USAGE: [-D device Name]\n); DT_Mdep_printf(USAGE: [-d] : debug (zero)\n); DT_Mdep_printf(USAGE: [-R service reliability]\n); @@ -129,4 +136,5 @@ void DT_Quit_Cmd_Print(Quit_Cmd_t * cmd) { DT_Mdep_printf(Quit_Cmd.server_name: %s\n, cmd-server_name); DT_Mdep_printf(Quit_Cmd.device_name: %s\n, cmd-device_name); + DT_Mdep_printf(Quit_Cmd.port: %s\n, cmd-port); } diff --git a/test/dapltest/include/dapl_quit_cmd.h b/test/dapltest/include/dapl_quit_cmd.h index 8aba24e..8640541 100644 --- a/test/dapltest/include/dapl_quit_cmd.h +++ b/test/dapltest/include/dapl_quit_cmd.h @@ -38,6 +38,7 @@ typedef struct chardevice_name[256]; /* -D */ DAT_UINT32 debug; /* -d */ DAT_QOS ReliabilityLevel; /* -R */ +DAT_CONN_QUAL port; /* -n */ } Quit_Cmd_t; #pragma pack () -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 1/4] NULL undeclared on Fedora
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/include/dat2/dat_platform_specific.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/dat/include/dat2/dat_platform_specific.h b/dat/include/dat2/dat_platform_specific.h index ba4cfbc..8d62bd0 100644 --- a/dat/include/dat2/dat_platform_specific.h +++ b/dat/include/dat2/dat_platform_specific.h @@ -147,6 +147,7 @@ typedef DAT_UINT64 DAT_PADDR; #if defined(__KERNEL__) #include linux/types.h #else +#include stdio.h #include sys/types.h #include linux/stddef.h #endif /* defined(__KERNEL__) */ -- 1.7.3
[PATCH 2/4] dat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- doc/dat.conf | 36 +++- 1 files changed, 27 insertions(+), 9 deletions(-) diff --git a/doc/dat.conf b/doc/dat.conf index 60fb211..ad6ab05 100644 --- a/doc/dat.conf +++ b/doc/dat.conf @@ -32,12 +32,30 @@ ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth3 0 ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 1 ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 2 -OpenIB-cma u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 ib0 0 -OpenIB-cma-1 u1.2 nonthreadsafe default libdaplcma.so.1 dapl.1.2 ib1 0 -OpenIB-mthca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 mthca0 1 -OpenIB-mthca0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 mthca0 2 -OpenIB-mlx4_0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 mlx4_0 1 -OpenIB-mlx4_0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 mlx4_0 2 -OpenIB-ipath0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 ipath0 1 -OpenIB-ipath0-2 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 ipath0 2 -OpenIB-ehca0-1 u1.2 nonthreadsafe default libdaplscm.so.1 dapl.1.2 ehca0 1 +ofa-v2-mcm-1 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_0 1 +ofa-v2-mcm-2 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_0 2 +ofa-v2-scif0 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 scif0 1 +ofa-v2-scif0-u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 scif0 1 +ofa-v2-mic0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 mic0:ib 1 +ofa-v2-mlx4_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 1 +ofa-v2-mlx4_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 2 +ofa-v2-mlx4_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_1 1 +ofa-v2-mlx4_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_1 2 +ofa-v2-mlx4_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_1 1 +ofa-v2-mlx4_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_1 2 +ofa-v2-mlx4_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_0 1 +ofa-v2-mlx4_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_0 2 +ofa-v2-mlx4_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_1 1 +ofa-v2-mlx4_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx4_1 2 +ofa-v2-mlx5_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx5_0 1 +ofa-v2-mlx5_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx5_0 2 +ofa-v2-mlx5_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx5_1 1 +ofa-v2-mlx5_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx5_1 2 +ofa-v2-mlx5_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx5_0 1 +ofa-v2-mlx5_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx5_0 2 +ofa-v2-mlx5_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx5_1 1 +ofa-v2-mlx5_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx5_1 2 +ofa-v2-mlx5_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx5_0 1 +ofa-v2-mlx5_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx5_0 2 +ofa-v2-mlx5_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx5_1 1 +ofa-v2-mlx5_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 mlx5_1 2 -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 4/4] dist: ib collective extension include files missing
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/Makefile.am b/Makefile.am index 7441348..47a9d9c 100755 --- a/Makefile.am +++ b/Makefile.am @@ -548,6 +548,8 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dapl/include/dapl_vendor.h \ dapl/openib_common/dapl_ib_dto.h \ dapl/openib_common/dapl_ib_common.h \ +dapl/openib_common/collectives/ib_collectives.h \ +dapl/openib_common/collectives/fca_provider.h \ dapl/openib_cma/dapl_ib_util.h \ dapl/openib_cma/linux/openib_osd.h \ dapl/openib_scm/dapl_ib_util.h \ @@ -590,7 +592,7 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ test/dapltest/include/dapl_transaction_stats.h \ test/dapltest/include/dapl_transaction_test.h \ test/dapltest/include/dapl_version.h \ -test/dapltest/mdep/linux/dapl_mdep_user.h $(XHEADERS) +test/dapltest/mdep/linux/dapl_mdep_user.h dist-hook: dapl.spec cp dapl.spec $(distdir) -- 1.7.3
[PATCH] DAPL v2.0: dapltest: fix endian swap issue with performance test
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/test/dapl_performance_client.c |2 +- test/dapltest/test/dapl_performance_server.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test/dapltest/test/dapl_performance_client.c b/test/dapltest/test/dapl_performance_client.c index 96d5b47..169a212 100644 --- a/test/dapltest/test/dapl_performance_client.c +++ b/test/dapltest/test/dapl_performance_client.c @@ -427,7 +427,7 @@ DT_Performance_Test_Client_Exchange(Params_t * params_ptr, * we pass to the other side. The other side cannot (and * better not) interpret these values. */ - if (DT_local_is_little_endian != test_ptr-is_remote_little_endian) { + if (DT_local_is_little_endian !test_ptr-is_remote_little_endian) { rmi-rmr_context = DT_EndianMemHandle(rmi-rmr_context); rmi-mem_address.as_64 = DT_EndianMemAddress(rmi-mem_address.as_64); diff --git a/test/dapltest/test/dapl_performance_server.c b/test/dapltest/test/dapl_performance_server.c index 5083967..475a5fe 100644 --- a/test/dapltest/test/dapl_performance_server.c +++ b/test/dapltest/test/dapl_performance_server.c @@ -328,7 +328,7 @@ DT_Performance_Test_Server_Exchange(DT_Tdep_Print_Head * phead, * we pass to the other side. The other side cannot (and * better not) interpret these values. */ - if (DT_local_is_little_endian != test_ptr-is_remote_little_endian) { + if (DT_local_is_little_endian !test_ptr-is_remote_little_endian) { rmi-rmr_context = DT_EndianMemHandle(rmi-rmr_context); rmi-mem_address.as_64 = DT_EndianMemAddress(rmi-mem_address.as_64); -- 1.7.3
[PATCH] DAPL v2.0: SCM: getifaddrs modfications for better out of the box experience
socket cm will now walk list of interfaces and ignore loopback and ignore IB devices, unless the IB netdev is the only device. Works better in a heterogeneous environment with a mix of net device. Tested with br0, mic0, and mic0:ib netdev mixes. Overriding with DAPL_SCM_NETDEV still works as is. Signed-off-by: Patrick Mccormick patrick.m.mccorm...@intel.com Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/util.c | 70 +++- 1 files changed, 37 insertions(+), 33 deletions(-) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index 8b97263..20fb8b2 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -28,6 +28,7 @@ #include dapl_osd.h #include stdlib.h +#include ifaddrs.h int g_dapl_loopback_connection = 0; @@ -148,7 +149,6 @@ int getipaddr_netdev(char *name, char *addr, int addr_len) /* Fill in the structure */ snprintf(ifr.ifr_name, IFNAMSIZ, %s, name); - ifr.ifr_hwaddr.sa_family = ARPHRD_INFINIBAND; /* Create a socket fd */ skfd = socket(PF_INET, SOCK_STREAM, 0); @@ -178,51 +178,54 @@ int getipaddr_netdev(char *name, char *addr, int addr_len) return ret; } -DAT_RETURN getlocalipaddr(char *addr, int addr_len) +/* IPv4 only, use IB if netdev set or it's the only interface */ +DAT_RETURN getlocalipaddr (char *addr, int addr_len) { - struct sockaddr_in *sin; - int ret, skfd, i; + struct ifaddrs *ifap, *ifa; + int ret, found=0, ib_ok=0; char *netdev = getenv(DAPL_SCM_NETDEV); - struct ifreq ifr[10]; - struct ifconf ifc; - /* use provided netdev instead of default hostname */ if (netdev != NULL) { ret = getipaddr_netdev(netdev, addr, addr_len); if (ret) { - dapl_log(DAPL_DBG_TYPE_ERR, - getlocalipaddr: NETDEV = %s - but not configured on system? ERR = %s\n, -netdev, strerror(ret)); - return dapl_convert_errno(ret, getlocalipaddr); - } else + dapl_log(DAPL_DBG_TYPE_ERR, ERR: NETDEV = %s + but not configured on system?\n, netdev); + return dapl_convert_errno(errno, getlocalipaddr); + } else { + dapl_log(DAPL_DBG_TYPE_UTIL, my_addr %s NETDEV = %s\n, +inet_ntoa(((struct sockaddr_in *)addr)-sin_addr), +netdev); return DAT_SUCCESS; + } } - if (addr_len sizeof(*sin)) - return DAT_INTERNAL_ERROR; - - memset(ifc,0,sizeof(ifc)); - ifc.ifc_buf = (char *)ifr; - ifc.ifc_len = sizeof(ifr); - - skfd = socket(PF_INET, SOCK_STREAM, 0); - ret = ioctl(skfd, SIOCGIFCONF, ifc); - if (ret) - goto bail; + if ((ret = getifaddrs (ifap))) + return dapl_convert_errno(errno, getifaddrs); - /* first non-loopback interface in list */ - for (i=0; i ifc.ifc_len/sizeof(struct ifreq); i++) { - if (strcmp(ifr[i].ifr_name, lo)) - break; +retry: + for (ifa = ifap; ifa; ifa = ifa-ifa_next) { + if (ifa-ifa_addr-sa_family == AF_INET) { + if (!found !(ifa-ifa_flags IFF_LOOPBACK) + ((!ib_ok dapl_os_pstrcmp(ib, ifa-ifa_name)) || +(ib_ok !dapl_os_pstrcmp(ib, ifa-ifa_name { + memcpy(addr, ifa-ifa_addr, sizeof(struct sockaddr_in)); + found++; + } + dapl_log(DAPL_DBG_TYPE_UTIL, + getifaddrs: %s - %s\n, ifa-ifa_name, +inet_ntoa(((struct sockaddr_in *)ifa-ifa_addr)-sin_addr)); + } + } + if (!found !ib_ok) { + ib_ok = 1; + goto retry; } - memcpy(addr, ifr[i].ifr_addr, sizeof(struct sockaddr_in)); + dapl_log(DAPL_DBG_TYPE_UTIL, my_addr %s\n, +inet_ntoa(((struct sockaddr_in *)addr)-sin_addr)); -bail: - close(skfd); - return dapl_convert_errno(ret, getlocalipaddr); + freeifaddrs(ifap); + return (found ? DAT_SUCCESS:DAT_INVALID_ADDRESS); } - #endif enum ibv_mtu dapl_ib_mtu(int mtu) @@ -811,3 +814,4 @@ ib_cm_events_t dapls_ib_get_cm_event(IN DAT_EVENT_NUMBER dat_event_num) } return ib_cm_event; } + -- 1.7.3
[PATCH] DAPL v2.0: ucm, scm: UD mode triggers list_head assert with large scale alltoall test
1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode. CR cleanup was implemented with EP to CR references still linked. During cr_accept, the CR remote_ia_address is linked to EP object by mistake with UD mode. UD mode my have multiple CRs per EP so no direct mappings to CR memory can exist. Only with RC mode which always has one EP to CR mapping. In scm, ucm: for CM object free with CR references the search and unlinking from SP must be under SP lock to serialize. Also, change cleanup thread wakeup logic to only trigger the thread if reference count indicates the need for more processing. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_cr_accept.c | 10 ++ dapl/openib_scm/cm.c | 14 +++--- dapl/openib_ucm/cm.c | 26 +- dapl/openib_ucm/dapl_ib_util.h |1 + 4 files changed, 27 insertions(+), 24 deletions(-) diff --git a/dapl/common/dapl_cr_accept.c b/dapl/common/dapl_cr_accept.c index 5df9458..4e48fea 100644 --- a/dapl/common/dapl_cr_accept.c +++ b/dapl/common/dapl_cr_accept.c @@ -180,11 +180,13 @@ dapl_cr_accept(IN DAT_CR_HANDLE cr_handle, entry_ep_state = ep_ptr-param.ep_state; entry_ep_handle = cr_ptr-param.local_ep_handle; ep_ptr-param.ep_state = DAT_EP_STATE_COMPLETION_PENDING; - ep_ptr-cr_ptr = cr_ptr; - ep_ptr-param.remote_ia_address_ptr = - cr_ptr-param.remote_ia_address_ptr; - cr_ptr-param.local_ep_handle = ep_handle; + /* UD supports multiple CR's per EP, provider will manage CR's */ + if (ep_ptr-param.ep_attr.service_type == DAT_SERVICE_TYPE_RC) { + ep_ptr-cr_ptr = cr_ptr; + ep_ptr-param.remote_ia_address_ptr = cr_ptr-param.remote_ia_address_ptr; + } + cr_ptr-param.local_ep_handle = ep_handle; dapl_os_unlock(ep_ptr-header.lock); dat_status = dapls_ib_accept_connection(cr_handle, diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index d4964bd..f7838f2 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -439,25 +439,25 @@ void dapls_cm_free(dp_ib_cm_handle_t cm_ptr) cm_ptr, dapl_cm_state_str(cm_ptr-state), cm_ptr-ep, sp_ptr, cm_ptr-ref_count); + dapl_os_lock(cm_ptr-lock); if (sp_ptr cm_ptr-state == DCM_CONNECTED cm_ptr-msg.daddr.ib.qp_type == IBV_QPT_UD) { - DAPL_CR *cr_ptr = dapl_sp_search_cr(sp_ptr, cm_ptr); + DAPL_CR *cr_ptr; + + dapl_os_lock(sp_ptr-header.lock); + cr_ptr = dapl_sp_search_cr(sp_ptr, cm_ptr); if (cr_ptr != NULL) { - dapl_os_lock(sp_ptr-header.lock); dapl_sp_remove_cr(sp_ptr, cr_ptr); - dapl_os_unlock(sp_ptr-header.lock); dapls_cr_free(cr_ptr); } + dapl_os_unlock(sp_ptr-header.lock); } /* free from internal workq, wait until EP is last ref */ - dapl_os_lock(cm_ptr-lock); cm_ptr-state = DCM_FREE; - dapl_os_unlock(cm_ptr-lock); - dapli_cm_thread_signal(cm_ptr); - dapl_os_lock(cm_ptr-lock); if (cm_ptr-ref_count != 1) { + dapli_cm_thread_signal(cm_ptr); dapl_os_unlock(cm_ptr-lock); dapl_os_wait_object_wait(cm_ptr-event, DAT_TIMEOUT_INFINITE); dapl_os_lock(cm_ptr-lock); diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 05cff10..d6f923e 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -779,19 +779,19 @@ void dapli_cm_free(dp_ib_cm_handle_t cm) cm, dapl_cm_state_str(cm-state), cm-ep, sp_ptr, sp_ptr ? sp_ptr-cr_list_count:0, cm-ref_count); + dapl_os_lock(cm-lock); if (sp_ptr cm-state == DCM_CONNECTED cm-msg.daddr.ib.qp_type == IBV_QPT_UD) { - DAPL_CR *cr_ptr = dapl_sp_search_cr(sp_ptr, cm); - dapl_log(DAPL_DBG_TYPE_CM, dapli_cm_free: UD CR %p\n, cr_ptr); - if (cr_ptr != NULL) { - dapl_os_lock(sp_ptr-header.lock); - dapl_sp_remove_cr(sp_ptr, cr_ptr); - dapl_os_unlock(sp_ptr-header.lock); - dapls_cr_free(cr_ptr); + dapl_os_lock(sp_ptr-header.lock); + cm-cr = dapl_sp_search_cr(sp_ptr, cm); + dapl_log(DAPL_DBG_TYPE_CM, dapli_cm_free: UD CR %p\n, cm-cr); + + if (cm-cr != NULL) { + dapl_sp_remove_cr(sp_ptr, cm-cr); + /* free CR at EP destroy */ } + dapl_os_unlock(sp_ptr-header.lock); } - - dapl_os_lock(cm-lock); cm-state = DCM_FREE; dapls_thread_signal(cm-hca-ib_trans.signal); dapl_os_unlock(cm-lock); @@ -809,12 +809,12 @@ void dapls_cm_free(dp_ib_cm_handle_t cm)
[ANNOUNCE] dapl-2.0.39
Rupert, Please pull new dapl-2.0.39 package into OFED 3.5-2 RC2 Thanks, Arlin -- Latest Packages (see ChangeLog for recent changes): md5sum: 9858bd36c4c21846a9c8a72bc0ad1339 dapl-2.0.39.tar.gz Install RPM packages as follow: dapl-2.0.39-1 dapl-utils-2.0.39-1 dapl-devel-2.0.39-1 dapl-debuginfo-2.0.39-1 Release 2.0.39 fixes (OFED 3.5-2) dapltest: fix endian swap issue with performance test SCM: getifaddrs modfications for better out of the box experience ucm, scm: UD mode triggers list_head assert with large scale alltoall test -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.0.38
Rupert, please pull new dapl-2.0.38 released package into OFED 3.5.2 Thanks, Arlin -- Latest Packages (see ChangeLog for recent changes): md5sum: 21b933fb24ed86d5c5413d9a269f913d dapl-2.0.38.tar.gz For v2.0 package install RPM packages as follow: dapl-2.0.38-1 dapl-utils-2.0.38-1 dapl-devel-2.0.38-1 dapl-debuginfo-2.0.38-1 Summary of v2.0 changes: Release 2.0.38 fixes (OFED 3.5.2) dapltest: add -n parameter to override default server port number (45278) ucm,scm: UD mode creates many CR objects per EP that needs cleaned up cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.0.37
Rupert/Vlad, please pull this package into OFED 3.5.2 Thanks, Arlin -- Latest Packages (see ChangeLog for recent changes): md5sum: 2e185e1aac2c09b3d9e529ee1aa1669e dapl-2.0.37.tar.gz For v2.0 package install RPM packages as follow: dapl-2.0.37-1 dapl-utils-2.0.37-1 dapl-devel-2.0.37-1 dapl-debuginfo-2.0.37-1 Summary of v2.0 changes: Release 2.0.37 fixes (OFED 3.5.2): common: add support for ia name during dat_ia_query common: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines. dapltest: ppc64 endian issue with exchanged mem handle and address -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] uDAPL: common: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines
Signed-off-by: Pradeep Satyanarayana prad...@lus.ibm.com Acked-by: Arlin Davis arlin.r.da...@intel.com --- dapl/udapl/linux/dapl_osd.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dapl/udapl/linux/dapl_osd.h b/dapl/udapl/linux/dapl_osd.h index 7198439..0412461 100644 --- a/dapl/udapl/linux/dapl_osd.h +++ b/dapl/udapl/linux/dapl_osd.h @@ -188,7 +188,7 @@ dapl_os_atomic_inc ( stwcx. %0,0,%2\n\ bne-1b : =r (tmp), +m (v) - : r (v) + : b (v) : cc); #else /* !__ia64__ */ __asm__ __volatile__ ( @@ -227,7 +227,7 @@ dapl_os_atomic_dec ( stwcx. %0,0,%2\n\ bne-1b : =r (tmp), +m (v) - : r (v) + : b (v) : cc); #else /* !__ia64__ */ __asm__ __volatile__ ( -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 1/3] uDAPL: dapltest: ppc64 endian issue with exchanged mem handle and address
Signed-off-by: Pradeep Satyanarayana prad...@lus.ibm.com Acked-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/common/dapl_endian.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/test/dapltest/common/dapl_endian.c b/test/dapltest/common/dapl_endian.c index d93fbb9..c77f2f2 100644 --- a/test/dapltest/common/dapl_endian.c +++ b/test/dapltest/common/dapl_endian.c @@ -77,8 +77,6 @@ DAT_UINT64 DT_Endian64(DAT_UINT64 val) DAT_UINT32 DT_EndianMemHandle(DAT_UINT32 val) { - if (DT_local_is_little_endian) - return val; val = ((val c1a32) 8) | ((val c1b32) 8); val = ((val c2a32) 16) | ((val c2b32) 16); return (val); @@ -88,8 +86,6 @@ DAT_UINT64 DT_EndianMemAddress(DAT_UINT64 val) { DAT_UINT64 val64; - if (DT_local_is_little_endian) - return val; val64 = val; val64 = ((val64 c1a64) 8) | ((val64 c1b64) 8); val64 = ((val64 c2a64) 16) | ((val64 c2b64) 16); -- 1.7.3
[PATCH 3/3] uDAPL common: add support for ia name during dat_ia_query
the device name was not being updated during a query. Copy the hca name into ia_attr-adapter_name for consumers. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/util.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index 33629b8..8b97263 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -310,6 +310,9 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA * hca_ptr, if (ia_attr != NULL) { (void)dapl_os_memzero(ia_attr, sizeof(*ia_attr)); + strncpy(ia_attr-adapter_name, + ibv_get_device_name(hca_ptr-ib_trans.ib_dev), + DAT_NAME_MAX_LENGTH - 1); ia_attr-adapter_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-vendor_name[DAT_NAME_MAX_LENGTH - 1] = '\0'; ia_attr-ia_address_ptr = @@ -317,7 +320,7 @@ DAT_RETURN dapls_ib_query_hca(IN DAPL_HCA * hca_ptr, dapl_dbg_log(DAPL_DBG_TYPE_UTIL, query_hca: %s %s \n, -ibv_get_device_name(hca_ptr-ib_trans.ib_dev), +ia_attr-adapter_name, inet_ntoa(((struct sockaddr_in *) hca_ptr-hca_address)-sin_addr)); -- 1.7.3
RE: Dapltest test error DAT_CONN_QUAL_IN_USE
http://openfabrics.org/bugzilla/index.cgi -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Vipul Pandya Sent: Friday, November 30, 2012 7:12 AM To: Davis, Arlin R Cc: Steve Wise; linux-rdma@vger.kernel.org; Kumar A S; Abhishek Agrawal; Divy Le Ray Subject: Re: Dapltest test error DAT_CONN_QUAL_IN_USE Arlin, Can you please refer to which bugzilla I should log a bug? Can you please provide me the url? Thanks, Vipul On 30-11-2012 05:21, Davis, Arlin R wrote: Vipul, Can you submit a bug in bugzilla for tracking? I will try to get to this next couple of days. -arlin -Original Message- From: Vipul Pandya [mailto:vi...@chelsio.com] Sent: Thursday, November 29, 2012 5:34 AM To: Davis, Arlin R Cc: Steve Wise; linux-rdma@vger.kernel.org; Kumar A S; Abhishek Agrawal; Divy Le Ray Subject: Re: Dapltest test error DAT_CONN_QUAL_IN_USE Hi Arlin, This issue is happening because there is a port collision between dapltest server port space and host TCP stack. The port collision happens because rdma_bind_addr is getting called from the two different places with different port arguments from dapltest. rdma_bind_addr is getting called from the following two places: 1. Once it is getting called from dapls_ib_setup_conn_listener function with starting port as 45278. Based on number of threads and eps, in subsequent call of dapls_ib_setup_conn_listener this port number will keep getting incremented. 2. 2nd time it is getting called from dapls_ib_qp_alloc function with port number as always 0. Now, when rdma_bind_addr gets called with port number 0 it will allocate any free random port number. Then when dapls_ib_setup_conn_listener calls the rdma_bind_addr with fix port number which is already allocate via dapls_ib_qp_alloc function rdma_bind_addr will return EADDRINUSE error, which in turn will result in DAT_CONN_QUAL_IN_USE error. I think solution here would be to call rdma_bind_addr from both the location passing port number from the same port range. Please let me know your thoughts on this. Our testing has been blocked because of this issue. We would like to get this fixed. Please let us know if we need to log a bug anywhere for this. Thanks, Vipul On 27-11-2012 01:24, Steve Wise wrote: Perhaps the port is in use by the host TCP stack? On 11/26/2012 1:30 PM, Davis, Arlin R wrote: dapltest server will start with port 45278 and increase by client thread count during each new client connection. If you never restart the server it will continue to increase the listen port based on new clients connecting. If you restart dapltest it will restart back at port 45278. I am not familiar with iWarp CM but the error is coming from rdma_bind_addr (EADDRINUSE|EBUSY|EADDRNOTAVAIL). I will have to defer to Steve for this error. -arlin -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Vipul Pandya Sent: Friday, November 23, 2012 5:54 AM To: linux-rdma@vger.kernel.org Cc: Kumar A S; Steve Wise; Abhishek Agrawal; Davis, Arlin R; Divy Le Ray Subject: Dapltest test error DAT_CONN_QUAL_IN_USE Hi All, I was running dapltest between my client and server machines with OFED- 3.5. While running the test it dapltest server throws an error DAT_CONN_QUAL_IN_USE if I increase number of threads and endpoints. Dapltest server: --- dapltest -T S -D chelsio1 Dapltest client: --- dapltest -T T -s 102.1.1.2 -D chelsio1 -R BE -i 1 -t 16 -w 8 server SR 8192 4 client SR 8192 4 Once I run the above test i get the following error on server side and client side stalls. $# dapltest -T S -D chelsio1 Dapltest: Service Point Ready - chelsio1 Test[b13f]: dat_psp_create #6 error: DAT_CONN_QUAL_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #0 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #1 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #2 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #3 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #4 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE
RE: Dapltest test error DAT_CONN_QUAL_IN_USE
Vipul, Can you submit a bug in bugzilla for tracking? I will try to get to this next couple of days. -arlin -Original Message- From: Vipul Pandya [mailto:vi...@chelsio.com] Sent: Thursday, November 29, 2012 5:34 AM To: Davis, Arlin R Cc: Steve Wise; linux-rdma@vger.kernel.org; Kumar A S; Abhishek Agrawal; Divy Le Ray Subject: Re: Dapltest test error DAT_CONN_QUAL_IN_USE Hi Arlin, This issue is happening because there is a port collision between dapltest server port space and host TCP stack. The port collision happens because rdma_bind_addr is getting called from the two different places with different port arguments from dapltest. rdma_bind_addr is getting called from the following two places: 1. Once it is getting called from dapls_ib_setup_conn_listener function with starting port as 45278. Based on number of threads and eps, in subsequent call of dapls_ib_setup_conn_listener this port number will keep getting incremented. 2. 2nd time it is getting called from dapls_ib_qp_alloc function with port number as always 0. Now, when rdma_bind_addr gets called with port number 0 it will allocate any free random port number. Then when dapls_ib_setup_conn_listener calls the rdma_bind_addr with fix port number which is already allocate via dapls_ib_qp_alloc function rdma_bind_addr will return EADDRINUSE error, which in turn will result in DAT_CONN_QUAL_IN_USE error. I think solution here would be to call rdma_bind_addr from both the location passing port number from the same port range. Please let me know your thoughts on this. Our testing has been blocked because of this issue. We would like to get this fixed. Please let us know if we need to log a bug anywhere for this. Thanks, Vipul On 27-11-2012 01:24, Steve Wise wrote: Perhaps the port is in use by the host TCP stack? On 11/26/2012 1:30 PM, Davis, Arlin R wrote: dapltest server will start with port 45278 and increase by client thread count during each new client connection. If you never restart the server it will continue to increase the listen port based on new clients connecting. If you restart dapltest it will restart back at port 45278. I am not familiar with iWarp CM but the error is coming from rdma_bind_addr (EADDRINUSE|EBUSY|EADDRNOTAVAIL). I will have to defer to Steve for this error. -arlin -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Vipul Pandya Sent: Friday, November 23, 2012 5:54 AM To: linux-rdma@vger.kernel.org Cc: Kumar A S; Steve Wise; Abhishek Agrawal; Davis, Arlin R; Divy Le Ray Subject: Dapltest test error DAT_CONN_QUAL_IN_USE Hi All, I was running dapltest between my client and server machines with OFED- 3.5. While running the test it dapltest server throws an error DAT_CONN_QUAL_IN_USE if I increase number of threads and endpoints. Dapltest server: --- dapltest -T S -D chelsio1 Dapltest client: --- dapltest -T T -s 102.1.1.2 -D chelsio1 -R BE -i 1 -t 16 -w 8 server SR 8192 4 client SR 8192 4 Once I run the above test i get the following error on server side and client side stalls. $# dapltest -T S -D chelsio1 Dapltest: Service Point Ready - chelsio1 Test[b13f]: dat_psp_create #6 error: DAT_CONN_QUAL_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #0 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #1 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #2 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #3 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #4 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #5 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #6 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Following link says DAT_CONN_QUAL_IN_USE error can come if rdma_cm returns an error due to bind failure. http://www.mail-archive.com/linux- r...@vger.kernel.org/msg01297.html rdma_cm from OFED-3.5 does not provide module parameter 'unify_tcp_port_space'. So, just
RE: Dapltest test error DAT_CONN_QUAL_IN_USE
dapltest server will start with port 45278 and increase by client thread count during each new client connection. If you never restart the server it will continue to increase the listen port based on new clients connecting. If you restart dapltest it will restart back at port 45278. I am not familiar with iWarp CM but the error is coming from rdma_bind_addr (EADDRINUSE|EBUSY|EADDRNOTAVAIL). I will have to defer to Steve for this error. -arlin -Original Message- From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- ow...@vger.kernel.org] On Behalf Of Vipul Pandya Sent: Friday, November 23, 2012 5:54 AM To: linux-rdma@vger.kernel.org Cc: Kumar A S; Steve Wise; Abhishek Agrawal; Davis, Arlin R; Divy Le Ray Subject: Dapltest test error DAT_CONN_QUAL_IN_USE Hi All, I was running dapltest between my client and server machines with OFED- 3.5. While running the test it dapltest server throws an error DAT_CONN_QUAL_IN_USE if I increase number of threads and endpoints. Dapltest server: --- dapltest -T S -D chelsio1 Dapltest client: --- dapltest -T T -s 102.1.1.2 -D chelsio1 -R BE -i 1 -t 16 -w 8 server SR 8192 4 client SR 8192 4 Once I run the above test i get the following error on server side and client side stalls. $# dapltest -T S -D chelsio1 Dapltest: Service Point Ready - chelsio1 Test[b13f]: dat_psp_create #6 error: DAT_CONN_QUAL_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #0 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #1 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #2 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #3 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #4 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #5 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b13f]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b13f]: Warning: dat_ep_disconnect (abrupt) #6 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Following link says DAT_CONN_QUAL_IN_USE error can come if rdma_cm returns an error due to bind failure. http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg01297.html rdma_cm from OFED-3.5 does not provide module parameter 'unify_tcp_port_space'. So, just to narrow down I installed OFED- 1.5.4.1 and ran the same test with unify_tcp_port_space=1. However with that also I was able to reproduced the same issue. Please note that if I decrease the numbers of endpoints to 4 then test works fine. i.e. If I give '-w 4' instead of '-w 8' in command line then test runs fine. I am using dapltest version 2.0.36 which comes from OFED-3.5. Can anyone give any pointers on this? Thanks, Vipul -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/7] windows: new version of getlocalipaddr not portable
revert to the original getaddrinfo method for windows Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/util.c | 144 + 1 files changed, 105 insertions(+), 39 deletions(-) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index 06a6f3d..33629b8 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -75,6 +75,69 @@ release: return hr; } +DAT_RETURN getlocalipaddr(char *addr, int addr_len) +{ + struct sockaddr_in *sin; + struct addrinfo *res, hint, *ai; + int ret; + char hostname[256]; + char *netdev = getenv(DAPL_SCM_NETDEV); + +retry: + /* use provided netdev instead of default hostname */ + if (netdev != NULL) { + ret = getipaddr_netdev(netdev, addr, addr_len); + if (ret) { + dapl_log(DAPL_DBG_TYPE_ERR, + getlocalipaddr: NETDEV = %s + but not configured on system? ERR = %s\n, +netdev, strerror(ret)); + return dapl_convert_errno(ret, getlocalipaddr); + } else + return DAT_SUCCESS; + } + + if (addr_len sizeof(*sin)) { + return DAT_INTERNAL_ERROR; + } + + ret = gethostname(hostname, 256); + if (ret) + return dapl_convert_errno(ret, gethostname); + + memset(hint, 0, sizeof hint); + hint.ai_flags = AI_PASSIVE; + hint.ai_family = AF_INET; + hint.ai_socktype = SOCK_STREAM; + hint.ai_protocol = IPPROTO_TCP; + + ret = getaddrinfo(hostname, NULL, hint, res); + if (ret) { + dapl_log(DAPL_DBG_TYPE_ERR, + getaddrinfo ERR: %d %s\n, ret, gai_strerror(ret)); + return DAT_INVALID_ADDRESS; + } + + ret = DAT_INVALID_ADDRESS; + for (ai = res; ai; ai = ai-ai_next) { + sin = (struct sockaddr_in *)ai-ai_addr; + if (*((uint32_t *) sin-sin_addr) != htonl(0x7f01)) { + *((struct sockaddr_in *)addr) = *sin; + ret = DAT_SUCCESS; + break; + } + } + + freeaddrinfo(res); + + /* only loopback found, retry netdev eth0 */ + if (ret == DAT_INVALID_ADDRESS) { + netdev = eth0; + goto retry; + } + + return ret; +} #else // _WIN64 || WIN32 /* Get IP address using network device name */ @@ -114,43 +177,6 @@ int getipaddr_netdev(char *name, char *addr, int addr_len) close(skfd); return ret; } -#endif - -enum ibv_mtu dapl_ib_mtu(int mtu) -{ - switch (mtu) { - case 256: - return IBV_MTU_256; - case 512: - return IBV_MTU_512; - case 1024: - return IBV_MTU_1024; - case 2048: - return IBV_MTU_2048; - case 4096: - return IBV_MTU_4096; - default: - return IBV_MTU_1024; - } -} - -char *dapl_ib_mtu_str(enum ibv_mtu mtu) -{ - switch (mtu) { - case IBV_MTU_256: - return 256; - case IBV_MTU_512: - return 512; - case IBV_MTU_1024: - return 1024; - case IBV_MTU_2048: - return 2048; - case IBV_MTU_4096: - return 4096; - default: - return 1024; - } -} DAT_RETURN getlocalipaddr(char *addr, int addr_len) { @@ -163,13 +189,13 @@ DAT_RETURN getlocalipaddr(char *addr, int addr_len) /* use provided netdev instead of default hostname */ if (netdev != NULL) { ret = getipaddr_netdev(netdev, addr, addr_len); - if (ret) { + if (ret) { dapl_log(DAPL_DBG_TYPE_ERR, getlocalipaddr: NETDEV = %s but not configured on system? ERR = %s\n, netdev, strerror(ret)); return dapl_convert_errno(ret, getlocalipaddr); - } else + } else return DAT_SUCCESS; } @@ -197,6 +223,46 @@ bail: return dapl_convert_errno(ret, getlocalipaddr); } +#endif + +enum ibv_mtu dapl_ib_mtu(int mtu) +{ + switch (mtu) { + case 256: + return IBV_MTU_256; + case 512: + return IBV_MTU_512; + case 1024: + return IBV_MTU_1024; + case 2048: + return IBV_MTU_2048; + case 4096: + return IBV_MTU_4096; + default: + return IBV_MTU_1024; + } +} + +char *dapl_ib_mtu_str(enum ibv_mtu mtu) +{ + switch (mtu) { + case IBV_MTU_256: + return 256; + case
[PATCH 3/7] ucm: record and silently drop a duplicate reject CM message
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 357dbf7..4e6c527 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -415,6 +415,12 @@ static void ucm_process_recv(ib_hca_transport_t *tp, } dapl_os_unlock(cm-lock); break; + case DCM_REJECTED: + if (ntohs(msg-op) == DCM_REJ_USER) { + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_USER_REJ_RX); + dapl_os_unlock(cm-lock); + break; + } default: dapl_log(DAPL_DBG_TYPE_WARN, ucm_recv: Warning, UNKNOWN state -- 1.7.3
[PATCH 4/7] dat.conf: keep list of providers in order for backward compatibility
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- doc/dat.conf |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/dat.conf b/doc/dat.conf index 0b020ba..60fb211 100644 --- a/doc/dat.conf +++ b/doc/dat.conf @@ -14,8 +14,6 @@ # For uDAPL iWARP provider, ia_params is netdev device name and 0 # For uDAPL RoCE provider, ia_params is device name and 0 # -ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_0 1 -ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_0 2 ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 1 ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 mlx4_0 2 ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 ib0 0 @@ -26,6 +24,8 @@ ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 ipath0 1 ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 ipath0 2 ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 ehca0 1 ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 +ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_0 1 +ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mlx4_0 2 ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mthca0 1 ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 mthca0 2 ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 eth2 0 -- 1.7.3
[PATCH 1/7] dapltest: DFLT_QLEN is defined in multiple tests
Patch set for bug fixes. add #ifdef checking in transaction test. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/test/dapl_transaction_test.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/test/dapltest/test/dapl_transaction_test.c b/test/dapltest/test/dapl_transaction_test.c index 14c14b4..779ea86 100644 --- a/test/dapltest/test/dapl_transaction_test.c +++ b/test/dapltest/test/dapl_transaction_test.c @@ -43,6 +43,10 @@ */ #define SYNC_BUFF_SIZE 64 +#ifdef DFLT_QLEN +#undef DFLT_QLEN +#endif + #define DFLT_QLEN 8 /* default event queue length */ #define DFLT_TMO 10 /* default timeout (seconds) */ #define MAX_CONN_RETRY 8 -- 1.7.3
[PATCH 5/7] common: check for valid states during ep posting
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_util.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/dapl/common/dapl_ep_util.c b/dapl/common/dapl_ep_util.c index 5133f59..8ceb1be 100644 --- a/dapl/common/dapl_ep_util.c +++ b/dapl/common/dapl_ep_util.c @@ -352,6 +352,10 @@ dapl_ep_post_send_req(IN DAT_EP_HANDLE ep_handle, ep_ptr = (DAPL_EP *) ep_handle; + if ((ep_ptr-param.ep_state != DAT_EP_STATE_CONNECTED) + (ep_ptr-param.ep_state != DAT_EP_STATE_DISCONNECTED)) + return(DAT_ERROR(DAT_INVALID_STATE, DAT_INVALID_STATE_EP_UNCONNECTED)); + /* * Synchronization ok since this buffer is only used for send * requests, which aren't allowed to race with each other. -- 1.7.3
[PATCH 6/7] common: allow qp modify in init state
Allow consumer to modify attributes via dat_ep_modify in init state. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/qp.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/dapl/openib_common/qp.c b/dapl/openib_common/qp.c index 16ce594..a8cc56e 100644 --- a/dapl/openib_common/qp.c +++ b/dapl/openib_common/qp.c @@ -287,6 +287,12 @@ dapls_ib_qp_modify(IN DAPL_IA * ia_ptr, IBV_QPS_ERR, 0, 0, 0)); } + /* consumer ep_modify, init state */ + if (ep_ptr-qp_handle-state == IBV_QPS_INIT) { + return (dapls_modify_qp_state(ep_ptr-qp_handle, + IBV_QPS_INIT, 0, 0, 0)); + } + /* * Check if we have the right qp_state to modify attributes */ -- 1.7.3
[PATCH 7/7] scm: increase ACK timeout to 20 for a default value to match other providers.
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/dapl_ib_util.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dapl/openib_scm/dapl_ib_util.h b/dapl/openib_scm/dapl_ib_util.h index 0d7f9f3..2050c2c 100644 --- a/dapl/openib_scm/dapl_ib_util.h +++ b/dapl/openib_scm/dapl_ib_util.h @@ -60,8 +60,8 @@ typedef dp_ib_cm_handle_t ib_cm_srvc_handle_t; #defineINLINE_SEND_DEFAULT 200 /* RC timer - retry count defaults */ -#define SCM_ACK_TIMER 16 /* 5 bits, 4.096us*2^ack_timer. 16== 268ms */ -#define SCM_ACK_RETRY 7 /* 3 bits, 7 * 268ms = 1.8 seconds */ +#define SCM_ACK_TIMER 20 /* 5 bits, 4.096us*2^ack_timer. 16== 268ms, 20==4.2s */ +#define SCM_ACK_RETRY 7 /* 3 bits, 7 * 4.2 == 30 seconds */ #define SCM_RNR_TIMER 12 /* 5 bits, 12 =.64ms, 28 =163ms, 31 =491ms */ #define SCM_RNR_RETRY 7 /* 3 bits, 7 == infinite */ #define SCM_CR_RETRY 5 /* retries for busy server, connect refused */ -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[ANNOUNCE] dapl-2.0.36
New uDAPL release (2.0.36) available at http://www.openfabrics.org/downloads/dapl Vlad, please pull v2.0 into OFED 3.x and remove the v1.2 compat-dapl package for OFED 3.x, it is no longer supported going forward. Thanks, Arlin -- Latest Packages (see ChangeLog for recent changes): md5sum: 8313a302685089502b44934183199dd5 dapl-2.0.36.tar.gz For support, including development, install RPM packages as follow: dapl-2.0.36-1 dapl-utils-2.0.36-1 dapl-devel-2.0.36-1 dapl-debuginfo-2.0.36-1 Summary of changes: Release 2.0.36 fixes (OFED 3.x): scm: increase ACK timeout to 20 for a default value to match other pr common: allow qp modify in init state common: check for valid states during ep posting dat.conf: keep list of providers in order for backward compatibility ucm: record and silently drop a duplicate reject CM message windows: new version of getlocalipaddr not portable dapltest: DFLT_QLEN is defined in multiple tests -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/15] uDAPL v2.0 common: ep_create should allow max_request_iov attribute setting of zero
When creating an EP without a request EVD (cq) the max_request_iov and max_request_sge will be 0. Allow this combination when checking attribute settings for ARG6. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_create.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/dapl/common/dapl_ep_create.c b/dapl/common/dapl_ep_create.c index e154b8d..c7dedde 100644 --- a/dapl/common/dapl_ep_create.c +++ b/dapl/common/dapl_ep_create.c @@ -171,7 +171,8 @@ dapl_ep_create(IN DAT_IA_HANDLE ia_handle, ep_attr-max_request_dtos == 0) || (recv_evd_handle != DAT_HANDLE_NULL ep_attr-max_recv_iov == 0) - || ep_attr-max_request_iov == 0 + || (request_evd_handle == DAT_HANDLE_NULL + ep_attr-max_request_iov != 0) || (DAT_SUCCESS != dapl_ep_check_recv_completion_flags (ep_attr-recv_completion_flags { -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/15] uDAPL v2.0 common: add check for NULL handle on ext calls, SRQ free, and helper functions
Series of bug fixes, package cleanup, and debug counters. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/common/dat_api.c | 29 ++--- 1 files changed, 26 insertions(+), 3 deletions(-) diff --git a/dat/common/dat_api.c b/dat/common/dat_api.c index f53ead7..50ffa2c 100755 --- a/dat/common/dat_api.c +++ b/dat/common/dat_api.c @@ -292,6 +292,11 @@ DAT_RETURN DAT_API dat_ia_query(IN DAT_IA_HANDLE ia_handle, DAT_RETURN DAT_API dat_set_consumer_context(IN DAT_HANDLE dat_handle, IN DAT_CONTEXT context) { + if (dat_handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, +DAT_INVALID_HANDLE1); + } + if (dats_is_ia_handle(dat_handle)) { DAT_IA_HANDLE dapl_ia_handle; DAT_RETURN dat_status; @@ -301,7 +306,7 @@ DAT_RETURN DAT_API dat_set_consumer_context(IN DAT_HANDLE dat_handle, /* failure to map the handle is unlikely but possible */ /* in a mult-threaded environment */ - if (DAT_SUCCESS == dat_status) { + if (DAT_SUCCESS != dat_status) { return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE1); } @@ -315,6 +320,11 @@ DAT_RETURN DAT_API dat_set_consumer_context(IN DAT_HANDLE dat_handle, DAT_RETURN DAT_API dat_get_consumer_context(IN DAT_HANDLE dat_handle, OUT DAT_CONTEXT * context) { + if (dat_handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, +DAT_INVALID_HANDLE1); + } + if (dats_is_ia_handle(dat_handle)) { DAT_IA_HANDLE dapl_ia_handle; DAT_RETURN dat_status; @@ -324,7 +334,7 @@ DAT_RETURN DAT_API dat_get_consumer_context(IN DAT_HANDLE dat_handle, /* failure to map the handle is unlikely but possible */ /* in a mult-threaded environment */ - if (DAT_SUCCESS == dat_status) { + if (DAT_SUCCESS != dat_status) { return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE1); } @@ -338,6 +348,11 @@ DAT_RETURN DAT_API dat_get_consumer_context(IN DAT_HANDLE dat_handle, DAT_RETURN DAT_API dat_get_handle_type(IN DAT_HANDLE dat_handle, OUT DAT_HANDLE_TYPE * type) { + if (dat_handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, +DAT_INVALID_HANDLE1); + } + if (dats_is_ia_handle(dat_handle)) { DAT_IA_HANDLE dapl_ia_handle; DAT_RETURN dat_status; @@ -347,7 +362,7 @@ DAT_RETURN DAT_API dat_get_handle_type(IN DAT_HANDLE dat_handle, /* failure to map the handle is unlikely but possible */ /* in a mult-threaded environment */ - if (DAT_SUCCESS == dat_status) { + if (DAT_SUCCESS != dat_status) { return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE1); } @@ -1009,6 +1024,9 @@ DAT_RETURN DAT_API dat_srq_create(IN DAT_IA_HANDLE ia_handle, DAT_RETURN DAT_API dat_srq_free(IN DAT_SRQ_HANDLE srq_handle) { + if (srq_handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_SRQ); + } return DAT_SRQ_FREE(srq_handle); } @@ -1063,6 +1081,11 @@ DAT_RETURN DAT_API dat_extension_op(IN DAT_HANDLE handle, DAT_IA_HANDLE dapl_handle; va_list args; + if (handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, +DAT_INVALID_HANDLE1); + } + /* If not IA handle then just passthrough */ if (dats_get_ia_handle(handle, dapl_handle) != DAT_SUCCESS) { dapl_handle = handle; -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/15] uDAPL v2.0 common: dapls_ep_flush_cq will segfault when no CQ is attached to EP
add check for NULL request/receive EVD (cq) before flushing. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_util.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/dapl/common/dapl_ep_util.c b/dapl/common/dapl_ep_util.c index 6646528..5133f59 100644 --- a/dapl/common/dapl_ep_util.c +++ b/dapl/common/dapl_ep_util.c @@ -620,9 +620,11 @@ static void dapli_ep_flush_evd(DAPL_EVD *evd_ptr) void dapls_ep_flush_cqs(DAPL_EP * ep_ptr) { - dapli_ep_flush_evd((DAPL_EVD *) ep_ptr-param.request_evd_handle); - while (dapls_cb_pending(ep_ptr-recv_buffer)) - dapli_ep_flush_evd((DAPL_EVD *) ep_ptr-param.recv_evd_handle); + if (ep_ptr-param.request_evd_handle) + dapli_ep_flush_evd((DAPL_EVD *) ep_ptr-param.request_evd_handle); + if (ep_ptr-param.recv_evd_handle) + while (dapls_cb_pending(ep_ptr-recv_buffer)) + dapli_ep_flush_evd((DAPL_EVD *) ep_ptr-param.recv_evd_handle); } /* -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/15] uDAPL v2.0 ucm: cleanup debug message, ntohl on p_size is incorrect
private data size is a short, change to ntohs on log message Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 6efcad2..39ef28d 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -1003,7 +1003,7 @@ bail: connect: ERR %s - cm_lid %x cm_qpn %x r_psp %x p_sz=%d\n, strerror(errno), htons(cm-msg.daddr.ib.lid), htonl(cm-msg.dqpn), htons(cm-msg.dport), -htonl(cm-msg.p_size)); +htons(cm-msg.p_size)); dapli_cm_free(cm); return DAT_INSUFFICIENT_RESOURCES; -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/15] uDAPL v2.0 common: add DAPL_DBG_TYPE_CM_STATS (0x40000) to debug log options
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/include/dapl_debug.h |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/dapl/include/dapl_debug.h b/dapl/include/dapl_debug.h index ff473e3..bb11c3d 100644 --- a/dapl/include/dapl_debug.h +++ b/dapl/include/dapl_debug.h @@ -70,7 +70,8 @@ typedef enum DAPL_DBG_TYPE_THREAD = 0x4000, DAPL_DBG_TYPE_CM_EST = 0x8000, DAPL_DBG_TYPE_CM_WARN = 0x1, -DAPL_DBG_TYPE_EXTENSION= 0x2 +DAPL_DBG_TYPE_EXTENSION= 0x2, +DAPL_DBG_TYPE_CM_STATS = 0x4 } DAPL_DBG_TYPE; -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/15] uDAPL v2.0 cma, scm, ucm: allow EP (QP) creation without EVD (CQ)
Provide ability to create a EP/QP with no EVD/CQ on either the request or receive queue. The current implementation allows on receive queue but not request queue. Not all ofa devices support a null CQ so if necessary create a dummy CQ at the time of QP creation. Also, if no CQ is specified set appropriate QP max wr/sge attributes to zero. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/qp.c | 41 + 1 files changed, 17 insertions(+), 24 deletions(-) diff --git a/dapl/openib_common/qp.c b/dapl/openib_common/qp.c index 94bb1ed..16ce594 100644 --- a/dapl/openib_common/qp.c +++ b/dapl/openib_common/qp.c @@ -79,29 +79,30 @@ dapls_ib_qp_alloc(IN DAPL_IA * ia_ptr, * Create a CQ with zero entries under the covers to support and * catch any invalid posting. */ - if (rcv_evd != DAT_HANDLE_NULL) - rcv_cq = rcv_evd-ib_cq_handle; - else if (!ia_ptr-hca_ptr-ib_trans.ib_cq_empty) - rcv_cq = ia_ptr-hca_ptr-ib_trans.ib_cq_empty; - else { + if ((!rcv_evd || !req_evd) !ia_ptr-hca_ptr-ib_trans.ib_cq_empty) { struct ibv_comp_channel *channel; channel = ibv_create_comp_channel(ia_ptr-hca_ptr-ib_hca_handle); if (!channel) - return (dapl_convert_errno(ENOMEM, create_cq)); + return (dapl_convert_errno(ENOMEM, create_cq_chan)); /* Call IB verbs to create CQ */ rcv_cq = ibv_create_cq(ia_ptr-hca_ptr-ib_hca_handle, - 0, NULL, channel, 0); + 1, NULL, channel, 0); if (rcv_cq == IB_INVALID_HANDLE) { ibv_destroy_comp_channel(channel); return (dapl_convert_errno(ENOMEM, create_cq)); } - ia_ptr-hca_ptr-ib_trans.ib_cq_empty = rcv_cq; } - if (req_evd != DAT_HANDLE_NULL) + + if (rcv_evd) + rcv_cq = rcv_evd-ib_cq_handle; + else + rcv_cq = ia_ptr-hca_ptr-ib_trans.ib_cq_empty; + + if (req_evd) req_cq = req_evd-ib_cq_handle; else req_cq = ia_ptr-hca_ptr-ib_trans.ib_cq_empty; @@ -133,9 +134,12 @@ dapls_ib_qp_alloc(IN DAPL_IA * ia_ptr, #endif /* Setup attributes and create qp */ dapl_os_memzero((void *)qp_create, sizeof(qp_create)); + qp_create.recv_cq = rcv_cq; + qp_create.cap.max_recv_wr = rcv_evd ? attr-max_recv_dtos:0; + qp_create.cap.max_recv_sge = rcv_evd ? attr-max_recv_iov:0; qp_create.send_cq = req_cq; - qp_create.cap.max_send_wr = attr-max_request_dtos; - qp_create.cap.max_send_sge = attr-max_request_iov; + qp_create.cap.max_send_wr = req_evd ? attr-max_request_dtos:0; + qp_create.cap.max_send_sge = req_evd ? attr-max_request_iov:0; qp_create.cap.max_inline_data = ia_ptr-hca_ptr-ib_trans.max_inline_send; qp_create.qp_type = IBV_QPT_RC; @@ -153,17 +157,6 @@ dapls_ib_qp_alloc(IN DAPL_IA * ia_ptr, } } #endif - - /* ibv assumes rcv_cq is never NULL, set to req_cq */ - if (rcv_cq == NULL) { - qp_create.recv_cq = req_cq; - qp_create.cap.max_recv_wr = 0; - qp_create.cap.max_recv_sge = 0; - } else { - qp_create.recv_cq = rcv_cq; - qp_create.cap.max_recv_wr = attr-max_recv_dtos; - qp_create.cap.max_recv_sge = attr-max_recv_iov; - } #ifdef _OPENIB_CMA_ if (rdma_create_qp(conn-cm_id, ib_pd_handle, qp_create)) { @@ -178,7 +171,7 @@ dapls_ib_qp_alloc(IN DAPL_IA * ia_ptr, ep_ptr-qp_handle = ibv_create_qp(ib_pd_handle, qp_create); if (!ep_ptr-qp_handle) return (dapl_convert_errno(ENOMEM, create_qp)); - + /* Setup QP attributes for INIT state on the way out */ if (dapls_modify_qp_state(ep_ptr-qp_handle, IBV_QPS_INIT, 0, 0, 0) != DAT_SUCCESS) { @@ -188,7 +181,7 @@ dapls_ib_qp_alloc(IN DAPL_IA * ia_ptr, } #endif dapl_dbg_log(DAPL_DBG_TYPE_EP, - qp_alloc: qpn %p type %d sq %d,%d rq %d,%d\n, + qp_alloc: qpn 0x%x type %d sq %d,%d rq %d,%d\n, ep_ptr-qp_handle-qp_num, ep_ptr-qp_handle-qp_type, qp_create.cap.max_send_wr, qp_create.cap.max_send_sge, qp_create.cap.max_recv_wr, qp_create.cap.max_recv_sge); -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/15] uDAPL v2.0 scm: fix retry count on connection pending timeout
Retry count not being decremented on connection TIMEOUT. Also, cleanup log messages on CONN and REP pending and add local port to output. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index a34965b..cac6a72 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -505,19 +505,18 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) struct dapl_ep *ep_ptr = cm_ptr-ep; if (err) { - dapl_log(DAPL_DBG_TYPE_WARN, - CONN_REQUEST: %s ERR %s - %s %d - %s %d\n, + dapl_log(DAPL_DBG_TYPE_CM_WARN, + CONN_PENDING: %s ERR %s - %s PORT L-%x R-%x %s cnt=%d\n, err == -1 ? POLL : SOCKOPT, err == -1 ? strerror(dapl_socket_errno()) : strerror(err), -inet_ntoa(((struct sockaddr_in *) - cm_ptr-addr)-sin_addr), -ntohs(((struct sockaddr_in *) - cm_ptr-addr)-sin_port), +inet_ntoa(((struct sockaddr_in *)cm_ptr-addr)-sin_addr), +ntohs(((struct sockaddr_in *)cm_ptr-msg.daddr.so)-sin_port), +ntohs(((struct sockaddr_in *)cm_ptr-addr)-sin_port), (err == ETIMEDOUT || err == ECONNREFUSED) ? RETRYING...:ABORTING, cm_ptr-retry); /* retry a timeout */ - if ((err == ETIMEDOUT) || (err == ECONNREFUSED --cm_ptr-retry)) { + if (((err == ETIMEDOUT) || (err == ECONNREFUSED)) --cm_ptr-retry) { closesocket(cm_ptr-socket); cm_ptr-socket = DAPL_INVALID_SOCKET; dapli_socket_connect(cm_ptr-ep, (DAT_IA_ADDRESS_PTR)cm_ptr-addr, @@ -715,14 +714,15 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) len = recv(cm_ptr-socket, (char *)cm_ptr-msg, exp, 0); if (len != exp || ntohs(cm_ptr-msg.ver) DCM_VER_MIN) { int err = dapl_socket_errno(); - dapl_log(DAPL_DBG_TYPE_WARN, - CONN_RTU read: sk %d ERR 0x%x, rcnt=%d, v=%d - %s PORT L-%x R-%x PID L-%x R-%x\n, + dapl_log(DAPL_DBG_TYPE_CM_WARN, + CONN_REP_PENDING: sk %d ERR 0x%x, rcnt=%d, v=%d - + %s PORT L-%x R-%x PID L-%x R-%x %d\n, cm_ptr-socket, err, len, ntohs(cm_ptr-msg.ver), inet_ntoa(((struct sockaddr_in *)cm_ptr-addr)-sin_addr), ntohs(((struct sockaddr_in *)cm_ptr-msg.daddr.so)-sin_port), ntohs(((struct sockaddr_in *)cm_ptr-addr)-sin_port), ntohs(*(uint16_t*)cm_ptr-msg.resv[0]), -ntohs(*(uint16_t*)cm_ptr-msg.resv[2])); +ntohs(*(uint16_t*)cm_ptr-msg.resv[2]),cm_ptr-retry); /* Retry; corner case where server tcp stack resets under load */ if (err == ECONNRESET --cm_ptr-retry) { -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/15] uDAPL v2.0 ucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)
Full sendq should retry polling completions instead of failing. When sendq is full and all requests are pending the get send message code should retry polling for completions and not return error on first empty CQ attempt. Give HCA a chance to complete some batched requests. Also, clean up the send message error logging. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 39ef28d..6b5867a 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -234,38 +234,42 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) static ib_cm_msg_t *ucm_get_smsg(ib_hca_transport_t *tp) { ib_cm_msg_t *msg = NULL; - int ret, polled = 0, hd = tp-s_hd; + int ret, polled = 1, hd = tp-s_hd; hd++; if (hd == tp-qpe) hd = 0; retry: - if (hd == tp-s_tl) + if (hd == tp-s_tl) { msg = NULL; + if (polled % 100 == 0) + dapl_log(DAPL_DBG_TYPE_WARN, + ucm_get_smsg: FULLq hd %d == tl %d, + completions stalled, polls=%d\n, +hd, tp-s_tl, polled); + } else { msg = tp-sbuf[hd]; tp-s_hd = hd; /* new hd */ } /* if empty, process some completions */ - if ((msg == NULL) (!polled)) { + if (msg == NULL) { struct ibv_wc wc; /* process completions, based on UCM_TX_BURST */ ret = ibv_poll_cq(tp-scq, 1, wc); if (ret 0) { dapl_log(DAPL_DBG_TYPE_WARN, -get_smsg: cq %p %s\n, +get_smsg: cq %p %s\n, tp-scq, strerror(errno)); + return NULL; } /* free up completed sends, update tail */ - if (ret 0) { + if (ret 0) tp-s_tl = (int)wc.wr_id; - dapl_log(DAPL_DBG_TYPE_CM, -get_smsg: wr_cmp (%d) s_tl=%d\n, - wc.status, tp-s_tl); - } + polled++; goto retry; } @@ -1000,8 +1004,8 @@ dapli_cm_connect(DAPL_EP *ep, dp_ib_cm_handle_t cm) bail: dapl_log(DAPL_DBG_TYPE_WARN, - connect: ERR %s - cm_lid %x cm_qpn %x r_psp %x p_sz=%d\n, -strerror(errno), htons(cm-msg.daddr.ib.lid), + connect: snd ERR - cm_lid %x cm_qpn %x r_psp %x p_sz=%d\n, +htons(cm-msg.daddr.ib.lid), htonl(cm-msg.dqpn), htons(cm-msg.dport), htons(cm-msg.p_size)); -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/15] uDAPL v2.0 scm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces
replace usage of getaddrinfo since is doesnt actually return bound addresses and can return the loopback address in some configurations. Some systems may not have eth0 configured so you cannot assume eth0 as a non-loopback default netdev. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/util.c | 55 ++-- 1 files changed, 18 insertions(+), 37 deletions(-) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index 053c376..c118ca9 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -155,12 +155,11 @@ char *dapl_ib_mtu_str(enum ibv_mtu mtu) DAT_RETURN getlocalipaddr(char *addr, int addr_len) { struct sockaddr_in *sin; - struct addrinfo *res, hint, *ai; - int ret; - char hostname[256]; + int ret, skfd, i; char *netdev = getenv(DAPL_SCM_NETDEV); + struct ifreq ifr[10]; + struct ifconf ifc; -retry: /* use provided netdev instead of default hostname */ if (netdev != NULL) { ret = getipaddr_netdev(netdev, addr, addr_len); @@ -174,46 +173,28 @@ retry: return DAT_SUCCESS; } - if (addr_len sizeof(*sin)) { + if (addr_len sizeof(*sin)) return DAT_INTERNAL_ERROR; - } - ret = gethostname(hostname, 256); + memset(ifc,0,sizeof(ifc)); + ifc.ifc_buf = (char *)ifr; + ifc.ifc_len = sizeof(ifr); + + skfd = socket(PF_INET, SOCK_STREAM, 0); + ret = ioctl(skfd, SIOCGIFCONF, ifc); if (ret) - return dapl_convert_errno(ret, gethostname); - - memset(hint, 0, sizeof hint); - hint.ai_flags = AI_PASSIVE; - hint.ai_family = AF_INET; - hint.ai_socktype = SOCK_STREAM; - hint.ai_protocol = IPPROTO_TCP; - - ret = getaddrinfo(hostname, NULL, hint, res); - if (ret) { - dapl_log(DAPL_DBG_TYPE_ERR, - getaddrinfo ERR: %d %s\n, ret, gai_strerror(ret)); - return DAT_INVALID_ADDRESS; - } + goto bail; - ret = DAT_INVALID_ADDRESS; - for (ai = res; ai; ai = ai-ai_next) { - sin = (struct sockaddr_in *)ai-ai_addr; - if (*((uint32_t *) sin-sin_addr) != htonl(0x7f01)) { - *((struct sockaddr_in *)addr) = *sin; - ret = DAT_SUCCESS; + /* first non-loopback interface in list */ + for (i=0; i ifc.ifc_len/sizeof(struct ifreq); i++) { + if (strcmp(ifr[i].ifr_name, lo)) break; - } } + memcpy(addr, ifr[i].ifr_addr, sizeof(struct sockaddr_in)); - freeaddrinfo(res); - - /* only loopback found, retry netdev eth0 */ - if (ret == DAT_INVALID_ADDRESS) { - netdev = eth0; - goto retry; - } - - return ret; +bail: + close(skfd); + return dapl_convert_errno(ret, getlocalipaddr); } /* -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/15] uDAPL v2.0 commom: add cm, link, and diag event counters in IB extended builds
Add additional event monitoring capabilities during runtime to help isolate issues during scaling in lieu of logging/printing warning messages. Counters have been added to provider CM services and counters have been added and mapped to sysfs ib_cm, device port and device diag counters. ibdev_path is used for device sysfs counters. uDAPL CM events are tracked on a per IA instance via internal provider counters. The ib_cm, link, and diag events are tracked on a per platform basis via sysfs. For these running counters a start and stop function is provided for sampling and mapping to DAPL 64 bit counters. All counters, along with new start and stop functions, are provided via dat_ib_extensions.h. New IB extension version is 2.0.7 New DCNT_IA_xx counters include 40 cm, 9 link, and 9 diag types. To enable new counters (default build is disabled): ./configure --enable-counters New bitmappings have been added to DAPL_DBG_TYPE environment variable to automatically start/stop counters and log errors if counters are enabled. The following will control CM, LINK, and DIAG respectively: DAPL_DBG_TYPE_CM_ERRS= 0x08, DAPL_DBG_TYPE_LINK_ERRS = 0x10, DAPL_DBG_TYPE_DIAG_ERRS = 0x40, Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am |3 + configure.in | 11 + dapl/common/dapl_debug.c | 431 +- dapl/common/dapl_ia_open.c |4 + dapl/common/dapl_ia_util.c | 12 +- dapl/include/dapl_debug.h| 13 +- dapl/openib_common/dapl_ib_common.h |2 +- dapl/openib_common/ib_extensions.c | 26 ++ dapl/udapl/linux/dapl_osd.h | 16 ++ dat/include/dat2/dat_ib_extensions.h | 95 - 10 files changed, 601 insertions(+), 12 deletions(-) diff --git a/Makefile.am b/Makefile.am index a9bdeda..edff7f8 100755 --- a/Makefile.am +++ b/Makefile.am @@ -20,6 +20,9 @@ XFLAGS = -DDAT_EXTENSIONS XPROGRAMS = dapl/openib_common/ib_extensions.c XHEADERS = XLIBS = +if DEFINE_COUNTERS +XFLAGS += -DDAPL_COUNTERS +endif if COLL_TYPE_FCA XFLAGS += -DDAT_IB_COLLECTIVES -DDAT_FCA_PROVIDER XPROGRAMS += dapl/openib_common/collectives/fca_provider.c diff --git a/configure.in b/configure.in index 71da96c..d577525 100644 --- a/configure.in +++ b/configure.in @@ -104,6 +104,17 @@ AC_ARG_ENABLE([ucm], [ucm=true]) AM_CONDITIONAL(DEFINE_UCM, test x$ucm = xtrue) +dnl Support to enable/disable IB extended counters (CM,LINK,DIAG) +AC_ARG_ENABLE([counters], + AS_HELP_STRING([--enable-counters],[enable counters provider build, default=disabled]), + [case ${enableval} in +yes) counters=true ;; +no) counters=false ;; +*) AC_MSG_ERROR(bad value ${enableval} for --enable-counters) ;; + esac], + [counters=false]) +AM_CONDITIONAL(DEFINE_COUNTERS, test x$counters = xtrue) + dnl Support ib_extension build - if enable-ext-type == ib AC_ARG_ENABLE(ext-type, [ --enable-ext-type Enable extensions support for library: ib, none, default=ib], diff --git a/dapl/common/dapl_debug.c b/dapl/common/dapl_debug.c index 7a0a199..cb45496 100644 --- a/dapl/common/dapl_debug.c +++ b/dapl/common/dapl_debug.c @@ -74,6 +74,328 @@ void dapl_internal_dbg_log(DAPL_DBG_TYPE type, const char *fmt, ...) #ifdef DAPL_COUNTERS +static int rd_ctr(const char *dev, + const char *file, + int port, + DAT_IA_COUNTER_TYPE type, + DAT_UINT64 *value) +{ + char *f_path; + int len, fd; + char vstr[21]; + char pstr[2]; + + sprintf(pstr, %d, port); + *value = 0; + + switch (type) { + case DCNT_IA_CM: + if (asprintf(f_path, /sys/class/infiniband_cm/%s/%s/%s, dev, pstr, file) 0) + return -1; + break; + case DCNT_IA_LNK: + if (asprintf(f_path, %s/ports/%s/counters/%s, dev, pstr, file) 0) + return -1; + break; + case DCNT_IA_DIAG: + if (asprintf(f_path, %s/diag_counters/%s, dev, file) 0) + return -1; + break; + default: + return -1; + } + + fd = open(f_path, O_RDONLY); + if (fd 0) { + free(f_path); + return -1; + } + + len = read(fd, vstr, 21); + + if (len 0 vstr[--len] == '\n') + vstr[len] = '\0'; + + *value = (DAT_UINT64)atoi(vstr); + + close(fd); + free(f_path); + return 0; +} + +#ifdef _OPENIB_CMA_ +static void dapl_start_cm_cntrs(DAT_HANDLE dh) +{ + DAPL_IA *ia = (DAPL_IA *)dh; + const char *dev = ibv_get_device_name(ia-hca_ptr-ib_trans.ib_dev); + int port = ia-hca_ptr-port_num; + DAT_UINT64 *cntrs = (DAT_UINT64 *)ia-cntrs; + + rd_ctr(dev,cm_tx_msgs/req, port, DCNT_IA_CM, cntrs[DCNT_IA_CM_REQ_TX]); +
[PATCH 11/15] uDAPL v2.0 scm: update socket cm provider to support new CM stat and error counters
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c | 34 ++ 1 files changed, 34 insertions(+), 0 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index cac6a72..b095c2f 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -487,6 +487,7 @@ DAT_RETURN dapli_socket_disconnect(dp_ib_cm_handle_t cm_ptr) NULL, 0, cm_ptr-ep); } } + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_DREQ_TX); /* release from workq */ dapli_cm_free(cm_ptr); @@ -526,6 +527,7 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) dapli_cm_free(cm_ptr); return; } + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_ERR_TIMEOUT); goto bail; } @@ -572,9 +574,13 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) htonll(*(uint64_t*)cm_ptr-msg.saddr.ib.gid[0]), (unsigned long long) htonll(*(uint64_t*)cm_ptr-msg.saddr.ib.gid[8])); + + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_REQ_TX); return; bail: + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_ERR); + /* mark CM object for cleanup */ dapli_cm_free(cm_ptr); dapl_evd_connection_callback(NULL, IB_CME_DESTINATION_REJECT, NULL, 0, ep_ptr); @@ -688,6 +694,8 @@ dapli_socket_connect(DAPL_EP * ep_ptr, dapli_cm_queue(cm_ptr); return DAT_SUCCESS; bail: + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_ERR); + dapl_log(DAPL_DBG_TYPE_ERR, connect ERROR: - %s r_qual %d\n, inet_ntoa(((struct sockaddr_in *)r_addr)-sin_addr), @@ -737,6 +745,7 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) } goto bail; } + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_REP_RX); /* keep the QP, address info in network order */ @@ -870,6 +879,8 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) /* post the event with private data */ event = IB_CME_CONNECTED; dapl_dbg_log(DAPL_DBG_TYPE_EP, ACTIVE: connected!\n); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_RTU_TX); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_ACTIVE_EST); #ifdef DAT_EXTENSIONS ud_bail: @@ -900,6 +911,9 @@ ud_bail: cm_ptr-ah, ntohs(cm_ptr-msg.saddr.ib.lid), ntohl(cm_ptr-msg.saddr.ib.qpn)); + + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), + DCNT_IA_CM_AH_RESOLVED); } else event = DAT_IB_UD_CONNECTION_ERROR_EVENT; @@ -1014,6 +1028,8 @@ dapli_socket_listen(DAPL_IA * ia_ptr, DAT_CONN_QUAL serviceID, DAPL_SP * sp_ptr) setup listen: port %d cr %p s_fd %d\n, serviceID + 1000, cm_ptr, cm_ptr-socket); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_LISTEN); + return dat_status; bail: /* Never queued, destroy here */ @@ -1128,6 +1144,8 @@ static void dapli_socket_accept_data(ib_cm_srvc_handle_t acm_ptr) acm_ptr-state = DCM_ACCEPTING_DATA; dapl_os_unlock(acm_ptr-lock); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(acm_ptr-hca-ia_list_head)), DCNT_IA_CM_REQ_RX); + dapl_dbg_log(DAPL_DBG_TYPE_CM, ACCEPT: DST %s %x lid=0x%x, qpn=0x%x, psz=%d\n, inet_ntoa(((struct sockaddr_in *) @@ -1151,6 +1169,9 @@ static void dapli_socket_accept_data(ib_cm_srvc_handle_t acm_ptr) (DAT_COUNT) exp, (DAT_PVOID *) acm_ptr-msg.p_data, (DAT_PVOID *) xevent); + + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(acm_ptr-hca-ia_list_head)), + DCNT_IA_CM_AH_REQ_RX); } else #endif /* trigger CR event and return SUCCESS */ @@ -1318,6 +1339,8 @@ dapli_socket_accept_usr(DAPL_EP * ep_ptr, dapl_dbg_log(DAPL_DBG_TYPE_EP, PASSIVE: accepted!\n); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm_ptr-hca-ia_list_head)), DCNT_IA_CM_REP_TX); + return DAT_SUCCESS; bail: /* schedule cleanup from workq
[PATCH 12/15] uDAPL v2.0 ucm: update UD cm provider to support new CM stat and error counters
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c | 45 +++-- 1 files changed, 43 insertions(+), 2 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 6b5867a..357dbf7 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -173,6 +173,7 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) (time - cm-timer)/1000, cm-hca-ib_trans.rep_time cm-retries); cm-retries++; + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_REQ_RETRY); dapl_os_unlock(cm-lock); dapli_cm_connect(cm-ep, cm); return; @@ -195,6 +196,7 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) (time - cm-timer)/1000, cm-hca-ib_trans.rtu_time cm-retries); cm-retries++; + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_REP_RETRY); dapl_os_unlock(cm-lock); ucm_reply(cm); return; @@ -217,6 +219,7 @@ static void ucm_check_timers(dp_ib_cm_handle_t cm, int *timer) (time - cm-timer)/1000, cm-hca-ib_trans.rtu_time cm-retries); cm-retries++; + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_DREQ_RETRY); dapl_os_unlock(cm-lock); dapli_cm_disconnect(cm); return; @@ -273,6 +276,8 @@ retry: polled++; goto retry; } + DAPL_CNTR_DATA(((DAPL_IA *)dapl_llist_peek_head(tp-hca-ia_list_head)), DCNT_IA_CM_ERR_REQ_FULLQ, polled 1 ? 1:0); + DAPL_CNTR_DATA(((DAPL_IA *)dapl_llist_peek_head(tp-hca-ia_list_head)), DCNT_IA_CM_REQ_FULLQ_POLL, polled - 1); return msg; } @@ -322,6 +327,7 @@ static int ucm_reject(ib_hca_transport_t *tp, ib_cm_msg_t *msg) ntohs(smsg.daddr.ib.lid), ntohl(smsg.dqpn), ntohs(smsg.dport)); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(tp-hca-ia_list_head)), DCNT_IA_CM_ERR_REJ_TX); return (ucm_send(tp, smsg, NULL, 0)); } @@ -366,7 +372,9 @@ static void ucm_process_recv(ib_hca_transport_t *tp, ntohl(cm-msg.d_id)); cm-msg.op = htons(DCM_RTU); - ucm_send(cm-hca-ib_trans, cm-msg, NULL, 0); + ucm_send(cm-hca-ib_trans, cm-msg, NULL, 0); + + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_RTU_RETRY); } dapl_os_unlock(cm-lock); break; @@ -393,6 +401,8 @@ static void ucm_process_recv(ib_hca_transport_t *tp, cm-msg.op = htons(DCM_DREP); ucm_send(cm-hca-ib_trans, cm-msg, NULL, 0); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_DREP_RETRY); + } else if (ntohs(msg-op) != DCM_DREP){ /* DREP ok to ignore, any other print warning */ dapl_log(DAPL_DBG_TYPE_WARN, @@ -401,6 +411,7 @@ static void ucm_process_recv(ib_hca_transport_t *tp, cm, dapl_cm_op_str(ntohs(msg-op)), dapl_cm_state_str(cm-state), ntohs(msg-sport), ntohl(msg-sqpn)); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_UNEXPECTED); } dapl_os_unlock(cm-lock); break; @@ -478,6 +489,8 @@ retry_listenq: ntohs(cm-msg.saddr.ib.lid), ntohs(cm-msg.sport), ntohl(cm-msg.sqpn), ntohl(cm-msg.saddr.ib.qpn)); + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_REQ_DUP); + return NULL; } } @@ -517,6 +530,10 @@ retry_listenq: ntohs(msg-sport), ntohl(msg-sqpn), ntohl(msg-saddr.ib.qpn), ntohl(msg-s_id), ntohl(msg-d_id)); + + if (ntohs(msg-op) == DCM_DREP) { + DAPL_CNTR(((DAPL_IA *)dapl_llist_peek_head(cm-hca-ia_list_head)), DCNT_IA_CM_ERR_DREP_DUP); + } } return found; @@ -878,6 +895,7 @@ DAT_RETURN dapli_cm_disconnect(dp_ib_cm_handle_t
[PATCH 13/15] uDAPL v2.0 windows: Provide auto-detect between RoCE and Infiniband for Windows.
For RoCE, enable transport global ID use. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/util.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/dapl/openib_common/util.c b/dapl/openib_common/util.c index c118ca9..06a6f3d 100644 --- a/dapl/openib_common/util.c +++ b/dapl/openib_common/util.c @@ -352,6 +352,17 @@ skip_ib: port_attr.link_layer); #endif #endif + +#ifdef _WIN32 +#ifndef _OPENIB_CMA_ + if (port_attr.transport != IBV_TRANSPORT_IB) + hca_ptr-ib_trans.global = 1; + + dapl_log(DAPL_DBG_TYPE_UTIL, + query_hca: port.transport %d ib_trans.global %d\n, +port_attr.transport, hca_ptr-ib_trans.global); +#endif +#endif dapl_log(DAPL_DBG_TYPE_UTIL, query_hca: (%x.%x) eps %d, sz %d evds %d, sz %d mtu %d - pkey %x p_idx %d sl %d\n, -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/15] uDAPL v2.0 config/build: remove post/postun hacking used to modify dat.conf
Return to the tried and true method of managing configuration files via %config directive and remove ugly sed editing methods. The dat.conf includes both v1 and v2 device entries to insure backward compatibility. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am | 36 +++- dapl.spec.in | 35 +++ 2 files changed, 6 insertions(+), 65 deletions(-) diff --git a/Makefile.am b/Makefile.am index edff7f8..7441348 100755 --- a/Makefile.am +++ b/Makefile.am @@ -50,6 +50,8 @@ else AM_CFLAGS = -g -Wall -D_GNU_SOURCE -DDAT_CONF=\$(sysconfdir)/dat.conf\ endif +sysconf_DATA = doc/dat.conf + datlibdir = $(libdir) if DEFINE_CMA dapllibofadir = $(libdir) @@ -559,6 +561,7 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ LICENSE.txt \ LICENSE2.txt \ LICENSE3.txt \ +doc/dat.conf \ dapl.spec.in \ $(man_MANS) \ test/dapltest/include/dapl_bpool.h \ @@ -592,37 +595,4 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dist-hook: dapl.spec cp dapl.spec $(distdir) -install-exec-hook: - if ! test -d $(DESTDIR)$(sysconfdir); then \ - mkdir -p $(DESTDIR)$(sysconfdir); \ - fi; \ - if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ - sed -e '/ofa-v2-.* u2/d' $(DESTDIR)$(sysconfdir)/dat.conf /tmp/ofadapl; \ - cp /tmp/ofadapl $(DESTDIR)$(sysconfdir)/dat.conf; \ - fi; \ - echo ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 'ib0 0 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 'ib1 0 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mthca0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mthca0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'ipath0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'ipath0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'ehca0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 'eth2 0 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 'mlx4_0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 'mlx4_0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 'mthca0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 'mthca0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 'eth2 0 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 'eth3 0 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 1 ' $(DESTDIR)$(sysconfdir)/dat.conf; \ - echo ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 2 ' $(DESTDIR)$(sysconfdir)/dat.conf; - -uninstall-hook: - if test -e $(DESTDIR)$(sysconfdir)/dat.conf; then \ - sed -e '/ofa-v2-.* u2/d' $(DESTDIR)$(sysconfdir)/dat.conf /tmp/ofadapl; \ - cp /tmp/ofadapl $(DESTDIR)$(sysconfdir)/dat.conf; \ - fi; - SUBDIRS = . test/dtest test/dapltest diff --git a/dapl.spec.in b/dapl.spec.in index b779f9f..55b345e 100644 --- a/dapl.spec.in +++ b/dapl.spec.in @@ -85,46 +85,17 @@ rm -rf %{buildroot} make DESTDIR=%{buildroot} install # remove unpackaged files from the buildroot rm -f %{buildroot}%{_libdir}/*.la -rm -f %{buildroot}%{_sysconfdir}/*.conf %clean rm -rf %{buildroot} -%post -/sbin/ldconfig -if [ -e %{_sysconfdir}/dat.conf ]; then -sed -e '/ofa-v2-.* u2/d' %{_sysconfdir}/dat.conf /tmp/$$ofadapl -mv /tmp/$$ofadapl %{_sysconfdir}/dat.conf -fi -echo ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 1 ' %{_sysconfdir}/dat.conf -echo ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 'mlx4_0 2 ' %{_sysconfdir}/dat.conf -echo ofa-v2-ib0 u2.0 nonthreadsafe default
[PATCH 1/13] DAPL v2.0: common: CR EVD overflow causes segfault.
Clean up Bugzilla bugs. Patch set resulting from negative testing of connection protocol and DAT interfaces for all OpenFabrics DAPL providers (cma, scm, ucm). The CR is freed up incorrectly before unlinking with SP. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_cr_callback.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c index 3997b38..c58444b 100644 --- a/dapl/common/dapl_cr_callback.c +++ b/dapl/common/dapl_cr_callback.c @@ -414,7 +414,6 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle, (DAT_CR_HANDLE) cr_ptr); if (dat_status != DAT_SUCCESS) { - dapls_cr_free(cr_ptr); (void)dapls_ib_reject_connection(ib_cm_handle, DAT_CONNECTION_EVENT_BROKEN, 0, NULL); @@ -423,6 +422,7 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle, dapl_os_lock(sp_ptr-header.lock); dapl_sp_remove_cr(sp_ptr, cr_ptr); dapl_os_unlock(sp_ptr-header.lock); + dapls_cr_free(cr_ptr); return DAT_INSUFFICIENT_RESOURCES; } -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 2/13] DAPL v2.0: dapltest: server CR EVD is too small for multi-client configurations.
Increase default size from 8 to 32. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- test/dapltest/test/dapl_server.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/test/dapltest/test/dapl_server.c b/test/dapltest/test/dapl_server.c index 443425c..92e0d21 100644 --- a/test/dapltest/test/dapl_server.c +++ b/test/dapltest/test/dapl_server.c @@ -34,7 +34,7 @@ #undef DFLT_QLEN #endif -#define DFLT_QLEN 8/* default event queue length */ +#define DFLT_QLEN 32 /* default event queue length */ int send_control_data(DT_Tdep_Print_Head * phead, unsigned char *buffp, -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 3/13] DAPL v2.0: scm: return correct event error code when remote host refuses requests
changed from TIMEOUT to NON_PEER_REJECTED Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index b45a4ab..586e1b0 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -578,7 +578,7 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) bail: /* mark CM object for cleanup */ dapli_cm_free(cm_ptr); - dapl_evd_connection_callback(NULL, IB_CME_TIMEOUT, NULL, 0, ep_ptr); + dapl_evd_connection_callback(NULL, IB_CME_DESTINATION_REJECT, NULL, 0, ep_ptr); } /* -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 4/13] DAPL v2.0: common: cleanup debug message on EVD overflows
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_evd_util.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 41423b2..a7b8c54 100644 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -656,7 +656,7 @@ dapls_evd_post_overflow_event(IN DAPL_EVD * evd_ptr) DAPL_EVD *async_evd_ptr = evd_ptr-header.owner_ia-async_error_evd; DAT_EVENT *event_ptr; - dapl_log(DAPL_DBG_TYPE_WARN, WARNING: overflow event on EVD %p/n, evd_ptr); + dapl_log(DAPL_DBG_TYPE_WARN, WARNING: overflow event on EVD %p\n, evd_ptr); dapl_os_lock(async_evd_ptr-header.lock); -- 1.7.3
[PATCH 5/13] DAPL v2.0: common: clean up dat_rsp_create log message
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_rsp_create.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/common/dapl_rsp_create.c b/dapl/common/dapl_rsp_create.c index 3e36e81..13fe09b 100644 --- a/dapl/common/dapl_rsp_create.c +++ b/dapl/common/dapl_rsp_create.c @@ -85,7 +85,7 @@ dapl_rsp_create(IN DAT_IA_HANDLE ia_handle, ia_ptr = (DAPL_IA *) ia_handle; dapl_dbg_log(DAPL_DBG_TYPE_CM, - dapl_rsp_free conn_qual: %x EP: %p\n, + dapl_rsp_create conn_qual: %x EP: %p\n, conn_qual, ep_handle); if (DAPL_BAD_HANDLE(ia_ptr, DAPL_MAGIC_IA)) { -- 1.7.3
[PATCH 7/13] DAPL v2.0: cma,scm,ucm: extra reference on EP, with RSP, causes dat_ep_free() to hang
Need to add check for RSP or PSP provider type service points during passive side accepts before taking CR reference on the EP. In these cases, the EP is already linked to inbound CR. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_cma/cm.c |5 +++-- dapl/openib_scm/cm.c |5 +++-- dapl/openib_ucm/cm.c |5 +++-- 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/dapl/openib_cma/cm.c b/dapl/openib_cma/cm.c index 7980cf6..bd2c1f0 100644 --- a/dapl/openib_cma/cm.c +++ b/dapl/openib_cma/cm.c @@ -901,8 +901,9 @@ dapls_ib_accept_connection(IN DAT_CR_HANDLE cr_handle, rdma_destroy_id(ep_conn-cm_id); dapls_cm_release(ep_conn); - /* add new CM to EP linking, qp_handle unchanged */ - dapl_ep_link_cm(ep_ptr, cr_conn); + /* add new CM to EP linking, qp_handle unchanged, !PSP !RSP */ + if (!cr_conn-sp-ep_handle !cr_conn-sp-psp_flags) + dapl_ep_link_cm(ep_ptr, cr_conn); cr_conn-ep = ep_ptr; } else { dapl_log(DAPL_DBG_TYPE_ERR, diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index 586e1b0..b6109f1 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -1277,8 +1277,9 @@ dapli_socket_accept_usr(DAPL_EP * ep_ptr, cm_ptr-state = DCM_ACCEPTED; dapl_os_unlock(cm_ptr-lock); - /* Link CM to EP, already queued on work thread */ - dapl_ep_link_cm(ep_ptr, cm_ptr); + /* Link CM to EP, already queued on work thread, !PSP !RSP */ + if (!cm_ptr-sp-ep_handle !cm_ptr-sp-psp_flags) + dapl_ep_link_cm(ep_ptr, cm_ptr); cm_ptr-ep = ep_ptr; local.p_size = htons(p_size); diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 2d2063e..762bd66 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -1581,8 +1581,9 @@ dapli_accept_usr(DAPL_EP *ep, DAPL_CR *cr, DAT_COUNT p_size, DAT_PVOID p_data) cm-p_size = p_size; dapl_os_memcpy(cm-p_data, p_data, p_size); - /* save state and setup valid reference to EP, HCA */ - dapl_ep_link_cm(ep, cm); + /* save state and setup valid reference to EP, HCA. !PSP !RSP */ + if (!cm-sp-ep_handle !cm-sp-psp_flags) + dapl_ep_link_cm(ep, cm); cm-ep = ep; cm-hca = ia-hca_ptr; -- 1.7.3
[PATCH 8/13] DAPL v2.0: dat: add check for NULL handle on IA calls
check added to dats_get_ia_handle() Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/common/dat_api.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/dat/common/dat_api.c b/dat/common/dat_api.c index 4e6eadd..f53ead7 100755 --- a/dat/common/dat_api.c +++ b/dat/common/dat_api.c @@ -178,6 +178,9 @@ dats_get_ia_handle(IN DAT_IA_HANDLE handle, OUT DAT_IA_HANDLE * ia_handle_p) { DAT_RETURN dat_status = DAT_SUCCESS; + if (handle == NULL) + return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_IA); + /* handle to vector */ if (DAT_IA_HANDLE_TO_UL(handle) = g_hv.handle_max) { unsigned long i; -- 1.7.3
[PATCH 9/13] DAPL v2.0: scm: incorrectly sends user reject during CR callback errors
Add reason checking on provider rejects and set appropriate op type in reject message. Reject can be called from cr callback during failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index b6109f1..a34965b 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -1666,7 +1666,11 @@ dapls_ib_reject_connection(IN dp_ib_cm_handle_t cm_ptr, return DAT_LENGTH_ERROR; /* write reject data to indicate reject */ - cm_ptr-msg.op = htons(DCM_REJ_USER); + if (reason == IB_CM_REJ_REASON_CONSUMER_REJ) + cm_ptr-msg.op = htons(DCM_REJ_USER); + else + cm_ptr-msg.op = htons(DCM_REJ_CM); + cm_ptr-msg.p_size = htons(psize); iov[0].iov_base = (void *)cm_ptr-msg; -- 1.7.3
[PATCH 10/13] DAPL v2.0: common: change dbg level on CR callback if not listening on SP
Change from from CM to CM_WARN level and include in non-debug build. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_cr_callback.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c index 1f6dd6d..8bfbb3e 100644 --- a/dapl/common/dapl_cr_callback.c +++ b/dapl/common/dapl_cr_callback.c @@ -152,8 +152,8 @@ void dapls_cr_callback(IN dp_ib_cm_handle_t ib_cm_handle, IN const ib_cm_events_ dapl_os_lock(sp_ptr-header.lock); if (sp_ptr-listening == DAT_FALSE) { dapl_os_unlock(sp_ptr-header.lock); - dapl_dbg_log(DAPL_DBG_TYPE_CM, ---- dapls_cr_callback: conn event on down SP\n); + dapl_log(DAPL_DBG_TYPE_CM_WARN, + cr_callback: CR event on non-listening SP\n); (void)dapls_ib_reject_connection(ib_cm_handle, DAT_CONNECTION_EVENT_UNREACHABLE, 0, NULL); -- 1.7.3
[PATCH 11/13] DAPL v2.0: ucm: incorrectly sends user reject during CR callback errors
Add reason checking on provider rejects and set appropriate op type in reject message. Reject can be called from cr callback during failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ. Add warning message on active side. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_ucm/cm.c | 34 ++ 1 files changed, 26 insertions(+), 8 deletions(-) diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index 762bd66..6efcad2 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -402,11 +402,12 @@ static void ucm_process_recv(ib_hca_transport_t *tp, break; default: dapl_log(DAPL_DBG_TYPE_WARN, -ucm_recv: UNKNOWN state -- op %s, %s spsp %x sqpn %x\n, - dapl_cm_op_str(ntohs(msg-op)), - dapl_cm_state_str(cm-state), - ntohs(msg-sport), ntohl(msg-sqpn)); +ucm_recv: Warning, UNKNOWN state +- op %s, %s spsp %x sqpn %x slid %x\n, + dapl_cm_op_str(ntohs(msg-op)), + dapl_cm_state_str(cm-state), + ntohs(msg-sport), ntohl(msg-sqpn), + ntohs(msg-saddr.ib.lid)); dapl_os_unlock(cm-lock); break; } @@ -1065,9 +1066,19 @@ static void ucm_connect_rtu(dp_ib_cm_handle_t cm, ib_cm_msg_t *msg) event = IB_CME_CONNECTED; else if (ntohs(msg-op) == DCM_REJ_USER) event = IB_CME_DESTINATION_REJECT_PRIVATE_DATA; - else + else { + dapl_log(DAPL_DBG_TYPE_WARN, + Warning, non-user CR REJECT: + cm %p op %s, st %s dlid %x iqp %x port %x - + slid %x iqp %x port %x\n, cm, +dapl_cm_op_str(ntohs(msg-op)), +dapl_cm_state_str(cm-state), +ntohs(msg-daddr.ib.lid), ntohl(msg-daddr.ib.qpn), +ntohs(msg-dport), ntohs(msg-saddr.ib.lid), +ntohl(msg-saddr.ib.qpn), ntohs(msg-sport)); + event = IB_CME_DESTINATION_REJECT; - + } if (event != IB_CME_CONNECTED) { dapl_log(DAPL_DBG_TYPE_CM, ACTIVE: CM_REQ REJECTED: @@ -1203,6 +1214,9 @@ ud_bail: (DAT_COUNT)ntohs(cm-msg.p_size), (DAT_PVOID *)cm-msg.p_data, (DAT_PVOID *)xevent); + + if (event != DAT_IB_UD_CONNECTION_EVENT_ESTABLISHED) + dapli_cm_free(cm); } else #endif { @@ -1942,8 +1956,12 @@ dapls_ib_reject_connection(IN dp_ib_cm_handle_t cm, cm-msg.saddr.ib.qp_type = cm-msg.daddr.ib.qp_type; dapl_os_memcpy(cm-msg.saddr.ib.gid[0], cm-hca-ib_trans.addr.ib.gid, 16); - cm-msg.op = htons(DCM_REJ_USER); + if (reason == IB_CM_REJ_REASON_CONSUMER_REJ) + cm-msg.op = htons(DCM_REJ_USER); + else + cm-msg.op = htons(DCM_REJ_CM); + if (ucm_send(cm-hca-ib_trans, cm-msg, pdata, psize)) { dapl_log(DAPL_DBG_TYPE_WARN, cm_reject: send ERR: %s\n, strerror(errno)); -- 1.7.3
[PATCH 13/13] DAPL v2.0: common: add missing sub-types to dat_strerror()
unknown minor error string returned with valid sub types. Update function for sub-type error codes in dat_error.h. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/common/dat_strerror.c | 110 + 1 files changed, 110 insertions(+), 0 deletions(-) diff --git a/dat/common/dat_strerror.c b/dat/common/dat_strerror.c index 4480bef..915dfb0 100644 --- a/dat/common/dat_strerror.c +++ b/dat/common/dat_strerror.c @@ -233,6 +233,11 @@ dat_strerror_minor(IN DAT_RETURN value, OUT const char **message) *message = DAT_RESOURCE_CREDITS; return DAT_SUCCESS; } + case DAT_RESOURCE_SRQ: + { + *message = DAT_RESOURCE_SRQ; + return DAT_SUCCESS; + } case DAT_INVALID_HANDLE_IA: { *message = DAT_INVALID_HANDLE_IA; @@ -303,6 +308,66 @@ dat_strerror_minor(IN DAT_RETURN value, OUT const char **message) *message = DAT_INVALID_HANDLE_EVD_ASYNC; return DAT_SUCCESS; } + case DAT_INVALID_HANDLE_SRQ: + { + *message = DAT_INVALID_HANDLE_SRQ; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE_CSP: + { + *message = DAT_INVALID_HANDLE_CSP; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE1: + { + *message = DAT_INVALID_HANDLE1; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE2: + { + *message = DAT_INVALID_HANDLE2; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE3: + { + *message = DAT_INVALID_HANDLE3; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE4: + { + *message = DAT_INVALID_HANDLE4; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE5: + { + *message = DAT_INVALID_HANDLE5; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE6: + { + *message = DAT_INVALID_HANDLE6; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE7: + { + *message = DAT_INVALID_HANDLE7; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE8: + { + *message = DAT_INVALID_HANDLE8; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE9: + { + *message = DAT_INVALID_HANDLE9; + return DAT_SUCCESS; + } + case DAT_INVALID_HANDLE10: + { + *message = DAT_INVALID_HANDLE10; + return DAT_SUCCESS; + } case DAT_INVALID_ARG1: { *message = DAT_INVALID_ARG1; @@ -408,6 +473,51 @@ dat_strerror_minor(IN DAT_RETURN value, OUT const char **message) *message = DAT_INVALID_STATE_EP_NOTREADY; return DAT_SUCCESS; } + case DAT_INVALID_STATE_EP_RECV_WATERMARK: + { + *message = DAT_INVALID_STATE_EP_RECV_WATERMARK; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_PZ: + { + *message = DAT_INVALID_STATE_EP_PZ; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_EVD_REQUEST: + { + *message = DAT_INVALID_STATE_EP_EVD_REQUEST; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_EVD_RECV: + { + *message = DAT_INVALID_STATE_EP_EVD_RCV; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_EVD_CONNECT: + { + *message = DAT_INVALID_STATE_EP_EVD_CONNECT; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_UNCONFIGURED: + { + *message = DAT_INVALID_STATE_EP_UNCONFIGURED; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_UNCONFRESERVED: + { + *message = DAT_INVALID_STATE_EP_UNCONFRESERVED; + return DAT_SUCCESS; + } + case DAT_INVALID_STATE_EP_UNCONFPASSIVE: + { + *message =
[PATCH 12/13] DAPL v2.0: common: extended CR event processing missing rejects on errors
When processing an inbound CR event callback a non-user reject should be sent to client in the case of a non-listening SP, allocation error, or EVD overrun. Changes made to dapls_evd_post_cr_event_ext callback. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_evd_util.c | 35 --- 1 files changed, 24 insertions(+), 11 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index a7b8c54..78d2a1f 100644 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -937,16 +937,13 @@ dapls_evd_post_cr_event_ext(IN DAPL_SP * sp_ptr, { DAPL_CR *cr_ptr; DAPL_EP *ep_ptr; + int reason = DAT_CONNECTION_EVENT_BROKEN; dapl_os_lock(sp_ptr-header.lock); if (sp_ptr-listening == DAT_FALSE) { dapl_os_unlock(sp_ptr-header.lock); - dapl_dbg_log(DAPL_DBG_TYPE_CM, ---- post_cr_event_ext: conn event on down SP\n); - (void)dapls_ib_reject_connection(ib_cm_handle, - DAT_CONNECTION_EVENT_UNREACHABLE, -0, NULL); - return DAT_CONN_QUAL_UNAVAILABLE; + reason = DAT_CONNECTION_EVENT_UNREACHABLE; + goto bail; } /* @@ -961,7 +958,7 @@ dapls_evd_post_cr_event_ext(IN DAPL_SP * sp_ptr, /* allocate new connect request */ cr_ptr = dapls_cr_alloc(sp_ptr-header.owner_ia); if (cr_ptr == NULL) - return DAT_INSUFFICIENT_RESOURCES; + goto bail; /* Set up the CR */ cr_ptr-sp_ptr = sp_ptr;/* maintain sp_ptr in case of reject */ @@ -994,8 +991,7 @@ dapls_evd_post_cr_event_ext(IN DAPL_SP * sp_ptr, ep_ptr = dapl_ep_alloc(ia_ptr, NULL); if (ep_ptr == NULL) { dapls_cr_free(cr_ptr); - /* Invoking function will call dapls_ib_cm_reject() */ - return DAT_INSUFFICIENT_RESOURCES; + goto bail; } ep_ptr-param.ia_handle = ia_ptr; ep_ptr-param.local_ia_address_ptr = @@ -1025,8 +1021,25 @@ dapls_evd_post_cr_event_ext(IN DAPL_SP * sp_ptr, /* link the CR onto the SP so we can pick it up later */ dapl_sp_link_cr(sp_ptr, cr_ptr); - return dapls_evd_do_post_cr_event_ext(sp_ptr-evd_handle, event_number, - sp_ptr, cr_ptr, ext_data); + if (dapls_evd_do_post_cr_event_ext(sp_ptr-evd_handle, + event_number, + sp_ptr, cr_ptr, + ext_data) == DAT_SUCCESS) { + return DAT_SUCCESS; + } + + /* error: take CR off the list, we can't use it */ + dapl_os_lock(sp_ptr-header.lock); + dapl_sp_remove_cr(sp_ptr, cr_ptr); + dapl_os_unlock(sp_ptr-header.lock); + dapls_cr_free(cr_ptr); +bail: + dapl_log(DAPL_DBG_TYPE_WARN, +cr_event_ext: ERROR reason = 0x%x\n, reason); + + (void)dapls_ib_reject_connection(ib_cm_handle, reason, 0, NULL); + + return DAT_INTERNAL_ERROR; } DAT_RETURN -- 1.7.3
RE: [PATCH] dapltest-server segfault seen on recent OFED-1.5.4 daily build
Thank you for the two patches. I tried the two patches and now, I have not seen a segfault till now on dapl-server at least. However, after about 2 hours of test, some of dapl-client throws below error on console: Server Name: 3.4.5.1 Server Net Address: 3.4.5.1 DT_cs_Client: Starting Test ... FAIL: 16 Server test connections did not report ready. FAIL: 16 Server test connections did not report ready. dapl-client is stalled at this stage, and needs to be manually killed by Ctrl+C. And below errors are seen on dapl-server console: Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE Test[b368]: Warning: dat_ep_disconnect (abrupt) #2 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE Test[b368]: Warning: dat_ep_disconnect (abrupt) #3 error DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE DAT_INVALID_STATE_EVD_IN_USE ... Can you send me your scripts so I can attempt to duplicate? Did this test setup run successfully before OFED 1.5.4 RC3 release? Thanks, Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] DAPL v2.0: common: remote ia address null pointer creates seg fault
add NULL ptr check and return DAT_INVALID_PARAMETER, DAT_INVALID_ARG2 Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_connect.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/dapl/common/dapl_ep_connect.c b/dapl/common/dapl_ep_connect.c index 80afead..590d0ed 100755 --- a/dapl/common/dapl_ep_connect.c +++ b/dapl/common/dapl_ep_connect.c @@ -81,6 +81,11 @@ dapl_ep_connect(IN DAT_EP_HANDLE ep_handle, DAT_COUNT req_hdr_size; void *private_data_ptr; + if (remote_ia_address == NULL) { + dat_status = DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG2); + goto bail; + } + dapl_dbg_log(DAPL_DBG_TYPE_API | DAPL_DBG_TYPE_CM, dapl_ep_connect (%p, {%u.%u.%u.%u}, %X, %d, %d, %p, %x, %x)\n, ep_handle, -- 1.7.3
[PATCH] DAPL v2.0: common: posting events on full queue returns wrong error code
return DAT_QUEUE_FULL instead of DAT_INSUFFICIENT_RESOURCES Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_evd_util.c | 17 - 1 files changed, 8 insertions(+), 9 deletions(-) diff --git a/dapl/common/dapl_evd_util.c b/dapl/common/dapl_evd_util.c index 9171415..41423b2 100644 --- a/dapl/common/dapl_evd_util.c +++ b/dapl/common/dapl_evd_util.c @@ -712,7 +712,7 @@ dapls_evd_post_cr_arrival_event(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } DAT_RETURN @@ -741,7 +741,7 @@ dapls_evd_post_connection_event(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } DAT_RETURN @@ -770,7 +770,7 @@ dapls_evd_post_async_error_event(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } DAT_RETURN @@ -794,7 +794,7 @@ dapls_evd_post_software_event(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } /* @@ -835,7 +835,7 @@ dapls_evd_post_generic_event(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } #ifdef DAT_EXTENSIONS @@ -876,8 +876,7 @@ dapls_evd_post_event_ext(IN DAPL_EVD * evd_ptr, if (event_ptr == NULL) { dapl_os_unlock(evd_ptr-header.lock); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, -DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL,0); } /* copy event and extended data */ @@ -926,7 +925,7 @@ dapls_evd_do_post_cr_event_ext(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } DAT_RETURN @@ -1059,7 +1058,7 @@ dapls_evd_post_connection_event_ext(IN DAPL_EVD * evd_ptr, err: dapl_os_unlock(evd_ptr-header.lock); dapls_evd_post_overflow_event(evd_ptr); - return DAT_ERROR(DAT_INSUFFICIENT_RESOURCES, DAT_RESOURCE_MEMORY); + return DAT_ERROR(DAT_QUEUE_FULL, 0); } #endif -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH] DAPL v2.0: common: dat_ep_modify seg faults with null ep_param ptr
Add additional NULL ptr check for arg3 ep_param Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_modify.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/dapl/common/dapl_ep_modify.c b/dapl/common/dapl_ep_modify.c index 9f0095f..0545870 100644 --- a/dapl/common/dapl_ep_modify.c +++ b/dapl/common/dapl_ep_modify.c @@ -385,6 +385,12 @@ dapli_ep_modify_validate_parameters(IN DAT_EP_HANDLE ep_handle, goto bail; } + if (ep_param == NULL) { + dat_status = + DAT_ERROR(DAT_INVALID_PARAMETER, DAT_INVALID_ARG3); + goto bail; + } + ep = (DAPL_EP *) ep_handle; ia = ep-header.owner_ia; -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH] DAPL v2.0: scm: change debug message level for listen/bind errors
reduce to CM_WARN instead of general WARN level. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index b9cb1bc..b45a4ab 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -994,7 +994,7 @@ dapli_socket_listen(DAPL_IA * ia_ptr, DAT_CONN_QUAL serviceID, DAPL_SP * sp_ptr) if (err == EADDRINUSE) dat_status = DAT_CONN_QUAL_IN_USE; else { - dapl_log(DAPL_DBG_TYPE_WARN, + dapl_log(DAPL_DBG_TYPE_CM_WARN, listen: ERROR 0x%x %s on port %d\n, err, strerror(err), serviceID + 1000); dat_status = DAT_INVALID_PARAMETER; -- 1.7.3
[PATCH] DAPL v2.0: common: increase default IB ack timer from 16 to 20
For larger, more congested fabrics, a larger ACK timer is needed. Consumers can still change default with environment variable DAPL_ACK_TIMER if they need to increase or decrease. This applies to SCM and UCM providers only. The CMA provider, which uses rdma_cm, has no way to control ack timer with current API. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/dapl_ib_common.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dapl/openib_common/dapl_ib_common.h b/dapl/openib_common/dapl_ib_common.h index a0dd772..e757b65 100644 --- a/dapl/openib_common/dapl_ib_common.h +++ b/dapl/openib_common/dapl_ib_common.h @@ -158,8 +158,8 @@ typedef uint16_tib_hca_port_t; #define DAT_UD_QKEY0x78654321 /* RC timer - retry count defaults */ -#define DCM_ACK_TIMER 16 /* 5 bits, 4.096us*2^ack_timer. 16== 268ms */ -#define DCM_ACK_RETRY 7 /* 3 bits, 7 * 268ms = 1.8 seconds */ +#define DCM_ACK_TIMER 20 /* 5 bits, 4.096us*2^ack_timer. 16== 268ms, 20==4.2s */ +#define DCM_ACK_RETRY 7 /* 3 bits, 7 * 4.2 == 30 seconds */ #define DCM_RNR_TIMER 12 /* 5 bits, 12 =.64ms, 28 =163ms, 31 =491ms */ #define DCM_RNR_RETRY 7 /* 3 bits, 7 == infinite */ #define DCM_IB_MTU 2048 -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [PATCH] DAPL v2.0: common: increase default IB ack timer from 16 to 20
For larger, more congested fabrics, a larger ACK timer is needed. Consumers can still change default with environment variable DAPL_ACK_TIMER if they need to increase or decrease. This applies to SCM and UCM providers only. The CMA provider, which uses rdma_cm, has no way to control ack timer with current API. The rdma_cm uses the path lifetime to determine the QP timeout value. The path record can be manipulated by the user when calling rdma_create_ep(). I stand corrected. I guess I should have said has no direct way. Yes, a user can modify path lifetime via rdma_create_ep() but there is no way for the user to know how much that will be manipulated and increased in the IB CM driver.
RE: [PATCH] DAPL v2.0: common: increase default IB ack timer from 16 to 20
Yes, a user can modify path lifetime via rdma_create_ep() but there is no way for the user to know how much that will be manipulated and increased in the IB CM driver. Sure there is. It's an open source driver. :) The ib_cm calculates the correct timeout based on the packet lifetime provided by the SA and the CA ack timeout. If user modifications to this value are necessary, then there is a bug in either the SA or HCA driver. or there could be congestion on the fabric that the SA or HCA driver is unaware of. N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[ANNOUNCE] dapl-2.0.34
New release for v2.0 (2.0.34) available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 1.5.4 RC3 Thanks, Arlin -- Latest Package (see ChangeLog for recent changes): md5sum: d8114711c07fa9f4b7e52ab2803a9e8d dapl-2.0.34.tar.gz For 1.2 and 2.0 support on same system, including development, install RPM packages as follow: dapl-2.0.34-1 dapl-utils-2.0.34-1 dapl-devel-2.0.34-1 dapl-debuginfo-2.0.34-1 compat-dapl-1.2.19-1 compat-dapl-devel-1.2.19-1 Summary of changes: Release 2.0.34 fixes (OFED 1.5.4 GA): scm: change debug message level for listen/bind errors common: increase default IB ack timer from 16 to 20 common: remote ia address null pointer creates seg fault common: posting events on full queue returns wrong error code common: dat_ep_modify seg faults with null ep_param ptr common: dat_evd_free seg faults with resized software EVD common: remove assert for incorrect events during cm_request dat: dat_cno_query with NULL cno_handle causes segmentation fault scm: dat_psp_create returns wrong error code on bind/listen failure scm: socket connect request count is reset improperly on retry scm: when hostname has loopback addr assigned, default to eth0 instead of failing scm: add port number to error log during hca_open failures common: query calls return incorrect IA handle to consumer common: srq create asserts with !dapl_llist_is_empty(head) failed -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] DAPL v2.0: common: dat_evd_free seg faults with resized software EVD
dapl_evd_resize is attempting to resize a CQ but there is no CQ attached to a software EVD. Add check for cq_handle before resizing. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_evd_resize.c | 13 - 1 files changed, 8 insertions(+), 5 deletions(-) diff --git a/dapl/common/dapl_evd_resize.c b/dapl/common/dapl_evd_resize.c index 762fad4..f36270c 100644 --- a/dapl/common/dapl_evd_resize.c +++ b/dapl/common/dapl_evd_resize.c @@ -108,11 +108,14 @@ DAT_RETURN DAT_API dapl_evd_resize(IN DAT_EVD_HANDLE evd_handle, goto bail; } - dat_status = dapls_ib_cq_resize(evd_ptr-header.owner_ia, - evd_ptr, evd_qlen); - if (dat_status != DAT_SUCCESS) { - dapl_os_unlock(evd_ptr-header.lock); - goto bail; + if (evd_ptr-ib_cq_handle) { + + dat_status = dapls_ib_cq_resize(evd_ptr-header.owner_ia, + evd_ptr, evd_qlen); + if (dat_status != DAT_SUCCESS) { + dapl_os_unlock(evd_ptr-header.lock); + goto bail; + } } dat_status = dapls_evd_event_realloc(evd_ptr, evd_qlen); -- 1.7.3
[PATCH] DAPL v2.0: dat: dat_cno_query with NULL cno_handle causes segmentation fault
add check for NULL handle in dat library Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dat/udat/udat_api.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/dat/udat/udat_api.c b/dat/udat/udat_api.c index 5948a4f..6c1549a 100644 --- a/dat/udat/udat_api.c +++ b/dat/udat/udat_api.c @@ -161,6 +161,9 @@ DAT_RETURN DAT_API dat_cno_query(IN DAT_CNO_HANDLE cno_handle, IN DAT_CNO_PARAM_MASK cno_param_mask, OUT DAT_CNO_PARAM * cno_param) { + if (cno_handle == NULL) { + return DAT_ERROR(DAT_INVALID_HANDLE, DAT_INVALID_HANDLE_CNO); + } return DAT_CNO_QUERY(cno_handle, cno_param_mask, cno_param); } -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH] DAPL v2.0: common: query calls return incorrect IA handle to consumer
The IA handle from the consumer perspective is an IA vector and not the provider IA address handle. Need to convert IA handle to IA vector for consumer calls. Modify dats_ia_get_handle call to convert both ways depending on handle type provided so a dapl provider can convert to vector on query calls. This fix is backward compatible with older libdat2 libraries. Function is already exported and syntax is unchanged. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_ep_query.c |1 + dapl/common/dapl_lmr_query.c |1 + dapl/common/dapl_psp_query.c |2 +- dapl/common/dapl_pz_query.c |2 +- dapl/common/dapl_rmr_query.c |1 + dapl/common/dapl_rsp_query.c |2 +- dapl/common/dapl_srq_query.c |1 + dapl/include/dapl.h |4 dapl/udapl/dapl_cno_query.c |2 +- dapl/udapl/dapl_evd_query.c |2 +- dat/common/dat_api.c | 37 - 11 files changed, 37 insertions(+), 18 deletions(-) diff --git a/dapl/common/dapl_ep_query.c b/dapl/common/dapl_ep_query.c index f5f548f..5241b96 100644 --- a/dapl/common/dapl_ep_query.c +++ b/dapl/common/dapl_ep_query.c @@ -107,6 +107,7 @@ dapl_ep_query(IN DAT_EP_HANDLE ep_handle, (DAT_IA_ADDRESS_PTR) ep_ptr-remote_ia_address; } *ep_param = ep_ptr-param; + dats_get_ia_handle(ep_ptr-param.ia_handle, ep_param-ia_handle); } bail: diff --git a/dapl/common/dapl_lmr_query.c b/dapl/common/dapl_lmr_query.c index 4ac37ec..d8688c8 100644 --- a/dapl/common/dapl_lmr_query.c +++ b/dapl/common/dapl_lmr_query.c @@ -80,6 +80,7 @@ dapl_lmr_query(IN DAT_LMR_HANDLE lmr_handle, lmr = (DAPL_LMR *) lmr_handle; dapl_os_memcpy(lmr_param, lmr-param, sizeof(DAT_LMR_PARAM)); + dats_get_ia_handle(lmr-param.ia_handle, lmr_param-ia_handle); bail: return dat_status; diff --git a/dapl/common/dapl_psp_query.c b/dapl/common/dapl_psp_query.c index d990ebd..bb01c9f 100644 --- a/dapl/common/dapl_psp_query.c +++ b/dapl/common/dapl_psp_query.c @@ -83,7 +83,7 @@ dapl_psp_query(IN DAT_PSP_HANDLE psp_handle, /* * Fill in the PSP params */ - psp_param-ia_handle = sp_ptr-header.owner_ia; + dats_get_ia_handle(sp_ptr-header.owner_ia, psp_param-ia_handle); psp_param-conn_qual = sp_ptr-conn_qual; psp_param-evd_handle = sp_ptr-evd_handle; psp_param-psp_flags = sp_ptr-psp_flags; diff --git a/dapl/common/dapl_pz_query.c b/dapl/common/dapl_pz_query.c index 5829af4..b95ca7d 100644 --- a/dapl/common/dapl_pz_query.c +++ b/dapl/common/dapl_pz_query.c @@ -79,7 +79,7 @@ dapl_pz_query(IN DAT_PZ_HANDLE pz_handle, /* Since the DAT_PZ_ARGS values are easily accessible, */ /* don't bother checking the DAT_PZ_ARGS_MASK value*/ - pz_param-ia_handle = (DAT_IA_HANDLE) pz-header.owner_ia; + dats_get_ia_handle(pz-header.owner_ia, pz_param-ia_handle); bail: return dat_status; diff --git a/dapl/common/dapl_rmr_query.c b/dapl/common/dapl_rmr_query.c index d18aa84..18d9984 100644 --- a/dapl/common/dapl_rmr_query.c +++ b/dapl/common/dapl_rmr_query.c @@ -87,6 +87,7 @@ dapl_rmr_query(IN DAT_RMR_HANDLE rmr_handle, } dapl_os_memcpy(rmr_param, rmr-param, sizeof(DAT_RMR_PARAM)); + dats_get_ia_handle(rmr-param.ia_handle, rmr_param-ia_handle); bail: return dat_status; diff --git a/dapl/common/dapl_rsp_query.c b/dapl/common/dapl_rsp_query.c index dfc8145..79b30d8 100644 --- a/dapl/common/dapl_rsp_query.c +++ b/dapl/common/dapl_rsp_query.c @@ -81,7 +81,7 @@ dapl_rsp_query(IN DAT_RSP_HANDLE rsp_handle, /* * Fill in the RSP params */ - rsp_param-ia_handle = sp_ptr-header.owner_ia; + dats_get_ia_handle(sp_ptr-header.owner_ia, rsp_param-ia_handle); rsp_param-conn_qual = sp_ptr-conn_qual; rsp_param-evd_handle = sp_ptr-evd_handle; rsp_param-ep_handle = sp_ptr-ep_handle; diff --git a/dapl/common/dapl_srq_query.c b/dapl/common/dapl_srq_query.c index af395d4..f9ad443 100644 --- a/dapl/common/dapl_srq_query.c +++ b/dapl/common/dapl_srq_query.c @@ -91,6 +91,7 @@ dapl_srq_query(IN DAT_SRQ_HANDLE srq_handle, srq_ptr-param.outstanding_dto_count = DAT_VALUE_UNKNOWN; *srq_param = srq_ptr-param; + dats_get_ia_handle(srq_ptr-header.owner_ia, srq_param-ia_handle); bail: return dat_status; diff --git a/dapl/include/dapl.h b/dapl/include/dapl.h index 68d0ea4..2fd5032 100755 --- a/dapl/include/dapl.h +++ b/dapl/include/dapl.h @@ -739,6 +739,10 @@ extern DAT_RETURN DAT_API dapl_get_handle_type ( IN DAT_HANDLE, OUT DAT_HANDLE_TYPE * ); +extern DAT_RETURN DAT_API dats_get_ia_handle ( + IN DAT_HANDLE, /* dat_handle */ + OUT DAT_IA_HANDLE * ); /* ia handle*/ + /* CNO functions */ #if !defined(__KERNEL__) diff
[PATCH] DAPL v2.0: common: srq create asserts with !dapl_llist_is_empty(head) failed
return DAT_NOT_IMPLEMENTED before allocating any resources until there is a provider that supports SRQ's. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/common/dapl_srq_create.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/dapl/common/dapl_srq_create.c b/dapl/common/dapl_srq_create.c index 7631a5e..77aeacd 100644 --- a/dapl/common/dapl_srq_create.c +++ b/dapl/common/dapl_srq_create.c @@ -114,6 +114,10 @@ dapl_srq_create(IN DAT_IA_HANDLE ia_handle, goto bail; } + /* SRQ provider not implemented */ + dat_status = DAT_ERROR(DAT_NOT_IMPLEMENTED, DAT_NO_SUBTYPE); + goto bail; + /* Allocate SRQ */ srq_ptr = dapl_srq_alloc(ia_ptr, srq_attr); if (srq_ptr == NULL) { @@ -129,9 +133,6 @@ dapl_srq_create(IN DAT_IA_HANDLE ia_handle, /* * XXX Allocate provider resource here!!! */ - /* XXX */ dat_status = DAT_ERROR(DAT_NOT_IMPLEMENTED, DAT_NO_SUBTYPE); - /* XXX */ dapl_srq_dealloc(srq_ptr); - /* XXX */ goto bail; /* Link it onto the IA */ dapl_ia_link_srq(ia_ptr, srq_ptr); -- 1.7.3
[PATCH] DAPL v2.0: scm: add port number to error log during hca_open failures
scm: add port number to error log during hca_open failures Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/device.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/dapl/openib_scm/device.c b/dapl/openib_scm/device.c index 41fccdf..77c64e3 100644 --- a/dapl/openib_scm/device.c +++ b/dapl/openib_scm/device.c @@ -324,8 +324,9 @@ found: if (ibv_query_port(hca_ptr-ib_hca_handle, (uint8_t) hca_ptr-port_num, port_attr)) { dapl_log(DAPL_DBG_TYPE_ERR, - open_hca: get lid ERR for %s, err=%s\n, + open_hca: get lid ERR for %s port=%d, err=%s\n, ibv_get_device_name(hca_ptr-ib_trans.ib_dev), +hca_ptr-port_num, strerror(errno)); goto err; } else { -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH 2/3] DAPL v2.0: scm: socket connect request count is reset improperly on retry
Include current retry count with the new connect request call and set according after creating the new cm object. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c | 23 --- 1 files changed, 12 insertions(+), 11 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index 305f85b..968d9b9 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -64,7 +64,7 @@ static DAT_RETURN dapli_socket_connect(DAPL_EP * ep_ptr, DAT_IA_ADDRESS_PTR r_addr, -DAT_CONN_QUAL r_qual, DAT_COUNT p_size, DAT_PVOID p_data); +DAT_CONN_QUAL r_qual, DAT_COUNT p_size, DAT_PVOID p_data, int retries); #ifdef DAPL_DBG /* Check for EP linking to IA and proper connect state */ @@ -505,8 +505,8 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) struct dapl_ep *ep_ptr = cm_ptr-ep; if (err) { - dapl_log(DAPL_DBG_TYPE_ERR, - CONN_PENDING: %s ERR %s - %s %d - %s\n, + dapl_log(DAPL_DBG_TYPE_WARN, + CONN_REQUEST: %s ERR %s - %s %d - %s %d\n, err == -1 ? POLL : SOCKOPT, err == -1 ? strerror(dapl_socket_errno()) : strerror(err), inet_ntoa(((struct sockaddr_in *) @@ -514,7 +514,7 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) ntohs(((struct sockaddr_in *) cm_ptr-addr)-sin_port), (err == ETIMEDOUT || err == ECONNREFUSED) ? -RETRYING...:ABORTING); +RETRYING...:ABORTING, cm_ptr-retry); /* retry a timeout */ if ((err == ETIMEDOUT) || (err == ECONNREFUSED --cm_ptr-retry)) { @@ -522,12 +522,11 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) cm_ptr-socket = DAPL_INVALID_SOCKET; dapli_socket_connect(cm_ptr-ep, (DAT_IA_ADDRESS_PTR)cm_ptr-addr, ntohs(((struct sockaddr_in *)cm_ptr-addr)-sin_port) - 1000, -ntohs(cm_ptr-msg.p_size), cm_ptr-msg.p_data); +ntohs(cm_ptr-msg.p_size), cm_ptr-msg.p_data, cm_ptr-retry); dapl_ep_unlink_cm(cm_ptr-ep, cm_ptr); dapli_cm_free(cm_ptr); return; } - goto bail; } @@ -579,7 +578,7 @@ static void dapli_socket_connected(dp_ib_cm_handle_t cm_ptr, int err) bail: /* mark CM object for cleanup */ dapli_cm_free(cm_ptr); - dapl_evd_connection_callback(NULL, IB_CME_LOCAL_FAILURE, NULL, 0, ep_ptr); + dapl_evd_connection_callback(NULL, IB_CME_TIMEOUT, NULL, 0, ep_ptr); } /* @@ -589,7 +588,7 @@ bail: static DAT_RETURN dapli_socket_connect(DAPL_EP * ep_ptr, DAT_IA_ADDRESS_PTR r_addr, -DAT_CONN_QUAL r_qual, DAT_COUNT p_size, DAT_PVOID p_data) +DAT_CONN_QUAL r_qual, DAT_COUNT p_size, DAT_PVOID p_data, int retries) { dp_ib_cm_handle_t cm_ptr; int ret; @@ -604,6 +603,8 @@ dapli_socket_connect(DAPL_EP * ep_ptr, if (cm_ptr == NULL) return dat_ret; + cm_ptr-retry = retries; + /* create, connect, sockopt, and exchange QP information */ if ((cm_ptr-socket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == DAPL_INVALID_SOCKET) { @@ -724,12 +725,12 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) ntohs(*(uint16_t*)cm_ptr-msg.resv[2])); /* Retry; corner case where server tcp stack resets under load */ - if (err == ECONNRESET) { + if (err == ECONNRESET --cm_ptr-retry) { closesocket(cm_ptr-socket); cm_ptr-socket = DAPL_INVALID_SOCKET; dapli_socket_connect(cm_ptr-ep, (DAT_IA_ADDRESS_PTR)cm_ptr-addr, ntohs(((struct sockaddr_in *)cm_ptr-addr)-sin_port) - 1000, -ntohs(cm_ptr-msg.p_size), cm_ptr-msg.p_data); +ntohs(cm_ptr-msg.p_size), cm_ptr-msg.p_data, cm_ptr-retry); dapl_ep_unlink_cm(cm_ptr-ep, cm_ptr); dapli_cm_free(cm_ptr); return; @@ -1455,7 +1456,7 @@ dapls_ib_connect(IN DAT_EP_HANDLE ep_handle, return (dapli_socket_connect(ep_ptr, remote_ia_address, remote_conn_qual, -private_data_size, private_data)); +private_data_size, private_data, SCM_CR_RETRY)); }
[PATCH 3/3] DAPL v2.0: scm: dat_psp_create returns wrong error code on bind/listen failure
The SCM provider changed to return DAT_INVALID_PARAMTER instead of incorrect DAT_CONN_QUAL_UNAVAILABLE error code on any bind or listen failure. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_scm/cm.c | 11 ++- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index 968d9b9..b9cb1bc 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -991,13 +991,14 @@ dapli_socket_listen(DAPL_IA * ia_ptr, DAT_CONN_QUAL serviceID, DAPL_SP * sp_ptr) if ((bind(cm_ptr-socket, (struct sockaddr *)addr, sizeof(addr)) 0) || (listen(cm_ptr-socket, 128) 0)) { int err = dapl_socket_errno(); - dapl_log(DAPL_DBG_TYPE_CM, - listen: ERROR 0x%x %s on port %d\n, -err, strerror(err), serviceID + 1000); if (err == EADDRINUSE) dat_status = DAT_CONN_QUAL_IN_USE; - else - dat_status = DAT_CONN_QUAL_UNAVAILABLE; + else { + dapl_log(DAPL_DBG_TYPE_WARN, + listen: ERROR 0x%x %s on port %d\n, +err, strerror(err), serviceID + 1000); + dat_status = DAT_INVALID_PARAMETER; + } goto bail; } -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
RE: [ANNOUNCE] OFED-1.5.4-rc1 is available
I don't see FDR support in the changelog. Did FDR make it into RC1 kernel? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [ANNOUNCE] dapl-2.0.33
The dapl-2.0.33 distribution was built without FCA configuration so it was missing some of the optional headers for FCA support. Please pull updated dapl-2.0.33.tar.gz. The git tree is unaffected. md5sum: 1539095f223d11a3c4a69c5e3775f1cc dapl-2.0.33.tar.gz Arlin -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] dapl-2.0.33
New release for v2.0 (2.0.33) available at http://www.openfabrics.org/downloads/dapl Vlad, please pull into OFED 1.5.4 RC1: -- Latest Package (see ChangeLog for recent changes): md5sum: 1dbffd27c790cc822383b309f0a01312 dapl-2.0.33.tar.gz For 1.2 and 2.0 support on same system, including development, install RPM packages as follow: dapl-2.0.33-1 dapl-utils-2.0.33-1 dapl-devel-2.0.33-1 dapl-debuginfo-2.0.33-1 compat-dapl-1.2.19-1 compat-dapl-devel-1.2.19-1 New features for 2.0.33 IB transport extensions for MPI collectives and Mellanox FCA provider added as optional build. Requires Mellanox mverbs, FCA library, and ConnectX-2 adapters Build example for FCA installed in default /opt/mellanox/fca directory: ./configure --enable-coll-type=fca LDFLAGS=-L/opt/mellanox/fca/lib CPPFLAGS=-I/opt/mellanox/fca/include/ Summary of v2.0 changes: Release 2.0.33 fixes (OFED 1.5.4 RC1): scm,ucm: fix compatibility issues and set minimum protocol support build: link librdmacm dependency to ib_acm usage for ucm and scm providers build: add selective enable/disable-xxx build switch for each provider build: add extended header files to EXTRA_DIST and fix missing backslash build: set IB extended coll-type to none by default common: change errno mapping of EINVAL to DAT_INVALID_PARAMETER build: add IB collective and FCA provider to dapl build package as an option common: add new dapls_evd_post_event_ext call for extended events ucm: add support for IB collective providers scm: add support for IB collective providers cma: add support for IB collective providers common: add supported collective types in named attributes for query common: add collective call mappings via standard dapli_post_ext() common: new debug bitmask definition for extension logging common: new IB collective provider for Mellanox Fabric Collective Agent dat: add definitions for MPI offloaded collectives in IB transport extensions common: cleanup debug messages when building with ibacm feature -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] DAPL v2.0: scm,ucm: fix compatibility issue and set minimum protocol support
allow latest version to work with previous version. provide compatibility back to OFED 1.5, dapl-2.0.23. if rdma_atomic_in is not exchanged, default back to original settings set by consumer. maintain compatibility moving forward from v6. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- dapl/openib_common/dapl_ib_common.h |1 + dapl/openib_scm/cm.c| 16 ++-- dapl/openib_ucm/cm.c| 14 +- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/dapl/openib_common/dapl_ib_common.h b/dapl/openib_common/dapl_ib_common.h index 8993a24..a0dd772 100644 --- a/dapl/openib_common/dapl_ib_common.h +++ b/dapl/openib_common/dapl_ib_common.h @@ -59,6 +59,7 @@ typedef ib_hca_handle_t dapl_ibal_ca_t; /* QP info to exchange, wire protocol version for these CM's */ #define DCM_VER 7 +#define DCM_VER_MIN 6 /* backward compatibility limit */ /* CM private data areas, same for all operations */ #defineDCM_MAX_PDATA_SIZE 118 diff --git a/dapl/openib_scm/cm.c b/dapl/openib_scm/cm.c index 1145f17..305f85b 100644 --- a/dapl/openib_scm/cm.c +++ b/dapl/openib_scm/cm.c @@ -712,7 +712,7 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) dapl_dbg_log(DAPL_DBG_TYPE_EP, connect_rtu: recv peer QP data\n); len = recv(cm_ptr-socket, (char *)cm_ptr-msg, exp, 0); - if (len != exp || ntohs(cm_ptr-msg.ver) != DCM_VER) { + if (len != exp || ntohs(cm_ptr-msg.ver) DCM_VER_MIN) { int err = dapl_socket_errno(); dapl_log(DAPL_DBG_TYPE_WARN, CONN_RTU read: sk %d ERR 0x%x, rcnt=%d, v=%d - %s PORT L-%x R-%x PID L-%x R-%x\n, @@ -807,8 +807,10 @@ static void dapli_socket_connect_rtu(dp_ib_cm_handle_t cm_ptr) } /* rdma_out, initiator, cannot exceed remote rdma_in max */ - ep_ptr-param.ep_attr.max_rdma_read_out = - DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_out, cm_ptr-msg.rd_in); + if (ntohs(cm_ptr-msg.ver) = 7) + ep_ptr-param.ep_attr.max_rdma_read_out = + DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_out, +cm_ptr-msg.rd_in); /* modify QP to RTR and then to RTS with remote info */ dapl_os_lock(ep_ptr-header.lock); @@ -1089,7 +1091,7 @@ static void dapli_socket_accept_data(ib_cm_srvc_handle_t acm_ptr) /* read in DST QP info, IA address. check for private data */ len = recv(acm_ptr-socket, (char *)acm_ptr-msg, exp, 0); - if (len != exp || ntohs(acm_ptr-msg.ver) != DCM_VER) { + if (len != exp || ntohs(acm_ptr-msg.ver) DCM_VER_MIN) { int err = dapl_socket_errno(); dapl_log(DAPL_DBG_TYPE_ERR, ACCEPT read: ERR 0x%x %s, rcnt=%d, ver=%d\n, @@ -1209,8 +1211,10 @@ dapli_socket_accept_usr(DAPL_EP * ep_ptr, } #endif /* rdma_out, initiator, cannot exceed remote rdma_in max */ - ep_ptr-param.ep_attr.max_rdma_read_out = - DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_out, cm_ptr-msg.rd_in); + if (ntohs(cm_ptr-msg.ver) = 7) + ep_ptr-param.ep_attr.max_rdma_read_out = + DAPL_MIN(ep_ptr-param.ep_attr.max_rdma_read_out, +cm_ptr-msg.rd_in); /* modify QP to RTR and then to RTS with remote info already read */ dapl_os_lock(ep_ptr-header.lock); diff --git a/dapl/openib_ucm/cm.c b/dapl/openib_ucm/cm.c index ec6a774..2d2063e 100644 --- a/dapl/openib_ucm/cm.c +++ b/dapl/openib_ucm/cm.c @@ -554,7 +554,7 @@ retry: (void*)wc[i].wr_id, wc[i].src_qp); /* validate CM message, version */ - if (ntohs(msg-ver) != DCM_VER) { + if (ntohs(msg-ver) DCM_VER_MIN) { dapl_log(DAPL_DBG_TYPE_WARN, ucm_recv: UNKNOWN msg %p, ver %d\n, msg, msg-ver); @@ -1092,8 +1092,10 @@ static void ucm_connect_rtu(dp_ib_cm_handle_t cm, ib_cm_msg_t *msg) dapl_os_unlock(cm-lock); /* rdma_out, initiator, cannot exceed remote rdma_in max */ -cm-ep-param.ep_attr.max_rdma_read_out = -DAPL_MIN(cm-ep-param.ep_attr.max_rdma_read_out, cm-msg.rd_in); + if (ntohs(cm-msg.ver) = 7) + cm-ep-param.ep_attr.max_rdma_read_out = + DAPL_MIN(cm-ep-param.ep_attr.max_rdma_read_out, +cm-msg.rd_in); /* modify QP to RTR and then to RTS with remote info */ dapl_os_lock(cm-ep-header.lock); @@ -1526,8 +1528,10 @@ dapli_accept_usr(DAPL_EP *ep, DAPL_CR *cr, DAT_COUNT p_size, DAT_PVOID p_data) #endif /* rdma_out, initiator, cannot exceed remote rdma_in max */ -ep-param.ep_attr.max_rdma_read_out = -
[PATCH] DAPL v2.0: build: set IB extended collective type to none by default
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- configure.in |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/configure.in b/configure.in index 75c002c..56058a0 100644 --- a/configure.in +++ b/configure.in @@ -102,7 +102,7 @@ AC_ARG_ENABLE(coll-type, echo Unknown IB collective type' type exit -1 fi - ],[coll_type=fca]) + ],[coll_type=none]) AM_CONDITIONAL(COLL_TYPE_FCA, test $coll_type = fca) dnl Check for Redhat EL release 4 -- 1.7.3 N�r��yb�X��ǧv�^�){.n�+{��ٚ�{ay�ʇڙ�,j��f���h���z��w��� ���j:+v���w�j�mzZ+�ݢj��!�i
[PATCH] DAPL v2.0: build: add extended header files to EXTRA_DIST and fix missing backslash
Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/Makefile.am b/Makefile.am index 55fe45b..1f74378 100755 --- a/Makefile.am +++ b/Makefile.am @@ -18,14 +18,18 @@ endif if EXT_TYPE_IB XFLAGS = -DDAT_EXTENSIONS XPROGRAMS = dapl/openib_common/ib_extensions.c +XHEADERS = +XLIBS = if COLL_TYPE_FCA XFLAGS += -DDAT_IB_COLLECTIVES -DDAT_FCA_PROVIDER XPROGRAMS += dapl/openib_common/collectives/fca_provider.c -XLIBS = -lfca +XHEADERS += dapl/openib_common/collectives/ib_collectives.h dapl/openib_common/collectives/fca_provider.h +XLIBS += -lfca endif else XFLAGS = XPROGRAMS = +XHEADERS = XLIBS = endif @@ -504,7 +508,7 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ dapl/include/dapl_ipoib_names.h \ dapl/include/dapl_vendor.h \ dapl/openib_common/dapl_ib_dto.h \ -dapl/openib_common/dapl_ib_common.h +dapl/openib_common/dapl_ib_common.h \ dapl/openib_cma/dapl_ib_util.h \ dapl/openib_cma/linux/openib_osd.h \ dapl/openib_scm/dapl_ib_util.h \ @@ -546,7 +550,7 @@ EXTRA_DIST = dat/common/dat_dictionary.h \ test/dapltest/include/dapl_transaction_stats.h \ test/dapltest/include/dapl_transaction_test.h \ test/dapltest/include/dapl_version.h \ -test/dapltest/mdep/linux/dapl_mdep_user.h $(XPROGRAMS) +test/dapltest/mdep/linux/dapl_mdep_user.h $(XHEADERS) dist-hook: dapl.spec cp dapl.spec $(distdir) -- 1.7.3
[PATCH] DAPL v2.0: build: add selective enable/disable-xxx build switch for each openfabrics provider
The following switches have been added to configure: --disable-cma (disables the rdma_cm dapl provider build) --disable-scm (disables the socket cm provider build) --disable-ucm (disables the IB UD cm provider build) all providers are enabled by default. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am | 42 -- configure.in | 33 + 2 files changed, 69 insertions(+), 6 deletions(-) diff --git a/Makefile.am b/Makefile.am index 1f74378..a430568 100755 --- a/Makefile.am +++ b/Makefile.am @@ -44,19 +44,31 @@ AM_CFLAGS = -g -Wall -D_GNU_SOURCE -DDAT_CONF=\$(sysconfdir)/dat.conf\ endif datlibdir = $(libdir) +if DEFINE_CMA dapllibofadir = $(libdir) +endif +if DEFINE_SCM daplliboscmdir = $(libdir) +endif +if DEFINE_UCM daplliboucmdir = $(libdir) +endif datlib_LTLIBRARIES = dat/udat/libdat2.la +if DEFINE_CMA dapllibofa_LTLIBRARIES = dapl/udapl/libdaplofa.la +endif +if DEFINE_SCM daplliboscm_LTLIBRARIES = dapl/udapl/libdaploscm.la +endif +if DEFINE_UCM daplliboucm_LTLIBRARIES = dapl/udapl/libdaploucm.la +endif dat_udat_libdat2_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -I$(srcdir)/dat/include/ -I$(srcdir)/dat/udat/ \ -I$(srcdir)/dat/udat/linux -I$(srcdir)/dat/common/ - +if DEFINE_CMA dapl_udapl_libdaplofa_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ @@ -64,7 +76,8 @@ dapl_udapl_libdaplofa_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS -I$(srcdir)/dapl/openib_common \ -I$(srcdir)/dapl/openib_cma \ -I$(srcdir)/dapl/openib_cma/linux - +endif +if DEFINE_SCM dapl_udapl_libdaploscm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ @@ -72,7 +85,8 @@ dapl_udapl_libdaploscm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAG -I$(srcdir)/dapl/openib_common \ -I$(srcdir)/dapl/openib_scm \ -I$(srcdir)/dapl/openib_scm/linux - +endif +if DEFINE_UCM dapl_udapl_libdaploucm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAGS) \ -DOPENIB -DCQ_WAIT_OBJECT \ -I$(srcdir)/dat/include/ -I$(srcdir)/dapl/include/ \ @@ -80,18 +94,30 @@ dapl_udapl_libdaploucm_la_CFLAGS = $(AM_CFLAGS) -D_GNU_SOURCE $(OSFLAGS) $(XFLAG -I$(srcdir)/dapl/openib_common \ -I$(srcdir)/dapl/openib_ucm \ -I$(srcdir)/dapl/openib_ucm/linux - +endif if HAVE_LD_VERSION_SCRIPT dat_version_script = -Wl,--version-script=$(srcdir)/dat/udat/libdat2.map +if DEFINE_CMA daplofa_version_script = -Wl,--version-script=$(srcdir)/dapl/udapl/libdaplofa.map +endif +if DEFINE_SCM daploscm_version_script = -Wl,--version-script=$(srcdir)/dapl/udapl/libdaploscm.map +endif +if DEFINE_UCM daploucm_version_script = -Wl,--version-script=$(srcdir)/dapl/udapl/libdaploucm.map +endif else dat_version_script = +if DEFINE_CMA daplofa_version_script = +endif +if DEFINE_SCM daploscm_version_script = +endif +if DEFINE_UCM daploucm_version_script = endif +endif # # uDAT: libdat2.so @@ -108,6 +134,7 @@ dat_udat_libdat2_la_SOURCES = dat/udat/udat.c \ dat/common/dat_sr.c dat_udat_libdat2_la_LDFLAGS = -version-info 2:0:0 $(dat_version_script) -ldl +if DEFINE_CMA # # uDAPL OpenFabrics rdma_cm version: libdaplofa.so # @@ -221,7 +248,8 @@ dapl_udapl_libdaplofa_la_SOURCES = dapl/udapl/dapl_init.c \ dapl_udapl_libdaplofa_la_LDFLAGS = -version-info 2:0:0 $(daplofa_version_script) \ -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ -lpthread -libverbs -lrdmacm $(XLIBS) - +endif +if DEFINE_SCM # # uDAPL OpenFabrics Socket CM version for IB: libdaplscm.so # @@ -335,7 +363,8 @@ dapl_udapl_libdaploscm_la_SOURCES = dapl/udapl/dapl_init.c \ dapl_udapl_libdaploscm_la_LDFLAGS = -version-info 2:0:0 $(daploscm_version_script) \ -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ -lpthread -libverbs -lrdmacm $(XLIBS) - +endif +if DEFINE_UCM # # uDAPL OpenFabrics UD CM version for IB: libdaplucm.so # @@ -449,6 +478,7 @@ dapl_udapl_libdaploucm_la_SOURCES =
[PATCH] DAPL v2.0: build: link librdmacm dependency to ib_acm usage for ucm and scm providers
Add -lrdmacm to XLIBS for ucm and scm providers. Only set library linking with conditional use of ib_acm as defined by DAPL_USE_IBACM. Signed-off-by: Arlin Davis arlin.r.da...@intel.com --- Makefile.am |8 ++-- configure.in |4 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Makefile.am b/Makefile.am index a430568..a9bdeda 100755 --- a/Makefile.am +++ b/Makefile.am @@ -33,6 +33,10 @@ XHEADERS = XLIBS = endif +if DAPL_USE_IBACM +XLIBS += -lrdmacm +endif + if DEFINE_ATTR_LINK_LAYER XFLAGS += -DDEFINE_ATTR_LINK_LAYER endif @@ -362,7 +366,7 @@ dapl_udapl_libdaploscm_la_SOURCES = dapl/udapl/dapl_init.c \ dapl_udapl_libdaploscm_la_LDFLAGS = -version-info 2:0:0 $(daploscm_version_script) \ -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ - -lpthread -libverbs -lrdmacm $(XLIBS) + -lpthread -libverbs $(XLIBS) endif if DEFINE_UCM # @@ -477,7 +481,7 @@ dapl_udapl_libdaploucm_la_SOURCES = dapl/udapl/dapl_init.c \ dapl_udapl_libdaploucm_la_LDFLAGS = -version-info 2:0:0 $(daploscm_version_script) \ -Wl,-init,dapl_init -Wl,-fini,dapl_fini \ - -lpthread -libverbs -lrdmacm $(XLIBS) + -lpthread -libverbs $(XLIBS) endif libdatincludedir = $(includedir)/dat2 diff --git a/configure.in b/configure.in index 30524d9..fb405ff 100644 --- a/configure.in +++ b/configure.in @@ -47,7 +47,11 @@ dnl End check for libraries if test $with_ib_acm != test $with_ib_acm != no; then AC_DEFINE(DAPL_USE_IBACM, 1, [set to 1 to use IB ACM services]) +ac_use_acm=yes +else +ac_use_acm=no fi +AM_CONDITIONAL(DAPL_USE_IBACM, test $ac_use_acm = yes) AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, if test -n `$LD --help /dev/null 2/dev/null | grep version-script`; then -- 1.7.3 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html