Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler
On Tue, Sep 11, 2012 at 9:03 AM, Jack Morgenstein ja...@dev.mellanox.co.il wrote: On Monday 10 September 2012 16:27, Or Gerlitz wrote: I took a look on the practice/wrapping used over the mm subsystem for radix_tree_lookup calls, whose maintainer, Andrew Morton is signed on the patch Roland pointed to, its just rcu_read_lock/unlock, seems this is what to do as well. In addition, need to do a synchronize_rcu when deleting patch? -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IB: new module params. cm_response_timeout, max_cm_retries
Le lundi 10 septembre 2012 à 19:11 +, Hefty, Sean a écrit : Create two kernel parameters, in order to make variables configurable. i.e. cma_cm_response_timeout for CM response timeout, and cma_max_cm_retries for the number of retries. They can now be configured via command line for the kernel modules. For example: # modprobe ib_srp cma_cm_response_timeout=30 cma_max_cm_retries=60 Rather than using a module parameter, I'd rather see this these values be controlled through /proc/sys/net/rdma_cm, similar to how the rdma_ucm handles max_backlog. Having them is better so that one can try different values without unloading the module. It would also be great to have default parameters to be applied to all CM loaded.: eg. rdma_cm / ib_srp default parameters should be made available from ib_cm ? For the rdma_cm, I also prefer something more generic. CM retries is fine, but exposing wonky IB timeout (4.096 x 2^X us) to the user is less than ideal. Sure, but which kind of approximation the kernel module is going to do ? Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] opensm: Add .gitignore
Signed-off-by: Bart Van Assche bvanass...@acm.org --- .gitignore | 43 +++ 1 files changed, 43 insertions(+), 0 deletions(-) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 000..4dea307 --- /dev/null +++ b/.gitignore @@ -0,0 +1,43 @@ +*.la +*.lo +*.o +.deps +.libs +aclocal.m4 +autom4te.cache/ +config.log +config.status +config/config.guess +config/config.sub +config/depcomp +config/install-sh +config/libtool.m4 +config/ltmain.sh +config/ltoptions.m4 +config/ltsugar.m4 +config/ltversion.m4 +config/lt~obsolete.m4 +config/missing +config/ylwrap +configure +include/config.h +include/config.h.in +include/opensm/osm_config.h +include/opensm/osm_version.h +include/opensm/stamp-h2 +include/stamp-h1 +libtool +Makefile +Makefile.in +man/opensm.8 +man/torus-2QoS.8 +man/torus-2QoS.conf.5 +opensm.spec +opensm/opensm +opensm/osm_qos_parser_l.c +opensm/osm_qos_parser_y.c +opensm/osm_qos_parser_y.h +osmtest/osmtest +scripts/opensm.init +scripts/redhat-opensm.init +scripts/sldd.sh -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] opensm: osm_pkey: Remove unused variables
Signed-off-by: Bart Van Assche bvanass...@acm.org --- opensm/osm_pkey.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/opensm/osm_pkey.c b/opensm/osm_pkey.c index 98e2aee..bb45f57 100644 --- a/opensm/osm_pkey.c +++ b/opensm/osm_pkey.c @@ -369,9 +369,6 @@ ib_net16_t osm_physp_find_common_pkey(IN const osm_physp_t * p_physp1, uint64_t pkey1_base, pkey2_base; const osm_pkey_tbl_t *pkey_tbl1, *pkey_tbl2; cl_map_iterator_t map_iter1, map_iter2; - ib_net16_t key; - const osm_pkey_tbl_t *pkey_tbl; - cl_map_iterator_t map_iter, map_end; pkey_tbl1 = osm_physp_get_pkey_tbl(p_physp1); pkey_tbl2 = osm_physp_get_pkey_tbl(p_physp2); -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] opensm: Add command-line option --pidfile
This option is necessary to control opensm from an LSB-compliant init script. Signed-off-by: Bart Van Assche bvanass...@acm.org --- opensm/main.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/opensm/main.c b/opensm/main.c index e9a0b4c..1a061a8 100644 --- a/opensm/main.c +++ b/opensm/main.c @@ -68,6 +68,7 @@ volatile unsigned int osm_exit_flag = 0; static volatile unsigned int osm_hup_flag = 0; static volatile unsigned int osm_usr1_flag = 0; +static char *pidfile; #define MAX_LOCAL_IBPORTS 64 #define INVALID_GUID (0xULL) @@ -498,10 +499,17 @@ static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, uint64_t port_guid) return attr_array[choice].port_guid; } +static void remove_pidfile(void) +{ + if (pidfile) + unlink(pidfile); +} + static int daemonize(osm_opensm_t * osm) { pid_t pid; int fd; + FILE *f; fd = open(/dev/null, O_WRONLY); if (fd 0) { @@ -523,6 +531,18 @@ static int daemonize(osm_opensm_t * osm) } else if (pid 0) exit(0); + if (pidfile) { + remove_pidfile(); + f = fopen(pidfile, w); + if (f) { + fprintf(f, %d\n, getpid()); + fclose(f); + } else { + perror(fopen); + exit(1); + } + } + close(0); close(1); close(2); @@ -649,6 +669,7 @@ int main(int argc, char *argv[]) {console-port, 1, NULL, 'C'}, #endif {daemon, 0, NULL, 'B'}, + {pidfile, 1, NULL, 'J'}, {inactive, 0, NULL, 'I'}, #ifdef ENABLE_OSM_PERF_MGR {perfmgr, 0, NULL, 1}, @@ -887,6 +908,10 @@ int main(int argc, char *argv[]) printf( Creating new log file\n); break; + case 'J': + pidfile = optarg; + break; + case 'P': SET_STR_OPT(opt.partition_config_file, optarg); break; @@ -1212,6 +1237,7 @@ int main(int argc, char *argv[]) Exit: osm_opensm_destroy(osm); complib_exit(); + remove_pidfile(); exit(0); } -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] opensm: /etc/init.d/opensmd: Port to Debian
Signed-off-by: Bart Van Assche bvanass...@acm.org --- scripts/opensm.init.in | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/scripts/opensm.init.in b/scripts/opensm.init.in index 01d2bb9..1b9348c 100644 --- a/scripts/opensm.init.in +++ b/scripts/opensm.init.in @@ -45,14 +45,20 @@ exec_prefix=@exec_prefix@ # Source function library. if [[ -s /etc/init.d/functions ]]; then +# RHEL / Fedora. . /etc/init.d/functions rc_status() { :; } rc_exit() { exit $RETVAL; } -fi -if [[ -s /etc/rc.status ]]; then +elif [[ -s /etc/rc.status ]]; then . /etc/rc.status failure() { rc_status -v; } success() { rc_status -v; } +elif [[ -s /lib/lsb/init-functions ]]; then +# SLES / openSuSE / Debian. +. /lib/lsb/init-functions +rc_exit() { exit $RETVAL; } +failure() { log_failure_msg; } +success() { log_success_msg; } fi CONFIG=@sysconfdir@/sysconfig/opensm @@ -62,7 +68,7 @@ fi start () { echo -n Starting opensm: -@sbindir@/opensm --daemon $OPTIONS /dev/null +@sbindir@/opensm --daemon --pidfile /var/run/opensm.pid $OPTIONS /dev/null if [[ $RETVAL -eq 0 ]]; then touch /var/lock/subsys/opensm success -- 1.7.7 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] opensm/complib: define if statements with branch prediction hints
I would not abstract the 'if' statement. If CL_PREDICT_FALSE/TRUE are not readable, then shorten those. if (PF(...)) is just as readable as if_PF(...) OK, agree. I'll issue a v2 shortly - the only difference would be change in this macro and rebase to the updated trunk. -- YK -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8 v2] opensm/complib: define macros for for if statements with branch prediction hints
Defined PT and PF for predict true and predict false respectively. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- include/complib/cl_types_osd.h | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/include/complib/cl_types_osd.h b/include/complib/cl_types_osd.h index ce1a452..2538913 100644 --- a/include/complib/cl_types_osd.h +++ b/include/complib/cl_types_osd.h @@ -1,6 +1,6 @@ /* * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * * This software is available to you under a choice of one of two @@ -64,6 +64,18 @@ BEGIN_C_DECLS #include inttypes.h #include assert.h #include string.h + +/* + * Branch prediction hints + */ +#if defined(HAVE_BUILTIN_EXPECT) +#define PT(exp)__builtin_expect( ((uintptr_t)(exp)), 1 ) +#define PF(exp)__builtin_expect( ((uintptr_t)(exp)), 0 ) +#else +#define PT(exp)(exp) +#define PF(exp)(exp) +#endif + #if defined (_DEBUG_) #define CL_ASSERT assert #else /* _DEBUG_ */ -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8 v2] opensm/libvendor/osm_vendor_ibumad_sa.c: use wrapper function instead of direct access
Use existing wrapper function to get to context instead of direct access. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- libvendor/osm_vendor_ibumad_sa.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/libvendor/osm_vendor_ibumad_sa.c b/libvendor/osm_vendor_ibumad_sa.c index 1d482c0..f715cf6 100644 --- a/libvendor/osm_vendor_ibumad_sa.c +++ b/libvendor/osm_vendor_ibumad_sa.c @@ -84,9 +84,8 @@ __osmv_sa_mad_rcv_cb(IN osm_madw_t * p_madw, } /* obtain the sent context since we store it during send in the ni_ctx */ - p_query_req_copy = - (osmv_query_req_t *) (uintptr_t)(p_req_madw-context.ni_context. - node_guid); + p_query_req_copy = (osmv_query_req_t *) +(uintptr_t)(osm_madw_get_ni_context_ptr(p_req_madw)-node_guid); /* provide the context of the original request in the result */ query_res.query_context = p_query_req_copy-query_context; @@ -180,9 +179,8 @@ static void __osmv_sa_mad_err_cb(IN void *bind_context, IN osm_madw_t * p_madw) OSM_LOG_ENTER(p_bind-p_log); /* Obtain the sent context etc */ - p_query_req_copy = - (osmv_query_req_t *) (uintptr_t)(p_madw-context.ni_context. - node_guid); + p_query_req_copy = (osmv_query_req_t *) +(uintptr_t)(osm_madw_get_ni_context_ptr(p_madw)-node_guid); /* provide the context of the original request in the result */ query_res.query_context = p_query_req_copy-query_context; @@ -433,7 +431,7 @@ __osmv_send_sa_req(IN osmv_sa_bind_info_t * p_bind, goto Exit; } *p_query_req_copy = *p_query_req; - p_madw-context.ni_context.node_guid = + osm_madw_get_ni_context_ptr(p_madw)-node_guid = (ib_net64_t) (uintptr_t)p_query_req_copy; /* we can support async as well as sync calls */ -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8 v2] opensm/libvendor/osm_vendor_ibumad.c:rename mad to p_mad to indicate pointer
Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- libvendor/osm_vendor_ibumad.c | 44 +-- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c index b068443..e0c9f90 100644 --- a/libvendor/osm_vendor_ibumad.c +++ b/libvendor/osm_vendor_ibumad.c @@ -288,7 +288,7 @@ static void *umad_receiver(void *p_ptr) osm_umad_bind_info_t *p_bind; osm_mad_addr_t osm_addr; osm_madw_t *p_madw, *p_req_madw; - ib_mad_t *mad; + ib_mad_t *p_mad; void *umad = 0; int mad_agent, length; @@ -340,11 +340,11 @@ static void *umad_receiver(void *p_ptr) continue; } - mad = (ib_mad_t *) umad_get_mad(umad); + p_mad = (ib_mad_t *) umad_get_mad(umad); ib_mad_addr_conv(umad, osm_addr, -mad-mgmt_class == IB_MCLASS_SUBN_LID || -mad-mgmt_class == IB_MCLASS_SUBN_DIR); +p_mad-mgmt_class == IB_MCLASS_SUBN_LID || +p_mad-mgmt_class == IB_MCLASS_SUBN_DIR); if (!(p_madw = osm_mad_pool_get(p_bind-p_mad_pool, (osm_bind_handle_t) p_bind, @@ -367,15 +367,15 @@ static void *umad_receiver(void *p_ptr) /* if status != 0 then we are handling recv timeout on send */ if (umad_status(p_madw-vend_wrap.umad)) { - if (!(p_req_madw = get_madw(p_vend, mad-trans_id, - mad-mgmt_class))) { + if (!(p_req_madw = get_madw(p_vend, p_mad-trans_id, + p_mad-mgmt_class))) { OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5412: Failed to obtain request madw for timed out MAD (class=0x%X method=0x%X attr=0x%X tid=0x%PRIx64) -- dropping\n, - mad-mgmt_class, mad-method, - cl_ntoh16(mad-attr_id), - cl_ntoh64(mad-trans_id)); + p_mad-mgmt_class, p_mad-method, + cl_ntoh16(p_mad-attr_id), + cl_ntoh64(p_mad-trans_id)); } else { p_req_madw-status = IB_TIMEOUT; log_send_error(p_vend, p_req_madw); @@ -394,30 +394,30 @@ static void *umad_receiver(void *p_ptr) } p_req_madw = 0; - if (ib_mad_is_response(mad) - !(p_req_madw = get_madw(p_vend, mad-trans_id, - mad-mgmt_class))) { + if (ib_mad_is_response(p_mad) + !(p_req_madw = get_madw(p_vend, p_mad-trans_id, + p_mad-mgmt_class))) { OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5413: Failed to obtain request madw for received MAD (class=0x%X method=0x%X attr=0x%X tid=0x%PRIx64) -- dropping\n, - mad-mgmt_class, mad-method, - cl_ntoh16((mad)-attr_id), - cl_ntoh64(mad-trans_id)); + p_mad-mgmt_class, p_mad-method, + cl_ntoh16(p_mad-attr_id), + cl_ntoh64(p_mad-trans_id)); osm_mad_pool_put(p_bind-p_mad_pool, p_madw); continue; } #ifndef VENDOR_RMPP_SUPPORT - if ((mad-mgmt_class != IB_MCLASS_SUBN_DIR) - (mad-mgmt_class != IB_MCLASS_SUBN_LID) - (ib_rmpp_is_flag_set((ib_rmpp_mad_t *) mad, + if ((p_mad-mgmt_class != IB_MCLASS_SUBN_DIR) + (p_mad-mgmt_class != IB_MCLASS_SUBN_LID) + (ib_rmpp_is_flag_set((ib_rmpp_mad_t *) p_mad, IB_RMPP_FLAG_ACTIVE))) { OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5414: class 0x%x method 0x%x RMPP version %d type %d flags 0x%x received -- dropping\n, - mad-mgmt_class, mad-method, - ((ib_rmpp_mad_t *) mad)-rmpp_version, - ((ib_rmpp_mad_t *) mad)-rmpp_type, - ((ib_rmpp_mad_t *) mad)-rmpp_flags); + p_mad-mgmt_class, p_mad-method, +
[PATCH 6/8 v2] opensm/libvendor/osm_vendor_ibumad.c: validate response MAD properties
Check that attribute ID, attribute modifier and transaction ID are the same in request and response. Note that just by checking these we cover a very wide range of possible bugs in SMAs. Attribute modifier is used in PortInfo, LFT, MFT, and others. Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- libvendor/osm_vendor_ibumad.c | 57 ++- 1 file changed, 45 insertions(+), 12 deletions(-) diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c index e0c9f90..ca320a6 100644 --- a/libvendor/osm_vendor_ibumad.c +++ b/libvendor/osm_vendor_ibumad.c @@ -288,7 +288,7 @@ static void *umad_receiver(void *p_ptr) osm_umad_bind_info_t *p_bind; osm_mad_addr_t osm_addr; osm_madw_t *p_madw, *p_req_madw; - ib_mad_t *p_mad; + ib_mad_t *p_mad, *p_req_mad; void *umad = 0; int mad_agent, length; @@ -394,18 +394,51 @@ static void *umad_receiver(void *p_ptr) } p_req_madw = 0; - if (ib_mad_is_response(p_mad) - !(p_req_madw = get_madw(p_vend, p_mad-trans_id, - p_mad-mgmt_class))) { - OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5413: - Failed to obtain request madw for received MAD -(class=0x%X method=0x%X attr=0x%X tid=0x%PRIx64) -- dropping\n, - p_mad-mgmt_class, p_mad-method, - cl_ntoh16(p_mad-attr_id), - cl_ntoh64(p_mad-trans_id)); - osm_mad_pool_put(p_bind-p_mad_pool, p_madw); - continue; + if (ib_mad_is_response(p_mad)) { + p_req_madw = get_madw(p_vend, p_mad-trans_id, + p_mad-mgmt_class); + if (PF(!p_req_madw)) { + OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, + ERR 5413: Failed to obtain request + madw for received MAD + (class=0x%X method=0x%X attr=0x%X + tid=0x%PRIx64) -- dropping\n, + p_mad-mgmt_class, p_mad-method, + cl_ntoh16(p_mad-attr_id), + cl_ntoh64(p_mad-trans_id)); + osm_mad_pool_put(p_bind-p_mad_pool, p_madw); + continue; + } + + /* +* Check that request MAD was really a request, +* and make sure that attribute ID, attribute +* modifier and transaction ID are the same in +* request and response. +*/ + p_req_mad = osm_madw_get_mad_ptr(p_req_madw); + if (PF(ib_mad_is_response(p_req_mad) || + p_mad-attr_id != p_req_mad-attr_id || + p_mad-attr_mod != p_req_mad-attr_mod || + p_mad-trans_id != p_req_mad-trans_id)) { + OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, + ERR 541A: + Response MAD validation failed + (request attr=0x%X modif=0x%X + tid=0x%PRIx64, + response attr=0x%X modif=0x%X + tid=0x%PRIx64) -- dropping\n, + cl_ntoh16(p_req_mad-attr_id), + cl_ntoh32(p_req_mad-attr_mod), + cl_ntoh64(p_req_mad-trans_id), + cl_ntoh16(p_mad-attr_id), + cl_ntoh32(p_mad-attr_mod), + cl_ntoh64(p_mad-trans_id)); + osm_mad_pool_put(p_bind-p_mad_pool, p_madw); + continue; + } } + #ifndef VENDOR_RMPP_SUPPORT if ((p_mad-mgmt_class != IB_MCLASS_SUBN_DIR) (p_mad-mgmt_class != IB_MCLASS_SUBN_LID) -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8 v2] opensm/osm_port_info_rcv.c: check received local_port_num
Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- opensm/osm_port_info_rcv.c | 16 1 file changed, 16 insertions(+) diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c index c3bc66c..442bc3f 100644 --- a/opensm/osm_port_info_rcv.c +++ b/opensm/osm_port_info_rcv.c @@ -505,6 +505,11 @@ void osm_pi_rcv_process(IN void *context, IN void *data) CL_ASSERT(p_smp-attr_id == IB_MAD_ATTR_PORT_INFO); + /* +* Attribute modifier has already been validated upon MAD receive, +* which means that port_num has to be valid - it originated from +* the request attribute modifier. +*/ port_num = (uint8_t) cl_ntoh32(p_smp-attr_mod); port_guid = p_context-port_guid; @@ -554,6 +559,17 @@ void osm_pi_rcv_process(IN void *context, IN void *data) p_node = p_port-p_node; CL_ASSERT(p_node); + if (p_pi-local_port_num p_node-node_info.num_ports) { + CL_PLOCK_RELEASE(sm-p_lock); + OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F15: + Received PortInfo for port GUID 0x% PRIx64 is + non-compliant and is being ignored since the + local port num %u num ports %u\n, + cl_ntoh64(port_guid), p_pi-local_port_num, + p_node-node_info.num_ports); + goto Exit; + } + /* If we were setting the PortInfo, then receiving this attribute was not part of sweeping the subnet. -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/8 v2] opensm/osm_port_info_rcv.c: use PF() hint on fatal conditions
Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il --- opensm/osm_port_info_rcv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c index 442bc3f..2a6d037 100644 --- a/opensm/osm_port_info_rcv.c +++ b/opensm/osm_port_info_rcv.c @@ -1,6 +1,6 @@ /* * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved. - * Copyright (c) 2002-2011 Mellanox Technologies LTD. All rights reserved. + * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved. * Copyright (c) 1996-2003 Intel Corporation. All rights reserved. * Copyright (c) 2009 HNR Consulting. All rights reserved. * @@ -69,7 +69,7 @@ static void pi_rcv_check_and_fix_lid(osm_log_t * log, ib_port_info_t * pi, osm_physp_t * p) { - if (cl_ntoh16(pi-base_lid) IB_LID_UCAST_END_HO) { + if (PF(cl_ntoh16(pi-base_lid) IB_LID_UCAST_END_HO)) { OSM_LOG(log, OSM_LOG_ERROR, ERR 0F04: Got invalid base LID %u from the network. Corrected to %u\n, cl_ntoh16(pi-base_lid), @@ -545,7 +545,7 @@ void osm_pi_rcv_process(IN void *context, IN void *data) CL_PLOCK_EXCL_ACQUIRE(sm-p_lock); p_port = osm_get_port_by_guid(sm-p_subn, port_guid); - if (!p_port) { + if (PF(!p_port)) { CL_PLOCK_RELEASE(sm-p_lock); OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F06: No port object for port with GUID 0x% PRIx64 @@ -559,7 +559,7 @@ void osm_pi_rcv_process(IN void *context, IN void *data) p_node = p_port-p_node; CL_ASSERT(p_node); - if (p_pi-local_port_num p_node-node_info.num_ports) { + if (PF(p_pi-local_port_num p_node-node_info.num_ports)) { CL_PLOCK_RELEASE(sm-p_lock); OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F15: Received PortInfo for port GUID 0x% PRIx64 is -- 1.7.11.1 -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V2 02/22] IB/core: change pkey table lookups to support full and partial membership for the same pkey
On 8/3/2012 4:40 AM, Jack Morgenstein wrote: Enhance the cached and non-cached pkey table lookups to enable limited and full members of the same pkey to co-exist in the pkey table. This is necessary for SRIOV to allow for a scheme where some guests would have the full membership pkey in their virtual pkey table, where other guests on the same hypervisor would have the limited one. In that sense, its an extension of the IBTA model for non virtualized nodes. OK, maybe I'm not getting something, but I'm curious why we always pick the full pkey in preference to the partial pkey. Shouldn't we pick the pkey that's appropriate for the vHCA sending the message? Also, given the rule of least surprise, don't you think it would be best to rename this function ib_find_cached_full_or_parital_pkey and in your next patch instead of naming it ib_find_exact_pkey just call that one ib_find_cached_pkey? To accomplish this, we need both the limited and full membership pkeys to be present in the master's (hypervisor physical port) pkey table. The algorithm for supporting pkey tables which contain both the limited and the full membership versions of the same pkey works as follows: When scanning the pkey table for a 15 bit pkey: A. If there is a full member version of that pkey anywhere in the table, return its index (even if a limited-member version of the pkey exists earlier in the table). B. If the full member version is not in the table, but the limited-member version is in the table, return the index of the limited pkey. Signed-off-by: Liran Liss lir...@mellanox.com Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/cache.c | 14 +++--- drivers/infiniband/core/device.c | 17 + 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 9353992..0f2f2b7 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -167,6 +167,7 @@ int ib_find_cached_pkey(struct ib_device *device, unsigned long flags; int i; int ret = -ENOENT; + int partial_ix = -1; if (port_num start_port(device) || port_num end_port(device)) return -EINVAL; @@ -179,10 +180,17 @@ int ib_find_cached_pkey(struct ib_device *device, for (i = 0; i cache-table_len; ++i) if ((cache-table[i] 0x7fff) == (pkey 0x7fff)) { - *index = i; - ret = 0; - break; + if (cache-table[i] 0x8000) { + *index = i; + ret = 0; + break; + } else + partial_ix = i; } + if (ret partial_ix = 0) { + *index = partial_ix; + ret = 0; + } read_unlock_irqrestore(device-cache.lock, flags); diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index e711de4..a645c68 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -707,18 +707,27 @@ int ib_find_pkey(struct ib_device *device, { int ret, i; u16 tmp_pkey; + int partial_ix = -1; for (i = 0; i device-pkey_tbl_len[port_num - start_port(device)]; ++i) { ret = ib_query_pkey(device, port_num, i, tmp_pkey); if (ret) return ret; - if ((pkey 0x7fff) == (tmp_pkey 0x7fff)) { - *index = i; - return 0; + /* if there is full-member pkey take it.*/ + if (tmp_pkey 0x8000) { + *index = i; + return 0; + } + if (partial_ix 0) + partial_ix = i; } } - + /*no full-member, if exists take the limited*/ + if (partial_ix = 0) { + *index = partial_ix; + return 0; + } return -ENOENT; } EXPORT_SYMBOL(ib_find_pkey); -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match
On 8/3/2012 4:40 AM, Jack Morgenstein wrote: When port pkey table potentially contains both full and partial membership copies for the same pkey, we need a function to find the exact (16-bit) pkey index. The code on this patch is fine, just see my previous email about the function naming... This is particularly necessary when the master forwards QP1 MADS sent by guests. If the guest has sent the MAD with a limited membership pkey, we wish to forward the MAD using the same limited membership pkey. Since master may have both the limited and the full member pkeys in its table, we must make sure to retrieve the limited membership pkey in this case. This requires the 16-bit pkey lookup function (which includes the membership bit). Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il Signed-off-by: Or Gerlitz ogerl...@mellanox.com --- drivers/infiniband/core/cache.c | 32 include/rdma/ib_cache.h | 16 2 files changed, 48 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 0f2f2b7..d8a8c83 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -198,6 +198,38 @@ int ib_find_cached_pkey(struct ib_device *device, } EXPORT_SYMBOL(ib_find_cached_pkey); +int ib_find_exact_cached_pkey(struct ib_device *device, + u8port_num, + u16 pkey, + u16 *index) +{ + struct ib_pkey_cache *cache; + unsigned long flags; + int i; + int ret = -ENOENT; + + if (port_num start_port(device) || port_num end_port(device)) + return -EINVAL; + + read_lock_irqsave(device-cache.lock, flags); + + cache = device-cache.pkey_cache[port_num - start_port(device)]; + + *index = -1; + + for (i = 0; i cache-table_len; ++i) + if (cache-table[i] == pkey) { + *index = i; + ret = 0; + break; + } + + read_unlock_irqrestore(device-cache.lock, flags); + + return ret; +} +EXPORT_SYMBOL(ib_find_exact_cached_pkey); + int ib_get_cached_lmc(struct ib_device *device, u8port_num, u8*lmc) diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h index 00a2b8e..ad9a3c2 100644 --- a/include/rdma/ib_cache.h +++ b/include/rdma/ib_cache.h @@ -101,6 +101,22 @@ int ib_find_cached_pkey(struct ib_device*device, u16 *index); /** + * ib_find_exact_cached_pkey - Returns the PKey table index where a specified + * PKey value occurs. Comparison uses the FULL 16 bits (incl membership bit) + * @device: The device to query. + * @port_num: The port number of the device to search for the PKey. + * @pkey: The PKey value to search for. + * @index: The index into the cached PKey table where the PKey was found. + * + * ib_find_exact_cached_pkey() searches the specified PKey table in + * the local software cache. + */ +int ib_find_exact_cached_pkey(struct ib_device*device, + u8 port_num, + u16 pkey, + u16 *index); + +/** * ib_get_cached_lmc - Returns a cached lmc table entry * @device: The device to query. * @port_num: The port number of the device to query. -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V2 04/22] IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support
On 8/3/2012 4:40 AM, Jack Morgenstein wrote: 1. Introduce the basic sriov parvirtualization context objects for multiplexing and demultiplexing MADs. 2. Introduce support for the new proxy and tunnel QP types. This patch introduces the objects required by the master for managing QP paravirtualization for guests. struct mlx4_ib_sriov{} is created by the master only. It is a container for the following: 1. All the info required by the PPF to multiplex and de-multiplex MADs (including those from the PF). (struct mlx4_ib_demux_ctx demux) OK, so can we have at least a single reference to the various abbreviations before using them exclusively? I know PF and PPF may be common, but it might be nice that they were used once in full form before abbreviated in commit messages. 2. All the info required to manage alias GUIDs (i.e., the GUID at index 0 that each guest perceives. In fact, this is not the GUID which is actually at index 0, but is, in fact, the GUID which is at index[VF number] in the physical table. OK, this has been one of the things that has made reviewing this difficult. I freely admit that I've steadfastly ignored SRIOV for as long as I can, so maybe this is just me. But, in the context of this driver, how am I supposed to know which code paths will be on the host and which on the guest? Also, I note that you do math every time you want to know if you are on a parent device or a virtual device. Do you really want to do math all the time, or would it be better to save off your status on device init and just refer to that when you would do math in this patch? -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match
On 8/3/2012 4:40 AM, Jack Morgenstein wrote: When port pkey table potentially contains both full and partial membership copies for the same pkey, we need a function to find the exact (16-bit) pkey index. This is particularly necessary when the master forwards QP1 MADS sent by guests. If the guest has sent the MAD with a limited membership pkey, we wish to forward the MAD using the same limited membership pkey. Since master may have both the limited and the full member pkeys in its table, we must make sure to retrieve the limited membership pkey in this case. This requires the 16-bit pkey lookup function (which includes the membership bit). As a second note, I would like to know why Intel (previously QLogic) does not use these functions in their driver and what it would take to get all drivers to use the functions. Do we need to add more to them? In my opinion these should be generally useful and used by all drivers. Mike? -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature
Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match
On Tue, Sep 11, 2012 at 10:12 AM, Doug Ledford dledf...@redhat.com wrote: As a second note, I would like to know why Intel (previously QLogic) does not use these functions in their driver and what it would take to get all drivers to use the functions. Do we need to add more to them? In my opinion these should be generally useful and used by all drivers. Use which functions? The P_Key lookup functions? What would a low-level driver use them for? I thought these are for use by upper-level protocols. - R. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] RDMA/cxgb4: move the dereference below the NULL test
applied, thanks -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match
On Tue, Sep 11, 2012 at 1:34 PM, Doug Ledford dledf...@redhat.com wrote: Well, at this point, the mlx4 driver uses them, the rdmacm kernel driver uses them, and both QLogic/Intel drivers have their own internal pkey table implementation. So, it isn't so much upper layer as it is drivers. rdmacm is an upper-level protocol (it's above the midlayer / hardware abstraction). mlx4 and mthca look up P_Keys because of internal details of how they send MADs, and really they should move to maintaining their own P_Key table too. -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] librdmacm/rsockets: Document rsocket protocol and design
Include a brief overview of the rsocket protocol and underlying design with the source code to make it easier for someone trying to decipher the actual code. Signed-off-by: Sean Hefty sean.he...@intel.com --- docs/rsocket | 144 ++ 1 files changed, 144 insertions(+), 0 deletions(-) create mode 100644 docs/rsocket diff --git a/docs/rsocket b/docs/rsocket new file mode 100644 index 000..5399f6c --- /dev/null +++ b/docs/rsocket @@ -0,0 +1,144 @@ +rsocket Protocol and Design Guide 9/10/2012 + +Overview + +Rsockets is a protocol over RDMA that supports a socket-level API +for applications. For details on the current state of the +implementation, readers should refer to the rsocket man page. This +document describes the rsocket protocol, general design, and +some implementation details. + +Rsockets exchanges data by performing RDMA write operations into +exposed data buffers. In addition to RDMA write data, rsockets uses +small, 32-bit messages for internal communication. RDMA writes +are used to transfer application data into remote data buffers +and to notify the peer when new target data buffers are available. +The following figure highlights the operation. + + host A host B + remote SGL + target SGL - [ ] + [ ] -- + [ ] ---- receive buffer(s) +--- +--+ + -- | | +--| | + -- | | +--+--+ + -- +--- +--+ + | | + | | + +--+ + +The remote SGL contains the address, size, and rkey of the target SGL. As +receive buffers become available on host B, rsockets will issue an RDMA +write against one of the entries in the target SGL on host A. The +updated entry will reference an available receive buffer. Immediate data +included with the RDMA write will indicate to host A that a target SGE +has been updated. + +When host A has data to send, it will check its target SGL. The current +target SGE will contain the address, size, and rkey of the next receive +buffer on host B. If the data transfer is smaller than the size of the +remote receive buffer, host A will update its target SGE to reflect the +remaining size of the receive buffer. That is, once a receive buffer has +been published to a remote peer, it will be fully consumed before a second +buffer is used. + +Rsockets relies on immediate data to notify the remote peer when data has +been transferred or when a target SGL has been updated. Because immediate +data requires that the remote QP have a posted receive, rsockets also uses +a credit based flow control mechanism. The number of credits is based on +the size of the receive queue, with initial credits exchanged during +connection setup. In order to transfer data, rsockets requires both +available receive buffers (published via the target SGL) and data credits. + +Since immediate data is limited to 32-bits, messages may either indicate +the arrival of application data or may be an internal message, but not both. +To avoid credit deadlock, rsockets reserves a small number of available +credits for control messages only, with the protocol relying on RNR NAKs +and retries to make forward progress. + + +Connection Establishment + +rsockets uses the RDMA CM for connection establishment. Struct rs_conn_data +is exchanged during the connection exchange as private data in the request +and reply messages. + +struct rs_sge { + uint64_t addr; + uint32_t key; + uint32_t length; +}; + +#define RS_CONN_FLAG_NET 1 + +struct rs_conn_data { + uint8_t version; + uint8_t flags; + uint16_t credits; + uint32_t reserved2; + struct rs_sge target_sgl; + struct rs_sge data_buf; +}; + +Version - current version is 1 +Flags +RS_CONN_FLAG_NET - Set to 1 if host is big Endian. + Determines byte ordering for RDMA write messages +Credits - number of initial receive credits +Reserved2 - set to 0 +Target SGL - Address, size (# entries), and rkey of target SGL. + Remote side will copy this into their remote SGL. +Data Buffer - Initial receive buffer address, size (in bytes), and rkey. + Remote side will copy this into their first target SGE. + + +Message Format +-- +Rsocket uses RDMA writes with immediate data for all message exchanges. +RDMA writes of 0 length are used if no additional data beyond the message +needs to be exchanged. Immediate data is limited to 32-bits. Rsockets +defines the following format for messages. + +The upper 3 bits are used to define the type of message being
Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match
On 9/11/2012 4:43 PM, Roland Dreier wrote: On Tue, Sep 11, 2012 at 1:34 PM, Doug Ledford dledf...@redhat.com wrote: Well, at this point, the mlx4 driver uses them, the rdmacm kernel driver uses them, and both QLogic/Intel drivers have their own internal pkey table implementation. So, it isn't so much upper layer as it is drivers. rdmacm is an upper-level protocol (it's above the midlayer / hardware abstraction). Yeah, I know. My point wasn't that it was a low level item, just that it's the only upper layer consumer that I saw. mlx4 and mthca look up P_Keys because of internal details of how they send MADs, and really they should move to maintaining their own P_Key table too. Why not make the routines useful for all users instead of multiple implementations of the same thing? -- Doug Ledford dledf...@redhat.com GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: OpenPGP digital signature