Re: [PATCH RFC for-next] net/mlx4_core: Fix racy flow in the driver CQ completion handler

2012-09-11 Thread Or Gerlitz
On Tue, Sep 11, 2012 at 9:03 AM, Jack Morgenstein
ja...@dev.mellanox.co.il wrote:
 On Monday 10 September 2012 16:27, Or Gerlitz wrote:
 I  took a look on the practice/wrapping used over the mm subsystem for
 radix_tree_lookup calls, whose maintainer,
 Andrew Morton is signed on the patch Roland pointed to, its just
 rcu_read_lock/unlock, seems this is what to do as well.

 In addition, need to do a synchronize_rcu when deleting

patch?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB: new module params. cm_response_timeout, max_cm_retries

2012-09-11 Thread Yann Droneaud
Le lundi 10 septembre 2012 à 19:11 +, Hefty, Sean a écrit :
  Create two kernel parameters, in order to make variables configurable.
  i.e. cma_cm_response_timeout for CM response timeout,
   and cma_max_cm_retries for the number of retries.
  
  They can now be configured via command line for the kernel modules.
  For example:
  # modprobe ib_srp cma_cm_response_timeout=30 cma_max_cm_retries=60
 
 Rather than using a module parameter, I'd rather see this these values be
 controlled through /proc/sys/net/rdma_cm, similar to how the rdma_ucm
 handles max_backlog.

Having them is better so that one can try different values without
unloading the module.

It would also be great to have default parameters to be applied to all
CM loaded.: eg. rdma_cm / ib_srp default parameters should be made
available from ib_cm ?

 For the rdma_cm, I also prefer something more generic.
 CM retries is fine, but exposing wonky IB timeout (4.096 x 2^X us) to 
 the user is less than ideal.

Sure, but which kind of approximation the kernel module is going to
do ? 

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] opensm: Add .gitignore

2012-09-11 Thread Bart Van Assche
Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 .gitignore |   43 +++
 1 files changed, 43 insertions(+), 0 deletions(-)
 create mode 100644 .gitignore

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 000..4dea307
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,43 @@
+*.la
+*.lo
+*.o
+.deps
+.libs
+aclocal.m4
+autom4te.cache/
+config.log
+config.status
+config/config.guess
+config/config.sub
+config/depcomp
+config/install-sh
+config/libtool.m4
+config/ltmain.sh
+config/ltoptions.m4
+config/ltsugar.m4
+config/ltversion.m4
+config/lt~obsolete.m4
+config/missing
+config/ylwrap
+configure
+include/config.h
+include/config.h.in
+include/opensm/osm_config.h
+include/opensm/osm_version.h
+include/opensm/stamp-h2
+include/stamp-h1
+libtool
+Makefile
+Makefile.in
+man/opensm.8
+man/torus-2QoS.8
+man/torus-2QoS.conf.5
+opensm.spec
+opensm/opensm
+opensm/osm_qos_parser_l.c
+opensm/osm_qos_parser_y.c
+opensm/osm_qos_parser_y.h
+osmtest/osmtest
+scripts/opensm.init
+scripts/redhat-opensm.init
+scripts/sldd.sh
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] opensm: osm_pkey: Remove unused variables

2012-09-11 Thread Bart Van Assche
Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 opensm/osm_pkey.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/opensm/osm_pkey.c b/opensm/osm_pkey.c
index 98e2aee..bb45f57 100644
--- a/opensm/osm_pkey.c
+++ b/opensm/osm_pkey.c
@@ -369,9 +369,6 @@ ib_net16_t osm_physp_find_common_pkey(IN const osm_physp_t 
* p_physp1,
uint64_t pkey1_base, pkey2_base;
const osm_pkey_tbl_t *pkey_tbl1, *pkey_tbl2;
cl_map_iterator_t map_iter1, map_iter2;
-   ib_net16_t key;
-   const osm_pkey_tbl_t *pkey_tbl;
-   cl_map_iterator_t map_iter, map_end;
 
pkey_tbl1 = osm_physp_get_pkey_tbl(p_physp1);
pkey_tbl2 = osm_physp_get_pkey_tbl(p_physp2);
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] opensm: Add command-line option --pidfile

2012-09-11 Thread Bart Van Assche
This option is necessary to control opensm from an LSB-compliant
init script.

Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 opensm/main.c |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/opensm/main.c b/opensm/main.c
index e9a0b4c..1a061a8 100644
--- a/opensm/main.c
+++ b/opensm/main.c
@@ -68,6 +68,7 @@ volatile unsigned int osm_exit_flag = 0;
 
 static volatile unsigned int osm_hup_flag = 0;
 static volatile unsigned int osm_usr1_flag = 0;
+static char *pidfile;
 
 #define MAX_LOCAL_IBPORTS 64
 #define INVALID_GUID (0xULL)
@@ -498,10 +499,17 @@ static ib_net64_t get_port_guid(IN osm_opensm_t * p_osm, 
uint64_t port_guid)
return attr_array[choice].port_guid;
 }
 
+static void remove_pidfile(void)
+{
+   if (pidfile)
+   unlink(pidfile);
+}
+
 static int daemonize(osm_opensm_t * osm)
 {
pid_t pid;
int fd;
+   FILE *f;
 
fd = open(/dev/null, O_WRONLY);
if (fd  0) {
@@ -523,6 +531,18 @@ static int daemonize(osm_opensm_t * osm)
} else if (pid  0)
exit(0);
 
+   if (pidfile) {
+   remove_pidfile();
+   f = fopen(pidfile, w);
+   if (f) {
+   fprintf(f, %d\n, getpid());
+   fclose(f);
+   } else {
+   perror(fopen);
+   exit(1);
+   }
+   }
+
close(0);
close(1);
close(2);
@@ -649,6 +669,7 @@ int main(int argc, char *argv[])
{console-port, 1, NULL, 'C'},
 #endif
{daemon, 0, NULL, 'B'},
+   {pidfile, 1, NULL, 'J'},
{inactive, 0, NULL, 'I'},
 #ifdef ENABLE_OSM_PERF_MGR
{perfmgr, 0, NULL, 1},
@@ -887,6 +908,10 @@ int main(int argc, char *argv[])
printf( Creating new log file\n);
break;
 
+   case 'J':
+   pidfile = optarg;
+   break;
+
case 'P':
SET_STR_OPT(opt.partition_config_file, optarg);
break;
@@ -1212,6 +1237,7 @@ int main(int argc, char *argv[])
 Exit:
osm_opensm_destroy(osm);
complib_exit();
+   remove_pidfile();
 
exit(0);
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] opensm: /etc/init.d/opensmd: Port to Debian

2012-09-11 Thread Bart Van Assche
Signed-off-by: Bart Van Assche bvanass...@acm.org
---
 scripts/opensm.init.in |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/scripts/opensm.init.in b/scripts/opensm.init.in
index 01d2bb9..1b9348c 100644
--- a/scripts/opensm.init.in
+++ b/scripts/opensm.init.in
@@ -45,14 +45,20 @@ exec_prefix=@exec_prefix@
 
 # Source function library.
 if [[ -s /etc/init.d/functions ]]; then
+# RHEL / Fedora.
 . /etc/init.d/functions
 rc_status() { :; }
 rc_exit() { exit $RETVAL; }
-fi
-if [[ -s /etc/rc.status ]]; then
+elif [[ -s /etc/rc.status ]]; then
 . /etc/rc.status
 failure() { rc_status -v; }
 success() { rc_status -v; }
+elif [[ -s /lib/lsb/init-functions ]]; then
+# SLES / openSuSE / Debian.
+. /lib/lsb/init-functions
+rc_exit() { exit $RETVAL; }
+failure() { log_failure_msg; }
+success() { log_success_msg; }
 fi
 
 CONFIG=@sysconfdir@/sysconfig/opensm
@@ -62,7 +68,7 @@ fi
 
 start () {
 echo -n Starting opensm: 
-@sbindir@/opensm --daemon $OPTIONS  /dev/null
+@sbindir@/opensm --daemon --pidfile /var/run/opensm.pid $OPTIONS  
/dev/null
 if [[ $RETVAL -eq 0 ]]; then
 touch /var/lock/subsys/opensm
 success
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] opensm/complib: define if statements with branch prediction hints

2012-09-11 Thread Yevgeny Kliteynik
 
 I would not abstract the 'if' statement.  If CL_PREDICT_FALSE/TRUE are not 
 readable, then shorten those.
 
 if (PF(...))
 
 is just as readable as
 
 if_PF(...)

OK, agree.
I'll issue a v2 shortly - the only difference would be
change in this macro and rebase to the updated trunk.

-- YK

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8 v2] opensm/complib: define macros for for if statements with branch prediction hints

2012-09-11 Thread Yevgeny Kliteynik
Defined PT and PF for predict true
and predict false respectively.

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 include/complib/cl_types_osd.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/complib/cl_types_osd.h b/include/complib/cl_types_osd.h
index ce1a452..2538913 100644
--- a/include/complib/cl_types_osd.h
+++ b/include/complib/cl_types_osd.h
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved.
+ * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -64,6 +64,18 @@ BEGIN_C_DECLS
 #include inttypes.h
 #include assert.h
 #include string.h
+
+/*
+ * Branch prediction hints
+ */
+#if defined(HAVE_BUILTIN_EXPECT)
+#define PT(exp)__builtin_expect( ((uintptr_t)(exp)), 1 )
+#define PF(exp)__builtin_expect( ((uintptr_t)(exp)), 0 )
+#else
+#define PT(exp)(exp)
+#define PF(exp)(exp)
+#endif
+
 #if defined (_DEBUG_)
 #define CL_ASSERT  assert
 #else  /* _DEBUG_ */
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8 v2] opensm/libvendor/osm_vendor_ibumad_sa.c: use wrapper function instead of direct access

2012-09-11 Thread Yevgeny Kliteynik
Use existing wrapper function to get to context instead of direct access.

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 libvendor/osm_vendor_ibumad_sa.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/libvendor/osm_vendor_ibumad_sa.c b/libvendor/osm_vendor_ibumad_sa.c
index 1d482c0..f715cf6 100644
--- a/libvendor/osm_vendor_ibumad_sa.c
+++ b/libvendor/osm_vendor_ibumad_sa.c
@@ -84,9 +84,8 @@ __osmv_sa_mad_rcv_cb(IN osm_madw_t * p_madw,
}

/* obtain the sent context since we store it during send in the ni_ctx 
*/
-   p_query_req_copy =
-   (osmv_query_req_t *) (uintptr_t)(p_req_madw-context.ni_context.
-   node_guid);
+   p_query_req_copy = (osmv_query_req_t *)
+(uintptr_t)(osm_madw_get_ni_context_ptr(p_req_madw)-node_guid);

/* provide the context of the original request in the result */
query_res.query_context = p_query_req_copy-query_context;
@@ -180,9 +179,8 @@ static void __osmv_sa_mad_err_cb(IN void *bind_context, IN 
osm_madw_t * p_madw)
OSM_LOG_ENTER(p_bind-p_log);

/* Obtain the sent context etc */
-   p_query_req_copy =
-   (osmv_query_req_t *) (uintptr_t)(p_madw-context.ni_context.
-   node_guid);
+   p_query_req_copy = (osmv_query_req_t *)
+(uintptr_t)(osm_madw_get_ni_context_ptr(p_madw)-node_guid);

/* provide the context of the original request in the result */
query_res.query_context = p_query_req_copy-query_context;
@@ -433,7 +431,7 @@ __osmv_send_sa_req(IN osmv_sa_bind_info_t * p_bind,
goto Exit;
}
*p_query_req_copy = *p_query_req;
-   p_madw-context.ni_context.node_guid =
+   osm_madw_get_ni_context_ptr(p_madw)-node_guid =
(ib_net64_t) (uintptr_t)p_query_req_copy;

/* we can support async as well as sync calls */
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8 v2] opensm/libvendor/osm_vendor_ibumad.c:rename mad to p_mad to indicate pointer

2012-09-11 Thread Yevgeny Kliteynik

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il

---
 libvendor/osm_vendor_ibumad.c | 44 +--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c
index b068443..e0c9f90 100644
--- a/libvendor/osm_vendor_ibumad.c
+++ b/libvendor/osm_vendor_ibumad.c
@@ -288,7 +288,7 @@ static void *umad_receiver(void *p_ptr)
osm_umad_bind_info_t *p_bind;
osm_mad_addr_t osm_addr;
osm_madw_t *p_madw, *p_req_madw;
-   ib_mad_t *mad;
+   ib_mad_t *p_mad;
void *umad = 0;
int mad_agent, length;

@@ -340,11 +340,11 @@ static void *umad_receiver(void *p_ptr)
continue;
}

-   mad = (ib_mad_t *) umad_get_mad(umad);
+   p_mad = (ib_mad_t *) umad_get_mad(umad);

ib_mad_addr_conv(umad, osm_addr,
-mad-mgmt_class == IB_MCLASS_SUBN_LID ||
-mad-mgmt_class == IB_MCLASS_SUBN_DIR);
+p_mad-mgmt_class == IB_MCLASS_SUBN_LID ||
+p_mad-mgmt_class == IB_MCLASS_SUBN_DIR);

if (!(p_madw = osm_mad_pool_get(p_bind-p_mad_pool,
(osm_bind_handle_t) p_bind,
@@ -367,15 +367,15 @@ static void *umad_receiver(void *p_ptr)

/* if status != 0 then we are handling recv timeout on send */
if (umad_status(p_madw-vend_wrap.umad)) {
-   if (!(p_req_madw = get_madw(p_vend, mad-trans_id,
-   mad-mgmt_class))) {
+   if (!(p_req_madw = get_madw(p_vend, p_mad-trans_id,
+   p_mad-mgmt_class))) {
OSM_LOG(p_vend-p_log, OSM_LOG_ERROR,
ERR 5412: 
Failed to obtain request madw for 
timed out MAD
 (class=0x%X method=0x%X attr=0x%X 
tid=0x%PRIx64) -- dropping\n,
-   mad-mgmt_class, mad-method,
-   cl_ntoh16(mad-attr_id),
-   cl_ntoh64(mad-trans_id));
+   p_mad-mgmt_class, p_mad-method,
+   cl_ntoh16(p_mad-attr_id),
+   cl_ntoh64(p_mad-trans_id));
} else {
p_req_madw-status = IB_TIMEOUT;
log_send_error(p_vend, p_req_madw);
@@ -394,30 +394,30 @@ static void *umad_receiver(void *p_ptr)
}

p_req_madw = 0;
-   if (ib_mad_is_response(mad) 
-   !(p_req_madw = get_madw(p_vend, mad-trans_id,
-   mad-mgmt_class))) {
+   if (ib_mad_is_response(p_mad) 
+   !(p_req_madw = get_madw(p_vend, p_mad-trans_id,
+   p_mad-mgmt_class))) {
OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5413: 
Failed to obtain request madw for received MAD
 (class=0x%X method=0x%X attr=0x%X 
tid=0x%PRIx64) -- dropping\n,
-   mad-mgmt_class, mad-method,
-   cl_ntoh16((mad)-attr_id),
-   cl_ntoh64(mad-trans_id));
+   p_mad-mgmt_class, p_mad-method,
+   cl_ntoh16(p_mad-attr_id),
+   cl_ntoh64(p_mad-trans_id));
osm_mad_pool_put(p_bind-p_mad_pool, p_madw);
continue;
}
 #ifndef VENDOR_RMPP_SUPPORT
-   if ((mad-mgmt_class != IB_MCLASS_SUBN_DIR) 
-   (mad-mgmt_class != IB_MCLASS_SUBN_LID) 
-   (ib_rmpp_is_flag_set((ib_rmpp_mad_t *) mad,
+   if ((p_mad-mgmt_class != IB_MCLASS_SUBN_DIR) 
+   (p_mad-mgmt_class != IB_MCLASS_SUBN_LID) 
+   (ib_rmpp_is_flag_set((ib_rmpp_mad_t *) p_mad,
 IB_RMPP_FLAG_ACTIVE))) {
OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5414: 
class 0x%x method 0x%x RMPP version %d type 
%d flags 0x%x received -- dropping\n,
-   mad-mgmt_class, mad-method,
-   ((ib_rmpp_mad_t *) mad)-rmpp_version,
-   ((ib_rmpp_mad_t *) mad)-rmpp_type,
-   ((ib_rmpp_mad_t *) mad)-rmpp_flags);
+   p_mad-mgmt_class, p_mad-method,
+   

[PATCH 6/8 v2] opensm/libvendor/osm_vendor_ibumad.c: validate response MAD properties

2012-09-11 Thread Yevgeny Kliteynik
Check that attribute ID, attribute modifier and
transaction ID are the same in request and response.

Note that just by checking these we cover a very wide
range of possible bugs in SMAs. Attribute modifier is
used in PortInfo, LFT, MFT, and others.

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 libvendor/osm_vendor_ibumad.c | 57 ++-
 1 file changed, 45 insertions(+), 12 deletions(-)

diff --git a/libvendor/osm_vendor_ibumad.c b/libvendor/osm_vendor_ibumad.c
index e0c9f90..ca320a6 100644
--- a/libvendor/osm_vendor_ibumad.c
+++ b/libvendor/osm_vendor_ibumad.c
@@ -288,7 +288,7 @@ static void *umad_receiver(void *p_ptr)
osm_umad_bind_info_t *p_bind;
osm_mad_addr_t osm_addr;
osm_madw_t *p_madw, *p_req_madw;
-   ib_mad_t *p_mad;
+   ib_mad_t *p_mad, *p_req_mad;
void *umad = 0;
int mad_agent, length;

@@ -394,18 +394,51 @@ static void *umad_receiver(void *p_ptr)
}

p_req_madw = 0;
-   if (ib_mad_is_response(p_mad) 
-   !(p_req_madw = get_madw(p_vend, p_mad-trans_id,
-   p_mad-mgmt_class))) {
-   OSM_LOG(p_vend-p_log, OSM_LOG_ERROR, ERR 5413: 
-   Failed to obtain request madw for received MAD
-(class=0x%X method=0x%X attr=0x%X 
tid=0x%PRIx64) -- dropping\n,
-   p_mad-mgmt_class, p_mad-method,
-   cl_ntoh16(p_mad-attr_id),
-   cl_ntoh64(p_mad-trans_id));
-   osm_mad_pool_put(p_bind-p_mad_pool, p_madw);
-   continue;
+   if (ib_mad_is_response(p_mad)) {
+   p_req_madw = get_madw(p_vend, p_mad-trans_id,
+ p_mad-mgmt_class);
+   if (PF(!p_req_madw)) {
+   OSM_LOG(p_vend-p_log, OSM_LOG_ERROR,
+   ERR 5413: Failed to obtain request 
+   madw for received MAD 
+   (class=0x%X method=0x%X attr=0x%X 
+   tid=0x%PRIx64) -- dropping\n,
+   p_mad-mgmt_class, p_mad-method,
+   cl_ntoh16(p_mad-attr_id),
+   cl_ntoh64(p_mad-trans_id));
+   osm_mad_pool_put(p_bind-p_mad_pool, p_madw);
+   continue;
+   }
+
+   /*
+* Check that request MAD was really a request,
+* and make sure that attribute ID, attribute
+* modifier and transaction ID are the same in
+* request and response.
+*/
+   p_req_mad = osm_madw_get_mad_ptr(p_req_madw);
+   if (PF(ib_mad_is_response(p_req_mad) ||
+  p_mad-attr_id != p_req_mad-attr_id ||
+  p_mad-attr_mod != p_req_mad-attr_mod ||
+  p_mad-trans_id != p_req_mad-trans_id)) {
+   OSM_LOG(p_vend-p_log, OSM_LOG_ERROR,
+   ERR 541A: 
+   Response MAD validation failed 
+   (request attr=0x%X modif=0x%X 
+   tid=0x%PRIx64, 
+   response attr=0x%X modif=0x%X 
+   tid=0x%PRIx64) -- dropping\n,
+   cl_ntoh16(p_req_mad-attr_id),
+   cl_ntoh32(p_req_mad-attr_mod),
+   cl_ntoh64(p_req_mad-trans_id),
+   cl_ntoh16(p_mad-attr_id),
+   cl_ntoh32(p_mad-attr_mod),
+   cl_ntoh64(p_mad-trans_id));
+   osm_mad_pool_put(p_bind-p_mad_pool, p_madw);
+   continue;
+   }
}
+
 #ifndef VENDOR_RMPP_SUPPORT
if ((p_mad-mgmt_class != IB_MCLASS_SUBN_DIR) 
(p_mad-mgmt_class != IB_MCLASS_SUBN_LID) 
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8 v2] opensm/osm_port_info_rcv.c: check received local_port_num

2012-09-11 Thread Yevgeny Kliteynik

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 opensm/osm_port_info_rcv.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
index c3bc66c..442bc3f 100644
--- a/opensm/osm_port_info_rcv.c
+++ b/opensm/osm_port_info_rcv.c
@@ -505,6 +505,11 @@ void osm_pi_rcv_process(IN void *context, IN void *data)

CL_ASSERT(p_smp-attr_id == IB_MAD_ATTR_PORT_INFO);

+   /*
+* Attribute modifier has already been validated upon MAD receive,
+* which means that port_num has to be valid - it originated from
+* the request attribute modifier.
+*/
port_num = (uint8_t) cl_ntoh32(p_smp-attr_mod);

port_guid = p_context-port_guid;
@@ -554,6 +559,17 @@ void osm_pi_rcv_process(IN void *context, IN void *data)
p_node = p_port-p_node;
CL_ASSERT(p_node);

+   if (p_pi-local_port_num  p_node-node_info.num_ports) {
+   CL_PLOCK_RELEASE(sm-p_lock);
+   OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F15: 
+   Received PortInfo for port GUID 0x% PRIx64  is 
+   non-compliant and is being ignored since the 
+   local port num %u  num ports %u\n,
+   cl_ntoh64(port_guid), p_pi-local_port_num,
+   p_node-node_info.num_ports);
+   goto Exit;
+   }
+
/*
   If we were setting the PortInfo, then receiving
   this attribute was not part of sweeping the subnet.
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8 v2] opensm/osm_port_info_rcv.c: use PF() hint on fatal conditions

2012-09-11 Thread Yevgeny Kliteynik

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 opensm/osm_port_info_rcv.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/opensm/osm_port_info_rcv.c b/opensm/osm_port_info_rcv.c
index 442bc3f..2a6d037 100644
--- a/opensm/osm_port_info_rcv.c
+++ b/opensm/osm_port_info_rcv.c
@@ -1,6 +1,6 @@
 /*
  * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2011 Mellanox Technologies LTD. All rights reserved.
+ * Copyright (c) 2002-2012 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  * Copyright (c) 2009 HNR Consulting. All rights reserved.
  *
@@ -69,7 +69,7 @@
 static void pi_rcv_check_and_fix_lid(osm_log_t * log, ib_port_info_t * pi,
 osm_physp_t * p)
 {
-   if (cl_ntoh16(pi-base_lid)  IB_LID_UCAST_END_HO) {
+   if (PF(cl_ntoh16(pi-base_lid)  IB_LID_UCAST_END_HO)) {
OSM_LOG(log, OSM_LOG_ERROR, ERR 0F04: 
Got invalid base LID %u from the network. 
Corrected to %u\n, cl_ntoh16(pi-base_lid),
@@ -545,7 +545,7 @@ void osm_pi_rcv_process(IN void *context, IN void *data)

CL_PLOCK_EXCL_ACQUIRE(sm-p_lock);
p_port = osm_get_port_by_guid(sm-p_subn, port_guid);
-   if (!p_port) {
+   if (PF(!p_port)) {
CL_PLOCK_RELEASE(sm-p_lock);
OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F06: 
No port object for port with GUID 0x% PRIx64
@@ -559,7 +559,7 @@ void osm_pi_rcv_process(IN void *context, IN void *data)
p_node = p_port-p_node;
CL_ASSERT(p_node);

-   if (p_pi-local_port_num  p_node-node_info.num_ports) {
+   if (PF(p_pi-local_port_num  p_node-node_info.num_ports)) {
CL_PLOCK_RELEASE(sm-p_lock);
OSM_LOG(sm-p_log, OSM_LOG_ERROR, ERR 0F15: 
Received PortInfo for port GUID 0x% PRIx64  is 
-- 
1.7.11.1

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 02/22] IB/core: change pkey table lookups to support full and partial membership for the same pkey

2012-09-11 Thread Doug Ledford
On 8/3/2012 4:40 AM, Jack Morgenstein wrote:
 Enhance the cached and non-cached pkey table lookups to enable limited and 
 full
 members of the same pkey to co-exist in the pkey table.
 
 This is necessary for SRIOV to allow for a scheme where some guests would 
 have the full
 membership pkey in their virtual pkey table, where other guests on the same 
 hypervisor
 would have the limited one. In that sense, its an extension of the IBTA model 
 for
 non virtualized nodes.

OK, maybe I'm not getting something, but I'm curious why we always pick
the full pkey in preference to the partial pkey.  Shouldn't we pick the
pkey that's appropriate for the vHCA sending the message?

Also, given the rule of least surprise, don't you think it would be best
to rename this function ib_find_cached_full_or_parital_pkey and in your
next patch instead of naming it ib_find_exact_pkey just call that one
ib_find_cached_pkey?

 To accomplish this, we need both the limited and full membership pkeys to be 
 present
 in the master's (hypervisor physical port) pkey table.
 
 The algorithm for supporting pkey tables which contain both the limited and 
 the full
 membership versions of the same pkey works as follows:
 
 When scanning the pkey table for a 15 bit pkey:
 
 A. If there is a full member version of that pkey anywhere
 in the table, return its index (even if a limited-member
 version of the pkey exists earlier in the table).
 
 B. If the full member version is not in the table,
 but the limited-member version is in the table,
 return the index of the limited pkey.
 
 Signed-off-by: Liran Liss lir...@mellanox.com
 Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/core/cache.c  |   14 +++---
  drivers/infiniband/core/device.c |   17 +
  2 files changed, 24 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
 index 9353992..0f2f2b7 100644
 --- a/drivers/infiniband/core/cache.c
 +++ b/drivers/infiniband/core/cache.c
 @@ -167,6 +167,7 @@ int ib_find_cached_pkey(struct ib_device *device,
   unsigned long flags;
   int i;
   int ret = -ENOENT;
 + int partial_ix = -1;
  
   if (port_num  start_port(device) || port_num  end_port(device))
   return -EINVAL;
 @@ -179,10 +180,17 @@ int ib_find_cached_pkey(struct ib_device *device,
  
   for (i = 0; i  cache-table_len; ++i)
   if ((cache-table[i]  0x7fff) == (pkey  0x7fff)) {
 - *index = i;
 - ret = 0;
 - break;
 + if (cache-table[i]  0x8000) {
 + *index = i;
 + ret = 0;
 + break;
 + } else
 + partial_ix = i;
   }
 + if (ret  partial_ix = 0) {
 + *index = partial_ix;
 + ret = 0;
 + }
  
   read_unlock_irqrestore(device-cache.lock, flags);
  
 diff --git a/drivers/infiniband/core/device.c 
 b/drivers/infiniband/core/device.c
 index e711de4..a645c68 100644
 --- a/drivers/infiniband/core/device.c
 +++ b/drivers/infiniband/core/device.c
 @@ -707,18 +707,27 @@ int ib_find_pkey(struct ib_device *device,
  {
   int ret, i;
   u16 tmp_pkey;
 + int partial_ix = -1;
  
   for (i = 0; i  device-pkey_tbl_len[port_num - start_port(device)]; 
 ++i) {
   ret = ib_query_pkey(device, port_num, i, tmp_pkey);
   if (ret)
   return ret;
 -
   if ((pkey  0x7fff) == (tmp_pkey  0x7fff)) {
 - *index = i;
 - return 0;
 + /* if there is full-member pkey take it.*/
 + if (tmp_pkey  0x8000) {
 + *index = i;
 + return 0;
 + }
 + if (partial_ix  0)
 + partial_ix = i;
   }
   }
 -
 + /*no full-member, if exists take the limited*/
 + if (partial_ix = 0) {
 + *index = partial_ix;
 + return 0;
 + }
   return -ENOENT;
  }
  EXPORT_SYMBOL(ib_find_pkey);
 


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband



signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match

2012-09-11 Thread Doug Ledford
On 8/3/2012 4:40 AM, Jack Morgenstein wrote:
 When port pkey table potentially contains both full and partial
 membership copies for the same pkey, we need a function to find
 the exact (16-bit) pkey index.

The code on this patch is fine, just see my previous email about the
function naming...

 This is particularly necessary when the master forwards QP1 MADS
 sent by guests.  If the guest has sent the MAD with a limited
 membership pkey, we wish to forward the MAD using the same limited
 membership pkey.  Since master may have both the limited and
 the full member pkeys in its table, we must make sure to retrieve
 the limited membership pkey in this case.
 
 This requires the 16-bit pkey lookup function (which includes the
 membership bit).
 
 Signed-off-by: Jack Morgenstein ja...@dev.mellanox.co.il
 Signed-off-by: Or Gerlitz ogerl...@mellanox.com
 ---
  drivers/infiniband/core/cache.c |   32 
  include/rdma/ib_cache.h |   16 
  2 files changed, 48 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c
 index 0f2f2b7..d8a8c83 100644
 --- a/drivers/infiniband/core/cache.c
 +++ b/drivers/infiniband/core/cache.c
 @@ -198,6 +198,38 @@ int ib_find_cached_pkey(struct ib_device *device,
  }
  EXPORT_SYMBOL(ib_find_cached_pkey);
  
 +int ib_find_exact_cached_pkey(struct ib_device *device,
 +   u8port_num,
 +   u16   pkey,
 +   u16  *index)
 +{
 + struct ib_pkey_cache *cache;
 + unsigned long flags;
 + int i;
 + int ret = -ENOENT;
 +
 + if (port_num  start_port(device) || port_num  end_port(device))
 + return -EINVAL;
 +
 + read_lock_irqsave(device-cache.lock, flags);
 +
 + cache = device-cache.pkey_cache[port_num - start_port(device)];
 +
 + *index = -1;
 +
 + for (i = 0; i  cache-table_len; ++i)
 + if (cache-table[i] == pkey) {
 + *index = i;
 + ret = 0;
 + break;
 + }
 +
 + read_unlock_irqrestore(device-cache.lock, flags);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL(ib_find_exact_cached_pkey);
 +
  int ib_get_cached_lmc(struct ib_device *device,
 u8port_num,
 u8*lmc)
 diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h
 index 00a2b8e..ad9a3c2 100644
 --- a/include/rdma/ib_cache.h
 +++ b/include/rdma/ib_cache.h
 @@ -101,6 +101,22 @@ int ib_find_cached_pkey(struct ib_device*device,
   u16 *index);
  
  /**
 + * ib_find_exact_cached_pkey - Returns the PKey table index where a specified
 + *   PKey value occurs. Comparison uses the FULL 16 bits (incl membership 
 bit)
 + * @device: The device to query.
 + * @port_num: The port number of the device to search for the PKey.
 + * @pkey: The PKey value to search for.
 + * @index: The index into the cached PKey table where the PKey was found.
 + *
 + * ib_find_exact_cached_pkey() searches the specified PKey table in
 + * the local software cache.
 + */
 +int ib_find_exact_cached_pkey(struct ib_device*device,
 +   u8   port_num,
 +   u16  pkey,
 +   u16 *index);
 +
 +/**
   * ib_get_cached_lmc - Returns a cached lmc table entry
   * @device: The device to query.
   * @port_num: The port number of the device to query.
 


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband



signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V2 04/22] IB/mlx4: SRIOV IB context objects and proxy/tunnel sqp support

2012-09-11 Thread Doug Ledford
On 8/3/2012 4:40 AM, Jack Morgenstein wrote:
 1. Introduce the basic sriov parvirtualization context objects
for multiplexing and demultiplexing MADs.
 2. Introduce support for the new proxy and tunnel QP types.
 
 This patch introduces the objects required by the master
 for managing QP paravirtualization for guests.
 
 struct mlx4_ib_sriov{} is created by the master only.
 It is a container for the following:
 1. All the info required by the PPF to multiplex and de-multiplex MADs
(including those from the PF). (struct mlx4_ib_demux_ctx demux)

OK, so can we have at least a single reference to the various
abbreviations before using them exclusively?  I know PF and PPF may be
common, but it might be nice that they were used once in full form
before abbreviated in commit messages.

 2. All the info required to manage alias GUIDs (i.e., the GUID at
index 0 that each guest perceives.  In fact, this is not the
GUID which is actually at index 0, but is, in fact, the GUID
which is at index[VF number] in the physical table.

OK, this has been one of the things that has made reviewing this
difficult.  I freely admit that I've steadfastly ignored SRIOV for as
long as I can, so maybe this is just me.  But, in the context of this
driver, how am I supposed to know which code paths will be on the host
and which on the guest?

Also, I note that you do math every time you want to know if you are on
a parent device or a virtual device.  Do you really want to do math all
the time, or would it be better to save off your status on device init
and just refer to that when you would do math in this patch?

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband



signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match

2012-09-11 Thread Doug Ledford
On 8/3/2012 4:40 AM, Jack Morgenstein wrote:
 When port pkey table potentially contains both full and partial
 membership copies for the same pkey, we need a function to find
 the exact (16-bit) pkey index.
 
 This is particularly necessary when the master forwards QP1 MADS
 sent by guests.  If the guest has sent the MAD with a limited
 membership pkey, we wish to forward the MAD using the same limited
 membership pkey.  Since master may have both the limited and
 the full member pkeys in its table, we must make sure to retrieve
 the limited membership pkey in this case.
 
 This requires the 16-bit pkey lookup function (which includes the
 membership bit).

As a second note, I would like to know why Intel (previously QLogic)
does not use these functions in their driver and what it would take to
get all drivers to use the functions.  Do we need to add more to them?
In my opinion these should be generally useful and used by all drivers.
 Mike?

-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband



signature.asc
Description: OpenPGP digital signature


Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match

2012-09-11 Thread Roland Dreier
On Tue, Sep 11, 2012 at 10:12 AM, Doug Ledford dledf...@redhat.com wrote:
 As a second note, I would like to know why Intel (previously QLogic)
 does not use these functions in their driver and what it would take to
 get all drivers to use the functions.  Do we need to add more to them?
 In my opinion these should be generally useful and used by all drivers.

Use which functions?  The P_Key lookup functions?

What would a low-level driver use them for?  I thought these are for
use by upper-level protocols.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] RDMA/cxgb4: move the dereference below the NULL test

2012-09-11 Thread Roland Dreier
applied, thanks
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match

2012-09-11 Thread Roland Dreier
On Tue, Sep 11, 2012 at 1:34 PM, Doug Ledford dledf...@redhat.com wrote:
 Well, at this point, the mlx4 driver uses them, the rdmacm kernel driver
 uses them, and both QLogic/Intel drivers have their own internal pkey
 table implementation.  So, it isn't so much upper layer as it is drivers.

rdmacm is an upper-level protocol (it's above the midlayer / hardware
abstraction).

mlx4 and mthca look up P_Keys because of internal details of how they
send MADs, and really they should move to maintaining their own P_Key
table too.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] librdmacm/rsockets: Document rsocket protocol and design

2012-09-11 Thread Hefty, Sean
Include a brief overview of the rsocket protocol and underlying design
with the source code to make it easier for someone trying to decipher
the actual code.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
 docs/rsocket |  144 ++
 1 files changed, 144 insertions(+), 0 deletions(-)
 create mode 100644 docs/rsocket

diff --git a/docs/rsocket b/docs/rsocket
new file mode 100644
index 000..5399f6c
--- /dev/null
+++ b/docs/rsocket
@@ -0,0 +1,144 @@
+rsocket Protocol and Design Guide   9/10/2012
+
+Overview
+
+Rsockets is a protocol over RDMA that supports a socket-level API
+for applications.  For details on the current state of the
+implementation, readers should refer to the rsocket man page.  This
+document describes the rsocket protocol, general design, and
+some implementation details. 
+
+Rsockets exchanges data by performing RDMA write operations into
+exposed data buffers.  In addition to RDMA write data, rsockets uses
+small, 32-bit messages for internal communication.  RDMA writes
+are used to transfer application data into remote data buffers
+and to notify the peer when new target data buffers are available.
+The following figure highlights the operation.
+
+   host A   host B
+  remote SGL  
+  target SGL  -  [  ]
+ [  ] --
+ [  ] ----   receive buffer(s) 
+---  +--+
+  --  |  |
+--|  |
+  --  |  |
+--+--+
+  --   
+---  +--+
+  |  |
+  |  |
+  +--+
+
+The remote SGL contains the address, size, and rkey of the target SGL.  As
+receive buffers become available on host B, rsockets will issue an RDMA
+write against one of the entries in the target SGL on host A.  The
+updated entry will reference an available receive buffer.  Immediate data
+included with the RDMA write will indicate to host A that a target SGE
+has been updated.
+
+When host A has data to send, it will check its target SGL.  The current
+target SGE will contain the address, size, and rkey of the next receive
+buffer on host B.  If the data transfer is smaller than the size of the
+remote receive buffer, host A will update its target SGE to reflect the
+remaining size of the receive buffer.  That is, once a receive buffer has
+been published to a remote peer, it will be fully consumed before a second
+buffer is used.
+
+Rsockets relies on immediate data to notify the remote peer when data has
+been transferred or when a target SGL has been updated.  Because immediate
+data requires that the remote QP have a posted receive, rsockets also uses
+a credit based flow control mechanism.  The number of credits is based on
+the size of the receive queue, with initial credits exchanged during
+connection setup.  In order to transfer data, rsockets requires both
+available receive buffers (published via the target SGL) and data credits.
+
+Since immediate data is limited to 32-bits, messages may either indicate
+the arrival of application data or may be an internal message, but not both.
+To avoid credit deadlock, rsockets reserves a small number of available
+credits for control messages only, with the protocol relying on RNR NAKs
+and retries to make forward progress.
+
+
+Connection Establishment
+
+rsockets uses the RDMA CM for connection establishment.  Struct rs_conn_data
+is exchanged during the connection exchange as private data in the request
+and reply messages.
+
+struct rs_sge {
+   uint64_t addr;
+   uint32_t key;
+   uint32_t length;
+};
+
+#define RS_CONN_FLAG_NET 1
+
+struct rs_conn_data {
+   uint8_t   version;
+   uint8_t   flags;
+   uint16_t  credits;
+   uint32_t  reserved2;
+   struct rs_sge target_sgl;
+   struct rs_sge data_buf;
+};
+
+Version - current version is 1
+Flags
+RS_CONN_FLAG_NET - Set to 1 if host is big Endian.
+   Determines byte ordering for RDMA write messages
+Credits - number of initial receive credits
+Reserved2 - set to 0
+Target SGL - Address, size (# entries), and rkey of target SGL.
+ Remote side will copy this into their remote SGL.
+Data Buffer - Initial receive buffer address, size (in bytes), and rkey.
+  Remote side will copy this into their first target SGE.
+
+
+Message Format
+--
+Rsocket uses RDMA writes with immediate data for all message exchanges.
+RDMA writes of 0 length are used if no additional data beyond the message
+needs to be exchanged.  Immediate data is limited to 32-bits.  Rsockets
+defines the following format for messages.
+
+The upper 3 bits are used to define the type of message being 

Re: [PATCH for-next V2 03/22] IB/core: Add ib_find_exact_cached_pkey() to search for 16-bit pkey match

2012-09-11 Thread Doug Ledford
On 9/11/2012 4:43 PM, Roland Dreier wrote:
 On Tue, Sep 11, 2012 at 1:34 PM, Doug Ledford dledf...@redhat.com wrote:
 Well, at this point, the mlx4 driver uses them, the rdmacm kernel driver
 uses them, and both QLogic/Intel drivers have their own internal pkey
 table implementation.  So, it isn't so much upper layer as it is drivers.
 
 rdmacm is an upper-level protocol (it's above the midlayer / hardware
 abstraction).

Yeah, I know.  My point wasn't that it was a low level item, just that
it's the only upper layer consumer that I saw.

 mlx4 and mthca look up P_Keys because of internal details of how they
 send MADs, and really they should move to maintaining their own P_Key
 table too.

Why not make the routines useful for all users instead of multiple
implementations of the same thing?


-- 
Doug Ledford dledf...@redhat.com
  GPG KeyID: 0E572FDD
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband



signature.asc
Description: OpenPGP digital signature