Re: [PATCH] IB core: Fix locking on device numbers allocation

2010-03-25 Thread Eli Cohen
On Wed, Mar 24, 2010 at 10:39:18AM -0700, Roland Dreier wrote:
 
 Looks like a good catch.  I assume you found this through inspection and
 not hitting it practice?
Correct, I caught this from inspecting the code.

  Also it seems user_mad.c would need the same fix.
Yes, I missed that.

 
 Although looking at this I wonder if we do need that lock... we don't
 seem to do any locking when we do the clear_bit in the dev_map, and all
 of this is done through the device add/remove callback, which seems to
 be serialized by the device_mutex in device.c.  But we probably don't
 want to make that a requirement in case we parallelize in the future.
 
I missed the fact the clear_bit is not atomic. So to make this
complete I will send a new patch with protection on the clear bit.
Would you like me to send a patch for user_mad too or would you push
that?
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] OFED-1.5.1 GA is available

2010-03-25 Thread Vladimir Sokolovsky

Hi,
I am pleased to announce that OFED-1.5.1 GA release is done

Notes:

The tarball is available on:
http://www.openfabrics.org/builds/ofed-1.5.1/release/OFED-1.5.1.tgz


To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/  for
OFED 1.5.1

Vladimir  Tziporet




Release information:

Linux Operating Systems:
  - RedHat EL4 up72.6.9-78.ELsmp
  - RedHat EL4 up82.6.9-89.ELsmp
  - RedHat EL5 up32.6.18-128.el5
  - RedHat EL5 up42.6.18-164.el5
  - SLES10 SP22.6.16.60-0.21-smp
  - SLES10 SP32.6.16.60-0.54-smp
  - SLES112.6.27.19-5-default
  - OEL 4 up7 2.6.9-78.ELsmp
  - OEL 4 up8 2.6.9-89.ELsmp
  - CentOS5.3 2.6.18-128.el5
  - CentOS5.4 2.6.18-164.el5
  - Fedora Core12 2.6.31.5-127.fc12*
  - OpenSuSE 11.2 2.6.31.5-0.1-default *
  - kernel.org2.6.29, 2.6.30,
  2.6.31 and 2.6.32*

 * Minimal QA for these versions

Systems:
 * x86_64
 * x86
 * ia64
 * ppc64

  Main Changes from OFED 1.5

1. Added RoCEE support - see RoCEE_README.txt
2. Added enhanced atomic operations to ConnectX (kernel only).
See mlx4_release_notes.txt.
3. Updated Open MPI to rev 1.4.1-2ofed
4. Updated MVAPICH2 to rev 1.4.1
5. Updated DAPL to rev 2.0.27
6. Updated libnes to rev 1.0.1
7. Updated librdmacm to rev 1.0.11
8. Removed tvflash RPM
9. NFS-RDMA is not supported on SLES10 SP3
10. Fixed IPv6 support and IPv4 routing corner cases for RDMA CM
11. Bug fixes
   See attached.



bug_id,bug_severity,priority,op_sys,assigned_to,bug_status,resolution,short_short_desc
138,normal,P2,All,m...@mellanox.co.il,RESOLVED,FIXED,getpeername, after other side closes, fails, which is not the behavior of TCP
592,normal,P3,Other,j...@mellanox.com,RESOLVED,FIXED,libsdp memory leak
668,critical,P2,All,dave.ol...@qlogic.com,RESOLVED,FIXED,iPath SMA does not generate traps
779,normal,P3,Other,j...@mellanox.com,RESOLVED,FIXED,sdp server accept: BUG: scheduling while atomic: ib_cm...
828,normal,P3,RHEL 5,j...@mellanox.com,RESOLVED,FIXED,SDP accept() fails with CONFIG_PREEMPT kernel
833,normal,P3,Other,ja...@dev.mellanox.co.il,RESOLVED,FIXED,IPoib hangout while running sdp with multiple
838,normal,P3,RHEL 5,j...@mellanox.com,RESOLVED,FIXED,connection refuse
894,major,P2,SLES 10,j...@mellanox.com,RESOLVED,FIXED,IPoIB connectivity lost during heavy testing on memfree
912,normal,P3,RHEL 4,j...@mellanox.com,RESOLVED,FIXED,When remote side is not accsessible and socket based application is running over SDP reference count is high
969,normal,P3,RHEL 4,j...@mellanox.com,RESOLVED,FIXED,HTTP over SDP with 200  connection cause to kernel panic in client side
977,normal,P3,All,j...@mellanox.com,RESOLVED,FIXED,binding 2 sockets to the same address fails with the wrong errno
998,normal,P3,Other,j...@mellanox.com,RESOLVED,FIXED,intermittent SDP BUG on ppc64
1087,minor,P5,SLES 10,am...@mellanox.co.il,RESOLVED,FIXED,recovery from rdma_create_qp()  is bad
1242,normal,P2,RHEL 4,dave.ol...@qlogic.com,RESOLVED,FIXED,kernel panic while running mpi2007 against ofed1.4 -- ib_ipath: ipath_sdma_verbs_send
1310,normal,P3,Other,am...@mellanox.co.il,RESOLVED,FIXED,stress_connect crash sometimes on SW220/SW221
1334,normal,P3,Other,tina.y...@oracle.com,RESOLVED,FIXED,possible lock ordering issue
1393,minor,P3,SLES 10,andy.gro...@oracle.com,RESOLVED,FIXED,Dmesg errors after running rds-gen/sink tests
1397,normal,P3,SLES 10,pa...@mellanox.co.il,RESOLVED,FIXED,Egle SDR agains Falcon QDR don’t run with new mvapich-1.1.0
1427,normal,P3,All,am...@mellanox.co.il,RESOLVED,FIXED,running netperf on ppc64, results in preload error and causes the client machine to hang
1440,blocker,P3,All,or...@dev.mellanox.co.il,RESOLVED,FIXED,mstvpd hangs on QDR HCAs
1445,normal,P3,All,am...@mellanox.co.il,RESOLVED,FIXED,removing a test module which is using the kernel socket api over sdp, causes the machine to hang
1453,minor,P3,Other,andy.gro...@oracle.com,RESOLVED,FIXED,RDS use of QPD_SQD state causes problems for ConnectX HCAs
1502,normal,P3,Other,am...@mellanox.co.il,RESOLVED,FIXED,2.6.16.46-0.12-SLERT-10-15: scheduling while atomic
1519,normal,P3,Other,andy.gro...@oracle.com,RESOLVED,FIXED,RDS may be doing to much at interrupt level
1552,normal,P3,RHEL 4,andy.gro...@oracle.com,RESOLVED,FIXED,spurious read events on  RDS socket
1590,normal,P3,Other,am...@mellanox.co.il,RESOLVED,FIXED,Unnecessary sock_hold in sdp_reset_sk()
1612,normal,P3,RHEL 5,am...@mellanox.co.il,RESOLVED,FIXED,killing stress_connect test results in Kernel BUG on ppc64 machine
1682,normal,P3,Other,am...@mellanox.co.il,RESOLVED,FIXED,Low 

[PATCH v2] IB core: Fix locking on device numbers allocation

2010-03-25 Thread Eli Cohen
When the driver needs to dynamically allocate char device numbers in systems
with more than IB_UVERBS_MAX_DEVICES devices, it releases map lock, allocates a
new range and a new device number from that range, and only then re-acquires
the lock. This must be protected for the same reasoning that the map_lock
spinlock is used. Without protecting we could also end up calling
alloc_chrdev_region() a nubmer of times and cause a leakage. Fix this by
replacing map_lock with a mutex and apply on the all the allocation code.

Signed-off-by: Eli Cohen e...@mellanox.co.il
---
Changes from previous version:
Protect with the mutex the clear_bit on the map allocations.


 drivers/infiniband/core/uverbs_main.c |   16 ++--
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_main.c 
b/drivers/infiniband/core/uverbs_main.c
index d805cf3..a16d90d 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -72,7 +72,7 @@ DEFINE_IDR(ib_uverbs_cq_idr);
 DEFINE_IDR(ib_uverbs_qp_idr);
 DEFINE_IDR(ib_uverbs_srq_idr);
 
-static DEFINE_SPINLOCK(map_lock);
+static DEFINE_MUTEX(map_lock);
 static DECLARE_BITMAP(dev_map, IB_UVERBS_MAX_DEVICES);
 
 static ssize_t (*uverbs_cmd_table[])(struct ib_uverbs_file *file,
@@ -738,15 +738,15 @@ static void ib_uverbs_add_one(struct ib_device *device)
kref_init(uverbs_dev-ref);
init_completion(uverbs_dev-comp);
 
-   spin_lock(map_lock);
+   mutex_lock(map_lock);
devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
if (devnum = IB_UVERBS_MAX_DEVICES) {
-   spin_unlock(map_lock);
devnum = find_overflow_devnum();
-   if (devnum  0)
+   if (devnum  0) {
+   mutex_unlock(map_lock);
goto err;
+   }
 
-   spin_lock(map_lock);
uverbs_dev-devnum = devnum + IB_UVERBS_MAX_DEVICES;
base = devnum + overflow_maj;
set_bit(devnum, overflow_map);
@@ -755,7 +755,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
base = devnum + IB_UVERBS_BASE_DEV;
set_bit(devnum, dev_map);
}
-   spin_unlock(map_lock);
+   mutex_unlock(map_lock);
 
uverbs_dev-ib_dev   = device;
uverbs_dev-num_comp_vectors = device-num_comp_vectors;
@@ -787,10 +787,12 @@ err_class:
 
 err_cdev:
cdev_del(uverbs_dev-cdev);
+   mutex_lock(map_lock);
if (uverbs_dev-devnum  IB_UVERBS_MAX_DEVICES)
clear_bit(devnum, dev_map);
else
clear_bit(devnum, overflow_map);
+   mutex_unlock(map_lock);
 
 err:
kref_put(uverbs_dev-ref, ib_uverbs_release_dev);
@@ -810,10 +812,12 @@ static void ib_uverbs_remove_one(struct ib_device *device)
device_destroy(uverbs_class, uverbs_dev-cdev.dev);
cdev_del(uverbs_dev-cdev);
 
+   mutex_lock(map_lock);
if (uverbs_dev-devnum  IB_UVERBS_MAX_DEVICES)
clear_bit(uverbs_dev-devnum, dev_map);
else
clear_bit(uverbs_dev-devnum - IB_UVERBS_MAX_DEVICES, 
overflow_map);
+   mutex_unlock(map_lock);
 
kref_put(uverbs_dev-ref, ib_uverbs_release_dev);
wait_for_completion(uverbs_dev-comp);
-- 
1.7.0.3

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] RDMA/nes: correct cap.max_inline_data assignment in nes_query_qp

2010-03-25 Thread Chien Tung

cap.max_inline_data is incorrectly set in init_attr instead
of attr.  Set it in attr so subsequent init_attr.cap assignment
will get the correct value.

Signed-off-by: Chien Tung chien.tin.t...@intel.com
---
 drivers/infiniband/hw/nes/nes_verbs.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c 
b/drivers/infiniband/hw/nes/nes_verbs.c
index 6992829..36348bf 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -2820,11 +2820,10 @@ static int nes_query_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *attr,
attr-cap.max_send_wr = nesqp-hwqp.sq_size;
attr-cap.max_recv_wr = nesqp-hwqp.rq_size;
attr-cap.max_recv_sge = 1;
-   if (nes_drv_opt  NES_DRV_OPT_NO_INLINE_DATA) {
-   init_attr-cap.max_inline_data = 0;
-   } else {
-   init_attr-cap.max_inline_data = 64;
-   }
+   if (nes_drv_opt  NES_DRV_OPT_NO_INLINE_DATA)
+   attr-cap.max_inline_data = 0;
+   else
+   attr-cap.max_inline_data = 64;
 
init_attr-event_handler = nesqp-ibqp.event_handler;
init_attr-qp_context = nesqp-ibqp.qp_context;
-- 
1.6.4.2

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] opensm/osm_dump.c: dump SL2VL tables

2010-03-25 Thread Yevgeny Kliteynik
Hi Sasha,

Dumping SL2VL tables in ROUTING verbosity level when QoS is on.
This is needed for SL2VL tables analysis in general, and for
routing engines that are using IB VLs in particular, such as
torus-2QoS.

Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
---
 opensm/opensm/osm_dump.c |   61 +-
 1 files changed, 60 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_dump.c b/opensm/opensm/osm_dump.c
index 86e9c00..2c21591 100644
--- a/opensm/opensm/osm_dump.c
+++ b/opensm/opensm/osm_dump.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
  * Copyright (c) 2004-2009 Voltaire, Inc. All rights reserved.
- * Copyright (c) 2002-2006 Mellanox Technologies LTD. All rights reserved.
+ * Copyright (c) 2002-2010 Mellanox Technologies LTD. All rights reserved.
  * Copyright (c) 1996-2003 Intel Corporation. All rights reserved.
  *
  * This software is available to you under a choice of one of two
@@ -456,6 +456,60 @@ static void dump_topology_node(cl_map_item_t * item, FILE 
* file, void *cxt)
}
 }

+static void dump_sl2vl_tbl(cl_map_item_t * item, FILE * file, void *cxt)
+{
+   osm_port_t *p_port = (osm_port_t *) item;
+   osm_node_t *p_node = p_port-p_node;
+   uint32_t in_port, out_port,
+num_ports = p_node-node_info.num_ports;
+   ib_net16_t base_lid = osm_port_get_base_lid(p_port);
+   osm_physp_t *p_physp;
+   ib_slvl_table_t *p_tbl;
+   int i, n;
+   char buf[1024];
+   char * header_line =#in out : 0  1  2  3  4  5  6  7  8  9  10 11 
12 13 14 15;
+   char * separator_line = 
#;
+
+   if (!num_ports)
+   return;
+
+   fprintf(file, %s 0x%016 PRIx64 , base LID %d, 
+   \%s\\n%s\n%s\n,
+   ib_get_node_type_str(p_node-node_info.node_type),
+   cl_ntoh64(p_port-guid), cl_ntoh16(base_lid),
+   p_node-print_desc, header_line, separator_line);
+
+   if (p_node-node_info.node_type == IB_NODE_TYPE_SWITCH) {
+   for (out_port = 0; out_port = num_ports; out_port++){
+   p_physp = osm_node_get_physp_ptr(p_node, out_port);
+
+   /* no need to print SL2VL table for port that is down */
+   if (!p_physp-p_remote_physp)
+   continue;
+
+   for (in_port = 0; in_port = num_ports; in_port++) {
+   p_tbl = osm_physp_get_slvl_tbl(p_physp, 
in_port);
+   for (i = 0, n = 0; i  16; i++)
+   n += sprintf(buf + n,  %-2d,
+   ib_slvl_table_get(p_tbl, i));
+   fprintf(file, %-3d %-3d :%s\n,
+   in_port, out_port, buf);
+   }
+   }
+   } else {
+   p_physp = p_port-p_physp;
+   CL_ASSERT(p_physp-p_remote_physp);
+   p_tbl = osm_physp_get_slvl_tbl(p_physp, 0);
+   for (i = 0, n = 0; i  16; i++)
+   n += sprintf(buf + n,  %-2d,
+   ib_slvl_table_get(p_tbl, i));
+   fprintf(file, %-3d %-3d :%s\n,
+   0, p_physp-port_num, buf);
+   }
+
+   fprintf(file, %s\n\n, separator_line);
+}
+
 static void print_node_report(cl_map_item_t * item, FILE * file, void *cxt)
 {
osm_node_t *p_node = (osm_node_t *) item;
@@ -630,6 +684,11 @@ void osm_dump_all(osm_opensm_t * osm)
osm_dump_qmap_to_file(osm, opensm.mcfdbs,
  osm-subn.sw_guid_tbl,
  dump_mcast_routes, osm);
+   /* SL2VL tables */
+   if (osm-subn.opt.qos)
+   osm_dump_qmap_to_file(osm, opensm-sl2vl.dump,
+ osm-subn.port_guid_tbl,
+ dump_sl2vl_tbl, osm);
}
osm_dump_qmap_to_file(osm, opensm-subnet.lst,
  osm-subn.node_guid_tbl, dump_topology_node,
-- 
1.5.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] opensm/osm_dump.c: dump SL2VL tables

2010-03-25 Thread Jim Schutt

Hi Yevgey,

On Thu, 2010-03-25 at 09:56 -0600, Yevgeny Kliteynik wrote:
 Hi Sasha,
 
 Dumping SL2VL tables in ROUTING verbosity level when QoS is on.
 This is needed for SL2VL tables analysis in general, and for
 routing engines that are using IB VLs in particular, such as
 torus-2QoS.
 

Very cool.  Thanks.

-- Jim

 Signed-off-by: Yevgeny Kliteynik klit...@dev.mellanox.co.il
 ---
  opensm/opensm/osm_dump.c |   61 
 +-
  1 files changed, 60 insertions(+), 1 deletions(-)



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Sean Hefty
Update the port reservation code path to support AF_IB addresses.
AF_IB is limited to the port spaces defined by the RDMA CM IP Annex
which are already supported by the rdma cm.

AF_IB is used with a sockaddr_ib structure, which exposes a 64-bit
service ID, rather than a 16-bit port number used by IP.  As a result,
port numbers must be converted between the two formats as defined by
the above annex.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
changes from v1:
SID calculation in cma_bind_port for AF_IB was incorrect.

 drivers/infiniband/core/cma.c |   38 +++---
 1 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ac57155..57f1521 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -657,18 +657,28 @@ static int cma_addr_cmp(struct sockaddr *src, struct 
sockaddr *dst)
case AF_INET:
return ((struct sockaddr_in *) src)-sin_addr.s_addr !=
   ((struct sockaddr_in *) dst)-sin_addr.s_addr;
-   default:
+   case AF_INET6:
return ipv6_addr_cmp(((struct sockaddr_in6 *) src)-sin6_addr,
 ((struct sockaddr_in6 *) dst)-sin6_addr);
+   default:
+   return ib_addr_cmp(((struct sockaddr_ib *) src)-sib_addr,
+  ((struct sockaddr_ib *) dst)-sib_addr);
}
 }
 
-static inline __be16 cma_port(struct sockaddr *addr)
+/* AF_IB must be using the RDMA CM IP Annex */
+static __be16 cma_port(struct sockaddr *addr)
 {
-   if (addr-sa_family == AF_INET)
+   switch (addr-sa_family) {
+   case AF_INET:
return ((struct sockaddr_in *) addr)-sin_port;
-   else
+   case AF_INET6:
return ((struct sockaddr_in6 *) addr)-sin6_port;
+   case AF_IB:
+   return htons((u16) be64_to_cpu(((struct sockaddr_ib *) 
addr)-sib_sid));
+   default:
+   return 0;
+   }
 }
 
 static inline int cma_any_port(struct sockaddr *addr)
@@ -1945,10 +1955,24 @@ EXPORT_SYMBOL(rdma_resolve_addr);
 static void cma_bind_port(struct rdma_bind_list *bind_list,
  struct rdma_id_private *id_priv)
 {
-   struct sockaddr_in *sin;
+   struct sockaddr *addr;
+   __be16 port;
 
-   sin = (struct sockaddr_in *) id_priv-id.route.addr.src_addr;
-   sin-sin_port = htons(bind_list-port);
+   addr = (struct sockaddr *) id_priv-id.route.addr.src_addr;
+   port = htons(bind_list-port);
+
+   switch (addr-sa_family) {
+   case AF_INET:
+   ((struct sockaddr_in *) addr)-sin_port = port;
+   break;
+   case AF_INET6:
+   ((struct sockaddr_in6 *) addr)-sin6_port = port;
+   break;
+   case AF_IB:
+   ((struct sockaddr_ib *) addr)-sib_sid =
+   cpu_to_be64(((u64) id_priv-id.ps  16) + ntohs(port));
+   break;
+   }
id_priv-bind_list = bind_list;
hlist_add_head(id_priv-node, bind_list-owners);
 }



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH 22/22 v2] [for 2.6.36] rdma/ucm: allow user space to specify AF_IB when joining multicast

2010-03-25 Thread Sean Hefty
Allow user space applications to join multicast groups using MGIDs
directly.  MGIDs may be passed using AF_IB addresses.  Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Signed-off-by: Sean Hefty sean.he...@intel.com
---
changes from v1:
Forgot to include the changes to the cma.c module to support
AF_IB as part of a join request.

 drivers/infiniband/core/cma.c  |2 +
 drivers/infiniband/core/ucma.c |   55 
 include/rdma/rdma_user_cm.h|   12 -
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9e9a473..42d51b5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2897,6 +2897,8 @@ static void cma_set_mgid(struct rdma_id_private *id_priv,
 0xFF10A01B)) {
/* IPv6 address is an SA assigned MGID. */
memcpy(mgid, sin6-sin6_addr, sizeof *mgid);
+   } else if (addr-sa_family == AF_IB) {
+   memcpy(mgid, ((struct sockaddr_ib *) addr)-sib_addr, sizeof 
*mgid);
} else if ((addr-sa_family == AF_INET6)) {
ipv6_ib_mc_map(sin6-sin6_addr, dev_addr-broadcast, mc_map);
if (id_priv-id.ps == RDMA_PS_UDP)
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 2224a05..a9b917a 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1139,23 +1139,23 @@ static ssize_t ucma_notify(struct ucma_file *file, 
const char __user *inbuf,
return ret;
 }
 
-static ssize_t ucma_join_ip_multicast(struct ucma_file *file,
- const char __user *inbuf,
- int in_len, int out_len)
+static ssize_t ucma_process_join(struct ucma_file *file,
+struct rdma_ucm_join_mcast *cmd,  int out_len)
 {
-   struct rdma_ucm_join_ip_mcast cmd;
struct rdma_ucm_create_id_resp resp;
struct ucma_context *ctx;
struct ucma_multicast *mc;
+   struct sockaddr *addr;
int ret;
 
if (out_len  sizeof(resp))
return -ENOSPC;
 
-   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
-   return -EFAULT;
+   addr = (struct sockaddr *) cmd-addr;
+   if (cmd-reserved || !cmd-addr_size || (cmd-addr_size != 
rdma_addr_size(addr)))
+   return -EINVAL;
 
-   ctx = ucma_get_ctx(file, cmd.id);
+   ctx = ucma_get_ctx(file, cmd-id);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
 
@@ -1166,14 +1166,14 @@ static ssize_t ucma_join_ip_multicast(struct ucma_file 
*file,
goto err1;
}
 
-   mc-uid = cmd.uid;
-   memcpy(mc-addr, cmd.addr, sizeof cmd.addr);
+   mc-uid = cmd-uid;
+   memcpy(mc-addr, addr, cmd-addr_size);
ret = rdma_join_multicast(ctx-cm_id, (struct sockaddr *) mc-addr, 
mc);
if (ret)
goto err2;
 
resp.id = mc-id;
-   if (copy_to_user((void __user *)(unsigned long)cmd.response,
+   if (copy_to_user((void __user *)(unsigned long) cmd-response,
 resp, sizeof(resp))) {
ret = -EFAULT;
goto err3;
@@ -1198,6 +1198,38 @@ err1:
return ret;
 }
 
+static ssize_t ucma_join_ip_multicast(struct ucma_file *file,
+ const char __user *inbuf,
+ int in_len, int out_len)
+{
+   struct rdma_ucm_join_ip_mcast cmd;
+   struct rdma_ucm_join_mcast join_cmd;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   join_cmd.response = cmd.response;
+   join_cmd.uid = cmd.uid;
+   join_cmd.id = cmd.id;
+   join_cmd.addr_size = rdma_addr_size((struct sockaddr *) cmd.addr);
+   join_cmd.reserved = 0;
+   memcpy(join_cmd.addr, cmd.addr, join_cmd.addr_size);
+
+   return ucma_process_join(file, join_cmd, out_len);
+}
+
+static ssize_t ucma_join_multicast(struct ucma_file *file,
+  const char __user *inbuf,
+  int in_len, int out_len)
+{
+   struct rdma_ucm_join_mcast cmd;
+
+   if (copy_from_user(cmd, inbuf, sizeof(cmd)))
+   return -EFAULT;
+
+   return ucma_process_join(file, cmd, out_len);
+}
+
 static ssize_t ucma_leave_multicast(struct ucma_file *file,
const char __user *inbuf,
int in_len, int out_len)
@@ -1361,7 +1393,8 @@ static ssize_t (*ucma_cmd_table[])(struct ucma_file *file,
[RDMA_USER_CM_CMD_MIGRATE_ID]   = ucma_migrate_id,
[RDMA_USER_CM_CMD_QUERY]= ucma_query,
[RDMA_USER_CM_CMD_BIND] = ucma_bind,
-   

[PATCH] [for-2.6.34] rdma/cm: set num_paths when manually assigning path records

2010-03-25 Thread Sean Hefty
When manually assigning the path records to use for a connection,
save the number of paths that were saved.  Otherwise, checks
against num_path will show 0, even though path record data is
available.

This was discovered by manually setting the path records from
user space, then querying the kernel to see if the correct
path records were assigned, only to discover that the kernel
returned 0 path records to the query.

Signed-off-by: Sean Hefty sean.he...@intel.com
---

 drivers/infiniband/core/cma.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 42d51b5..38906f2 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1783,6 +1783,7 @@ int rdma_set_ib_paths(struct rdma_cm_id *id,
}
 
memcpy(id-route.path_rec, path_rec, sizeof *path_rec * num_paths);
+   id-route.num_paths = num_paths;
return 0;
 err:
cma_comp_exch(id_priv, CMA_ROUTE_RESOLVED, CMA_ADDR_RESOLVED);



--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Jason Gunthorpe
On Thu, Mar 25, 2010 at 12:05:33PM -0700, Sean Hefty wrote:
 + case AF_IB:
 + ((struct sockaddr_ib *) addr)-sib_sid =
 + cpu_to_be64(((u64) id_priv-id.ps  16) + ntohs(port));
 + break;
 + }

Could you elaborate a bit on how you are mixing the port space and the
SID?

IMHO, after thinking about it for a bit, I would prefer to see the
port space be unused from a user-space perspective when used with
AF_IB.

If a ps is needed in the kernel then it should pick the ps based on
the SID prefix that user space provided.. I guess prior to adding the
mask bits this wouldn't have made sense, but now that they are in it
seems like the way to go.

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Sean Hefty
Could you elaborate a bit on how you are mixing the port space and the
SID?

The SID is divided into multiple, disjoint regions.  (Annex 3 defines some of
the regions.)  The port space selects a specific region.

IMHO, after thinking about it for a bit, I would prefer to see the
port space be unused from a user-space perspective when used with
AF_IB.

The rdma_cm_id is associated with a port space on creation, before it's known
what address family will be used.  The kernel code enforces that the SID is
formatted correctly for the port space that was selected.

If a ps is needed in the kernel then it should pick the ps based on
the SID prefix that user space provided.. I guess prior to adding the
mask bits this wouldn't have made sense, but now that they are in it
seems like the way to go.

I added sib_mask to sockaddr_ib, but it's unused at this point.  I think what
you're saying makes sense, but for RDMA_PS_IB, once it's defined.  RDMA_PS_IB
may need to behave as RDMA_PS_TCP or RDMA_PS_UDP based on the sib_mask and SID,
and ensure that any selected SID is reserved from the correct underlying SID
region.

Does this make sense?

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Sean Hefty
Does this early association with the port space have any effect?

Doing it early versus later - no.  Doing it at all - yes.

I would be inclined to ditch the port space concept entirely for
AF_IB. Just ignore the input parameter and always base the selection
on the SID region. It is confusing to that there are two ways to
specify the same thing.

The address family is not sufficient to determine the desired port space.  For
example, AF_INET is usable for RDMA_PS_TCP and RDMA_PS_UDP.

Is there any reason the port space has to be known when the cm_id is
created but before bind?

No - but it is still required for transport neutrality.  rdma_create_id() simply
stores the value.

So... the port space could be specified during rdma_bind_addr and
rdma_resolve_addr instead of rdma_create_id, with the port space being
determined based on the sid/mask if AF_IB, rather than given directly.  That
seems more complex to me than just specifying it up front when rdma_create_id is
called.

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to monitor traffic with IBoE/RoCEE?

2010-03-25 Thread Pradeep Satyanarayana
With IBoE/RoCEE, the traditional SM in IB clusters is not needed. Most of the 
current
IB tools rely on the SM and PM to get packet and error statistics and so on. 
These 
won't be applicable with IBoE/RoCEE. netstat will have no value since the 
kernel 
has been bypassed. So, how does one monitor traffic in such a cluster?

The possibilities that I can think of are to get information from the switches 
or if there 
are some special tools to get information from the adapter ports themselves. 
Are such tools
available with ConnectX adapters? How would one get cluster-wide traffic 
information when
such a cluster is deployed?

Thanks
Pradeep

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Sean Hefty
Is there any reason the port space has to be known when the cm_id is
created but before bind?

No - but it is still required for transport neutrality.  rdma_create_id()
simply stores the value.

To correct this slightly, a user can call listen after calling create_id without
calling bind.  The bind is done internally to listen.

The port space really should be indicated at creation time.  If we can agree on
that, then the kernel simply needs to decide how to handle AF_IB at bind time.

These patches handle it by setting the well known portion of the SID and acting
on the other bits based on them being set.  The mask is unused.  For the
existing port spaces, we can leave this, or require the user provide a SID/mask
that is consistent with the chosen port space.

For RDMA_PS_IB, I think the flow that you outlined in your other email makes
sense.  Let RDMA_PS_IB cover the entire SID range.  When the SID/mask fall into
an existing port space, it needs to reserve a port from within that port space
to avoid collisions.

- Sean

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 5/22 v2] [for 2.6.36] rdma/cm: update port reservation to support AF_IB

2010-03-25 Thread Jason Gunthorpe
On Thu, Mar 25, 2010 at 08:56:23PM -0700, Sean Hefty wrote:
 Is there any reason the port space has to be known when the cm_id is
 created but before bind?
 
 No - but it is still required for transport neutrality.  rdma_create_id()
 simply stores the value.
 
 To correct this slightly, a user can call listen after calling create_id 
 without
 calling bind.  The bind is done internally to listen.
 
 The port space really should be indicated at creation time.  If we can agree 
 on
 that, then the kernel simply needs to decide how to handle AF_IB at bind time.

I still think AF_IB should not have any port space other than
RDMA_PS_IB - since it is functionally inclusive of all the other
cases.

 These patches handle it by setting the well known portion of the SID and 
 acting
 on the other bits based on them being set.  The mask is unused.  For the
 existing port spaces, we can leave this, or require the user provide a 
 SID/mask
 that is consistent with the chosen port space.

See, to me this is very not nice. The address is specified with a
SID/mask, having the kernel ignore the mask and overwrite bits in the
SID makes no sense as an API.
 
 For RDMA_PS_IB, I think the flow that you outlined in your other email makes
 sense.  Let RDMA_PS_IB cover the entire SID range.  When the SID/mask fall 
 into
 an existing port space, it needs to reserve a port from within that port space
 to avoid collisions.

Right :) I just don't understand why you think AF_IB/RDMA_PS_TCP has
to be a supported combination.

Is there some use case you see where having RDMA_PS_TCP/UDP work with
AF_IB is an advantage?

Jason
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html