Re: [ewg] RAW_ETH support [PATCH 0/2]

2010-06-09 Thread Moni Shoua
Aleksey Senin wrote:
 Those patches add new RAW_ETH QP type to the kernel in order to support
 creation of RAW Ethernet packets for iWARP and RDMAOE protocols.
 The reason for new type is that RAW_ETY QP  already used by Mellanox
 drivers for another purpose. Another reason, that there is RAW_ETH QP
 type already present in userspace, but it mapped to RAW_ETY type in the
 kernel and cause to confusion when dealing with code.
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
 
Vlad,
With Mirek's approval, we I think that nothing prevents from accepting these 
changes.
thanks

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs

2010-06-09 Thread Alexander Schmidt
On Wed, 02 Jun 2010 14:49:37 -0700
Roland Dreier rdre...@cisco.com wrote:

 So if I read this correctly this patch introduces almost a 50% overhead
 in the 1M case... and probably much worse (as a fraction) in say the 64K
 or 4K case.  I wonder if that's acceptable?

We don't think this is acceptable, so we like the third approach you suggested
very much. I've written the code and attached it below. This third version
does not introduce additional overhead when not using huge pages (verified
with 4k, 64k, 1m and 16m memory regions).

Problem description:

When fork support is enabled in libibverbs, madvise() is called for every
memory page that is registered as a memory region. Memory ranges that
are passed to madvise() must be page aligned and the size must be a
multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find
out the system page size and rounds all ranges passed to reg_mr() according
to this page size. When memory from libhugetlbfs is passed to reg_mr(), this
does not work as the page size for this memory range might be different
(e.g. 16Mb). So libibverbs would have to use the huge page size to
calculate a page aligned range for madvise.

As huge pages are provided to the application under the hood when
preloading libhugetlbfs, the application does not have any knowledge about
when it registers a huge page or a usual page.

To work around this issue, detect the use of huge pages in libibverbs and
align memory ranges passed to madvise according to the huge page size.

Changes since v1:

- detect use of huge pages at ibv_fork_init() time by walking through
  /sys/kernel/mm/hugepages/
- read huge page size from /proc/pid/smaps, which contains the page
  size of the mapping (thereby enabling support for mutliple huge
  page sizes)
- code is independent of libhugetlbfs now, so huge pages can be provided
  to the application by any library

Changes since v2:

- reading from /proc/ to determine the huge page size is now only done
  when a call to madvise() using the system page size fails. So there
  is no overhead introduced when registering non-huge-page memory.

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com
---
 src/memory.c |   96 +++
 1 file changed, 90 insertions(+), 6 deletions(-)

--- libibverbs.git.orig/src/memory.c
+++ libibverbs.git/src/memory.c
@@ -40,6 +40,8 @@
 #include unistd.h
 #include stdlib.h
 #include stdint.h
+#include stdio.h
+#include string.h
 
 #include ibverbs.h
 
@@ -70,10 +72,64 @@ static pthread_mutex_t mm_mutex = PTHREA
 static int page_size;
 static int too_late;
 
+static unsigned long smaps_page_size(FILE *file)
+{
+   int n;
+   unsigned long size = 0;
+   char buf[1024];
+
+   while (fgets(buf, sizeof(buf), file) != NULL) {
+   if (!strstr(buf, KernelPageSize:))
+   continue;
+
+   n = sscanf(buf, %*s %lu, size);
+   if (n  1)
+   continue;
+
+   /* page size is printed in Kb */
+   size = size * 1024;
+
+   break;
+   }
+
+   return size;
+}
+
+static unsigned long get_page_size(void *base)
+{
+   unsigned long ret = 0;
+   FILE *file;
+   char buf[1024];
+
+   file = fopen(/proc/self/smaps, r);
+   if (!file)
+   goto out;
+
+   while (fgets(buf, sizeof(buf), file) != NULL) {
+   int n;
+   uintptr_t range_start, range_end;
+
+   n = sscanf(buf, %lx-%lx, range_start, range_end);
+
+   if (n  2)
+   continue;
+
+   if ((uintptr_t) base = range_start  (uintptr_t) base  
range_end) {
+   ret = smaps_page_size(file);
+   break;
+   }
+   }
+   fclose(file);
+
+out:
+   return ret;
+}
+
 int ibv_fork_init(void)
 {
-   void *tmp;
+   void *tmp, *tmp_aligned;
int ret;
+   unsigned long size;
 
if (mm_root)
return 0;
@@ -88,8 +144,17 @@ int ibv_fork_init(void)
if (posix_memalign(tmp, page_size, page_size))
return ENOMEM;
 
-   ret = madvise(tmp, page_size, MADV_DONTFORK) ||
- madvise(tmp, page_size, MADV_DOFORK);
+   size = get_page_size(tmp);
+
+   if (size)
+   tmp_aligned = (void *)((uintptr_t)tmp  ~(size - 1));
+   else {
+   size = page_size;
+   tmp_aligned = tmp;
+   }
+
+   ret = madvise(tmp_aligned, size, MADV_DONTFORK) ||
+ madvise(tmp_aligned, size, MADV_DOFORK);
 
free(tmp);
 
@@ -522,7 +587,8 @@ static struct ibv_mem_node *undo_node(st
return node;
 }
 
-static int ibv_madvise_range(void *base, size_t size, int advice)
+static int ibv_madvise_range(void *base, size_t size, int advice,
+unsigned long page_size)
 {
uintptr_t start, end;
struct ibv_mem_node 

[ewg] ofa_1_5_kernel 20100609-0200 daily build status

2010-06-09 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-194.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] New RAW ETH QP type v2 [ PATCH 1/1 ]

2010-06-09 Thread Aleksey Senin
Previous v1 missed implementation in verbs.c file.

Add RAW ETH functionality to verbs layer. This QP used for creation RAW 
Ethernet packets 
over iWARP and RDMAOE protocols.



Signed-off-by: Aleksey Senin aleks...@voltaire.com
---
 drivers/infiniband/core/verbs.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 881850e..bb4dcd5 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -382,6 +382,7 @@ static const struct {
[IB_QPT_UD]  = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_QKEY),
+   [IB_QPT_RAW_ETH] = IB_QP_PORT,
[IB_QPT_UC]  = (IB_QP_PKEY_INDEX
|
IB_QP_PORT  
|
IB_QP_ACCESS_FLAGS),
@@ -1004,7 +1005,11 @@ int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, 
u16 lid)
 
switch (rdma_node_get_transport(qp-device-node_type)) {
case RDMA_TRANSPORT_IB:
-   if (gid-raw[0] != 0xff || qp-qp_type != IB_QPT_UD)
+   if (qp-qp_type == IB_QPT_RAW_ETH) {
+   /* In raw Etherent mgids the 63 msb's should be 0 */
+   if (gid-global.subnet_prefix  cpu_to_be64(~1ULL))
+   return -EINVAL;
+   } else if (gid-raw[0] != 0xff || qp-qp_type != IB_QPT_UD)
return -EINVAL;
break;
case RDMA_TRANSPORT_IWARP:
@@ -1023,7 +1028,11 @@ int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, 
u16 lid)
 
switch (rdma_node_get_transport(qp-device-node_type)) {
case RDMA_TRANSPORT_IB:
-   if (gid-raw[0] != 0xff || qp-qp_type != IB_QPT_UD)
+   if (qp-qp_type == IB_QPT_RAW_ETH) {
+   /* In raw Etherent mgids the 63 msb's should be 0 */
+   if (gid-global.subnet_prefix  cpu_to_be64(~1ULL))
+   return -EINVAL;
+   } else if (gid-raw[0] != 0xff || qp-qp_type != IB_QPT_UD)
return -EINVAL;
break;
case RDMA_TRANSPORT_IWARP:
-- 
1.6.5.2


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]

2010-06-09 Thread Aleksey Senin
Add RAW ETH QP support for Mellanox adapters.


Signed-off-by: Aleksey Senin aleks...@voltaire.com
---
 drivers/infiniband/hw/mlx4/main.c |   13 +
 drivers/infiniband/hw/mlx4/qp.c   |   25 +
 drivers/net/mlx4/mcg.c|   22 +-
 include/linux/mlx4/device.h   |7 +--
 4 files changed, 48 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index c146b84..6841dc7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -684,7 +684,9 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct mlx4_ib_qp *mqp = to_mqp(ibqp);
 
err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, 
!!(mqp-flags 
-   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK));
+   MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
+   (ibqp-qp_type == IB_QPT_RAW_ETH) ?
+   MLX4_PROT_EN : MLX4_PROT_IB);
if (err)
return err;
 
@@ -695,7 +697,9 @@ static int mlx4_ib_mcg_attach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
return 0;
 
 err_add:
-   mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw);
+   mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw,
+   (ibqp-qp_type == IB_QPT_RAW_ETH) ?
+   MLX4_PROT_EN : MLX4_PROT_IB);
return err;
 }
 
@@ -724,8 +728,9 @@ static int mlx4_ib_mcg_detach(struct ib_qp *ibqp, union 
ib_gid *gid, u16 lid)
struct net_device *ndev;
struct gid_entry *ge;
 
-   err = mlx4_multicast_detach(mdev-dev,
-   mqp-mqp, gid-raw);
+   err = mlx4_multicast_detach(mdev-dev, mqp-mqp, gid-raw,
+   (ibqp-qp_type == IB_QPT_RAW_ETH) ?
+   MLX4_PROT_EN : MLX4_PROT_IB);
if (err)
return err;
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 422d367..b6b484d 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -811,6 +811,7 @@ struct ib_qp *mlx4_ib_create_qp(struct ib_pd *pd,
case IB_QPT_RC:
case IB_QPT_UC:
case IB_QPT_UD:
+   case IB_QPT_RAW_ETH:
{
qp = kzalloc(sizeof *qp, GFP_KERNEL);
if (!qp)
@@ -902,6 +903,7 @@ static int to_mlx4_st(enum ib_qp_type type)
case IB_QPT_RAW_ETY:
case IB_QPT_SMI:
case IB_QPT_GSI:return MLX4_QP_ST_MLX;
+   case IB_QPT_RAW_ETH:return MLX4_QP_ST_MLX;
default:return -1;
}
 }
@@ -1064,8 +1066,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
break;
}
}
-
-   if (ibqp-qp_type == IB_QPT_GSI || ibqp-qp_type == IB_QPT_SMI ||
+   if (ibqp-qp_type == IB_QPT_RAW_ETH)
+   context-mtu_msgmax = 0xff;
+   else if (ibqp-qp_type == IB_QPT_GSI || ibqp-qp_type == IB_QPT_SMI ||
ibqp-qp_type == IB_QPT_RAW_ETY)
context-mtu_msgmax = (IB_MTU_4096  5) | 11;
else if (ibqp-qp_type == IB_QPT_UD) {
@@ -1237,12 +1240,16 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
if (cur_state == IB_QPS_INIT 
new_state == IB_QPS_RTR  
(ibqp-qp_type == IB_QPT_GSI || ibqp-qp_type == IB_QPT_SMI ||
-ibqp-qp_type == IB_QPT_UD || ibqp-qp_type == IB_QPT_RAW_ETY)) {
+ibqp-qp_type == IB_QPT_UD || ibqp-qp_type == IB_QPT_RAW_ETY ||
+   ibqp-qp_type == IB_QPT_RAW_ETH)) {
context-pri_path.sched_queue = (qp-port - 1)  6;
if (is_qp0(dev, qp))
context-pri_path.sched_queue |= 
MLX4_IB_DEFAULT_QP0_SCHED_QUEUE;
else
context-pri_path.sched_queue |= 
MLX4_IB_DEFAULT_SCHED_QUEUE;
+
+   /* Default counter for non-RC QPs */
+   context-pri_path.counter_index = 0xff;
}
 
if (cur_state == IB_QPS_RTS  new_state == IB_QPS_SQD  
@@ -1356,7 +1363,7 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *attr,
goto out;
}
 
-   if ((attr_mask  IB_QP_PORT) 
+   if ((attr_mask  IB_QP_PORT)  (ibqp-qp_type != IB_QPT_RAW_ETH) 
(attr-port_num == 0 || attr-port_num  dev-num_ports)) {
mlx4_ib_dbg(qpn 0x%x: invalid port number (%d) specified 
for transition %d to %d. qp_type %d,
@@ -1365,6 +1372,16 @@ int mlx4_ib_modify_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *attr,
goto out;
}
 
+   if ((attr_mask  IB_QP_PORT)  (ibqp-qp_type == IB_QPT_RAW_ETH) 
+   (rdma_port_link_layer(dev-ib_dev, attr-port_num)
+   != 

Re: [ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]

2010-06-09 Thread Eli Cohen
On Wed, Jun 09, 2010 at 02:29:57PM +0300, Aleksey Senin wrote:
  
   err = mlx4_multicast_attach(mdev-dev, mqp-mqp, gid-raw, 
 !!(mqp-flags 
 - MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK));
 + MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK),
 + (ibqp-qp_type == IB_QPT_RAW_ETH) ?
 + MLX4_PROT_EN : MLX4_PROT_IB);
   if (err)
   return err;

Usage of MLX4_PROT_EN and MLX4_PROT_IB is wrong in this context since
they are used for a totally different purpose. You need to define a
new enum and explicitly set values for it to reflect hardware
definitions.

 @@ -1237,12 +1240,16 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
   if (cur_state == IB_QPS_INIT 
   new_state == IB_QPS_RTR  
   (ibqp-qp_type == IB_QPT_GSI || ibqp-qp_type == IB_QPT_SMI ||
 -  ibqp-qp_type == IB_QPT_UD || ibqp-qp_type == IB_QPT_RAW_ETY)) {
 +  ibqp-qp_type == IB_QPT_UD || ibqp-qp_type == IB_QPT_RAW_ETY ||
 + ibqp-qp_type == IB_QPT_RAW_ETH)) {
   context-pri_path.sched_queue = (qp-port - 1)  6;
   if (is_qp0(dev, qp))
   context-pri_path.sched_queue |= 
 MLX4_IB_DEFAULT_QP0_SCHED_QUEUE;
   else
   context-pri_path.sched_queue |= 
 MLX4_IB_DEFAULT_SCHED_QUEUE;
 +
 + /* Default counter for non-RC QPs */
 + context-pri_path.counter_index = 0xff;

Looks like this breaks hardware counters. Why are you using this
statement? Also it appears that the patches were not created against
latest OFED 1.5.2 sources.

Did you check that counters are still working for RoCEE after this
patch?

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] RAW ETH support for Mellanox v1 [PATCH 1/1]

2010-06-09 Thread Aleksey Senin

 Did you check that counters are still working for RoCEE after this
 patch?
 

I'll check the counters issue, but today patches created agains
OFED-1.5.2-20100607-0636 and should be to be installed under
kernel_patches/fixes directory.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs

2010-06-09 Thread Roland Dreier
Thanks, nice work.  I like this approach.  Alex (Vainman) any comments
on this?

 - R.
-- 
Roland Dreier rola...@cisco.com || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] EWG/OFED meeting June 7, 2010 meeting minutes

2010-06-09 Thread Betsy Zeller
Tziporet - The fix so PSM doesn't try and build on unsupported systems
was submitted last week, so that should be covered. I was a bit
surprised to see it raised as an issue here, but you might not have had
a chance to catch up.

Also, we have done some testing on OFED 1.5.2, including with PSM, and
that is going well.

- Betsy


On Tue, 2010-06-08 at 01:44 -0700, Tziporet Koren wrote:
 Meeting summary:
 
 1. OFED 1.5.2 progress as planned
 2. We plan to have RC2 on Thursday this week (Jun 10, 2010)
 
 Meeting details:
 
 1. OFED 1.5.2 - features status
 - Add new OSes: 
- RHEL 5.5 - done
- SLES11 SP1 - Jeff Backer volunteered to do it
- Add RHEL6 beta - done
 - Update the management package - new package was provided (not final)
 - Update with new libibverbs 1.1.4 from Roland - on work
 - Add-on packages that does not touch the core:
- Qlogic wish to add PSM library - Need to fix PSN library not to build on 
 systems 
  that are not supported: ia64 and PPC - Betsy should be 
 responsible for this
- New libehca tarball - done
- iWarp Multicast Acceleration (IBV_QPT_RAW_ETH) - done
- Add IBV_QPT_RAW_ETH for mlx4 - Voltaire - with in discussion between V  
 Mellanox. 
Moni to coordinate change for the nes driver on 
 RAW Eth QP
- ACM - Sean - done
- uDAPL package with bug fixes - better support for RoCE - done
- SDP Zcopy in GA - on work - toward completion
 - Critical bug fixes - ongoing
 
 2. OFED 1.5.2 testing status - all
 Voltaire will start more testing after RC2
 Intel - Woody - continue testing - all good so far
 Nes - have one issue that they should fix
 IBM - not much testing. Will start on RC2
 Qlogic - no info
 HP - not testing
 Mellanox - regression is running, focused on SDP
 
 
 Open: Is anyone interested to add more kernel.org support beyond 2.6.32 we 
 already support
 
 3. Schedule: 
 Beta - May 3 - done - used in the interop. Report on OFED issues will be 
 provided.
 RC1  - May 31 - done
 RC2  - Jun 10
 RC3  - Jun 22
 GA   - Jun 29
 
 Tziporet
 
 
 Tziporet
 ___
 ewg mailing list
 ewg@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [GIT PULL] RDMA/nes: update for 1.5.2 RC2

2010-06-09 Thread Tung, Chien Tin
Vlad,

Please pull from:

  ssh://v...@sofa.openfabrics.org/home/ctung/scm/ofed-1.5.git ofed_kernel_1_5

for:

Chien Tung (1):
  RDMA/nes: get and print eeprom version number

Mirek Walukiewicz (1):
  RDMA/nes: Added missing mutex during memory registration

 kernel_patches/fixes/nes_0035_eeprom_version.patch |   34 
 kernel_patches/fixes/nes_0036_ima_mutex_fix.patch  |   22 +
 2 files changed, 56 insertions(+), 0 deletions(-)
 create mode 100644 kernel_patches/fixes/nes_0035_eeprom_version.patch
 create mode 100644 kernel_patches/fixes/nes_0036_ima_mutex_fix.patch

Also please pull in Aleksey's RAW ETH support series and 
New RAW ETH QP type v2 [ PATCH 1/1 ] specific for nes.

Thanks,

Chien


___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg