[ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs

2010-05-31 Thread Alexander Schmidt
Hi Roland,

we have been working on adressing your review comments and are looking for
feedback regarding v2 now.

Problem description:

When fork support is enabled in libibverbs, madvise() is called for every
memory page that is registered as a memory region. Memory ranges that
are passed to madvise() must be page aligned and the size must be a
multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find
out the system page size and rounds all ranges passed to reg_mr() according
to this page size. When memory from libhugetlbfs is passed to reg_mr(), this
does not work as the page size for this memory range might be different
(e.g. 16Mb). So libibverbs would have to use the huge page size to
calculate a page aligned range for madvise.

As huge pages are provided to the application under the hood when
preloading libhugetlbfs, the application does not have any knowledge about
when it registers a huge page or a usual page.

To work around this issue, detect the use of huge pages in libibverbs and
align memory ranges passed to madvise according to the huge page size.

Changes since v1:

- detect use of huge pages at ibv_fork_init() time by walking through
  /sys/kernel/mm/hugepages/
- read huge page size from /proc/pid/smaps, which contains the page
  size of the mapping (thereby enabling support for mutliple huge
  page sizes)
- code is independent of libhugetlbfs now, so huge pages can be provided
  to the application by any library

Performance:

PPC64 system with eHCA

without patch:
1M memory region120usec
16M memory region  1970usec 

with patch v2:
1M memory region172usec
16M memory region  2030usec

with patch and 16M huge pages:
1M memory region110usec
16M memory region   193usec

Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com
---
 src/memory.c |  137 ---
 1 file changed, 131 insertions(+), 6 deletions(-)

--- libibverbs-1.1.2.orig/src/memory.c
+++ libibverbs-1.1.2/src/memory.c
@@ -40,6 +40,10 @@
 #include unistd.h
 #include stdlib.h
 #include stdint.h
+#include stdio.h
+#include string.h
+#include dirent.h
+#include limits.h
 
 #include ibverbs.h
 
@@ -68,12 +72,117 @@ struct ibv_mem_node {
 static struct ibv_mem_node *mm_root;
 static pthread_mutex_t mm_mutex = PTHREAD_MUTEX_INITIALIZER;
 static int page_size;
+static int huge_page_enabled;
 static int too_late;
 
+static int is_huge_page_enabled(void)
+{
+   int n, ret = 0;
+   char *bufp;
+   DIR *dir;
+   struct dirent *entry;
+   FILE *file;
+   unsigned long nr_hugepages;
+   char buf[1024];
+
+   dir = opendir(/sys/kernel/mm/hugepages/);
+   if (!dir)
+   return 0;
+
+   while ((entry = readdir(dir))) {
+   if (strncmp(entry-d_name, hugepages-, 10))
+   continue;
+
+   snprintf(buf, sizeof(buf), 
/sys/kernel/mm/hugepages/%s/nr_hugepages,
+   entry-d_name);
+
+   file = fopen(buf, r);
+   if (!file)
+   continue;
+
+   bufp = fgets(buf, sizeof(buf), file);
+   fclose(file);
+   if (!bufp)
+   continue;
+
+   n = sscanf(buf, %lu, nr_hugepages);
+   if (n  1)
+   continue;
+
+   if (nr_hugepages) {
+   ret = 1;
+   goto out;
+   }
+   }
+
+out:
+   closedir(dir);
+
+   return ret;
+}
+
+static unsigned long smaps_page_size(FILE *file)
+{
+   int n;
+   unsigned long size = page_size;
+   char buf[1024];
+
+   while (fgets(buf, sizeof(buf), file) != NULL) {
+   if (!strstr(buf, KernelPageSize:))
+   continue;
+
+   n = sscanf(buf, %*s %lu, size);
+   if (n  1)
+   continue;
+
+   /* page size is printed in Kb */
+   size = size * 1024;
+
+   break;
+   }
+
+   return size;
+}
+
+static unsigned long get_page_size(void *base)
+{
+   unsigned long ret = page_size;
+   pid_t pid;
+   FILE *file;
+   char buf[1024];
+
+   pid = getpid();
+   snprintf(buf, sizeof(buf), /proc/%d/smaps, pid);
+
+   file = fopen(buf, r);
+   if (!file)
+   goto out;
+
+   while (fgets(buf, sizeof(buf), file) != NULL) {
+   int n;
+   uintptr_t range_start, range_end;
+
+   n = sscanf(buf, %lx-%lx, range_start, range_end);
+
+   if (n  2)
+   continue;
+
+   if ((uintptr_t) base = range_start  (uintptr_t) base  
range_end) {
+   ret = smaps_page_size(file);
+   break;
+   }
+   }
+   fclose(file);
+
+out:
+   return ret;
+}
+
 int ibv_fork_init(void)
 {
-   void *tmp;
+   void *tmp, *tmp_aligned;
int ret;
+ 

[ewg] ofa_1_5_kernel 20100531-0200 daily build status

2010-05-31 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git
git_branch: ofed_kernel_1_5

Common build parameters: 

Passed:
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.21.1
Passed on i686 with linux-2.6.26
Passed on i686 with linux-2.6.24
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.27
Passed on x86_64 with linux-2.6.16.60-0.54.5-smp
Passed on x86_64 with linux-2.6.16.60-0.21-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.18-128.el5
Passed on x86_64 with linux-2.6.18-164.el5
Passed on x86_64 with linux-2.6.18-194.el5
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18-93.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.26
Passed on x86_64 with linux-2.6.25
Passed on x86_64 with linux-2.6.27
Passed on x86_64 with linux-2.6.27.19-5-smp
Passed on x86_64 with linux-2.6.9-67.ELsmp
Passed on x86_64 with linux-2.6.9-78.ELsmp
Passed on x86_64 with linux-2.6.9-89.ELsmp
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.19
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.26
Passed on ia64 with linux-2.6.24
Passed on ia64 with linux-2.6.25
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.19

Failed:
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [ANNOUNCE] OFED 1.5.2 rc1 is available

2010-05-31 Thread Vladimir Sokolovsky

Hi,
OFED 1.5.2-rc1 is available

Notes:

The tarball is available on:
http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/OFED-1.5.2-rc1.tgz

To get BUILD_ID run ofed_info

Please report any issues in bugzilla https://bugs.openfabrics.org/  for
OFED 1.5.2

Vladimir  Tziporet


Supported Platforms and Operating Systems
-
o   CPU architectures:
  - x86_64
  - x86
  - ppc64
  - ia64

o   Linux Operating Systems:
  - RedHat EL4 up72.6.9-78.ELsmp
  - RedHat EL4 up82.6.9-89.ELsmp
  - RedHat EL5 up32.6.18-128.el5
  - RedHat EL5 up42.6.18-164.el5
  - RedHat EL5 up52.6.18-194.el5
  - SLES10 SP22.6.16.60-0.21-smp
  - SLES10 SP32.6.16.60-0.54-smp
  - SLES112.6.27.19-5-default
  - OEL 4 up7 2.6.9-78.ELsmp
  - OEL 4 up8 2.6.9-89.ELsmp
  - CentOS5.3 2.6.18-128.el5
  - CentOS5.4 2.6.18-164.el5
  - Fedora Core12 2.6.31.5-127.fc12*
  - OpenSuSE 11.2 2.6.31.5-0.1-default *
  - kernel.org2.6.29, 2.6.30,
  2.6.31 and 2.6.32*

* Minimal QA for these versions

Main changes from 1.5.1:
===
1. Updated packages:
   libibverbs-1.1.3-0.8.g4d733f4.tar
   libehca-1.2.2-0.1.g69e1a88.tar.gz
   libnes-1.0.1-0.3.g8d69734.tar.gz
   compat-dapl-1.2.17.tar.gz
   librdmacm-1.0.12.tar.gz
   dapl-2.0.28.tar.gz

   - Management
 infiniband-diags-1.5.6.tar.gz
 libibmad-1.3.5.tar.gz
 libibumad-1.3.5.tar.gz
 opensm-3.3.6.tar.gz

   - MPI
 openmpi-1.4.2-1.src.rpm

2. Added RHEL 6 beta support


Known issues:
=
librdmacm-1.0.12 compilation fails on RHEL4.x



commit b2f53cd53d470604bc002699ba905ee23892fa17
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Sun May 23 11:40:02 2010 +0300

sdp: new debug function added, minor debug message change.

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit 41cc7a8a7805a6ce8ce8028fc8d9742867b0506c
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Sun May 30 14:03:43 2010 +0300

sdp: device removal rewritten for a stability improvement.

main changes:
1. device_removal_lock is better used.
2. sdp_dev is marked NULL in order to prevent new sockets born to the 
removed
   device.
3. new timeout functionality used when a reference count was taken for the 
CMA
   to return, but the CMA won't be invoked because rdma_id was destroyed.

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit d96f1205fd59992c82fc64b63d3fcaa328976794
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Tue May 18 13:35:56 2010 +0300

sdp: unnecessary local variable removed, 'const' declarations added

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit 9ad5e04ed192cf5daae49864cf4da54b89701900
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Tue May 18 13:32:27 2010 +0300

sdp: tx timer is deleted when sockets goes to TCP_CLOSE

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit 314569eb0bf4cf8fb40af7b5e82cbc362d001858
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Tue May 18 11:48:54 2010 +0300

sdp: canceled a call to sdp_desroy_work() on send completion with error

No need to destroy resources after send completion with error.

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit ba9dbc39149fb34fab383b190ca03c6d1017a92d
Author: Eldad Zinger eld...@mellanox.co.il
Date:   Tue May 18 11:34:28 2010 +0300

sdp: unnecessary wait-queue removed from sdp_sock structure.

Signed-off-by: Eldad Zinger eld...@mellanox.co.il

commit edda276773fd8f68d374990c2d1e91545ad2a5c0
Author: Eli Cohen e...@mellanox.co.il
Date:   Thu May 27 11:06:19 2010 +0300

Backport: Add missing dev_id sysfs file

Signed-off-by: Eli Cohen e...@mellanox.co.il

commit dcfc4bb33fa2b8f5498fb6e1966af8731a28b6f7
Author: Eli Cohen e...@mellanox.co.il
Date:   Thu May 27 09:59:15 2010 +0300

Backport dev_id to 2.6.9

Since dev_id does not exsist as a member of struct net_device in kernel 
2.6.9,
avoid using it in mlx4_en. I will add means to read dev_id in subsequent
patches.

Signed-off-by: Eli Cohen e...@mellanox.co.il

commit 977323bd115b8737604393fcf1af144fcc205507
Author: Eli Cohen e...@mellanox.co.il
Date:   Thu May 27 09:02:02 2010 +0300

mlx4_en: use net_device dev_id to indicate port number

Today, there are no means to know which port of a hardware device a netdev
interface uses. struct net_device conatins a field, dev_id, that can be used
for that. Use this field to save the port number in ConnectX that is being 
used
by the net device; port