[ewg] [PATCH v2] libibverbs: ibv_fork_init() and libhugetlbfs
Hi Roland, we have been working on adressing your review comments and are looking for feedback regarding v2 now. Problem description: When fork support is enabled in libibverbs, madvise() is called for every memory page that is registered as a memory region. Memory ranges that are passed to madvise() must be page aligned and the size must be a multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find out the system page size and rounds all ranges passed to reg_mr() according to this page size. When memory from libhugetlbfs is passed to reg_mr(), this does not work as the page size for this memory range might be different (e.g. 16Mb). So libibverbs would have to use the huge page size to calculate a page aligned range for madvise. As huge pages are provided to the application under the hood when preloading libhugetlbfs, the application does not have any knowledge about when it registers a huge page or a usual page. To work around this issue, detect the use of huge pages in libibverbs and align memory ranges passed to madvise according to the huge page size. Changes since v1: - detect use of huge pages at ibv_fork_init() time by walking through /sys/kernel/mm/hugepages/ - read huge page size from /proc/pid/smaps, which contains the page size of the mapping (thereby enabling support for mutliple huge page sizes) - code is independent of libhugetlbfs now, so huge pages can be provided to the application by any library Performance: PPC64 system with eHCA without patch: 1M memory region120usec 16M memory region 1970usec with patch v2: 1M memory region172usec 16M memory region 2030usec with patch and 16M huge pages: 1M memory region110usec 16M memory region 193usec Signed-off-by: Alexander Schmidt al...@linux.vnet.ibm.com --- src/memory.c | 137 --- 1 file changed, 131 insertions(+), 6 deletions(-) --- libibverbs-1.1.2.orig/src/memory.c +++ libibverbs-1.1.2/src/memory.c @@ -40,6 +40,10 @@ #include unistd.h #include stdlib.h #include stdint.h +#include stdio.h +#include string.h +#include dirent.h +#include limits.h #include ibverbs.h @@ -68,12 +72,117 @@ struct ibv_mem_node { static struct ibv_mem_node *mm_root; static pthread_mutex_t mm_mutex = PTHREAD_MUTEX_INITIALIZER; static int page_size; +static int huge_page_enabled; static int too_late; +static int is_huge_page_enabled(void) +{ + int n, ret = 0; + char *bufp; + DIR *dir; + struct dirent *entry; + FILE *file; + unsigned long nr_hugepages; + char buf[1024]; + + dir = opendir(/sys/kernel/mm/hugepages/); + if (!dir) + return 0; + + while ((entry = readdir(dir))) { + if (strncmp(entry-d_name, hugepages-, 10)) + continue; + + snprintf(buf, sizeof(buf), /sys/kernel/mm/hugepages/%s/nr_hugepages, + entry-d_name); + + file = fopen(buf, r); + if (!file) + continue; + + bufp = fgets(buf, sizeof(buf), file); + fclose(file); + if (!bufp) + continue; + + n = sscanf(buf, %lu, nr_hugepages); + if (n 1) + continue; + + if (nr_hugepages) { + ret = 1; + goto out; + } + } + +out: + closedir(dir); + + return ret; +} + +static unsigned long smaps_page_size(FILE *file) +{ + int n; + unsigned long size = page_size; + char buf[1024]; + + while (fgets(buf, sizeof(buf), file) != NULL) { + if (!strstr(buf, KernelPageSize:)) + continue; + + n = sscanf(buf, %*s %lu, size); + if (n 1) + continue; + + /* page size is printed in Kb */ + size = size * 1024; + + break; + } + + return size; +} + +static unsigned long get_page_size(void *base) +{ + unsigned long ret = page_size; + pid_t pid; + FILE *file; + char buf[1024]; + + pid = getpid(); + snprintf(buf, sizeof(buf), /proc/%d/smaps, pid); + + file = fopen(buf, r); + if (!file) + goto out; + + while (fgets(buf, sizeof(buf), file) != NULL) { + int n; + uintptr_t range_start, range_end; + + n = sscanf(buf, %lx-%lx, range_start, range_end); + + if (n 2) + continue; + + if ((uintptr_t) base = range_start (uintptr_t) base range_end) { + ret = smaps_page_size(file); + break; + } + } + fclose(file); + +out: + return ret; +} + int ibv_fork_init(void) { - void *tmp; + void *tmp, *tmp_aligned; int ret; +
[ewg] ofa_1_5_kernel 20100531-0200 daily build status
This email was generated automatically, please do not reply git_url: git://git.openfabrics.org/ofed_1_5/linux-2.6.git git_branch: ofed_kernel_1_5 Common build parameters: Passed: Passed on i686 with linux-2.6.18 Passed on i686 with linux-2.6.19 Passed on i686 with linux-2.6.21.1 Passed on i686 with linux-2.6.26 Passed on i686 with linux-2.6.24 Passed on i686 with linux-2.6.22 Passed on i686 with linux-2.6.27 Passed on x86_64 with linux-2.6.16.60-0.54.5-smp Passed on x86_64 with linux-2.6.16.60-0.21-smp Passed on x86_64 with linux-2.6.18 Passed on x86_64 with linux-2.6.18-128.el5 Passed on x86_64 with linux-2.6.18-164.el5 Passed on x86_64 with linux-2.6.18-194.el5 Passed on x86_64 with linux-2.6.19 Passed on x86_64 with linux-2.6.18-93.el5 Passed on x86_64 with linux-2.6.21.1 Passed on x86_64 with linux-2.6.20 Passed on x86_64 with linux-2.6.22 Passed on x86_64 with linux-2.6.24 Passed on x86_64 with linux-2.6.26 Passed on x86_64 with linux-2.6.25 Passed on x86_64 with linux-2.6.27 Passed on x86_64 with linux-2.6.27.19-5-smp Passed on x86_64 with linux-2.6.9-67.ELsmp Passed on x86_64 with linux-2.6.9-78.ELsmp Passed on x86_64 with linux-2.6.9-89.ELsmp Passed on ia64 with linux-2.6.18 Passed on ia64 with linux-2.6.19 Passed on ia64 with linux-2.6.21.1 Passed on ia64 with linux-2.6.23 Passed on ia64 with linux-2.6.22 Passed on ia64 with linux-2.6.26 Passed on ia64 with linux-2.6.24 Passed on ia64 with linux-2.6.25 Passed on ppc64 with linux-2.6.18 Passed on ppc64 with linux-2.6.19 Failed: ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] [ANNOUNCE] OFED 1.5.2 rc1 is available
Hi, OFED 1.5.2-rc1 is available Notes: The tarball is available on: http://www.openfabrics.org/downloads/OFED/ofed-1.5.2/OFED-1.5.2-rc1.tgz To get BUILD_ID run ofed_info Please report any issues in bugzilla https://bugs.openfabrics.org/ for OFED 1.5.2 Vladimir Tziporet Supported Platforms and Operating Systems - o CPU architectures: - x86_64 - x86 - ppc64 - ia64 o Linux Operating Systems: - RedHat EL4 up72.6.9-78.ELsmp - RedHat EL4 up82.6.9-89.ELsmp - RedHat EL5 up32.6.18-128.el5 - RedHat EL5 up42.6.18-164.el5 - RedHat EL5 up52.6.18-194.el5 - SLES10 SP22.6.16.60-0.21-smp - SLES10 SP32.6.16.60-0.54-smp - SLES112.6.27.19-5-default - OEL 4 up7 2.6.9-78.ELsmp - OEL 4 up8 2.6.9-89.ELsmp - CentOS5.3 2.6.18-128.el5 - CentOS5.4 2.6.18-164.el5 - Fedora Core12 2.6.31.5-127.fc12* - OpenSuSE 11.2 2.6.31.5-0.1-default * - kernel.org2.6.29, 2.6.30, 2.6.31 and 2.6.32* * Minimal QA for these versions Main changes from 1.5.1: === 1. Updated packages: libibverbs-1.1.3-0.8.g4d733f4.tar libehca-1.2.2-0.1.g69e1a88.tar.gz libnes-1.0.1-0.3.g8d69734.tar.gz compat-dapl-1.2.17.tar.gz librdmacm-1.0.12.tar.gz dapl-2.0.28.tar.gz - Management infiniband-diags-1.5.6.tar.gz libibmad-1.3.5.tar.gz libibumad-1.3.5.tar.gz opensm-3.3.6.tar.gz - MPI openmpi-1.4.2-1.src.rpm 2. Added RHEL 6 beta support Known issues: = librdmacm-1.0.12 compilation fails on RHEL4.x commit b2f53cd53d470604bc002699ba905ee23892fa17 Author: Eldad Zinger eld...@mellanox.co.il Date: Sun May 23 11:40:02 2010 +0300 sdp: new debug function added, minor debug message change. Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit 41cc7a8a7805a6ce8ce8028fc8d9742867b0506c Author: Eldad Zinger eld...@mellanox.co.il Date: Sun May 30 14:03:43 2010 +0300 sdp: device removal rewritten for a stability improvement. main changes: 1. device_removal_lock is better used. 2. sdp_dev is marked NULL in order to prevent new sockets born to the removed device. 3. new timeout functionality used when a reference count was taken for the CMA to return, but the CMA won't be invoked because rdma_id was destroyed. Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit d96f1205fd59992c82fc64b63d3fcaa328976794 Author: Eldad Zinger eld...@mellanox.co.il Date: Tue May 18 13:35:56 2010 +0300 sdp: unnecessary local variable removed, 'const' declarations added Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit 9ad5e04ed192cf5daae49864cf4da54b89701900 Author: Eldad Zinger eld...@mellanox.co.il Date: Tue May 18 13:32:27 2010 +0300 sdp: tx timer is deleted when sockets goes to TCP_CLOSE Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit 314569eb0bf4cf8fb40af7b5e82cbc362d001858 Author: Eldad Zinger eld...@mellanox.co.il Date: Tue May 18 11:48:54 2010 +0300 sdp: canceled a call to sdp_desroy_work() on send completion with error No need to destroy resources after send completion with error. Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit ba9dbc39149fb34fab383b190ca03c6d1017a92d Author: Eldad Zinger eld...@mellanox.co.il Date: Tue May 18 11:34:28 2010 +0300 sdp: unnecessary wait-queue removed from sdp_sock structure. Signed-off-by: Eldad Zinger eld...@mellanox.co.il commit edda276773fd8f68d374990c2d1e91545ad2a5c0 Author: Eli Cohen e...@mellanox.co.il Date: Thu May 27 11:06:19 2010 +0300 Backport: Add missing dev_id sysfs file Signed-off-by: Eli Cohen e...@mellanox.co.il commit dcfc4bb33fa2b8f5498fb6e1966af8731a28b6f7 Author: Eli Cohen e...@mellanox.co.il Date: Thu May 27 09:59:15 2010 +0300 Backport dev_id to 2.6.9 Since dev_id does not exsist as a member of struct net_device in kernel 2.6.9, avoid using it in mlx4_en. I will add means to read dev_id in subsequent patches. Signed-off-by: Eli Cohen e...@mellanox.co.il commit 977323bd115b8737604393fcf1af144fcc205507 Author: Eli Cohen e...@mellanox.co.il Date: Thu May 27 09:02:02 2010 +0300 mlx4_en: use net_device dev_id to indicate port number Today, there are no means to know which port of a hardware device a netdev interface uses. struct net_device conatins a field, dev_id, that can be used for that. Use this field to save the port number in ConnectX that is being used by the net device; port