[OMPI devel] 1.7.5 fails on simple test
*$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,tcp /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi [vegas12:12724] *** Process received signal *** [vegas12:12724] Signal: Segmentation fault (11) [vegas12:12724] Signal code: (128) [vegas12:12724] Failing at address: (nil) [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] [vegas12:12724] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] [vegas12:12724] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] [vegas12:12724] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] [vegas12:12724] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] [vegas12:12724] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8] [vegas12:12724] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0] [vegas12:12724] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb] [vegas12:12724] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210] [vegas12:12724] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25] [vegas12:12724] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b] [vegas12:12724] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a] [vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd] [vegas12:12724] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29] [vegas12:12724] *** End of error message *** [vegas12:12731] *** Process received signal *** [vegas12:12731] Signal: Segmentation fault (11) [vegas12:12731] Signal code: (128) [vegas12:12731] Failing at address: (nil) [vegas12:12731] [ 0] /lib64/libpthread.so.0[0x3937c0f500] [vegas12:12731] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] [vegas12:12731] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] [vegas12:12731] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] [vegas12:12731] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] [vegas12:12731] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8] [vegas12:12731] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0] [vegas12:12731] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb] [vegas12:12731] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210] [vegas12:12731] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25] [vegas12:12731] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b] [vegas12:12731] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a] [vegas12:12731] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd] [vegas12:12731] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29] [vegas12:12731] *** End of error message *** -- mpirun noticed that process rank 0 with PID 12724 on node vegas12 exited on signal 11 (Segmentation fault). -- jenkins@vegas12 ~ *
Re: [OMPI devel] Compilation error: 'OMPI_MPIHANDLES_DLL_PREFIX' undeclared
It is a compilation flag passes through the Makefile (when automake is used). I guess you will have to modify the CMake to pass it as well. You need to for the compilation of the ompi/debuggers/ompi_debuggers.c and should point to the location of the installed libraries. George. On Feb 10, 2014, at 03:36 , Irvanda Kurniadi wrote: > Hi, > > I'm porting OpenMPI to L4/fiasco. I found this error message while compiling > OpenMPI: > error: ‘OMPI_MPIHANDLES_DLL_PREFIX’ undeclared (first use in this function) > error: ‘OMPI_MSGQ_DLL_PREFIX’ undeclared (first use in this function) > > I found the OMPI_MPIHANDLES_DLL_PREFIX in CMakelist.txt like below: > SET_TARGET_PROPERTIES(libmpi PROPERTIES COMPILE_FLAGS > "${OMPI_C_DEF_PRE}OMPI_MPIHANDLES_DLL_PREFIX=libompi_dbg_mpihandles > ${OMPI_C_DEF_PRE}OMPI_MSGQ_DLL_PREFIX=libompi_dbg_msgq") > > I don't know how to use this CMakelist.txt in L4/fiasco. Or maybe this > problem can be fixed without CMakelist.txt. Anybody knows how to overcome > this problem? > > regards, > Irvanda > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Compilation error: 'OMPI_MPIHANDLES_DLL_PREFIX' undeclared
Note that we have removed all CMake support from Open MPI starting with v1.7. Is there a reason you're using the CMake support instead of the Autotools support? We only had the CMake support there for MS Windows support, which has been removed (which is why the CMake support was removed). On Feb 9, 2014, at 9:36 PM, Irvanda Kurniadi wrote: > Hi, > > I'm porting OpenMPI to L4/fiasco. I found this error message while compiling > OpenMPI: > error: ‘OMPI_MPIHANDLES_DLL_PREFIX’ undeclared (first use in this function) > error: ‘OMPI_MSGQ_DLL_PREFIX’ undeclared (first use in this function) > > I found the OMPI_MPIHANDLES_DLL_PREFIX in CMakelist.txt like below: > SET_TARGET_PROPERTIES(libmpi PROPERTIES COMPILE_FLAGS > "${OMPI_C_DEF_PRE}OMPI_MPIHANDLES_DLL_PREFIX=libompi_dbg_mpihandles > ${OMPI_C_DEF_PRE}OMPI_MSGQ_DLL_PREFIX=libompi_dbg_msgq") > > I don't know how to use this CMakelist.txt in L4/fiasco. Or maybe this > problem can be fixed without CMakelist.txt. Anybody knows how to overcome > this problem? > > regards, > Irvanda > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.7.5 fails on simple test
I have seen this same issue although my core dump is a little bit different. I am running with tcp,self. The first entry in the list of BTLs is garbage, but then there is tcp and self in the list. Strange. This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. Program terminated with signal 11, Segmentation fault. #0 0x7fb6dec981d0 in ?? () Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64 (gdb) where #0 0x7fb6dec981d0 in ?? () #1 #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40) at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, requested=0, provided=0x7fff80487cc8) at ../../ompi/runtime/ompi_mpi_init.c:776 #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) at pinit.c:84 #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at MPI_Isend_ator_c.c:143 (gdb) #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40) at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, btl_endpoints, reachable); (gdb) print *btl $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, btl_rndv_eager_limit = 140423556235000, btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 140423556235016, btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 140423556235032, btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 3895459624, btl_flags = 32694, btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 , btl_del_procs = 0x7fb6e82fff38 , btl_register = 0x7fb6e82fff48 , btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 , btl_free = 0x7fb6e82fff58 , btl_prepare_src = 0x7fb6e82fff68 , btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 0x7fb6e82fff78 , btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 , btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 , btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 , btl_ft_event = 0x7fb6e82fffa8 } (gdb) From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman Sent: Monday, February 10, 2014 4:23 AM To: Open MPI Developers Subject: [OMPI devel] 1.7.5 fails on simple test $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,tcp /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi [vegas12:12724] *** Process received signal *** [vegas12:12724] Signal: Segmentation fault (11) [vegas12:12724] Signal code: (128) [vegas12:12724] Failing at address: (nil) [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] [vegas12:12724] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] [vegas12:12724] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] [vegas12:12724] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] [vegas12:12724] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] [vegas12:12724] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8] [vegas12:12724] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0] [vegas12:12724] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb] [vegas12:12724] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210] [vegas12:12724] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25] [vegas12:12724] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b] [vegas12:12724] [11] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a] [vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd] [vegas12:12724] [13] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/e
[OMPI devel] Reviewing MPI_Dims_create
Hello, I noticed some effort in improving the scalability of MPI_Dims_create(int nnodes, int ndims, int dims[]) Unfortunately there were some issues with the first attempt (r30539 and r30540) which were reverted. So I decided to give it a short review based on r30606 https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 1.) freeprocs is initialized to be nnodes and the subsequent divisions of freeprocs have all positive integers as divisor. So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK section at the begin instead of the following (see patch 0001): 99 if (freeprocs < 1) { 100return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, 101 FUNC_NAME); 102 } 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int *nprimes, int **pprimes) which makes mathematically more sens (as the largest prime factor of any number n cannot exceed \sqrt{n}) - and should produce the right result. ;) (see patch 0002) Here the improvements: module load mpi/openmpi/trunk-gnu.4.7.3 $ ./mpi-dims-old 100 time used for MPI_Dims_create(100, 3, {}): 8.104007 module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing $ ./mpi-dims-new 100 time used for MPI_Dims_create(100, 3, {}): 0.060400 3.) Memory allocation for the list of prime numbers may be reduced up to a factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap (pprof) top Total: -1.6 MB 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main 0.0 -0.0% -18.8% -1.6 100.0% main -1.9 118.8% 100.0% -1.9 118.8% getprimes Find attached patch for it in 0003. If there are no issues I would like to commit this to trunk for further testing (+cmr for 1.7.5?) end of this week. Best regards Christoph [1] http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html -- Christoph Niethammer High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19 70569 Stuttgart Tel: ++49(0)711-685-87203 email: nietham...@hlrs.de http://www.hlrs.de/people/niethammerFrom e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001 From: Christoph Niethammer Date: Mon, 10 Feb 2014 16:44:03 +0100 Subject: [PATCH 1/3] Move parameter check into appropriate code section at the begin. --- ompi/mpi/c/dims_create.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c index d2c3858..3d0792f 100644 --- a/ompi/mpi/c/dims_create.c +++ b/ompi/mpi/c/dims_create.c @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, MPI_ERR_DIMS, FUNC_NAME); } + +if (1 > nnodes) { +return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, + MPI_ERR_DIMS, FUNC_NAME); +} } /* Get # of free-to-be-assigned processes and # of free dimensions */ @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) FUNC_NAME); } -if (freeprocs < 1) { - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, - FUNC_NAME); -} -else if (freeprocs == 1) { +if (freeprocs == 1) { for (i = 0; i < ndims; ++i, ++dims) { if (*dims == 0) { *dims = 1; -- 1.8.3.2 From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001 From: Christoph Niethammer Date: Mon, 10 Feb 2014 18:50:51 +0100 Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact that the largest prime factor of any number n cannot exceed \sqrt{n}. --- ompi/mpi/c/dims_create.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c index 3d0792f..1c1c381 100644 --- a/ompi/mpi/c/dims_create.c +++ b/ompi/mpi/c/dims_create.c @@ -5,7 +5,7 @@ * Copyright (c) 2004-2005 The University of Tennessee and The University * of Tennessee Research Foundation. All rights * reserved. - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, * University of Stuttgart. All rights reserved. * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. @@ -20,6 +20,8 @@ #in
Re: [OMPI devel] Reviewing MPI_Dims_create
Nice! Can you verify that it passes the ibm test? I didn't look closely, and to be honest, I'm not sure why the previous improvement broke the IBM test because it hypothetically did what you mentioned (stopped at sqrt(freenodes)). I think patch 1 is a no-brainer. I'm not sure about #2 because I'm not sure how it's different than the previous one, nor did I have time to investigate why the previous one broke the IBM test. #3 seems like a good idea, too; I did't check the paper, but I assume it's some kind of proof about the upper limit on the number of primes in a given range. Two questions: 1. Should we cache generated prime numbers? (if so, it'll have to be done in a thread-safe way) 2. Should we just generate prime numbers and hard-code them into a table that is compiled into the code? We would only need primes up to the sqrt of 2billion (i.e., signed int), right? I don't know how many that is -- if it's small enough, perhaps this is the easiest solution. On Feb 10, 2014, at 1:30 PM, Christoph Niethammer wrote: > Hello, > > I noticed some effort in improving the scalability of > MPI_Dims_create(int nnodes, int ndims, int dims[]) > Unfortunately there were some issues with the first attempt (r30539 and > r30540) which were reverted. > > So I decided to give it a short review based on r30606 > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of > freeprocs have all positive integers as divisor. > So IMHO it would make more sense to check if nnodes > 0 in the > MPI_PARAM_CHECK section at the begin instead of the following (see patch > 0001): > > 99if (freeprocs < 1) { > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > 101FUNC_NAME); > 102 } > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int > *nprimes, int **pprimes) > which makes mathematically more sens (as the largest prime factor of any > number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > (see patch 0002) > Here the improvements: > > module load mpi/openmpi/trunk-gnu.4.7.3 > $ ./mpi-dims-old 100 > time used for MPI_Dims_create(100, 3, {}): 8.104007 > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > $ ./mpi-dims-new 100 > time used for MPI_Dims_create(100, 3, {}): 0.060400 > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as > reported by tcmalloc/pprof on a test program - but it may sum up with fatter > nodes. :P > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > (pprof) top > Total: -1.6 MB > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > 0.0 -0.0% -18.8% -1.6 100.0% main >-1.9 118.8% 100.0% -1.9 118.8% getprimes > > Find attached patch for it in 0003. > > > If there are no issues I would like to commit this to trunk for further > testing (+cmr for 1.7.5?) end of this week. > > Best regards > Christoph > > [1] > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html > > > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer<0001-Move-parameter-check-into-appropriate-code-section-a.patch><0002-Speeding-up-detection-of-prime-numbers-using-the-fac.patch><0003-Reduce-memory-usage-by-a-better-approximation-for-th.patch>___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] 1.7.5 fails on simple test
I have tracked this down. There is a missing commit that affects ompi_mpi_init.c causing it to initialize bml twice. Ralph, can you apply r30310 to 1.7? Thanks, Rolf From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart Sent: Monday, February 10, 2014 12:29 PM To: Open MPI Developers Subject: Re: [OMPI devel] 1.7.5 fails on simple test I have seen this same issue although my core dump is a little bit different. I am running with tcp,self. The first entry in the list of BTLs is garbage, but then there is tcp and self in the list. Strange. This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. Program terminated with signal 11, Segmentation fault. #0 0x7fb6dec981d0 in ?? () Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64 (gdb) where #0 0x7fb6dec981d0 in ?? () #1 #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40) at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, requested=0, provided=0x7fff80487cc8) at ../../ompi/runtime/ompi_mpi_init.c:776 #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) at pinit.c:84 #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at MPI_Isend_ator_c.c:143 (gdb) #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, reachable=0x7fff80487b40) at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, btl_endpoints, reachable); (gdb) print *btl $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, btl_rndv_eager_limit = 140423556235000, btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 140423556235016, btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 140423556235032, btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 3895459624, btl_flags = 32694, btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 , btl_del_procs = 0x7fb6e82fff38 , btl_register = 0x7fb6e82fff48 , btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 , btl_free = 0x7fb6e82fff58 , btl_prepare_src = 0x7fb6e82fff68 , btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 0x7fb6e82fff78 , btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 , btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 , btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 , btl_ft_event = 0x7fb6e82fffa8 } (gdb) From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman Sent: Monday, February 10, 2014 4:23 AM To: Open MPI Developers Subject: [OMPI devel] 1.7.5 fails on simple test $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun -np 8 -mca pml ob1 -mca btl self,tcp /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi [vegas12:12724] *** Process received signal *** [vegas12:12724] Signal: Segmentation fault (11) [vegas12:12724] Signal code: (128) [vegas12:12724] Failing at address: (nil) [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] [vegas12:12724] [ 1] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] [vegas12:12724] [ 2] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] [vegas12:12724] [ 3] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] [vegas12:12724] [ 4] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] [vegas12:12724] [ 5] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8] [vegas12:12724] [ 6] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0] [vegas12:12724] [ 7] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb] [vegas12:12724] [ 8] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210] [vegas12:12724] [ 9] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25] [vegas12:12724] [10] /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-n
Re: [OMPI devel] 1.7.5 fails on simple test
Done - thanks Rolf!! On Feb 10, 2014, at 1:13 PM, Rolf vandeVaart wrote: > I have tracked this down. There is a missing commit that affects > ompi_mpi_init.c causing it to initialize bml twice. > Ralph, can you apply r30310 to 1.7? > > Thanks, > Rolf > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart > Sent: Monday, February 10, 2014 12:29 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.7.5 fails on simple test > > I have seen this same issue although my core dump is a little bit different. > I am running with tcp,self. The first entry in the list of BTLs is garbage, > but then there is tcp and self in the list. Strange. This is my core dump. > Line 208 in bml_r2.c is where I get the SEGV. > > Program terminated with signal 11, Segmentation fault. > #0 0x7fb6dec981d0 in ?? () > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6_4.5.x86_64 > (gdb) where > #0 0x7fb6dec981d0 in ?? () > #1 > #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) > at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 > #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, > requested=0, provided=0x7fff80487cc8) > at ../../ompi/runtime/ompi_mpi_init.c:776 > #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, > argv=0x7fff80487d80) at pinit.c:84 > #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at > MPI_Isend_ator_c.c:143 > (gdb) > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, > btl_endpoints, reachable); > (gdb) print *btl > $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, > btl_rndv_eager_limit = 140423556235000, > btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = > 140423556235016, > btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = > 140423556235032, > btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = > 3895459624, btl_flags = 32694, > btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 > , > btl_del_procs = 0x7fb6e82fff38 , btl_register = > 0x7fb6e82fff48 , > btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 > , > btl_free = 0x7fb6e82fff58 , btl_prepare_src = > 0x7fb6e82fff68 , > btl_prepare_dst = 0x7fb6e82fff68 , btl_send = > 0x7fb6e82fff78 , > btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 > , > btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 > , > btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 > , > btl_ft_event = 0x7fb6e82fffa8 } > (gdb) > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman > Sent: Monday, February 10, 2014 4:23 AM > To: Open MPI Developers > Subject: [OMPI devel] 1.7.5 fails on simple test > > > > $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun > -np 8 -mca pml ob1 -mca btl self,tcp > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi > [vegas12:12724] *** Process received signal *** > [vegas12:12724] Signal: Segmentation fault (11) > [vegas12:12724] Signal code: (128) > [vegas12:12724] Failing at address: (nil) > [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] > [vegas12:12724] [ 1] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] > [vegas12:12724] [ 2] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] > [vegas12:12724] [ 3] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] > [vegas12:12724] [ 4] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] > [vegas12:12724] [ 5] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8] > [vegas12:12724] [ 6] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0] > [vegas12:12724] [ 7] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb] > [vegas12:12724] [ 8] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MP
Re: [OMPI devel] 1.7.5 fails on simple test
Ralph, If you give me a heads-up when this makes it into a tarball, I will retest my failing ppc and sparc platforms. -Paul On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart wrote: > I have tracked this down. There is a missing commit that affects > ompi_mpi_init.c causing it to initialize bml twice. > > Ralph, can you apply r30310 to 1.7? > > > > Thanks, > > Rolf > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf > vandeVaart > *Sent:* Monday, February 10, 2014 12:29 PM > *To:* Open MPI Developers > *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test > > > > I have seen this same issue although my core dump is a little bit > different. I am running with tcp,self. The first entry in the list of > BTLs is garbage, but then there is tcp and self in the list. Strange. > This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. > > > > Program terminated with signal 11, Segmentation fault. > > #0 0x7fb6dec981d0 in ?? () > > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6_4.5.x86_64 > > (gdb) where > > #0 0x7fb6dec981d0 in ?? () > > #1 > > #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 > > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > > #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) > > at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 > > #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, > requested=0, provided=0x7fff80487cc8) > > at ../../ompi/runtime/ompi_mpi_init.c:776 > > #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, > argv=0x7fff80487d80) at pinit.c:84 > > #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at > MPI_Isend_ator_c.c:143 > > (gdb) > > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > > 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, > btl_endpoints, reachable); > > (gdb) print *btl > > $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, > btl_rndv_eager_limit = 140423556235000, > > btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = > 140423556235016, > > btl_rdma_pipeline_frag_size = 140423556235016, > btl_min_rdma_pipeline_size = 140423556235032, > > btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = > 3895459624, btl_flags = 32694, > > btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 > , > > btl_del_procs = 0x7fb6e82fff38 , btl_register = > 0x7fb6e82fff48 , > > btl_finalize = 0x7fb6e82fff48 , btl_alloc = > 0x7fb6e82fff58 , > > btl_free = 0x7fb6e82fff58 , btl_prepare_src = > 0x7fb6e82fff68 , > > btl_prepare_dst = 0x7fb6e82fff68 , btl_send = > 0x7fb6e82fff78 , > > btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 > , > > btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 > , > > btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 > , > > btl_ft_event = 0x7fb6e82fffa8 } > > (gdb) > > > > > > *From:* devel [mailto:devel-boun...@open-mpi.org] > *On Behalf Of *Mike Dubman > *Sent:* Monday, February 10, 2014 4:23 AM > *To:* Open MPI Developers > *Subject:* [OMPI devel] 1.7.5 fails on simple test > > > > > > > > *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun > -np 8 -mca pml ob1 -mca btl self,tcp > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi* > > *[vegas12:12724] *** Process received signal > > *[vegas12:12724] Signal: Segmentation fault (11)* > > *[vegas12:12724] Signal code: (128)* > > *[vegas12:12724] Failing at address: (nil)* > > *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]* > > *[vegas12:12724] [ 1] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]* > > *[vegas12:12724] [ 2] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]* > > *[vegas12:12724] [ 3] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]* > > *[vegas12:12724] [ 4] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]* > > *[vegas12:12724] [ 5] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]* > > *[vegas12:12724] [ 6] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]* > > *[vegas12:12724] [ 7] > /scrap/jenkins/scr
Re: [OMPI devel] 1.7.5 fails on simple test
Generating it now - sorry for my lack of response, my OMPI email was down for some reason. I can now receive it, but still haven't gotten the backlog from the down period. On Feb 10, 2014, at 1:23 PM, Paul Hargrove wrote: > Ralph, > > If you give me a heads-up when this makes it into a tarball, I will retest my > failing ppc and sparc platforms. > > -Paul > > > On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart > wrote: > I have tracked this down. There is a missing commit that affects > ompi_mpi_init.c causing it to initialize bml twice. > > Ralph, can you apply r30310 to 1.7? > > > > Thanks, > > Rolf > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart > Sent: Monday, February 10, 2014 12:29 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.7.5 fails on simple test > > > > I have seen this same issue although my core dump is a little bit different. > I am running with tcp,self. The first entry in the list of BTLs is garbage, > but then there is tcp and self in the list. Strange. This is my core dump. > Line 208 in bml_r2.c is where I get the SEGV. > > > > Program terminated with signal 11, Segmentation fault. > > #0 0x7fb6dec981d0 in ?? () > > Missing separate debuginfos, use: debuginfo-install > glibc-2.12-1.107.el6_4.5.x86_64 > > (gdb) where > > #0 0x7fb6dec981d0 in ?? () > > #1 > > #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 > > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > > #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) > > at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 > > #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, > requested=0, provided=0x7fff80487cc8) > > at ../../ompi/runtime/ompi_mpi_init.c:776 > > #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, > argv=0x7fff80487d80) at pinit.c:84 > > #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at > MPI_Isend_ator_c.c:143 > > (gdb) > > #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, > reachable=0x7fff80487b40) > > at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 > > 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, > btl_endpoints, reachable); > > (gdb) print *btl > > $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, > btl_rndv_eager_limit = 140423556235000, > > btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = > 140423556235016, > > btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = > 140423556235032, > > btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = > 3895459624, btl_flags = 32694, > > btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 > , > > btl_del_procs = 0x7fb6e82fff38 , btl_register = > 0x7fb6e82fff48 , > > btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 > , > > btl_free = 0x7fb6e82fff58 , btl_prepare_src = > 0x7fb6e82fff68 , > > btl_prepare_dst = 0x7fb6e82fff68 , btl_send = > 0x7fb6e82fff78 , > > btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 > , > > btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 > , > > btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 > , > > btl_ft_event = 0x7fb6e82fffa8 } > > (gdb) > > > > > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman > Sent: Monday, February 10, 2014 4:23 AM > To: Open MPI Developers > Subject: [OMPI devel] 1.7.5 fails on simple test > > > > > > $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun > -np 8 -mca pml ob1 -mca btl self,tcp > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi > [vegas12:12724] *** Process received signal *** > [vegas12:12724] Signal: Segmentation fault (11) > [vegas12:12724] Signal code: (128) > [vegas12:12724] Failing at address: (nil) > [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] > [vegas12:12724] [ 1] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] > [vegas12:12724] [ 2] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] > [vegas12:12724] [ 3] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] > [vegas12:12724] [ 4] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9] > [vegas12:12724] [ 5] > /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install
Re: [OMPI devel] 1.7.5 fails on simple test
Tarball is now posted On Feb 10, 2014, at 1:31 PM, Ralph Castain wrote: > Generating it now - sorry for my lack of response, my OMPI email was down for > some reason. I can now receive it, but still haven't gotten the backlog from > the down period. > > > On Feb 10, 2014, at 1:23 PM, Paul Hargrove wrote: > >> Ralph, >> >> If you give me a heads-up when this makes it into a tarball, I will retest >> my failing ppc and sparc platforms. >> >> -Paul >> >> >> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart >> wrote: >> I have tracked this down. There is a missing commit that affects >> ompi_mpi_init.c causing it to initialize bml twice. >> >> Ralph, can you apply r30310 to 1.7? >> >> >> >> Thanks, >> >> Rolf >> >> >> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart >> Sent: Monday, February 10, 2014 12:29 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] 1.7.5 fails on simple test >> >> >> >> I have seen this same issue although my core dump is a little bit different. >> I am running with tcp,self. The first entry in the list of BTLs is >> garbage, but then there is tcp and self in the list. Strange. This is my >> core dump. Line 208 in bml_r2.c is where I get the SEGV. >> >> >> >> Program terminated with signal 11, Segmentation fault. >> >> #0 0x7fb6dec981d0 in ?? () >> >> Missing separate debuginfos, use: debuginfo-install >> glibc-2.12-1.107.el6_4.5.x86_64 >> >> (gdb) where >> >> #0 0x7fb6dec981d0 in ?? () >> >> #1 >> >> #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 >> >> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, >> reachable=0x7fff80487b40) >> >> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >> >> #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2) >> >> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 >> >> #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, >> requested=0, provided=0x7fff80487cc8) >> >> at ../../ompi/runtime/ompi_mpi_init.c:776 >> >> #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, >> argv=0x7fff80487d80) at pinit.c:84 >> >> #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at >> MPI_Isend_ator_c.c:143 >> >> (gdb) >> >> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, >> reachable=0x7fff80487b40) >> >> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >> >> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, >> btl_endpoints, reachable); >> >> (gdb) print *btl >> >> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, >> btl_rndv_eager_limit = 140423556235000, >> >> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = >> 140423556235016, >> >> btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size >> = 140423556235032, >> >> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = >> 3895459624, btl_flags = 32694, >> >> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 >> , >> >> btl_del_procs = 0x7fb6e82fff38 , btl_register = >> 0x7fb6e82fff48 , >> >> btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 >> , >> >> btl_free = 0x7fb6e82fff58 , btl_prepare_src = >> 0x7fb6e82fff68 , >> >> btl_prepare_dst = 0x7fb6e82fff68 , btl_send = >> 0x7fb6e82fff78 , >> >> btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 >> , >> >> btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 >> , >> >> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 >> , >> >> btl_ft_event = 0x7fb6e82fffa8 } >> >> (gdb) >> >> >> >> >> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman >> Sent: Monday, February 10, 2014 4:23 AM >> To: Open MPI Developers >> Subject: [OMPI devel] 1.7.5 fails on simple test >> >> >> >> >> >> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun >> -np 8 -mca pml ob1 -mca btl self,tcp >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi >> [vegas12:12724] *** Process received signal *** >> [vegas12:12724] Signal: Segmentation fault (11) >> [vegas12:12724] Signal code: (128) >> [vegas12:12724] Failing at address: (nil) >> [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500] >> [vegas12:12724] [ 1] >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813] >> [vegas12:12724] [ 2] >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7] >> [vegas12:12724] [ 3] >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2] >> [vegas12:12724] [ 4] >
Re: [OMPI devel] Speedup for MPI_Dims_create()
Jeff- I've seen that you've reverted the patch as it was faulty. Sorry about that! I've attached a new patch, which applies against the current trunk. The problem with the last patch was that it didn't catch a special case: of all prime factors of n, there may be at most one larger than sqrt(n). The old patch assumed that there was none. I've included a comment in the source code so that this becomes clear for later readers. The attached patch is more complicated than the original code, as we now need to calculate the prime numbers and the number of their occurrences in the integer factorization simultaneously. We can't split both (as in the trunk) anymore, as the last prime might only be discovered during the original getfactors(). I've tested this code back to back with the original code with 1...1 nodes and 1...6 dimensions, just to be on the sure side this time. Best -Andreas On 19:32 Mon 03 Feb , Jeff Squyres (jsquyres) wrote: > Andreas -- > > I added the sqrt() change, which is the most important change, and then did a > 2nd commit with the whitespace cleanup. The sqrt change will likely be in > 1.7.5. I credited you in the commit log; you'll likely also get credited in > NEWS. > > Thank you for the patch! > > > On Dec 19, 2013, at 9:37 AM, Andreas Schäfer wrote: > > > Dear all, > > > > please find attached a (trivial) patch to MPI_Dims_create(). When > > computing the prime factors of nnodes, it is sufficient to check for > > primes less or equal to sqrt(nnodes). > > > > This was not so much of a problem in the past, but now that Tier 0 > > systems are capable of running O(10^6) MPI processes, the difference > > in execution time is on the order of seconds (e.g. 8.86s vs. 0.04s on > > my notebook, with nnproc = 10^6). > > > > Best > > -Andreas > > > > PS: oh, and the patch removes some trailing whitespace. Yuck. :-) > > > > > > -- > > == > > Andreas Schäfer > > HPC and Grid Computing > > Chair of Computer Science 3 > > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany > > +49 9131 85-27910 > > PGP/GPG key via keyserver > > http://www.libgeodecomp.org > > == > > > > (\___/) > > (+'.'+) > > (")_(") > > This is Bunny. Copy and paste Bunny into your > > signature to help him gain world domination! > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Andreas Schäfer HPC and Grid Computing Chair of Computer Science 3 Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany +49 9131 85-27910 PGP/GPG key via keyserver http://www.libgeodecomp.org == (\___/) (+'.'+) (")_(") This is Bunny. Copy and paste Bunny into your signature to help him gain world domination! Index: ompi/mpi/c/dims_create.c === --- ompi/mpi/c/dims_create.c (revision 30654) +++ ompi/mpi/c/dims_create.c (working copy) @@ -10,7 +10,9 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2012 Los Alamos National Security, LLC. All rights - * reserved. + * reserved. + * Copyright (c) 2014 Friedrich-Alexander-Universitaet Erlangen-Nuernberg, + * All rights reserved. * $COPYRIGHT$ * * Additional copyrights may follow @@ -20,6 +22,8 @@ #include "ompi_config.h" +#include + #include "ompi/mpi/c/bindings.h" #include "ompi/runtime/params.h" #include "ompi/communicator/communicator.h" @@ -37,8 +41,7 @@ /* static functions */ static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims); -static int getfactors(int num, int nprime, int *primes, int **pcounts); -static int getprimes(int num, int *pnprime, int **pprimes); +static int getprimefactors(int num, int *nfactors, int **pprimes, int **pcounts); /* @@ -50,7 +53,7 @@ int i; int freeprocs; int freedims; -int nprimes; +int nfactors; int *primes; int *factors; int *procs; @@ -108,20 +111,14 @@ return MPI_SUCCESS; } -/* Compute the relevant prime numbers for factoring */ -if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) { - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err, - FUNC_NAME); -} - /* Factor the number of free processes */
Re: [OMPI devel] Reviewing MPI_Dims_create
Christoph- your patch has the same problem as my original patch: indeed there may be a prime factor p of n with p > sqrt(n). What's important is that there may only be at most one. I've submitted an updated patch (see my previous mail) which catches this special case. Best -Andreas On 19:30 Mon 10 Feb , Christoph Niethammer wrote: > Hello, > > I noticed some effort in improving the scalability of > MPI_Dims_create(int nnodes, int ndims, int dims[]) > Unfortunately there were some issues with the first attempt (r30539 and > r30540) which were reverted. > > So I decided to give it a short review based on r30606 > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of > freeprocs have all positive integers as divisor. > So IMHO it would make more sense to check if nnodes > 0 in the > MPI_PARAM_CHECK section at the begin instead of the following (see patch > 0001): > > 99if (freeprocs < 1) { > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > 101FUNC_NAME); > 102 } > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int > *nprimes, int **pprimes) > which makes mathematically more sens (as the largest prime factor of any > number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > (see patch 0002) > Here the improvements: > > module load mpi/openmpi/trunk-gnu.4.7.3 > $ ./mpi-dims-old 100 > time used for MPI_Dims_create(100, 3, {}): 8.104007 > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > $ ./mpi-dims-new 100 > time used for MPI_Dims_create(100, 3, {}): 0.060400 > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as > reported by tcmalloc/pprof on a test program - but it may sum up with fatter > nodes. :P > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > (pprof) top > Total: -1.6 MB > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > 0.0 -0.0% -18.8% -1.6 100.0% main > -1.9 118.8% 100.0% -1.9 118.8% getprimes > > Find attached patch for it in 0003. > > > If there are no issues I would like to commit this to trunk for further > testing (+cmr for 1.7.5?) end of this week. > > Best regards > Christoph > > [1] > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html > > > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer > From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001 > From: Christoph Niethammer > Date: Mon, 10 Feb 2014 16:44:03 +0100 > Subject: [PATCH 1/3] Move parameter check into appropriate code section at the > begin. > > --- > ompi/mpi/c/dims_create.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > index d2c3858..3d0792f 100644 > --- a/ompi/mpi/c/dims_create.c > +++ b/ompi/mpi/c/dims_create.c > @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, > MPI_ERR_DIMS, FUNC_NAME); > } > + > +if (1 > nnodes) { > +return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, > + MPI_ERR_DIMS, FUNC_NAME); > +} > } > > /* Get # of free-to-be-assigned processes and # of free dimensions */ > @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[]) > FUNC_NAME); > } > > -if (freeprocs < 1) { > - return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > - FUNC_NAME); > -} > -else if (freeprocs == 1) { > +if (freeprocs == 1) { > for (i = 0; i < ndims; ++i, ++dims) { > if (*dims == 0) { > *dims = 1; > -- > 1.8.3.2 > > From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001 > From: Christoph Niethammer > Date: Mon, 10 Feb 2014 18:50:51 +0100 > Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact > that the largest prime factor of any number n cannot exceed \sqrt{n}. > > --- > ompi/mpi/c/dims_create.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c > index 3d0792f..1c1c381 100644 > --- a/ompi/mpi/c/dims_create.c
Re: [OMPI devel] 1.7.5 fails on simple test
The fastest of my systems that failed over the weekend (a ppc64) has completed tests successfully. I will report on the ppc32 and SPARC results when they have all passed or failed. -Paul On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain wrote: > Tarball is now posted > > On Feb 10, 2014, at 1:31 PM, Ralph Castain wrote: > > Generating it now - sorry for my lack of response, my OMPI email was down > for some reason. I can now receive it, but still haven't gotten the backlog > from the down period. > > > On Feb 10, 2014, at 1:23 PM, Paul Hargrove wrote: > > Ralph, > > If you give me a heads-up when this makes it into a tarball, I will retest > my failing ppc and sparc platforms. > > -Paul > > > On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart > wrote: > >> I have tracked this down. There is a missing commit that affects >> ompi_mpi_init.c causing it to initialize bml twice. >> >> Ralph, can you apply r30310 to 1.7? >> >> >> >> Thanks, >> >> Rolf >> >> >> >> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf >> vandeVaart >> *Sent:* Monday, February 10, 2014 12:29 PM >> *To:* Open MPI Developers >> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test >> >> >> >> I have seen this same issue although my core dump is a little bit >> different. I am running with tcp,self. The first entry in the list of >> BTLs is garbage, but then there is tcp and self in the list. Strange. >> This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. >> >> >> >> Program terminated with signal 11, Segmentation fault. >> >> #0 0x7fb6dec981d0 in ?? () >> >> Missing separate debuginfos, use: debuginfo-install >> glibc-2.12-1.107.el6_4.5.x86_64 >> >> (gdb) where >> >> #0 0x7fb6dec981d0 in ?? () >> >> #1 >> >> #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 >> >> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >> procs=0x2061440, reachable=0x7fff80487b40) >> >> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >> >> #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, >> nprocs=2) >> >> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 >> >> #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, >> requested=0, provided=0x7fff80487cc8) >> >> at ../../ompi/runtime/ompi_mpi_init.c:776 >> >> #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, >> argv=0x7fff80487d80) at pinit.c:84 >> >> #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at >> MPI_Isend_ator_c.c:143 >> >> (gdb) >> >> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >> procs=0x2061440, reachable=0x7fff80487b40) >> >> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >> >> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, >> btl_endpoints, reachable); >> >> (gdb) print *btl >> >> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, >> btl_rndv_eager_limit = 140423556235000, >> >> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = >> 140423556235016, >> >> btl_rdma_pipeline_frag_size = 140423556235016, >> btl_min_rdma_pipeline_size = 140423556235032, >> >> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = >> 3895459624, btl_flags = 32694, >> >> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 >> , >> >> btl_del_procs = 0x7fb6e82fff38 , btl_register = >> 0x7fb6e82fff48 , >> >> btl_finalize = 0x7fb6e82fff48 , btl_alloc = >> 0x7fb6e82fff58 , >> >> btl_free = 0x7fb6e82fff58 , btl_prepare_src = >> 0x7fb6e82fff68 , >> >> btl_prepare_dst = 0x7fb6e82fff68 , btl_send = >> 0x7fb6e82fff78 , >> >> btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 >> , >> >> btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 >> , >> >> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 >> , >> >> btl_ft_event = 0x7fb6e82fffa8 } >> >> (gdb) >> >> >> >> >> >> *From:* devel [mailto:devel-boun...@open-mpi.org] >> *On Behalf Of *Mike Dubman >> *Sent:* Monday, February 10, 2014 4:23 AM >> *To:* Open MPI Developers >> *Subject:* [OMPI devel] 1.7.5 fails on simple test >> >> >> >> >> >> >> >> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun >> -np 8 -mca pml ob1 -mca btl self,tcp >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi* >> >> *[vegas12:12724] *** Process received signal >> >> *[vegas12:12724] Signal: Segmentation fault (11)* >> >> *[vegas12:12724] Signal code: (128)* >> >> *[vegas12:12724] Failing at address: (nil)* >> >> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]* >> >> *[vegas12:12724] [ 1] >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]* >> >> *[vegas12:12724] [ 2] >> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778
Re: [OMPI devel] RFC: optimize probe in ob1
Nathan, While this sounds like an optimization for highly specific application behavior, it is justifiable under some usage scenarios. I have several issues with the patch. Here are the minor ones: 1. It does modifications that are nor necessary to the patch itself (as an example removal of the static keyword from the mca_pml_ob1_comm_proc_t class instance). 2. Moving add_fragment_to_unexpected change the meaning of the code. 3. If this change get pushed in to the trunk, the only reason for the existence of last_probed disappear. Thus, the variable should disappear as well. 4. The last part of the patch is not related to this topic and should be pushed separately. Now the most major one. With this change you alter the most performance critical piece of code, by adding a non negligible number of potential cache misses (looking for the number of elements, adding/removing an element from a queue). This deserve a careful evaluation and consideration, not only for the less likely usage pattern you describe but for the more mainstream uses. George. On Feb 7, 2014, at 23:01 , Nathan Hjelm wrote: > What: The current probe algorithm in ob1 is linear with respect to the > number or processes in the job. I wish to change the algorithm to be > linear in the number of processes with unexpected messages. To do this I > added an additional opal_list_t to the ob1 communicator and made the ob1 > process a list_item_t. When an unexpected message comes in on a proc it > is added to that proc's unexpected message queue and the proc is added > to the communicator's list of procs with unexpected messages > (unexpected_procs) if it isn't already on that list. When matching a > probe request this list is used to determine which procs to look at to > find an unexpected message. The new list is protected by the matching > lock so no extra locking is needed. > > Why: I have a benchmark that makes heavy use of MPI_Iprobe in one if its > phases. I discovered that the primary reason this benchmark was running > slow with Open MPI is the probe algorithm. > > When: This is another simple optimization. It only affects the > unexpected message path and will speed up probe requests. This is > intended to go into 1.7.5. Setting the timeout to next Tuesday (which > gives me time to verify the improvment at scale-- 131,000 PEs). > > See the attached patch. > > -Nathan > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: optimize probe in ob1
On Tue, Feb 11, 2014 at 12:29:57AM +0100, George Bosilca wrote: > Nathan, > > While this sounds like an optimization for highly specific application > behavior, it is justifiable under some usage scenarios. I have several issues > with the patch. Here are the minor ones: > > 1. It does modifications that are nor necessary to the patch itself (as an > example removal of the static keyword from the mca_pml_ob1_comm_proc_t class > instance). Yeah. Not really part of the RFC. I should have removed it from the patch. That static modifier appears to be meaningless in that context. > 2. Moving add_fragment_to_unexpected change the meaning of the code. The location look wrong to me. A peruse receive event may be generated multiple times the way it was before. Doesn't matter anymore though as peruse is on its way out. > 3. If this change get pushed in to the trunk, the only reason for the > existence of last_probed disappear. Thus, the variable should disappear as > well. I agree. That variable should go away. I will remove it from my branch now. > 4. The last part of the patch is not related to this topic and should be > pushed separately. Bah. That shouldn't have been there either. That is a separate issue I can fix in another commit. > Now the most major one. With this change you alter the most performance > critical piece of code, by adding a non negligible number of potential cache > misses (looking for the number of elements, adding/removing an element from a > queue). This deserve a careful evaluation and consideration, not only for the > less likely usage pattern you describe but for the more mainstream uses. I agree that this should be reviewed carefully. A majority of the changes are in the unexpected message path and not in the critical path but due to the nature of icache misses it may still have an impact. I verified there was no impact on one system using vader and a ping-pong benchmark. I still need to verify there is no impact to message rate both on and off node as well as verify there is no impact on other architectures (AMD for example is very sensitive to changes outside the critical path). Thanks for your comments George! -Nathan pgpIu78bGVbph.pgp Description: PGP signature
[OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset
WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 bit mode (they are currently 64 bit, even in a 32 bit build). On v1.7, leave the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API layer to ensure we won't over/underflow 32 bits. WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205) WHERE: On trunk, this can be solved entirely in configury. In v1.7/v1.8, make changes in the MPI API layer (e.g., check MPI_Send to ensure (count*size_of_datatype)<2B) TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can be pushed back -- there's no real rush; this isn't a hot issue (but it is wrong and should be fixed). MORE DETAIL: I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong answers when compiled in 32 bit mode on a 64 bit machine. This is because in that build: - size_t: 4 bytes - ptrdiff_t: 4 bytes - MPI_Aint: 4 bytes - MPI_Offset: 8 bytes - MPI_Count: 8 bytes Some data points: 1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and MPI_Offset. 2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its underlying computation type. 3. The _x tests were failing in 32 bit builds because they take (count,datatype) input that intentionally results in a number of bytes that is larger than 2 billion, assigned that value to a size_t (which is 32 bits), caused an overflow, and therefore got the wrong answer. To solve this: - On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to be larger than size_t. This means that on 32 bit builds -- on both 32 and 64 bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == 4. There is a patch for this on #4205. - Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on v1.7, so we can just check for over/underflow in the MPI API. For example, we can check that (count * size_of_datatype) < 2 billion (other checks will also be necessary; this is just an example). I have no patch for this yet. As a side effect, this means that -- for 32 bit builds -- we will not support large filesystems well (e.g., filesystems with 64 bit offsets). BlueGene is an example of such a system (not that OMPI supports BlueGene, but...). Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits. I don't think that this is a major issue, because 32 bit builds are not a huge issue for the OMPI community, but I raise the point in the spirit of full disclosure. Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor infrastructure to use something other than size_t, and I have zero desire to do that! (please, no OMPI vendor reveal that they're going to seriously build giant 32 bit systems...) Also, while investigating this issue, I discovered that the configury for determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND values were unrelated to the C types that we discovered for these concepts. The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value are now directly correlated with the C types that were discovered. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Reviewing MPI_Dims_create
Hello, If you mean the current version in the ompi-tests/ibm svn repository I can confirm that it passes the topolgy/dimscreate test without errors. :) The difference in the patches is as follows: The patch from Andreas only generated a table of prime numbers of up to sqrt(freeprocs) while my patch still produces prime numbers up to freeprocs. And for factoring we really need all factors up to freeprocs. The standard sqrt optimization was just introduced in the wrong place. :) You are right with #3: It's a better approximation for the upper bound and the proof is something to be read under the Christmas tree. ;) I just have to rethink if the ceil() is necessary in the code as I am not sure about rounding issues in floating point calculations here... :P Regarding your questions: 1.) I don't think we have to cache prime numbers as MPI_Dims create will not be used frequently for factorization. If anybody needs faster factorization he would use his own - even more optimized - code. If you find some free time beside Open MPI go out for some harder problems at http://projecteuler.net. But please don't get frustrated from the assembler solutions. ;) 2.) Interesting idea: Using the approximation from the cited paper we should only need around 400 MB to store all primes in the int32 range. Potential for applying compression techniques still present. ^^ Regards Christoph -- Christoph Niethammer High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19 70569 Stuttgart Tel: ++49(0)711-685-87203 email: nietham...@hlrs.de http://www.hlrs.de/people/niethammer - Ursprüngliche Mail - Von: "Jeff Squyres (jsquyres)" An: "Open MPI Developers" Gesendet: Montag, 10. Februar 2014 20:12:08 Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create Nice! Can you verify that it passes the ibm test? I didn't look closely, and to be honest, I'm not sure why the previous improvement broke the IBM test because it hypothetically did what you mentioned (stopped at sqrt(freenodes)). I think patch 1 is a no-brainer. I'm not sure about #2 because I'm not sure how it's different than the previous one, nor did I have time to investigate why the previous one broke the IBM test. #3 seems like a good idea, too; I did't check the paper, but I assume it's some kind of proof about the upper limit on the number of primes in a given range. Two questions: 1. Should we cache generated prime numbers? (if so, it'll have to be done in a thread-safe way) 2. Should we just generate prime numbers and hard-code them into a table that is compiled into the code? We would only need primes up to the sqrt of 2billion (i.e., signed int), right? I don't know how many that is -- if it's small enough, perhaps this is the easiest solution. On Feb 10, 2014, at 1:30 PM, Christoph Niethammer wrote: > Hello, > > I noticed some effort in improving the scalability of > MPI_Dims_create(int nnodes, int ndims, int dims[]) > Unfortunately there were some issues with the first attempt (r30539 and > r30540) which were reverted. > > So I decided to give it a short review based on r30606 > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of > freeprocs have all positive integers as divisor. > So IMHO it would make more sense to check if nnodes > 0 in the > MPI_PARAM_CHECK section at the begin instead of the following (see patch > 0001): > > 99if (freeprocs < 1) { > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > 101FUNC_NAME); > 102 } > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int > *nprimes, int **pprimes) > which makes mathematically more sens (as the largest prime factor of any > number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > (see patch 0002) > Here the improvements: > > module load mpi/openmpi/trunk-gnu.4.7.3 > $ ./mpi-dims-old 100 > time used for MPI_Dims_create(100, 3, {}): 8.104007 > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > $ ./mpi-dims-new 100 > time used for MPI_Dims_create(100, 3, {}): 0.060400 > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as > reported by tcmalloc/pprof on a test program - but it may sum up with fatter > nodes. :P > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > (pprof) top > Total: -1.6 MB > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > 0.0 -0.0% -18.8% -1.6 100.0% main >-1.9 118.8% 100.0% -1.9 118.8% getprimes > > Find attached patch for it in 0003
Re: [OMPI devel] Reviewing MPI_Dims_create
On Feb 10, 2014, at 7:22 PM, Christoph Niethammer wrote: > 2.) Interesting idea: Using the approximation from the cited paper we should > only need around 400 MB to store all primes in the int32 range. Potential for > applying compression techniques still present. ^^ Per Andreas' last mail, we only need primes up to sqrt(2B) + 1 more. That *has* to be less than 400MB... right? sqrt(2B) = 46340. So the upper limit on the size required to hold all the primes from 2...46340 is 46340*sizeof(int) = 185,360 bytes (plus one more, per Andreas, so 185,364). This is all SWAGing, but I'm assuming the actual number must be *far* less than that... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Speedup for MPI_Dims_create()
Cool. See the other thread where I'm wondering if we shouldn't just pre-generate all the primes, hard-code them into a table, and be done with this issue. On Feb 10, 2014, at 5:19 PM, Andreas Schäfer wrote: > Jeff- > > I've seen that you've reverted the patch as it was faulty. Sorry about > that! I've attached a new patch, which applies against the current > trunk. The problem with the last patch was that it didn't catch a > special case: of all prime factors of n, there may be at most one > larger than sqrt(n). The old patch assumed that there was none. I've > included a comment in the source code so that this becomes clear for > later readers. > > The attached patch is more complicated than the original code, as we > now need to calculate the prime numbers and the number of their > occurrences in the integer factorization simultaneously. We can't > split both (as in the trunk) anymore, as the last prime might only be > discovered during the original getfactors(). > > I've tested this code back to back with the original code with > 1...1 nodes and 1...6 dimensions, just to be on the sure side this > time. > > Best > -Andreas > > > On 19:32 Mon 03 Feb , Jeff Squyres (jsquyres) wrote: >> Andreas -- >> >> I added the sqrt() change, which is the most important change, and then did >> a 2nd commit with the whitespace cleanup. The sqrt change will likely be in >> 1.7.5. I credited you in the commit log; you'll likely also get credited in >> NEWS. >> >> Thank you for the patch! >> >> >> On Dec 19, 2013, at 9:37 AM, Andreas Schäfer wrote: >> >>> Dear all, >>> >>> please find attached a (trivial) patch to MPI_Dims_create(). When >>> computing the prime factors of nnodes, it is sufficient to check for >>> primes less or equal to sqrt(nnodes). >>> >>> This was not so much of a problem in the past, but now that Tier 0 >>> systems are capable of running O(10^6) MPI processes, the difference >>> in execution time is on the order of seconds (e.g. 8.86s vs. 0.04s on >>> my notebook, with nnproc = 10^6). >>> >>> Best >>> -Andreas >>> >>> PS: oh, and the patch removes some trailing whitespace. Yuck. :-) >>> >>> >>> -- >>> == >>> Andreas Schäfer >>> HPC and Grid Computing >>> Chair of Computer Science 3 >>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany >>> +49 9131 85-27910 >>> PGP/GPG key via keyserver >>> http://www.libgeodecomp.org >>> == >>> >>> (\___/) >>> (+'.'+) >>> (")_(") >>> This is Bunny. Copy and paste Bunny into your >>> signature to help him gain world domination! >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > == > Andreas Schäfer > HPC and Grid Computing > Chair of Computer Science 3 > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany > +49 9131 85-27910 > PGP/GPG key via keyserver > http://www.libgeodecomp.org > == > > (\___/) > (+'.'+) > (")_(") > This is Bunny. Copy and paste Bunny into your > signature to help him gain world domination! > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] RFC: optimize probe in ob1
On Feb 11, 2014, at 01:05 , Nathan Hjelm wrote: > On Tue, Feb 11, 2014 at 12:29:57AM +0100, George Bosilca wrote: >> Nathan, >> >> While this sounds like an optimization for highly specific application >> behavior, it is justifiable under some usage scenarios. I have several >> issues with the patch. Here are the minor ones: >> >> 1. It does modifications that are nor necessary to the patch itself (as an >> example removal of the static keyword from the mca_pml_ob1_comm_proc_t class >> instance). > > Yeah. Not really part of the RFC. I should have removed it from the > patch. That static modifier appears to be meaningless in that context. The class is only usable in the context of a single .c file. As a code protection it makes perfect sense to me. >> 2. Moving add_fragment_to_unexpected change the meaning of the code. > > The location look wrong to me. A peruse receive event may be generated > multiple times the way it was before. Doesn't matter anymore though as > peruse is on its way out. It’s not yet, and I did not notice an RFC about. The event I was referring to is only generated when the message is first noticed. In the particular instance affected by your patch it has been delayed until the communicator is created locally, but it still have to be generated once. >> 3. If this change get pushed in to the trunk, the only reason for the >> existence of last_probed disappear. Thus, the variable should disappear as >> well. > > I agree. That variable should go away. I will remove it from my branch now. > >> 4. The last part of the patch is not related to this topic and should be >> pushed separately. > > Bah. That shouldn't have been there either. That is a separate issue I > can fix in another commit. > >> Now the most major one. With this change you alter the most performance >> critical piece of code, by adding a non negligible number of potential cache >> misses (looking for the number of elements, adding/removing an element from >> a queue). This deserve a careful evaluation and consideration, not only for >> the less likely usage pattern you describe but for the more mainstream uses. > > I agree that this should be reviewed carefully. A majority of the > changes are in the unexpected message path and not in the critical path > but due to the nature of cache misses it may still have an impact. The size check and the removal from the list is still in the critical path. At some point we were down to few hundreds of nano-sec, enough to get bugged by one extra memory reference. George. > I verified there was no impact on one system using vader and a ping-pong > benchmark. I still need to verify there is no impact to message rate > both on and off node as well as verify there is no impact on other > architectures (AMD for example is very sensitive to changes outside the > critical path). > > Thanks for your comments George! > > -Nathan > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Reviewing MPI_Dims_create
Hi Andreas, As mentioned in my former mail I did not touch the factorization code. But to figure out if a number n is *not* a prime number it is sufficient to check up to \sqrt(n). Proof: let n = p*q with q > \sqrt{n} --> p < \sqrt(n) So we have already found factor p before reaching \sqrt(n) and by this n is no prime any more and no need for further checks. ;) The mentioned factorization may indeed include one factor which is larger than \sqrt(n). :) Proof that at least one prime factor can be larger than \sqrt(n) example: 6 = 2*3 \sqrt(6) = 2.4494897427832... < 3 Q.E.D. Proof that no more than one factor can be larger than \sqrt(n): let n = \prod_{i=0}^K p_i with p_i \in N and K > 2 and assume w.l.o.g. p_0 > \sqrt(n) and p_1 > \sqrt(n) --> 1 > \prod_{i=2}^K p_i which is a contradiction as all p_i \in N. Q.E.D. So your idea is still applicable with not much effort and we only need prime factors up to sqrt(n) in the factorizer code for an additional optimization. :) First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No complication with counts IMHO. I leave this without patch as it is already 2:30 in the morning. :P Regards Christoph -- Christoph Niethammer High Performance Computing Center Stuttgart (HLRS) Nobelstrasse 19 70569 Stuttgart Tel: ++49(0)711-685-87203 email: nietham...@hlrs.de http://www.hlrs.de/people/niethammer - Ursprüngliche Mail - Von: "Andreas Schäfer" An: "Open MPI Developers" Gesendet: Montag, 10. Februar 2014 23:24:24 Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create Christoph- your patch has the same problem as my original patch: indeed there may be a prime factor p of n with p > sqrt(n). What's important is that there may only be at most one. I've submitted an updated patch (see my previous mail) which catches this special case. Best -Andreas On 19:30 Mon 10 Feb , Christoph Niethammer wrote: > Hello, > > I noticed some effort in improving the scalability of > MPI_Dims_create(int nnodes, int ndims, int dims[]) > Unfortunately there were some issues with the first attempt (r30539 and > r30540) which were reverted. > > So I decided to give it a short review based on r30606 > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of > freeprocs have all positive integers as divisor. > So IMHO it would make more sense to check if nnodes > 0 in the > MPI_PARAM_CHECK section at the begin instead of the following (see patch > 0001): > > 99if (freeprocs < 1) { > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > 101FUNC_NAME); > 102 } > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int > *nprimes, int **pprimes) > which makes mathematically more sens (as the largest prime factor of any > number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > (see patch 0002) > Here the improvements: > > module load mpi/openmpi/trunk-gnu.4.7.3 > $ ./mpi-dims-old 100 > time used for MPI_Dims_create(100, 3, {}): 8.104007 > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > $ ./mpi-dims-new 100 > time used for MPI_Dims_create(100, 3, {}): 0.060400 > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as > reported by tcmalloc/pprof on a test program - but it may sum up with fatter > nodes. :P > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > (pprof) top > Total: -1.6 MB > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > 0.0 -0.0% -18.8% -1.6 100.0% main > -1.9 118.8% 100.0% -1.9 118.8% getprimes > > Find attached patch for it in 0003. > > > If there are no issues I would like to commit this to trunk for further > testing (+cmr for 1.7.5?) end of this week. > > Best regards > Christoph > > [1] > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html > > > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer > From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001 > From: Christoph Niethammer > Date: Mon, 10 Feb 2014 16:44:03 +0100 > Subject: [PATCH 1/3] Move parameter check into appropriate code section at the > begin. > > --- > ompi/mpi/c/dims_create.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/ompi/mpi/c/dims
[OMPI devel] oshmem test suite
The Fortran programs in the oshmem test suite don't compile because my_pe and num_pes are already declared in OMPI's shmem.fh. To be fair, I asked Mellanox to put those declarations in shmem.fh because I thought it was crazy that all applications would have to declare them. Apparently, the shmem community is crazy. :-\ So I'll rescind my previous recommendation (even though I still think it's the Right way to go). I'll remove the "integer my_pe, num_pes" declarations from shmem.fh, and put the declarations back in the shmem examples we have in examples/. I still think it's crazy, but if the openshmem people are doing this in all their test programs, I assume it's good representation of what the shmem community itself is doing. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Reviewing MPI_Dims_create
sqrt(2^31)/log(sqrt(2^31))*(1+1.2762/log(sqrt(2^31)))/1024 * 4byte = 18,850133965051 kbyte should do it. ;) Amazing - I think our systems are still *too small* - lets go for MPI with int64 types. ^^ - Ursprüngliche Mail - Von: "Jeff Squyres (jsquyres)" An: "Open MPI Developers" Gesendet: Dienstag, 11. Februar 2014 01:32:53 Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create On Feb 10, 2014, at 7:22 PM, Christoph Niethammer wrote: > 2.) Interesting idea: Using the approximation from the cited paper we should > only need around 400 MB to store all primes in the int32 range. Potential for > applying compression techniques still present. ^^ Per Andreas' last mail, we only need primes up to sqrt(2B) + 1 more. That *has* to be less than 400MB... right? sqrt(2B) = 46340. So the upper limit on the size required to hold all the primes from 2...46340 is 46340*sizeof(int) = 185,360 bytes (plus one more, per Andreas, so 185,364). This is all SWAGing, but I'm assuming the actual number must be *far* less than that... -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] 1.7.5 fails on simple test
All the platforms that failed over the weekend have passed today. -Paul On Mon, Feb 10, 2014 at 2:34 PM, Paul Hargrove wrote: > The fastest of my systems that failed over the weekend (a ppc64) has > completed tests successfully. > I will report on the ppc32 and SPARC results when they have all passed or > failed. > > -Paul > > > On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain wrote: > >> Tarball is now posted >> >> On Feb 10, 2014, at 1:31 PM, Ralph Castain wrote: >> >> Generating it now - sorry for my lack of response, my OMPI email was down >> for some reason. I can now receive it, but still haven't gotten the backlog >> from the down period. >> >> >> On Feb 10, 2014, at 1:23 PM, Paul Hargrove wrote: >> >> Ralph, >> >> If you give me a heads-up when this makes it into a tarball, I will >> retest my failing ppc and sparc platforms. >> >> -Paul >> >> >> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart >> wrote: >> >>> I have tracked this down. There is a missing commit that affects >>> ompi_mpi_init.c causing it to initialize bml twice. >>> >>> Ralph, can you apply r30310 to 1.7? >>> >>> >>> >>> Thanks, >>> >>> Rolf >>> >>> >>> >>> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf >>> vandeVaart >>> *Sent:* Monday, February 10, 2014 12:29 PM >>> *To:* Open MPI Developers >>> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test >>> >>> >>> >>> I have seen this same issue although my core dump is a little bit >>> different. I am running with tcp,self. The first entry in the list of >>> BTLs is garbage, but then there is tcp and self in the list. Strange. >>> This is my core dump. Line 208 in bml_r2.c is where I get the SEGV. >>> >>> >>> >>> Program terminated with signal 11, Segmentation fault. >>> >>> #0 0x7fb6dec981d0 in ?? () >>> >>> Missing separate debuginfos, use: debuginfo-install >>> glibc-2.12-1.107.el6_4.5.x86_64 >>> >>> (gdb) where >>> >>> #0 0x7fb6dec981d0 in ?? () >>> >>> #1 >>> >>> #2 0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6 >>> >>> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >>> procs=0x2061440, reachable=0x7fff80487b40) >>> >>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >>> >>> #4 0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, >>> nprocs=2) >>> >>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332 >>> >>> #5 0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, >>> requested=0, provided=0x7fff80487cc8) >>> >>> at ../../ompi/runtime/ompi_mpi_init.c:776 >>> >>> #6 0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, >>> argv=0x7fff80487d80) at pinit.c:84 >>> >>> #7 0x00401c56 in main (argc=1, argv=0x7fff80488158) at >>> MPI_Isend_ator_c.c:143 >>> >>> (gdb) >>> >>> #3 0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, >>> procs=0x2061440, reachable=0x7fff80487b40) >>> >>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208 >>> >>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, >>> btl_endpoints, reachable); >>> >>> (gdb) print *btl >>> >>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, >>> btl_rndv_eager_limit = 140423556235000, >>> >>> btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = >>> 140423556235016, >>> >>> btl_rdma_pipeline_frag_size = 140423556235016, >>> btl_min_rdma_pipeline_size = 140423556235032, >>> >>> btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = >>> 3895459624, btl_flags = 32694, >>> >>> btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 >>> , >>> >>> btl_del_procs = 0x7fb6e82fff38 , btl_register = >>> 0x7fb6e82fff48 , >>> >>> btl_finalize = 0x7fb6e82fff48 , btl_alloc = >>> 0x7fb6e82fff58 , >>> >>> btl_free = 0x7fb6e82fff58 , btl_prepare_src = >>> 0x7fb6e82fff68 , >>> >>> btl_prepare_dst = 0x7fb6e82fff68 , btl_send = >>> 0x7fb6e82fff78 , >>> >>> btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 >>> , >>> >>> btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 >>> , >>> >>> btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 >>> , >>> >>> btl_ft_event = 0x7fb6e82fffa8 } >>> >>> (gdb) >>> >>> >>> >>> >>> >>> *From:* devel >>> [mailto:devel-boun...@open-mpi.org] >>> *On Behalf Of *Mike Dubman >>> *Sent:* Monday, February 10, 2014 4:23 AM >>> *To:* Open MPI Developers >>> *Subject:* [OMPI devel] 1.7.5 fails on simple test >>> >>> >>> >>> >>> >>> >>> >>> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun >>> -np 8 -mca pml ob1 -mca btl self,tcp >>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi* >>> >>> *[vegas12:12724] *** Process received signal >>> >>> *[vegas12:12724] Signal: Segmentation fault (11)* >>> >>> *[vegas12:12724] Signal code: (128)* >>> >>> *[vegas12:12724] Failing at address: (nil)* >>> >>> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]* >>> >>> *[vegas12:12724] [ 1] >>>