[OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Mike Dubman
*$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
-np 8 -mca pml ob1 -mca btl self,tcp
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
[vegas12:12724] *** Process received signal ***
[vegas12:12724] Signal: Segmentation fault (11)
[vegas12:12724] Signal code:  (128)
[vegas12:12724] Failing at address: (nil)
[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
[vegas12:12724] [ 1]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]
[vegas12:12724] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]
[vegas12:12724] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]
[vegas12:12724] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]
[vegas12:12724] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]
[vegas12:12724] [ 6]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]
[vegas12:12724] [ 7]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]
[vegas12:12724] [ 8]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210]
[vegas12:12724] [ 9]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25]
[vegas12:12724] [10]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]
[vegas12:12724] [11]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]
[vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
[vegas12:12724] [13]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]
[vegas12:12724] *** End of error message ***
[vegas12:12731] *** Process received signal ***
[vegas12:12731] Signal: Segmentation fault (11)
[vegas12:12731] Signal code:  (128)
[vegas12:12731] Failing at address: (nil)
[vegas12:12731] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
[vegas12:12731] [ 1]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]
[vegas12:12731] [ 2]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]
[vegas12:12731] [ 3]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]
[vegas12:12731] [ 4]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]
[vegas12:12731] [ 5]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]
[vegas12:12731] [ 6]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]
[vegas12:12731] [ 7]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]
[vegas12:12731] [ 8]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210]
[vegas12:12731] [ 9]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25]
[vegas12:12731] [10]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]
[vegas12:12731] [11]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]
[vegas12:12731] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]
[vegas12:12731] [13]
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400b29]
[vegas12:12731] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 12724 on node vegas12
exited on signal 11 (Segmentation fault).
--
jenkins@vegas12 ~
*


Re: [OMPI devel] Compilation error: 'OMPI_MPIHANDLES_DLL_PREFIX' undeclared

2014-02-10 Thread George Bosilca
It is a compilation flag passes through the Makefile (when automake is used). I 
guess you will have to modify the CMake to pass it as well. You need to for the 
compilation of the ompi/debuggers/ompi_debuggers.c and should point to the 
location of the installed libraries.

  George.

On Feb 10, 2014, at 03:36 , Irvanda Kurniadi  wrote:

> Hi,
> 
> I'm porting OpenMPI to L4/fiasco. I found this error message while compiling 
> OpenMPI:
> error: ‘OMPI_MPIHANDLES_DLL_PREFIX’ undeclared (first use in this function)
> error: ‘OMPI_MSGQ_DLL_PREFIX’ undeclared (first use in this function)
> 
> I found the OMPI_MPIHANDLES_DLL_PREFIX in CMakelist.txt like below:
> SET_TARGET_PROPERTIES(libmpi PROPERTIES COMPILE_FLAGS   
> "${OMPI_C_DEF_PRE}OMPI_MPIHANDLES_DLL_PREFIX=libompi_dbg_mpihandles
>  ${OMPI_C_DEF_PRE}OMPI_MSGQ_DLL_PREFIX=libompi_dbg_msgq")
> 
> I don't know how to use this CMakelist.txt in L4/fiasco. Or maybe this 
> problem can be fixed without CMakelist.txt. Anybody knows how to overcome 
> this problem?
> 
> regards,
> Irvanda
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Compilation error: 'OMPI_MPIHANDLES_DLL_PREFIX' undeclared

2014-02-10 Thread Jeff Squyres (jsquyres)
Note that we have removed all CMake support from Open MPI starting with v1.7.

Is there a reason you're using the CMake support instead of the Autotools 
support?  We only had the CMake support there for MS Windows support, which has 
been removed (which is why the CMake support was removed).


On Feb 9, 2014, at 9:36 PM, Irvanda Kurniadi  wrote:

> Hi,
> 
> I'm porting OpenMPI to L4/fiasco. I found this error message while compiling 
> OpenMPI:
> error: ‘OMPI_MPIHANDLES_DLL_PREFIX’ undeclared (first use in this function)
> error: ‘OMPI_MSGQ_DLL_PREFIX’ undeclared (first use in this function)
> 
> I found the OMPI_MPIHANDLES_DLL_PREFIX in CMakelist.txt like below:
> SET_TARGET_PROPERTIES(libmpi PROPERTIES COMPILE_FLAGS   
> "${OMPI_C_DEF_PRE}OMPI_MPIHANDLES_DLL_PREFIX=libompi_dbg_mpihandles
>  ${OMPI_C_DEF_PRE}OMPI_MSGQ_DLL_PREFIX=libompi_dbg_msgq")
> 
> I don't know how to use this CMakelist.txt in L4/fiasco. Or maybe this 
> problem can be fixed without CMakelist.txt. Anybody knows how to overcome 
> this problem?
> 
> regards,
> Irvanda
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Rolf vandeVaart
I have seen this same issue although my core dump is a little bit different.  I 
am running with tcp,self.  The first entry in the list of BTLs is garbage, but 
then there is tcp and self in the list.   Strange.  This is my core dump.  Line 
208 in bml_r2.c is where I get the SEGV.

Program terminated with signal 11, Segmentation fault.
#0  0x7fb6dec981d0 in ?? ()
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.107.el6_4.5.x86_64
(gdb) where
#0  0x7fb6dec981d0 in ?? ()
#1  
#2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
#4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
#5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
requested=0, provided=0x7fff80487cc8)
at ../../ompi/runtime/ompi_mpi_init.c:776
#6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) 
at pinit.c:84
#7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
MPI_Isend_ator_c.c:143
(gdb)
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
btl_endpoints, reachable);
(gdb) print *btl
$1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
btl_rndv_eager_limit = 140423556235000,
  btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
140423556235016,
  btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
140423556235032,
  btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
3895459624, btl_flags = 32694,
  btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
,
  btl_del_procs = 0x7fb6e82fff38 , btl_register = 
0x7fb6e82fff48 ,
  btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
,
  btl_free = 0x7fb6e82fff58 , btl_prepare_src = 0x7fb6e82fff68 
,
  btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 0x7fb6e82fff78 
,
  btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
,
  btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
,
  btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
,
  btl_ft_event = 0x7fb6e82fffa8 }
(gdb)


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
Sent: Monday, February 10, 2014 4:23 AM
To: Open MPI Developers
Subject: [OMPI devel] 1.7.5 fails on simple test






$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
 -np 8 -mca pml ob1 -mca btl self,tcp 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi

[vegas12:12724] *** Process received signal ***

[vegas12:12724] Signal: Segmentation fault (11)

[vegas12:12724] Signal code:  (128)

[vegas12:12724] Failing at address: (nil)

[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]

[vegas12:12724] [ 1] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]

[vegas12:12724] [ 2] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]

[vegas12:12724] [ 3] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]

[vegas12:12724] [ 4] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]

[vegas12:12724] [ 5] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]

[vegas12:12724] [ 6] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]

[vegas12:12724] [ 7] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]

[vegas12:12724] [ 8] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210]

[vegas12:12724] [ 9] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25]

[vegas12:12724] [10] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400c0b]

[vegas12:12724] [11] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi[0x400d4a]

[vegas12:12724] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x393741ecdd]

[vegas12:12724] [13] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/e

[OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Christoph Niethammer
Hello,

I noticed some effort in improving the scalability of
MPI_Dims_create(int nnodes, int ndims, int dims[])
Unfortunately there were some issues with the first attempt (r30539 and r30540) 
which were reverted.

So I decided to give it a short review based on r30606
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606


1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
freeprocs have all positive integers as divisor.
So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK 
section at the begin instead of the following (see patch 0001):

99  if (freeprocs < 1) {
100return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
101  FUNC_NAME);
102 }


2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
*nprimes, int **pprimes)
which makes mathematically more sens (as the largest prime factor of any number 
n cannot exceed \sqrt{n}) - and should produce the right result. ;)
(see patch 0002)
Here the improvements:

module load mpi/openmpi/trunk-gnu.4.7.3
$ ./mpi-dims-old 100
time used for MPI_Dims_create(100, 3, {}): 8.104007
module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
$ ./mpi-dims-new 100
time used for MPI_Dims_create(100, 3, {}): 0.060400


3.) Memory allocation for the list of prime numbers may be reduced up to a 
factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
\pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported 
by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P

$ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
(pprof) top
Total: -1.6 MB
 0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
 0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
 0.0  -0.0% -18.8% -1.6 100.0% main
-1.9 118.8% 100.0% -1.9 118.8% getprimes

Find attached patch for it in 0003.


If there are no issues I would like to commit this to trunk for further testing 
(+cmr for 1.7.5?) end of this week.

Best regards
Christoph

[1] http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html



--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammerFrom e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
From: Christoph Niethammer 
Date: Mon, 10 Feb 2014 16:44:03 +0100
Subject: [PATCH 1/3] Move parameter check into appropriate code section at the
 begin.

---
 ompi/mpi/c/dims_create.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index d2c3858..3d0792f 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
 return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, 
MPI_ERR_DIMS, FUNC_NAME);
 }
+
+if (1 > nnodes) {
+return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
+   MPI_ERR_DIMS, FUNC_NAME);
+}
 }
 
 /* Get # of free-to-be-assigned processes and # of free dimensions */
@@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
  FUNC_NAME);
 }
 
-if (freeprocs < 1) {
-   return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
- FUNC_NAME);
-}
-else if (freeprocs == 1) {
+if (freeprocs == 1) {
 for (i = 0; i < ndims; ++i, ++dims) {
 if (*dims == 0) {
*dims = 1;
-- 
1.8.3.2

From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001
From: Christoph Niethammer 
Date: Mon, 10 Feb 2014 18:50:51 +0100
Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact
 that the largest prime factor of any number n cannot exceed \sqrt{n}.

---
 ompi/mpi/c/dims_create.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index 3d0792f..1c1c381 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2004-2005 The University of Tennessee and The University
  * of Tennessee Research Foundation.  All rights
  * reserved.
- * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
+ * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, 
  * University of Stuttgart.  All rights reserved.
  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
@@ -20,6 +20,8 @@
 
 #in

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Jeff Squyres (jsquyres)
Nice!  Can you verify that it passes the ibm test?  I didn't look closely, and 
to be honest, I'm not sure why the previous improvement broke the IBM test 
because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).

I think patch 1 is a no-brainer.  I'm not sure about #2 because I'm not sure 
how it's different than the previous one, nor did I have time to investigate 
why the previous one broke the IBM test.  #3 seems like a good idea, too; I 
did't check the paper, but I assume it's some kind of proof about the upper 
limit on the number of primes in a given range.

Two questions:

1. Should we cache generated prime numbers?  (if so, it'll have to be done in a 
thread-safe way)

2. Should we just generate prime numbers and hard-code them into a table that 
is compiled into the code?  We would only need primes up to the sqrt of 
2billion (i.e., signed int), right?  I don't know how many that is -- if it's 
small enough, perhaps this is the easiest solution.



On Feb 10, 2014, at 1:30 PM, Christoph Niethammer  wrote:

> Hello,
> 
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and 
> r30540) which were reverted.
> 
> So I decided to give it a short review based on r30606
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> 
> 
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the 
> MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> 0001):
> 
> 99if (freeprocs < 1) {
> 100  return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101FUNC_NAME);
> 102   }
> 
> 
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any 
> number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> 
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 100
> time used for MPI_Dims_create(100, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 100
> time used for MPI_Dims_create(100, 3, {}): 0.060400
> 
> 
> 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> reported by tcmalloc/pprof on a test program - but it may sum up with fatter 
> nodes. :P
> 
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
> 0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
> 0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
> 0.0  -0.0% -18.8% -1.6 100.0% main
>-1.9 118.8% 100.0% -1.9 118.8% getprimes
> 
> Find attached patch for it in 0003.
> 
> 
> If there are no issues I would like to commit this to trunk for further 
> testing (+cmr for 1.7.5?) end of this week.
> 
> Best regards
> Christoph
> 
> [1] 
> http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> 
> 
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer<0001-Move-parameter-check-into-appropriate-code-section-a.patch><0002-Speeding-up-detection-of-prime-numbers-using-the-fac.patch><0003-Reduce-memory-usage-by-a-better-approximation-for-th.patch>___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Rolf vandeVaart
I have tracked this down.  There is a missing commit that affects 
ompi_mpi_init.c causing it to initialize bml twice.
Ralph, can you apply r30310 to 1.7?

Thanks,
Rolf

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Monday, February 10, 2014 12:29 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] 1.7.5 fails on simple test

I have seen this same issue although my core dump is a little bit different.  I 
am running with tcp,self.  The first entry in the list of BTLs is garbage, but 
then there is tcp and self in the list.   Strange.  This is my core dump.  Line 
208 in bml_r2.c is where I get the SEGV.

Program terminated with signal 11, Segmentation fault.
#0  0x7fb6dec981d0 in ?? ()
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.107.el6_4.5.x86_64
(gdb) where
#0  0x7fb6dec981d0 in ?? ()
#1  
#2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
#4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
#5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
requested=0, provided=0x7fff80487cc8)
at ../../ompi/runtime/ompi_mpi_init.c:776
#6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, argv=0x7fff80487d80) 
at pinit.c:84
#7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
MPI_Isend_ator_c.c:143
(gdb)
#3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
reachable=0x7fff80487b40)
at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
btl_endpoints, reachable);
(gdb) print *btl
$1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
btl_rndv_eager_limit = 140423556235000,
  btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
140423556235016,
  btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
140423556235032,
  btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
3895459624, btl_flags = 32694,
  btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
,
  btl_del_procs = 0x7fb6e82fff38 , btl_register = 
0x7fb6e82fff48 ,
  btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
,
  btl_free = 0x7fb6e82fff58 , btl_prepare_src = 0x7fb6e82fff68 
,
  btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 0x7fb6e82fff78 
,
  btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
,
  btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
,
  btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
,
  btl_ft_event = 0x7fb6e82fffa8 }
(gdb)


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
Sent: Monday, February 10, 2014 4:23 AM
To: Open MPI Developers
Subject: [OMPI devel] 1.7.5 fails on simple test






$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
 -np 8 -mca pml ob1 -mca btl self,tcp 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi

[vegas12:12724] *** Process received signal ***

[vegas12:12724] Signal: Segmentation fault (11)

[vegas12:12724] Signal code:  (128)

[vegas12:12724] Failing at address: (nil)

[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]

[vegas12:12724] [ 1] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]

[vegas12:12724] [ 2] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]

[vegas12:12724] [ 3] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]

[vegas12:12724] [ 4] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]

[vegas12:12724] [ 5] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]

[vegas12:12724] [ 6] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]

[vegas12:12724] [ 7] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]

[vegas12:12724] [ 8] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MPI_Init+0x170)[0x778d4210]

[vegas12:12724] [ 9] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi_mpifh.so.2(PMPI_Init_f08+0x25)[0x77b71c25]

[vegas12:12724] [10] 
/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-n

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Ralph Castain
Done - thanks Rolf!!


On Feb 10, 2014, at 1:13 PM, Rolf vandeVaart  wrote:

> I have tracked this down.  There is a missing commit that affects 
> ompi_mpi_init.c causing it to initialize bml twice.
> Ralph, can you apply r30310 to 1.7?
>  
> Thanks,
> Rolf
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
> Sent: Monday, February 10, 2014 12:29 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] 1.7.5 fails on simple test
>  
> I have seen this same issue although my core dump is a little bit different.  
> I am running with tcp,self.  The first entry in the list of BTLs is garbage, 
> but then there is tcp and self in the list.   Strange.  This is my core dump. 
>  Line 208 in bml_r2.c is where I get the SEGV.
>  
> Program terminated with signal 11, Segmentation fault.
> #0  0x7fb6dec981d0 in ?? ()
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.107.el6_4.5.x86_64
> (gdb) where
> #0  0x7fb6dec981d0 in ?? ()
> #1  
> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
> reachable=0x7fff80487b40)
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
> requested=0, provided=0x7fff80487cc8)
> at ../../ompi/runtime/ompi_mpi_init.c:776
> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, 
> argv=0x7fff80487d80) at pinit.c:84
> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
> MPI_Isend_ator_c.c:143
> (gdb)
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
> reachable=0x7fff80487b40)
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
> btl_endpoints, reachable);
> (gdb) print *btl
> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
> btl_rndv_eager_limit = 140423556235000,
>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
> 140423556235016,
>   btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
> 140423556235032,
>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
> 3895459624, btl_flags = 32694,
>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
> ,
>   btl_del_procs = 0x7fb6e82fff38 , btl_register = 
> 0x7fb6e82fff48 ,
>   btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
> ,
>   btl_free = 0x7fb6e82fff58 , btl_prepare_src = 
> 0x7fb6e82fff68 ,
>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 
> 0x7fb6e82fff78 ,
>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
> ,
>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
> ,
>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
> ,
>   btl_ft_event = 0x7fb6e82fffa8 }
> (gdb)
>  
>  
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
> Sent: Monday, February 10, 2014 4:23 AM
> To: Open MPI Developers
> Subject: [OMPI devel] 1.7.5 fails on simple test
>  
>  
>  
> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>  -np 8 -mca pml ob1 -mca btl self,tcp 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
> [vegas12:12724] *** Process received signal ***
> [vegas12:12724] Signal: Segmentation fault (11)
> [vegas12:12724] Signal code:  (128)
> [vegas12:12724] Failing at address: (nil)
> [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
> [vegas12:12724] [ 1] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]
> [vegas12:12724] [ 2] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]
> [vegas12:12724] [ 3] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]
> [vegas12:12724] [ 4] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]
> [vegas12:12724] [ 5] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]
> [vegas12:12724] [ 6] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]
> [vegas12:12724] [ 7] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(ompi_mpi_init+0x52b)[0x778bffdb]
> [vegas12:12724] [ 8] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(MP

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Paul Hargrove
Ralph,

If you give me a heads-up when this makes it into a tarball, I will retest
my failing ppc and sparc platforms.

-Paul


On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart wrote:

> I have tracked this down.  There is a missing commit that affects
> ompi_mpi_init.c causing it to initialize bml twice.
>
> Ralph, can you apply r30310 to 1.7?
>
>
>
> Thanks,
>
> Rolf
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf
> vandeVaart
> *Sent:* Monday, February 10, 2014 12:29 PM
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test
>
>
>
> I have seen this same issue although my core dump is a little bit
> different.  I am running with tcp,self.  The first entry in the list of
> BTLs is garbage, but then there is tcp and self in the list.   Strange.
> This is my core dump.  Line 208 in bml_r2.c is where I get the SEGV.
>
>
>
> Program terminated with signal 11, Segmentation fault.
>
> #0  0x7fb6dec981d0 in ?? ()
>
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.107.el6_4.5.x86_64
>
> (gdb) where
>
> #0  0x7fb6dec981d0 in ?? ()
>
> #1  
>
> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
>
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440,
> reachable=0x7fff80487b40)
>
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>
> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
>
> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
>
> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158,
> requested=0, provided=0x7fff80487cc8)
>
> at ../../ompi/runtime/ompi_mpi_init.c:776
>
> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c,
> argv=0x7fff80487d80) at pinit.c:84
>
> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at
> MPI_Isend_ator_c.c:143
>
> (gdb)
>
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440,
> reachable=0x7fff80487b40)
>
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>
> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs,
> btl_endpoints, reachable);
>
> (gdb) print *btl
>
> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984,
> btl_rndv_eager_limit = 140423556235000,
>
>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length =
> 140423556235016,
>
>   btl_rdma_pipeline_frag_size = 140423556235016,
> btl_min_rdma_pipeline_size = 140423556235032,
>
>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth =
> 3895459624, btl_flags = 32694,
>
>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38
> ,
>
>   btl_del_procs = 0x7fb6e82fff38 , btl_register =
> 0x7fb6e82fff48 ,
>
>   btl_finalize = 0x7fb6e82fff48 , btl_alloc =
> 0x7fb6e82fff58 ,
>
>   btl_free = 0x7fb6e82fff58 , btl_prepare_src =
> 0x7fb6e82fff68 ,
>
>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send =
> 0x7fb6e82fff78 ,
>
>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88
> ,
>
>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98
> ,
>
>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8
> ,
>
>   btl_ft_event = 0x7fb6e82fffa8 }
>
> (gdb)
>
>
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org]
> *On Behalf Of *Mike Dubman
> *Sent:* Monday, February 10, 2014 4:23 AM
> *To:* Open MPI Developers
> *Subject:* [OMPI devel] 1.7.5 fails on simple test
>
>
>
>
>
>
>
> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>  -np 8 -mca pml ob1 -mca btl self,tcp 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi*
>
> *[vegas12:12724] *** Process received signal 
>
> *[vegas12:12724] Signal: Segmentation fault (11)*
>
> *[vegas12:12724] Signal code:  (128)*
>
> *[vegas12:12724] Failing at address: (nil)*
>
> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*
>
> *[vegas12:12724] [ 1] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]*
>
> *[vegas12:12724] [ 2] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]*
>
> *[vegas12:12724] [ 3] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]*
>
> *[vegas12:12724] [ 4] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]*
>
> *[vegas12:12724] [ 5] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_pml_ob1.so(+0x51d8)[0x737481d8]*
>
> *[vegas12:12724] [ 6] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_pml_base_select+0x1e0)[0x778f31e0]*
>
> *[vegas12:12724] [ 7] 
> /scrap/jenkins/scr

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Ralph Castain
Generating it now - sorry for my lack of response, my OMPI email was down for 
some reason. I can now receive it, but still haven't gotten the backlog from 
the down period.


On Feb 10, 2014, at 1:23 PM, Paul Hargrove  wrote:

> Ralph,
> 
> If you give me a heads-up when this makes it into a tarball, I will retest my 
> failing ppc and sparc platforms.
> 
> -Paul
> 
> 
> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart  
> wrote:
> I have tracked this down.  There is a missing commit that affects 
> ompi_mpi_init.c causing it to initialize bml twice.
> 
> Ralph, can you apply r30310 to 1.7?
> 
>  
> 
> Thanks,
> 
> Rolf
> 
>  
> 
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
> Sent: Monday, February 10, 2014 12:29 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] 1.7.5 fails on simple test
> 
>  
> 
> I have seen this same issue although my core dump is a little bit different.  
> I am running with tcp,self.  The first entry in the list of BTLs is garbage, 
> but then there is tcp and self in the list.   Strange.  This is my core dump. 
>  Line 208 in bml_r2.c is where I get the SEGV.
> 
>  
> 
> Program terminated with signal 11, Segmentation fault.
> 
> #0  0x7fb6dec981d0 in ?? ()
> 
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.107.el6_4.5.x86_64
> 
> (gdb) where
> 
> #0  0x7fb6dec981d0 in ?? ()
> 
> #1  
> 
> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
> 
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
> reachable=0x7fff80487b40)
> 
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> 
> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
> 
> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
> 
> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
> requested=0, provided=0x7fff80487cc8)
> 
> at ../../ompi/runtime/ompi_mpi_init.c:776
> 
> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, 
> argv=0x7fff80487d80) at pinit.c:84
> 
> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
> MPI_Isend_ator_c.c:143
> 
> (gdb)
> 
> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
> reachable=0x7fff80487b40)
> 
> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
> 
> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
> btl_endpoints, reachable);
> 
> (gdb) print *btl
> 
> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
> btl_rndv_eager_limit = 140423556235000,
> 
>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
> 140423556235016,
> 
>   btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size = 
> 140423556235032,
> 
>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
> 3895459624, btl_flags = 32694,
> 
>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
> ,
> 
>   btl_del_procs = 0x7fb6e82fff38 , btl_register = 
> 0x7fb6e82fff48 ,
> 
>   btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
> ,
> 
>   btl_free = 0x7fb6e82fff58 , btl_prepare_src = 
> 0x7fb6e82fff68 ,
> 
>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 
> 0x7fb6e82fff78 ,
> 
>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
> ,
> 
>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
> ,
> 
>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
> ,
> 
>   btl_ft_event = 0x7fb6e82fffa8 }
> 
> (gdb)
> 
>  
> 
>  
> 
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
> Sent: Monday, February 10, 2014 4:23 AM
> To: Open MPI Developers
> Subject: [OMPI devel] 1.7.5 fails on simple test
> 
>  
> 
>  
>  
> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>  -np 8 -mca pml ob1 -mca btl self,tcp 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
> [vegas12:12724] *** Process received signal ***
> [vegas12:12724] Signal: Segmentation fault (11)
> [vegas12:12724] Signal code:  (128)
> [vegas12:12724] Failing at address: (nil)
> [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
> [vegas12:12724] [ 1] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]
> [vegas12:12724] [ 2] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]
> [vegas12:12724] [ 3] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]
> [vegas12:12724] [ 4] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_bml_base_init+0x99)[0x778e0cc9]
> [vegas12:12724] [ 5] 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Ralph Castain
Tarball is now posted

On Feb 10, 2014, at 1:31 PM, Ralph Castain  wrote:

> Generating it now - sorry for my lack of response, my OMPI email was down for 
> some reason. I can now receive it, but still haven't gotten the backlog from 
> the down period.
> 
> 
> On Feb 10, 2014, at 1:23 PM, Paul Hargrove  wrote:
> 
>> Ralph,
>> 
>> If you give me a heads-up when this makes it into a tarball, I will retest 
>> my failing ppc and sparc platforms.
>> 
>> -Paul
>> 
>> 
>> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart  
>> wrote:
>> I have tracked this down.  There is a missing commit that affects 
>> ompi_mpi_init.c causing it to initialize bml twice.
>> 
>> Ralph, can you apply r30310 to 1.7?
>> 
>>  
>> 
>> Thanks,
>> 
>> Rolf
>> 
>>  
>> 
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
>> Sent: Monday, February 10, 2014 12:29 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] 1.7.5 fails on simple test
>> 
>>  
>> 
>> I have seen this same issue although my core dump is a little bit different. 
>>  I am running with tcp,self.  The first entry in the list of BTLs is 
>> garbage, but then there is tcp and self in the list.   Strange.  This is my 
>> core dump.  Line 208 in bml_r2.c is where I get the SEGV.
>> 
>>  
>> 
>> Program terminated with signal 11, Segmentation fault.
>> 
>> #0  0x7fb6dec981d0 in ?? ()
>> 
>> Missing separate debuginfos, use: debuginfo-install 
>> glibc-2.12-1.107.el6_4.5.x86_64
>> 
>> (gdb) where
>> 
>> #0  0x7fb6dec981d0 in ?? ()
>> 
>> #1  
>> 
>> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
>> 
>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
>> reachable=0x7fff80487b40)
>> 
>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>> 
>> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0, nprocs=2)
>> 
>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
>> 
>> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158, 
>> requested=0, provided=0x7fff80487cc8)
>> 
>> at ../../ompi/runtime/ompi_mpi_init.c:776
>> 
>> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c, 
>> argv=0x7fff80487d80) at pinit.c:84
>> 
>> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at 
>> MPI_Isend_ator_c.c:143
>> 
>> (gdb)
>> 
>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2, procs=0x2061440, 
>> reachable=0x7fff80487b40)
>> 
>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>> 
>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs, 
>> btl_endpoints, reachable);
>> 
>> (gdb) print *btl
>> 
>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984, 
>> btl_rndv_eager_limit = 140423556235000,
>> 
>>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length = 
>> 140423556235016,
>> 
>>   btl_rdma_pipeline_frag_size = 140423556235016, btl_min_rdma_pipeline_size 
>> = 140423556235032,
>> 
>>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth = 
>> 3895459624, btl_flags = 32694,
>> 
>>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38 
>> ,
>> 
>>   btl_del_procs = 0x7fb6e82fff38 , btl_register = 
>> 0x7fb6e82fff48 ,
>> 
>>   btl_finalize = 0x7fb6e82fff48 , btl_alloc = 0x7fb6e82fff58 
>> ,
>> 
>>   btl_free = 0x7fb6e82fff58 , btl_prepare_src = 
>> 0x7fb6e82fff68 ,
>> 
>>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send = 
>> 0x7fb6e82fff78 ,
>> 
>>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88 
>> ,
>> 
>>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98 
>> ,
>> 
>>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8 
>> ,
>> 
>>   btl_ft_event = 0x7fb6e82fffa8 }
>> 
>> (gdb)
>> 
>>  
>> 
>>  
>> 
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman
>> Sent: Monday, February 10, 2014 4:23 AM
>> To: Open MPI Developers
>> Subject: [OMPI devel] 1.7.5 fails on simple test
>> 
>>  
>> 
>>  
>>  
>> $/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>  -np 8 -mca pml ob1 -mca btl self,tcp 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi
>> [vegas12:12724] *** Process received signal ***
>> [vegas12:12724] Signal: Segmentation fault (11)
>> [vegas12:12724] Signal code:  (128)
>> [vegas12:12724] Failing at address: (nil)
>> [vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]
>> [vegas12:12724] [ 1] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]
>> [vegas12:12724] [ 2] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778e14a7]
>> [vegas12:12724] [ 3] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x73ded6f2]
>> [vegas12:12724] [ 4] 
>

Re: [OMPI devel] Speedup for MPI_Dims_create()

2014-02-10 Thread Andreas Schäfer
Jeff-

I've seen that you've reverted the patch as it was faulty. Sorry about
that! I've attached a new patch, which applies against the current
trunk. The problem with the last patch was that it didn't catch a
special case: of all prime factors of n, there may be at most one
larger than sqrt(n). The old patch assumed that there was none. I've
included a comment in the source code so that this becomes clear for
later readers.

The attached patch is more complicated than the original code, as we
now need to calculate the prime numbers and the number of their
occurrences in the integer factorization simultaneously. We can't
split both (as in the trunk) anymore, as the last prime might only be
discovered during the original getfactors().

I've tested this code back to back with the original code with
1...1 nodes and 1...6 dimensions, just to be on the sure side this
time.

Best
-Andreas


On 19:32 Mon 03 Feb , Jeff Squyres (jsquyres) wrote:
> Andreas --
> 
> I added the sqrt() change, which is the most important change, and then did a 
> 2nd commit with the whitespace cleanup.  The sqrt change will likely be in 
> 1.7.5.  I credited you in the commit log; you'll likely also get credited in 
> NEWS.
> 
> Thank you for the patch!
> 
> 
> On Dec 19, 2013, at 9:37 AM, Andreas Schäfer  wrote:
> 
> > Dear all,
> > 
> > please find attached a (trivial) patch to MPI_Dims_create(). When
> > computing the prime factors of nnodes, it is sufficient to check for
> > primes less or equal to sqrt(nnodes).
> > 
> > This was not so much of a problem in the past, but now that Tier 0
> > systems are capable of running O(10^6) MPI processes, the difference
> > in execution time is on the order of seconds (e.g. 8.86s vs. 0.04s on
> > my notebook, with nnproc = 10^6).
> > 
> > Best
> > -Andreas
> > 
> > PS: oh, and the patch removes some trailing whitespace. Yuck. :-)
> > 
> > 
> > -- 
> > ==
> > Andreas Schäfer
> > HPC and Grid Computing
> > Chair of Computer Science 3
> > Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> > +49 9131 85-27910
> > PGP/GPG key via keyserver
> > http://www.libgeodecomp.org
> > ==
> > 
> > (\___/)
> > (+'.'+)
> > (")_(")
> > This is Bunny. Copy and paste Bunny into your
> > signature to help him gain world domination!
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
==
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!
Index: ompi/mpi/c/dims_create.c
===
--- ompi/mpi/c/dims_create.c	(revision 30654)
+++ ompi/mpi/c/dims_create.c	(working copy)
@@ -10,7 +10,9 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
  * Copyright (c) 2012  Los Alamos National Security, LLC.  All rights
- * reserved. 
+ * reserved.
+ * Copyright (c) 2014  Friedrich-Alexander-Universitaet Erlangen-Nuernberg,
+ * All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -20,6 +22,8 @@
 
 #include "ompi_config.h"
 
+#include 
+
 #include "ompi/mpi/c/bindings.h"
 #include "ompi/runtime/params.h"
 #include "ompi/communicator/communicator.h"
@@ -37,8 +41,7 @@
 
 /* static functions */
 static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims);
-static int getfactors(int num, int nprime, int *primes, int **pcounts);
-static int getprimes(int num, int *pnprime, int **pprimes);
+static int getprimefactors(int num, int *nfactors, int **pprimes, int **pcounts);
 
 
 /*
@@ -50,7 +53,7 @@
 int i;
 int freeprocs;
 int freedims;
-int nprimes;
+int nfactors;
 int *primes;
 int *factors;
 int *procs;
@@ -108,20 +111,14 @@
 return MPI_SUCCESS;
 }
 
-/* Compute the relevant prime numbers for factoring */
-if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) {
-   return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
- FUNC_NAME);
-}
-
 /* Factor the number of free processes */

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Andreas Schäfer
Christoph-

your patch has the same problem as my original patch: indeed there may
be a prime factor p of n with p > sqrt(n). What's important is that
there may only be at most one. I've submitted an updated patch (see my
previous mail) which catches this special case.

Best
-Andreas


On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> Hello,
> 
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and 
> r30540) which were reverted.
> 
> So I decided to give it a short review based on r30606
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> 
> 
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the 
> MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> 0001):
> 
> 99if (freeprocs < 1) {
> 100  return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101FUNC_NAME);
> 102   }
> 
> 
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any 
> number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> 
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 100
> time used for MPI_Dims_create(100, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 100
> time used for MPI_Dims_create(100, 3, {}): 0.060400
> 
> 
> 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> reported by tcmalloc/pprof on a test program - but it may sum up with fatter 
> nodes. :P
> 
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
>  0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
>  0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
>  0.0  -0.0% -18.8% -1.6 100.0% main
> -1.9 118.8% 100.0% -1.9 118.8% getprimes
> 
> Find attached patch for it in 0003.
> 
> 
> If there are no issues I would like to commit this to trunk for further 
> testing (+cmr for 1.7.5?) end of this week.
> 
> Best regards
> Christoph
> 
> [1] 
> http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> 
> 
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer

> From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
> From: Christoph Niethammer 
> Date: Mon, 10 Feb 2014 16:44:03 +0100
> Subject: [PATCH 1/3] Move parameter check into appropriate code section at the
>  begin.
> 
> ---
>  ompi/mpi/c/dims_create.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> index d2c3858..3d0792f 100644
> --- a/ompi/mpi/c/dims_create.c
> +++ b/ompi/mpi/c/dims_create.c
> @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
>  return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, 
> MPI_ERR_DIMS, FUNC_NAME);
>  }
> +
> +if (1 > nnodes) {
> +return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
> +   MPI_ERR_DIMS, FUNC_NAME);
> +}
>  }
>  
>  /* Get # of free-to-be-assigned processes and # of free dimensions */
> @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
>   FUNC_NAME);
>  }
>  
> -if (freeprocs < 1) {
> -   return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> - FUNC_NAME);
> -}
> -else if (freeprocs == 1) {
> +if (freeprocs == 1) {
>  for (i = 0; i < ndims; ++i, ++dims) {
>  if (*dims == 0) {
> *dims = 1;
> -- 
> 1.8.3.2
> 

> From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001
> From: Christoph Niethammer 
> Date: Mon, 10 Feb 2014 18:50:51 +0100
> Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact
>  that the largest prime factor of any number n cannot exceed \sqrt{n}.
> 
> ---
>  ompi/mpi/c/dims_create.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> index 3d0792f..1c1c381 100644
> --- a/ompi/mpi/c/dims_create.c

Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Paul Hargrove
The fastest of my systems that failed over the weekend (a ppc64)  has
completed tests successfully.
I will report on the ppc32 and SPARC results when they have all passed or
failed.

-Paul


On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain  wrote:

> Tarball is now posted
>
> On Feb 10, 2014, at 1:31 PM, Ralph Castain  wrote:
>
> Generating it now - sorry for my lack of response, my OMPI email was down
> for some reason. I can now receive it, but still haven't gotten the backlog
> from the down period.
>
>
> On Feb 10, 2014, at 1:23 PM, Paul Hargrove  wrote:
>
> Ralph,
>
> If you give me a heads-up when this makes it into a tarball, I will retest
> my failing ppc and sparc platforms.
>
> -Paul
>
>
> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart 
> wrote:
>
>> I have tracked this down.  There is a missing commit that affects
>> ompi_mpi_init.c causing it to initialize bml twice.
>>
>> Ralph, can you apply r30310 to 1.7?
>>
>>
>>
>> Thanks,
>>
>> Rolf
>>
>>
>>
>> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf
>> vandeVaart
>> *Sent:* Monday, February 10, 2014 12:29 PM
>> *To:* Open MPI Developers
>> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test
>>
>>
>>
>> I have seen this same issue although my core dump is a little bit
>> different.  I am running with tcp,self.  The first entry in the list of
>> BTLs is garbage, but then there is tcp and self in the list.   Strange.
>> This is my core dump.  Line 208 in bml_r2.c is where I get the SEGV.
>>
>>
>>
>> Program terminated with signal 11, Segmentation fault.
>>
>> #0  0x7fb6dec981d0 in ?? ()
>>
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.12-1.107.el6_4.5.x86_64
>>
>> (gdb) where
>>
>> #0  0x7fb6dec981d0 in ?? ()
>>
>> #1  
>>
>> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
>>
>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>> procs=0x2061440, reachable=0x7fff80487b40)
>>
>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>
>> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0,
>> nprocs=2)
>>
>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
>>
>> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158,
>> requested=0, provided=0x7fff80487cc8)
>>
>> at ../../ompi/runtime/ompi_mpi_init.c:776
>>
>> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c,
>> argv=0x7fff80487d80) at pinit.c:84
>>
>> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at
>> MPI_Isend_ator_c.c:143
>>
>> (gdb)
>>
>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>> procs=0x2061440, reachable=0x7fff80487b40)
>>
>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>
>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs,
>> btl_endpoints, reachable);
>>
>> (gdb) print *btl
>>
>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984,
>> btl_rndv_eager_limit = 140423556235000,
>>
>>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length =
>> 140423556235016,
>>
>>   btl_rdma_pipeline_frag_size = 140423556235016,
>> btl_min_rdma_pipeline_size = 140423556235032,
>>
>>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth =
>> 3895459624, btl_flags = 32694,
>>
>>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38
>> ,
>>
>>   btl_del_procs = 0x7fb6e82fff38 , btl_register =
>> 0x7fb6e82fff48 ,
>>
>>   btl_finalize = 0x7fb6e82fff48 , btl_alloc =
>> 0x7fb6e82fff58 ,
>>
>>   btl_free = 0x7fb6e82fff58 , btl_prepare_src =
>> 0x7fb6e82fff68 ,
>>
>>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send =
>> 0x7fb6e82fff78 ,
>>
>>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88
>> ,
>>
>>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98
>> ,
>>
>>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8
>> ,
>>
>>   btl_ft_event = 0x7fb6e82fffa8 }
>>
>> (gdb)
>>
>>
>>
>>
>>
>> *From:* devel [mailto:devel-boun...@open-mpi.org]
>> *On Behalf Of *Mike Dubman
>> *Sent:* Monday, February 10, 2014 4:23 AM
>> *To:* Open MPI Developers
>> *Subject:* [OMPI devel] 1.7.5 fails on simple test
>>
>>
>>
>>
>>
>>
>>
>> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>  -np 8 -mca pml ob1 -mca btl self,tcp 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi*
>>
>> *[vegas12:12724] *** Process received signal 
>>
>> *[vegas12:12724] Signal: Segmentation fault (11)*
>>
>> *[vegas12:12724] Signal code:  (128)*
>>
>> *[vegas12:12724] Failing at address: (nil)*
>>
>> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*
>>
>> *[vegas12:12724] [ 1] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_component_init+0x583)[0x7395f813]*
>>
>> *[vegas12:12724] [ 2] 
>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/lib/libmpi.so.1(mca_btl_base_select+0x117)[0x778

Re: [OMPI devel] RFC: optimize probe in ob1

2014-02-10 Thread George Bosilca
Nathan,

While this sounds like an optimization for highly specific application 
behavior, it is justifiable under some usage scenarios. I have several issues 
with the patch. Here are the minor ones:

1. It does modifications that are nor necessary to the patch itself (as an 
example removal of the static keyword from the mca_pml_ob1_comm_proc_t class 
instance).

2. Moving add_fragment_to_unexpected change the meaning of the code.

3. If this change get pushed in to the trunk, the only reason for the existence 
of last_probed disappear. Thus, the variable should disappear as well.

4. The last part of the patch is not related to this topic and should be pushed 
separately.

Now the most major one. With this change you alter the most performance 
critical piece of code, by adding a non negligible number of potential cache 
misses (looking for the number of elements, adding/removing an element from a 
queue). This deserve a careful evaluation and consideration, not only for the 
less likely usage pattern you describe but for the more mainstream uses.

  George.

On Feb 7, 2014, at 23:01 , Nathan Hjelm  wrote:

> What: The current probe algorithm in ob1 is linear with respect to the
> number or processes in the job. I wish to change the algorithm to be
> linear in the number of processes with unexpected messages. To do this I
> added an additional opal_list_t to the ob1 communicator and made the ob1
> process a list_item_t. When an unexpected message comes in on a proc it
> is added to that proc's unexpected message queue and the proc is added
> to the communicator's list of procs with unexpected messages
> (unexpected_procs) if it isn't already on that list. When matching a
> probe request this list is used to determine which procs to look at to
> find an unexpected message. The new list is protected by the matching
> lock so no extra locking is needed.
> 
> Why: I have a benchmark that makes heavy use of MPI_Iprobe in one if its
> phases. I discovered that the primary reason this benchmark was running
> slow with Open MPI is the probe algorithm.
> 
> When: This is another simple optimization. It only affects the
> unexpected message path and will speed up probe requests. This is
> intended to go into 1.7.5. Setting the timeout to next Tuesday (which
> gives me time to verify the improvment at scale-- 131,000 PEs).
> 
> See the attached patch.
> 
> -Nathan
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: optimize probe in ob1

2014-02-10 Thread Nathan Hjelm
On Tue, Feb 11, 2014 at 12:29:57AM +0100, George Bosilca wrote:
> Nathan,
> 
> While this sounds like an optimization for highly specific application 
> behavior, it is justifiable under some usage scenarios. I have several issues 
> with the patch. Here are the minor ones:
> 
> 1. It does modifications that are nor necessary to the patch itself (as an 
> example removal of the static keyword from the mca_pml_ob1_comm_proc_t class 
> instance).

Yeah. Not really part of the RFC. I should have removed it from the
patch. That static modifier appears to be meaningless in that context.

> 2. Moving add_fragment_to_unexpected change the meaning of the code.

The location look wrong to me. A peruse receive event may be generated
multiple times the way it was before. Doesn't matter anymore though as
peruse is on its way out.

> 3. If this change get pushed in to the trunk, the only reason for the 
> existence of last_probed disappear. Thus, the variable should disappear as 
> well.

I agree. That variable should go away. I will remove it from my branch now.

> 4. The last part of the patch is not related to this topic and should be 
> pushed separately.

Bah. That shouldn't have been there either. That is a separate issue I
can fix in another commit.

> Now the most major one. With this change you alter the most performance 
> critical piece of code, by adding a non negligible number of potential cache 
> misses (looking for the number of elements, adding/removing an element from a 
> queue). This deserve a careful evaluation and consideration, not only for the 
> less likely usage pattern you describe but for the more mainstream uses.

I agree that this should be reviewed carefully. A majority of the
changes are in the unexpected message path and not in the critical path
but due to the nature of icache misses it may still have an impact. I
verified there was no impact on one system using vader and a ping-pong
benchmark. I still need to verify there is no impact to message rate
both on and off node as well as verify there is no impact on other
architectures (AMD for example is very sensitive to changes outside the
critical path).

Thanks for your comments George!

-Nathan


pgpIu78bGVbph.pgp
Description: PGP signature


[OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset

2014-02-10 Thread Jeff Squyres (jsquyres)
WHAT: On trunk, force MPI_Count/MPI_Offset to be 32 bits when building in 32 
bit mode (they are currently 64 bit, even in a 32 bit build).  On v1.7, leave 
the sizes at 64 bit (for ABI reasons), but put error checking in the MPI API 
layer to ensure we won't over/underflow 32 bits.

WHY: See ticket #4205 (https://svn.open-mpi.org/trac/ompi/ticket/4205)

WHERE: On trunk, this can be solved entirely in configury.  In v1.7/v1.8, make 
changes in the MPI API layer (e.g., check MPI_Send to ensure 
(count*size_of_datatype)<2B)

TIMEOUT: I'll tentatively say next Tuesday teleconf, Feb 18, 2014, but it can 
be pushed back -- there's no real rush; this isn't a hot issue (but it is wrong 
and should be fixed).

MORE DETAIL:

I noticed that MPI_Get_elements_x() and MPI_Type_size_x() were giving wrong 
answers when compiled in 32 bit mode on a 64 bit machine.  This is because in 
that build:

- size_t: 4 bytes
- ptrdiff_t: 4 bytes
- MPI_Aint: 4 bytes
- MPI_Offset: 8 bytes
- MPI_Count: 8 bytes

Some data points:

1. MPI-3 says that MPI_Count must be big enough to hold both an MPI_Aint and 
MPI_Offset.

2. The entire PML/BML/BTL/convertor infrastructure uses size_t as its 
underlying computation type.

3. The _x tests were failing in 32 bit builds because they take 
(count,datatype) input that intentionally results in a number of bytes that is 
larger than 2 billion, assigned that value to a size_t (which is 32 bits), 
caused an overflow, and therefore got the wrong answer.

To solve this:

- On the trunk, we can just not allow MPI_Count (and therefore MPI_Offset) to 
be larger than size_t.  This means that on 32 bit builds -- on both 32 and 64 
bit systems -- sizeof(MPI_Aint) == sizeof(MPI_Offset) == sizeof(MPI_Count) == 
4.  There is a patch for this on #4205.

- Because of ABI issues, we cannot change the size of MPI_Count/MPI_Offset on 
v1.7, so we can just check for over/underflow in the MPI API.  For example, we 
can check that (count * size_of_datatype) < 2 billion (other checks will also 
be necessary; this is just an example).  I have no patch for this yet.

As a side effect, this means that -- for 32 bit builds -- we will not support 
large filesystems well (e.g., filesystems with 64 bit offsets).  BlueGene is an 
example of such a system (not that OMPI supports BlueGene, but...).  
Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits.  I 
don't think that this is a major issue, because 32 bit builds are not a huge 
issue for the OMPI community, but I raise the point in the spirit of full 
disclosure.  Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset and 
MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor infrastructure 
to use something other than size_t, and I have zero desire to do that!  
(please, no OMPI vendor reveal that they're going to seriously build giant 32 
bit systems...)

Also, while investigating this issue, I discovered that the configury for 
determining the Fortran MPI_ADDRESS_KIND, MPI_OFFSET_KIND, and MPI_COUNT_KIND 
values were unrelated to the C types that we discovered for these concepts.  
The patch on #4205 fixes this issue as well -- the Fortran MPI_*_KIND value are 
now directly correlated with the C types that were discovered.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Christoph Niethammer
Hello,

If you mean the current version in the ompi-tests/ibm svn repository I can 
confirm that it passes the topolgy/dimscreate test without errors. :)

The difference in the patches is as follows: The patch from Andreas only 
generated a table of prime numbers of up to sqrt(freeprocs) while my patch 
still produces prime numbers up to freeprocs. And for factoring we really need 
all factors up to freeprocs. The standard sqrt optimization was just introduced 
in the wrong place. :)

You are right with #3: It's a better approximation for the upper bound and the 
proof is something to be read under the Christmas tree. ;)
I just have to rethink if the ceil() is necessary in the code as I am not sure 
about rounding issues in floating point calculations here... :P

Regarding your questions:
1.) I don't think we have to cache prime numbers as MPI_Dims create will not be 
used frequently for factorization. If anybody needs faster factorization he 
would use his own - even more optimized - code. If you find some free time 
beside Open MPI go out for some harder problems at http://projecteuler.net. But 
please don't get frustrated from the assembler solutions. ;)

2.) Interesting idea: Using the approximation from the cited paper we should 
only need around 400 MB to store all primes in the int32 range. Potential for 
applying compression techniques still present. ^^

Regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



- Ursprüngliche Mail -
Von: "Jeff Squyres (jsquyres)" 
An: "Open MPI Developers" 
Gesendet: Montag, 10. Februar 2014 20:12:08
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

Nice!  Can you verify that it passes the ibm test?  I didn't look closely, and 
to be honest, I'm not sure why the previous improvement broke the IBM test 
because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).

I think patch 1 is a no-brainer.  I'm not sure about #2 because I'm not sure 
how it's different than the previous one, nor did I have time to investigate 
why the previous one broke the IBM test.  #3 seems like a good idea, too; I 
did't check the paper, but I assume it's some kind of proof about the upper 
limit on the number of primes in a given range.

Two questions:

1. Should we cache generated prime numbers?  (if so, it'll have to be done in a 
thread-safe way)

2. Should we just generate prime numbers and hard-code them into a table that 
is compiled into the code?  We would only need primes up to the sqrt of 
2billion (i.e., signed int), right?  I don't know how many that is -- if it's 
small enough, perhaps this is the easiest solution.



On Feb 10, 2014, at 1:30 PM, Christoph Niethammer  wrote:

> Hello,
> 
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and 
> r30540) which were reverted.
> 
> So I decided to give it a short review based on r30606
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> 
> 
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the 
> MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> 0001):
> 
> 99if (freeprocs < 1) {
> 100  return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101FUNC_NAME);
> 102   }
> 
> 
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any 
> number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> 
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 100
> time used for MPI_Dims_create(100, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 100
> time used for MPI_Dims_create(100, 3, {}): 0.060400
> 
> 
> 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> reported by tcmalloc/pprof on a test program - but it may sum up with fatter 
> nodes. :P
> 
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
> 0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
> 0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
> 0.0  -0.0% -18.8% -1.6 100.0% main
>-1.9 118.8% 100.0% -1.9 118.8% getprimes
> 
> Find attached patch for it in 0003

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Jeff Squyres (jsquyres)
On Feb 10, 2014, at 7:22 PM, Christoph Niethammer  wrote:

> 2.) Interesting idea: Using the approximation from the cited paper we should 
> only need around 400 MB to store all primes in the int32 range. Potential for 
> applying compression techniques still present. ^^

Per Andreas' last mail, we only need primes up to sqrt(2B) + 1 more.  That 
*has* to be less than 400MB... right?

sqrt(2B) = 46340.  So the upper limit on the size required to hold all the 
primes from 2...46340 is 46340*sizeof(int) = 185,360 bytes (plus one more, per 
Andreas, so 185,364).

This is all SWAGing, but I'm assuming the actual number must be *far* less than 
that...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Speedup for MPI_Dims_create()

2014-02-10 Thread Jeff Squyres (jsquyres)
Cool.

See the other thread where I'm wondering if we shouldn't just pre-generate all 
the primes, hard-code them into a table, and be done with this issue.


On Feb 10, 2014, at 5:19 PM, Andreas Schäfer  wrote:

> Jeff-
> 
> I've seen that you've reverted the patch as it was faulty. Sorry about
> that! I've attached a new patch, which applies against the current
> trunk. The problem with the last patch was that it didn't catch a
> special case: of all prime factors of n, there may be at most one
> larger than sqrt(n). The old patch assumed that there was none. I've
> included a comment in the source code so that this becomes clear for
> later readers.
> 
> The attached patch is more complicated than the original code, as we
> now need to calculate the prime numbers and the number of their
> occurrences in the integer factorization simultaneously. We can't
> split both (as in the trunk) anymore, as the last prime might only be
> discovered during the original getfactors().
> 
> I've tested this code back to back with the original code with
> 1...1 nodes and 1...6 dimensions, just to be on the sure side this
> time.
> 
> Best
> -Andreas
> 
> 
> On 19:32 Mon 03 Feb , Jeff Squyres (jsquyres) wrote:
>> Andreas --
>> 
>> I added the sqrt() change, which is the most important change, and then did 
>> a 2nd commit with the whitespace cleanup.  The sqrt change will likely be in 
>> 1.7.5.  I credited you in the commit log; you'll likely also get credited in 
>> NEWS.
>> 
>> Thank you for the patch!
>> 
>> 
>> On Dec 19, 2013, at 9:37 AM, Andreas Schäfer  wrote:
>> 
>>> Dear all,
>>> 
>>> please find attached a (trivial) patch to MPI_Dims_create(). When
>>> computing the prime factors of nnodes, it is sufficient to check for
>>> primes less or equal to sqrt(nnodes).
>>> 
>>> This was not so much of a problem in the past, but now that Tier 0
>>> systems are capable of running O(10^6) MPI processes, the difference
>>> in execution time is on the order of seconds (e.g. 8.86s vs. 0.04s on
>>> my notebook, with nnproc = 10^6).
>>> 
>>> Best
>>> -Andreas
>>> 
>>> PS: oh, and the patch removes some trailing whitespace. Yuck. :-)
>>> 
>>> 
>>> -- 
>>> ==
>>> Andreas Schäfer
>>> HPC and Grid Computing
>>> Chair of Computer Science 3
>>> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
>>> +49 9131 85-27910
>>> PGP/GPG key via keyserver
>>> http://www.libgeodecomp.org
>>> ==
>>> 
>>> (\___/)
>>> (+'.'+)
>>> (")_(")
>>> This is Bunny. Copy and paste Bunny into your
>>> signature to help him gain world domination!
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> ==
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==
> 
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: optimize probe in ob1

2014-02-10 Thread George Bosilca

On Feb 11, 2014, at 01:05 , Nathan Hjelm  wrote:

> On Tue, Feb 11, 2014 at 12:29:57AM +0100, George Bosilca wrote:
>> Nathan,
>> 
>> While this sounds like an optimization for highly specific application 
>> behavior, it is justifiable under some usage scenarios. I have several 
>> issues with the patch. Here are the minor ones:
>> 
>> 1. It does modifications that are nor necessary to the patch itself (as an 
>> example removal of the static keyword from the mca_pml_ob1_comm_proc_t class 
>> instance).
> 
> Yeah. Not really part of the RFC. I should have removed it from the
> patch. That static modifier appears to be meaningless in that context.

The class is only usable in the context of a single .c file. As a code 
protection it makes perfect sense to me.

>> 2. Moving add_fragment_to_unexpected change the meaning of the code.
> 
> The location look wrong to me. A peruse receive event may be generated
> multiple times the way it was before. Doesn't matter anymore though as
> peruse is on its way out.

It’s not yet, and I did not notice an RFC about. The event I was referring to 
is only generated when the message is first noticed. In the particular instance 
affected by your patch it has been delayed until the communicator is created 
locally, but it still have to be generated once. 

>> 3. If this change get pushed in to the trunk, the only reason for the 
>> existence of last_probed disappear. Thus, the variable should disappear as 
>> well.
> 
> I agree. That variable should go away. I will remove it from my branch now.
> 
>> 4. The last part of the patch is not related to this topic and should be 
>> pushed separately.
> 
> Bah. That shouldn't have been there either. That is a separate issue I
> can fix in another commit.
> 
>> Now the most major one. With this change you alter the most performance 
>> critical piece of code, by adding a non negligible number of potential cache 
>> misses (looking for the number of elements, adding/removing an element from 
>> a queue). This deserve a careful evaluation and consideration, not only for 
>> the less likely usage pattern you describe but for the more mainstream uses.
> 
> I agree that this should be reviewed carefully. A majority of the
> changes are in the unexpected message path and not in the critical path
> but due to the nature of cache misses it may still have an impact.

The size check and the removal from the list is still in the critical path. At 
some point we were down to few hundreds of nano-sec, enough to get bugged by 
one extra memory reference.

  George.


> I verified there was no impact on one system using vader and a ping-pong
> benchmark. I still need to verify there is no impact to message rate
> both on and off node as well as verify there is no impact on other
> architectures (AMD for example is very sensitive to changes outside the
> critical path).
> 
> Thanks for your comments George!
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Christoph Niethammer
Hi Andreas,

As mentioned in my former mail I did not touch the factorization code.
But to figure out if a number n is *not* a prime number it is sufficient to 
check up to \sqrt(n).
Proof:
let n = p*q with q > \sqrt{n}
--> p < \sqrt(n)
So we have already found factor p before reaching \sqrt(n) and by this n is no 
prime any more and no need for further checks. ;)


The mentioned factorization may indeed include one factor which is larger than 
\sqrt(n). :)

Proof that at least one prime factor can be larger than \sqrt(n) example:
6 = 2*3
\sqrt(6) = 2.4494897427832... < 3   Q.E.D.


Proof that no more than one factor can be larger than \sqrt(n):
let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
--> 1 > \prod_{i=2}^K p_i
which is a contradiction as all p_i \in N.  Q.E.D.


So your idea is still applicable with not much effort and we only need prime 
factors up to sqrt(n) in the factorizer code for an additional optimization. :)

First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} p_i 
we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No 
complication with counts IMHO. I leave this without patch as it is already 2:30 
in the morning. :P

Regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer

- Ursprüngliche Mail -
Von: "Andreas Schäfer" 
An: "Open MPI Developers" 
Gesendet: Montag, 10. Februar 2014 23:24:24
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

Christoph-

your patch has the same problem as my original patch: indeed there may
be a prime factor p of n with p > sqrt(n). What's important is that
there may only be at most one. I've submitted an updated patch (see my
previous mail) which catches this special case.

Best
-Andreas


On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> Hello,
> 
> I noticed some effort in improving the scalability of
> MPI_Dims_create(int nnodes, int ndims, int dims[])
> Unfortunately there were some issues with the first attempt (r30539 and 
> r30540) which were reverted.
> 
> So I decided to give it a short review based on r30606
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> 
> 
> 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> freeprocs have all positive integers as divisor.
> So IMHO it would make more sense to check if nnodes > 0 in the 
> MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> 0001):
> 
> 99if (freeprocs < 1) {
> 100  return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> 101FUNC_NAME);
> 102   }
> 
> 
> 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> *nprimes, int **pprimes)
> which makes mathematically more sens (as the largest prime factor of any 
> number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> (see patch 0002)
> Here the improvements:
> 
> module load mpi/openmpi/trunk-gnu.4.7.3
> $ ./mpi-dims-old 100
> time used for MPI_Dims_create(100, 3, {}): 8.104007
> module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> $ ./mpi-dims-new 100
> time used for MPI_Dims_create(100, 3, {}): 0.060400
> 
> 
> 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> reported by tcmalloc/pprof on a test program - but it may sum up with fatter 
> nodes. :P
> 
> $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> (pprof) top
> Total: -1.6 MB
>  0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
>  0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
>  0.0  -0.0% -18.8% -1.6 100.0% main
> -1.9 118.8% 100.0% -1.9 118.8% getprimes
> 
> Find attached patch for it in 0003.
> 
> 
> If there are no issues I would like to commit this to trunk for further 
> testing (+cmr for 1.7.5?) end of this week.
> 
> Best regards
> Christoph
> 
> [1] 
> http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> 
> 
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer

> From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
> From: Christoph Niethammer 
> Date: Mon, 10 Feb 2014 16:44:03 +0100
> Subject: [PATCH 1/3] Move parameter check into appropriate code section at the
>  begin.
> 
> ---
>  ompi/mpi/c/dims_create.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/ompi/mpi/c/dims

[OMPI devel] oshmem test suite

2014-02-10 Thread Jeff Squyres (jsquyres)
The Fortran programs in the oshmem test suite don't compile because my_pe and 
num_pes are already declared in OMPI's shmem.fh.

To be fair, I asked Mellanox to put those declarations in shmem.fh because I 
thought it was crazy that all applications would have to declare them.

Apparently, the shmem community is crazy.  :-\

So I'll rescind my previous recommendation (even though I still think it's the 
Right way to go).  I'll remove the "integer my_pe, num_pes" declarations from 
shmem.fh, and put the declarations back in the shmem examples we have in 
examples/.

I still think it's crazy, but if the openshmem people are doing this in all 
their test programs, I assume it's good representation of what the shmem 
community itself is doing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-10 Thread Christoph Niethammer
sqrt(2^31)/log(sqrt(2^31))*(1+1.2762/log(sqrt(2^31)))/1024 * 4byte = 
18,850133965051 kbyte should do it. ;)
Amazing - I think our systems are still *too small* - lets go for MPI with 
int64 types. ^^

- Ursprüngliche Mail -
Von: "Jeff Squyres (jsquyres)" 
An: "Open MPI Developers" 
Gesendet: Dienstag, 11. Februar 2014 01:32:53
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

On Feb 10, 2014, at 7:22 PM, Christoph Niethammer  wrote:

> 2.) Interesting idea: Using the approximation from the cited paper we should 
> only need around 400 MB to store all primes in the int32 range. Potential for 
> applying compression techniques still present. ^^

Per Andreas' last mail, we only need primes up to sqrt(2B) + 1 more.  That 
*has* to be less than 400MB... right?

sqrt(2B) = 46340.  So the upper limit on the size required to hold all the 
primes from 2...46340 is 46340*sizeof(int) = 185,360 bytes (plus one more, per 
Andreas, so 185,364).

This is all SWAGing, but I'm assuming the actual number must be *far* less than 
that...

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] 1.7.5 fails on simple test

2014-02-10 Thread Paul Hargrove
All the platforms that failed over the weekend have passed today.

-Paul


On Mon, Feb 10, 2014 at 2:34 PM, Paul Hargrove  wrote:

> The fastest of my systems that failed over the weekend (a ppc64)  has
> completed tests successfully.
> I will report on the ppc32 and SPARC results when they have all passed or
> failed.
>
> -Paul
>
>
> On Mon, Feb 10, 2014 at 1:52 PM, Ralph Castain  wrote:
>
>> Tarball is now posted
>>
>> On Feb 10, 2014, at 1:31 PM, Ralph Castain  wrote:
>>
>> Generating it now - sorry for my lack of response, my OMPI email was down
>> for some reason. I can now receive it, but still haven't gotten the backlog
>> from the down period.
>>
>>
>> On Feb 10, 2014, at 1:23 PM, Paul Hargrove  wrote:
>>
>> Ralph,
>>
>> If you give me a heads-up when this makes it into a tarball, I will
>> retest my failing ppc and sparc platforms.
>>
>> -Paul
>>
>>
>> On Mon, Feb 10, 2014 at 1:13 PM, Rolf vandeVaart 
>> wrote:
>>
>>> I have tracked this down.  There is a missing commit that affects
>>> ompi_mpi_init.c causing it to initialize bml twice.
>>>
>>> Ralph, can you apply r30310 to 1.7?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Rolf
>>>
>>>
>>>
>>> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf
>>> vandeVaart
>>> *Sent:* Monday, February 10, 2014 12:29 PM
>>> *To:* Open MPI Developers
>>> *Subject:* Re: [OMPI devel] 1.7.5 fails on simple test
>>>
>>>
>>>
>>> I have seen this same issue although my core dump is a little bit
>>> different.  I am running with tcp,self.  The first entry in the list of
>>> BTLs is garbage, but then there is tcp and self in the list.   Strange.
>>> This is my core dump.  Line 208 in bml_r2.c is where I get the SEGV.
>>>
>>>
>>>
>>> Program terminated with signal 11, Segmentation fault.
>>>
>>> #0  0x7fb6dec981d0 in ?? ()
>>>
>>> Missing separate debuginfos, use: debuginfo-install
>>> glibc-2.12-1.107.el6_4.5.x86_64
>>>
>>> (gdb) where
>>>
>>> #0  0x7fb6dec981d0 in ?? ()
>>>
>>> #1  
>>>
>>> #2  0x7fb6e82fff38 in main_arena () from /lib64/libc.so.6
>>>
>>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>>> procs=0x2061440, reachable=0x7fff80487b40)
>>>
>>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>>
>>> #4  0x7fb6df50a751 in mca_pml_ob1_add_procs (procs=0x2060bc0,
>>> nprocs=2)
>>>
>>> at ../../../../../ompi/mca/pml/ob1/pml_ob1.c:332
>>>
>>> #5  0x7fb6e8570dca in ompi_mpi_init (argc=1, argv=0x7fff80488158,
>>> requested=0, provided=0x7fff80487cc8)
>>>
>>> at ../../ompi/runtime/ompi_mpi_init.c:776
>>>
>>> #6  0x7fb6e85a3606 in PMPI_Init (argc=0x7fff80487d8c,
>>> argv=0x7fff80487d80) at pinit.c:84
>>>
>>> #7  0x00401c56 in main (argc=1, argv=0x7fff80488158) at
>>> MPI_Isend_ator_c.c:143
>>>
>>> (gdb)
>>>
>>> #3  0x7fb6e4103de2 in mca_bml_r2_add_procs (nprocs=2,
>>> procs=0x2061440, reachable=0x7fff80487b40)
>>>
>>> at ../../../../../ompi/mca/bml/r2/bml_r2.c:208
>>>
>>> 208 rc = btl->btl_add_procs(btl, n_new_procs, new_procs,
>>> btl_endpoints, reachable);
>>>
>>> (gdb) print *btl
>>>
>>> $1 = {btl_component = 0x7fb6e82ffee8, btl_eager_limit = 140423556234984,
>>> btl_rndv_eager_limit = 140423556235000,
>>>
>>>   btl_max_send_size = 140423556235000, btl_rdma_pipeline_send_length =
>>> 140423556235016,
>>>
>>>   btl_rdma_pipeline_frag_size = 140423556235016,
>>> btl_min_rdma_pipeline_size = 140423556235032,
>>>
>>>   btl_exclusivity = 3895459608, btl_latency = 32694, btl_bandwidth =
>>> 3895459624, btl_flags = 32694,
>>>
>>>   btl_seg_size = 140423556235048, btl_add_procs = 0x7fb6e82fff38
>>> ,
>>>
>>>   btl_del_procs = 0x7fb6e82fff38 , btl_register =
>>> 0x7fb6e82fff48 ,
>>>
>>>   btl_finalize = 0x7fb6e82fff48 , btl_alloc =
>>> 0x7fb6e82fff58 ,
>>>
>>>   btl_free = 0x7fb6e82fff58 , btl_prepare_src =
>>> 0x7fb6e82fff68 ,
>>>
>>>   btl_prepare_dst = 0x7fb6e82fff68 , btl_send =
>>> 0x7fb6e82fff78 ,
>>>
>>>   btl_sendi = 0x7fb6e82fff78 , btl_put = 0x7fb6e82fff88
>>> ,
>>>
>>>   btl_get = 0x7fb6e82fff88 , btl_dump = 0x7fb6e82fff98
>>> ,
>>>
>>>   btl_mpool = 0x7fb6e82fff98, btl_register_error = 0x7fb6e82fffa8
>>> ,
>>>
>>>   btl_ft_event = 0x7fb6e82fffa8 }
>>>
>>> (gdb)
>>>
>>>
>>>
>>>
>>>
>>> *From:* devel 
>>> [mailto:devel-boun...@open-mpi.org]
>>> *On Behalf Of *Mike Dubman
>>> *Sent:* Monday, February 10, 2014 4:23 AM
>>> *To:* Open MPI Developers
>>> *Subject:* [OMPI devel] 1.7.5 fails on simple test
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *$/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/bin/mpirun
>>>  -np 8 -mca pml ob1 -mca btl self,tcp 
>>> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi_install1/examples/hello_usempi*
>>>
>>> *[vegas12:12724] *** Process received signal 
>>>
>>> *[vegas12:12724] Signal: Segmentation fault (11)*
>>>
>>> *[vegas12:12724] Signal code:  (128)*
>>>
>>> *[vegas12:12724] Failing at address: (nil)*
>>>
>>> *[vegas12:12724] [ 0] /lib64/libpthread.so.0[0x3937c0f500]*
>>>
>>> *[vegas12:12724] [ 1] 
>>>