Hello,
I noticed some effort in improving the scalability of
MPI_Dims_create(int nnodes, int ndims, int dims[])
Unfortunately there were some issues with the first attempt (r30539 and r30540)
which were reverted.
So I decided to give it a short review based on r30606
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
1.) freeprocs is initialized to be nnodes and the subsequent divisions of
freeprocs have all positive integers as divisor.
So IMHO it would make more sense to check if nnodes > 0 in the MPI_PARAM_CHECK
section at the begin instead of the following (see patch 0001):
99 if (freeprocs < 1) {
100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
101 FUNC_NAME);
102 }
2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int
*nprimes, int **pprimes)
which makes mathematically more sens (as the largest prime factor of any number
n cannot exceed \sqrt{n}) - and should produce the right result. ;)
(see patch 0002)
Here the improvements:
module load mpi/openmpi/trunk-gnu.4.7.3
$ ./mpi-dims-old 1000000
time used for MPI_Dims_create(1000000, 3, {}): 8.104007
module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
$ ./mpi-dims-new 1000000
time used for MPI_Dims_create(1000000, 3, {}): 0.060400
3.) Memory allocation for the list of prime numbers may be reduced up to a
factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
\pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1
Unfortunately this saves us only 1.6 MB per process for 1mio nodes as reported
by tcmalloc/pprof on a test program - but it may sum up with fatter nodes. :P
$ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
(pprof) top
Total: -1.6 MB
0.3 -18.8% -18.8% 0.3 -18.8% getprimes2
0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main
0.0 -0.0% -18.8% -1.6 100.0% main
-1.9 118.8% 100.0% -1.9 118.8% getprimes
Find attached patch for it in 0003.
If there are no issues I would like to commit this to trunk for further testing
(+cmr for 1.7.5?) end of this week.
Best regards
Christoph
[1] http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
--
Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart
Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer
From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
From: Christoph Niethammer <christoph.nietham...@hlrs.de>
Date: Mon, 10 Feb 2014 16:44:03 +0100
Subject: [PATCH 1/3] Move parameter check into appropriate code section at the
begin.
---
ompi/mpi/c/dims_create.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index d2c3858..3d0792f 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
MPI_ERR_DIMS, FUNC_NAME);
}
+
+ if (1 > nnodes) {
+ return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
+ MPI_ERR_DIMS, FUNC_NAME);
+ }
}
/* Get # of free-to-be-assigned processes and # of free dimensions */
@@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
FUNC_NAME);
}
- if (freeprocs < 1) {
- return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
- FUNC_NAME);
- }
- else if (freeprocs == 1) {
+ if (freeprocs == 1) {
for (i = 0; i < ndims; ++i, ++dims) {
if (*dims == 0) {
*dims = 1;
--
1.8.3.2
From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001
From: Christoph Niethammer <christoph.nietham...@hlrs.de>
Date: Mon, 10 Feb 2014 18:50:51 +0100
Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact
that the largest prime factor of any number n cannot exceed \sqrt{n}.
---
ompi/mpi/c/dims_create.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index 3d0792f..1c1c381 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -5,7 +5,7 @@
* Copyright (c) 2004-2005 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights
* reserved.
- * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
+ * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart,
* University of Stuttgart. All rights reserved.
* Copyright (c) 2004-2005 The Regents of the University of California.
* All rights reserved.
@@ -20,6 +20,8 @@
#include "ompi_config.h"
+#include <math.h>
+
#include "ompi/mpi/c/bindings.h"
#include "ompi/runtime/params.h"
#include "ompi/communicator/communicator.h"
@@ -293,13 +295,14 @@ getprimes(int num, int *pnprime, int **pprimes) {
primes[i++] = 2;
for (n = 3; n <= num; n += 2) {
- for (j = 1; j < i; ++j) {
+ double nsqrt = sqrt((double) n);
+ for(j = 0; (double) primes[j] <= nsqrt; j++) {
if ((n % primes[j]) == 0) {
break;
}
}
- if (j == i) {
+ if (primes[j] > nsqrt) {
if (i >= size) {
return MPI_ERR_DIMS;
}
--
1.8.3.2
From a012206cfbf7b8b22cef4cc5b811384e5e3ac32c Mon Sep 17 00:00:00 2001
From: Christoph Niethammer <christoph.nietham...@hlrs.de>
Date: Mon, 10 Feb 2014 19:02:03 +0100
Subject: [PATCH 3/3] Reduce memory usage by a better approximation for the
upper limit of the number of primes.
---
ompi/mpi/c/dims_create.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index 1c1c381..8dd3144 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -281,9 +281,17 @@ getprimes(int num, int *pnprime, int **pprimes) {
int n;
int size;
int *primes;
+ double lognum;
- /* Allocate the array of primes */
- size = (num / 2) + 1;
+ /* Allocate the array of primes
+ * see Pierre Dusart, Math. Comp. 68 (1999), no. 225, 411â415.*/
+ lognum = log(num);
+ if(num > 1) {
+ size = ceil(num/lognum * (1+1.2762/lognum));
+ }
+ else {
+ size = 1;
+ }
primes = (int *) malloc((unsigned) size * sizeof(int));
if (NULL == primes) {
return MPI_ERR_NO_MEM;
--
1.8.3.2