Hello,

After rethinking Jeff's comments about caching prime numbers I came to the 
conclusion that we can omit the prime numbers at all and go directly for the 
factorization. :D
We then only need at most   log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128 
byte   of memory for the factors.

Computational costs reduce as well as the factorization itself is done by a 
loop with at most \sqrt(num) / 2 iterations - which is the same as in the 
original prime number detection loop.
I think this is the cleanest way which reduces also source code size. ;)

Find attache patch against the trunk.

Best regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



----- Ursprüngliche Mail -----
Von: "Andreas Schäfer" <gent...@gmx.de>
An: "Open MPI Developers" <de...@open-mpi.org>
Gesendet: Dienstag, 11. Februar 2014 06:24:56
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

OK, so we were thinking the same thing :-) The optimization you
mention below is the same I've used in my updated patch.


On 02:29 Tue 11 Feb     , Christoph Niethammer wrote:
> As mentioned in my former mail I did not touch the factorization
> code.
> But to figure out if a number n is *not* a prime number it is sufficient to 
> check up to \sqrt(n).
> Proof:
> let n = p*q with q > \sqrt{n}
> --> p < \sqrt(n)
> So we have already found factor p before reaching \sqrt(n) and by this n is 
> no prime any more and no need for further checks. ;)
> 
> 
> The mentioned factorization may indeed include one factor which is larger 
> than \sqrt(n). :)
> 
> Proof that at least one prime factor can be larger than \sqrt(n) example:
> 6 = 2*3
> \sqrt(6) = 2.4494897427832... < 3   Q.E.D.
> 
> 
> Proof that no more than one factor can be larger than \sqrt(n):
> let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
> and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
> --> 1 > \prod_{i=2}^K p_i
> which is a contradiction as all p_i \in N.  Q.E.D.
> 
> 
> So your idea is still applicable with not much effort and we only need prime 
> factors up to sqrt(n) in the factorizer code for an additional optimization. 
> :)
> 
> First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} 
> p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No 
> complication with counts IMHO. I leave this without patch as it is already 
> 2:30 in the morning. :P
> 
> Regards
> Christoph
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> 
> ----- Ursprüngliche Mail -----
> Von: "Andreas Schäfer" <gent...@gmx.de>
> An: "Open MPI Developers" <de...@open-mpi.org>
> Gesendet: Montag, 10. Februar 2014 23:24:24
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> Christoph-
> 
> your patch has the same problem as my original patch: indeed there may
> be a prime factor p of n with p > sqrt(n). What's important is that
> there may only be at most one. I've submitted an updated patch (see my
> previous mail) which catches this special case.
> 
> Best
> -Andreas
> 
> 
> On 19:30 Mon 10 Feb     , Christoph Niethammer wrote:
> > Hello,
> > 
> > I noticed some effort in improving the scalability of
> > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > Unfortunately there were some issues with the first attempt (r30539 and 
> > r30540) which were reverted.
> > 
> > So I decided to give it a short review based on r30606
> > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > 
> > 
> > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> > freeprocs have all positive integers as divisor.
> > So IMHO it would make more sense to check if nnodes > 0 in the 
> > MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> > 0001):
> > 
> > 99      if (freeprocs < 1) {
> > 100        return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> > 101                                      FUNC_NAME);
> > 102     }
> > 
> > 
> > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> > *nprimes, int **pprimes)
> > which makes mathematically more sens (as the largest prime factor of any 
> > number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> > (see patch 0002)
> > Here the improvements:
> > 
> > module load mpi/openmpi/trunk-gnu.4.7.3
> > $ ./mpi-dims-old 1000000
> > time used for MPI_Dims_create(1000000, 3, {}): 8.104007
> > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> > $ ./mpi-dims-new 1000000
> > time used for MPI_Dims_create(1000000, 3, {}): 0.060400
> > 
> > 
> > 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> > \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> > reported by tcmalloc/pprof on a test program - but it may sum up with 
> > fatter nodes. :P
> > 
> > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> > (pprof) top
> > Total: -1.6 MB
> >      0.3 -18.8% -18.8%      0.3 -18.8% getprimes2
> >      0.0  -0.0% -18.8%     -1.6 100.0% __libc_start_main
> >      0.0  -0.0% -18.8%     -1.6 100.0% main
> >     -1.9 118.8% 100.0%     -1.9 118.8% getprimes
> > 
> > Find attached patch for it in 0003.
> > 
> > 
> > If there are no issues I would like to commit this to trunk for further 
> > testing (+cmr for 1.7.5?) end of this week.
> > 
> > Best regards
> > Christoph
> > 
> > [1] 
> > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> > 
> > 
> > 
> > --
> > 
> > Christoph Niethammer
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstrasse 19
> > 70569 Stuttgart
> > 
> > Tel: ++49(0)711-685-87203
> > email: nietham...@hlrs.de
> > http://www.hlrs.de/people/niethammer
> 
> > From e3292b90cac42fad80ed27a555419002ed61ab66 Mon Sep 17 00:00:00 2001
> > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > Date: Mon, 10 Feb 2014 16:44:03 +0100
> > Subject: [PATCH 1/3] Move parameter check into appropriate code section at 
> > the
> >  begin.
> > 
> > ---
> >  ompi/mpi/c/dims_create.c | 11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > index d2c3858..3d0792f 100644
> > --- a/ompi/mpi/c/dims_create.c
> > +++ b/ompi/mpi/c/dims_create.c
> > @@ -71,6 +71,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> >              return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD, 
> >                                             MPI_ERR_DIMS, FUNC_NAME);
> >          }
> > +
> > +        if (1 > nnodes) {
> > +            return OMPI_ERRHANDLER_INVOKE (MPI_COMM_WORLD,
> > +                                           MPI_ERR_DIMS, FUNC_NAME);
> > +        }
> >      }
> >  
> >      /* Get # of free-to-be-assigned processes and # of free dimensions */
> > @@ -95,11 +100,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
> >                                       FUNC_NAME);
> >      }
> >  
> > -    if (freeprocs < 1) {
> > -       return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> > -                                     FUNC_NAME);
> > -    }
> > -    else if (freeprocs == 1) {
> > +    if (freeprocs == 1) {
> >          for (i = 0; i < ndims; ++i, ++dims) {
> >              if (*dims == 0) {
> >                 *dims = 1;
> > -- 
> > 1.8.3.2
> > 
> 
> > From bc862c47ef8d581a8f6735c51983d6c9eeb95dfd Mon Sep 17 00:00:00 2001
> > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > Date: Mon, 10 Feb 2014 18:50:51 +0100
> > Subject: [PATCH 2/3] Speeding up detection of prime numbers using the fact
> >  that the largest prime factor of any number n cannot exceed \sqrt{n}.
> > 
> > ---
> >  ompi/mpi/c/dims_create.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > index 3d0792f..1c1c381 100644
> > --- a/ompi/mpi/c/dims_create.c
> > +++ b/ompi/mpi/c/dims_create.c
> > @@ -5,7 +5,7 @@
> >   * Copyright (c) 2004-2005 The University of Tennessee and The University
> >   *                         of Tennessee Research Foundation.  All rights
> >   *                         reserved.
> > - * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
> > + * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, 
> >   *                         University of Stuttgart.  All rights reserved.
> >   * Copyright (c) 2004-2005 The Regents of the University of California.
> >   *                         All rights reserved.
> > @@ -20,6 +20,8 @@
> >  
> >  #include "ompi_config.h"
> >  
> > +#include <math.h>
> > +
> >  #include "ompi/mpi/c/bindings.h"
> >  #include "ompi/runtime/params.h"
> >  #include "ompi/communicator/communicator.h"
> > @@ -293,13 +295,14 @@ getprimes(int num, int *pnprime, int **pprimes) {
> >     primes[i++] = 2;
> >  
> >     for (n = 3; n <= num; n += 2) {
> > -      for (j = 1; j < i; ++j) {
> > +      double nsqrt = sqrt((double) n);
> > +      for(j = 0; (double) primes[j] <= nsqrt; j++) {
> >           if ((n % primes[j]) == 0) {
> >               break;
> >            }
> >        }
> >  
> > -      if (j == i) {
> > +      if (primes[j] > nsqrt) {
> >          if (i >= size) {
> >             return MPI_ERR_DIMS;
> >           }
> > -- 
> > 1.8.3.2
> > 
> 
> > From a012206cfbf7b8b22cef4cc5b811384e5e3ac32c Mon Sep 17 00:00:00 2001
> > From: Christoph Niethammer <christoph.nietham...@hlrs.de>
> > Date: Mon, 10 Feb 2014 19:02:03 +0100
> > Subject: [PATCH 3/3] Reduce memory usage by a better approximation for the
> >  upper limit of the number of primes.
> > 
> > ---
> >  ompi/mpi/c/dims_create.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
> > index 1c1c381..8dd3144 100644
> > --- a/ompi/mpi/c/dims_create.c
> > +++ b/ompi/mpi/c/dims_create.c
> > @@ -281,9 +281,17 @@ getprimes(int num, int *pnprime, int **pprimes) {
> >     int n;
> >     int size;
> >     int *primes;
> > +   double lognum;
> >  
> > -   /* Allocate the array of primes */
> > -   size = (num / 2) + 1;
> > +   /* Allocate the array of primes
> > +    * see Pierre Dusart, Math. Comp. 68 (1999), no. 225, 411???415.*/
> > +   lognum = log(num);
> > +   if(num > 1) {
> > +       size = ceil(num/lognum * (1+1.2762/lognum));
> > +   }
> > +   else {
> > +       size = 1;
> > +   }
> >     primes = (int *) malloc((unsigned) size * sizeof(int));
> >     if (NULL == primes) {
> >         return MPI_ERR_NO_MEM;
> > -- 
> > 1.8.3.2
> > 
> 
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> ==========================================================
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==========================================================
> 
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
==========================================================
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==========================================================

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
From 19af07aac40d9e2b9c827ab043643b1e6e28fe0c Mon Sep 17 00:00:00 2001
From: Christoph Niethammer <christoph.nietham...@hlrs.de>
Date: Tue, 11 Feb 2014 11:58:30 +0100
Subject: [PATCH] Omit usage of pre calculated prime numbers and factorize
 directly.

---
 ompi/mpi/c/dims_create.c | 159 +++++++++++++++--------------------------------
 1 file changed, 50 insertions(+), 109 deletions(-)

diff --git a/ompi/mpi/c/dims_create.c b/ompi/mpi/c/dims_create.c
index 3d0792f..02a3413 100644
--- a/ompi/mpi/c/dims_create.c
+++ b/ompi/mpi/c/dims_create.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2004-2005 The University of Tennessee and The University
  *                         of Tennessee Research Foundation.  All rights
  *                         reserved.
- * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
+ * Copyright (c) 2004-2014 High Performance Computing Center Stuttgart, 
  *                         University of Stuttgart.  All rights reserved.
  * Copyright (c) 2004-2005 The Regents of the University of California.
  *                         All rights reserved.
@@ -20,6 +20,8 @@
 
 #include "ompi_config.h"
 
+#include <math.h>
+
 #include "ompi/mpi/c/bindings.h"
 #include "ompi/runtime/params.h"
 #include "ompi/communicator/communicator.h"
@@ -36,9 +38,8 @@
 static const char FUNC_NAME[] = "MPI_Dims_create";
 
 /* static functions */
-static int assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims);
-static int getfactors(int num, int nprime, int *primes, int **pcounts);
-static int getprimes(int num, int *pnprime, int **pprimes);
+static int assignnodes(int ndim, int nfactor, int *pfacts,int **pdims);
+static int getfactors(int num, int *nfators, int **factors);
 
 
 /*
@@ -50,8 +51,7 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
     int i;
     int freeprocs;
     int freedims;
-    int nprimes;
-    int *primes;
+    int nfactors;
     int *factors;
     int *procs;
     int *p;
@@ -109,20 +109,14 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
         return MPI_SUCCESS;
     }
 
-    /* Compute the relevant prime numbers for factoring */
-    if (MPI_SUCCESS != (err = getprimes(freeprocs, &nprimes, &primes))) {
-       return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
-                                     FUNC_NAME);
-    }
-
     /* Factor the number of free processes */
-    if (MPI_SUCCESS != (err = getfactors(freeprocs, nprimes, primes, &factors))) {
+    if (MPI_SUCCESS != (err = getfactors(freeprocs, &nfactors, &factors))) {
        return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
                                      FUNC_NAME);
     }
 
     /* Assign free processes to free dimensions */
-    if (MPI_SUCCESS != (err = assignnodes(freedims, nprimes, primes, factors, &procs))) {
+    if (MPI_SUCCESS != (err = assignnodes(freedims, nfactors, factors, &procs))) {
        return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, err,
                                      FUNC_NAME);
     }
@@ -135,7 +129,6 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
         }
     }
 
-    free((char *) primes);
     free((char *) factors);
     free((char *) procs);
 
@@ -154,12 +147,11 @@ int MPI_Dims_create(int nnodes, int ndims, int dims[])
  *  Accepts:    - # of dimensions
  *          - # of prime factors
  *          - array of prime factors
- *          - array of factor counts
  *          - ptr to array of dimensions (returned value)
  *  Returns:    - 0 or ERROR
  */
 static int
-assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
+assignnodes(int ndim, int nfactor, int *pfacts, int **pdims)
 {
     int *bins;
     int i, j;
@@ -185,17 +177,15 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
     
     /* Loop assigning factors from the highest to the lowest */
     for (j = nfactor - 1; j >= 0; --j) {
-       f = pfacts[j];
-       for (n = counts[j]; n > 0; --n) {
-            /* Assign a factor to the smallest bin */
-            pmin = bins;
-            for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
-                if (*p < *pmin) {
-                    pmin = p;
-                }
+        f = pfacts[j];
+        /* Assign a factor to the smallest bin */
+        pmin = bins;
+        for (i = 1, p = pmin + 1; i < ndim; ++i, ++p) {
+            if (*p < *pmin) {
+                pmin = p;
             }
-            *pmin *= f;
         }
+        *pmin *= f;
      }
     
      /* Sort dimensions in decreasing order (O(n^2) for now) */
@@ -217,96 +207,47 @@ assignnodes(int ndim, int nfactor, int *pfacts, int *counts, int **pdims)
  *
  *  Function:   - factorize a number
  *  Accepts:    - number
- *          - # of primes
- *          - array of primes
- *          - ptr to array of counts (returned value)
- *  Returns:    - 0 or ERROR
+ *          - # prime factors
+ *          - array of prime factors
+ *  Returns:    - MPI_SUCCESS or ERROR
  */
 static int
-getfactors(int num, int nprime, int *primes, int **pcounts)
-{
-    int *counts;
+getfactors(int num, int *nfactors, int **factors) {
+    int size;
+    int n;
+    int d;
     int i;
-    int *p;
-    int *c;
-    
-    if (0 >= nprime) {
-        return MPI_ERR_INTERN;
-    }
+    double lognum;
 
-    /* Allocate the factor counts array */
-    counts = (int *) malloc((unsigned) nprime * sizeof(int));
-    if (NULL == counts) {
-       return MPI_ERR_NO_MEM;
+    if(num  < 2) {
+        (*nfactors) = 0;
+        (*factors) = NULL;
+        return MPI_SUCCESS;
     }
-
-    *pcounts = counts;
-
-    /* Loop over the prime numbers */
-    i = nprime - 1;
-    p = primes + i;
-    c = counts + i;
-
-    for (; i >= 0; --i, --p, --c) {
-        *c = 0;
-        while ((num % *p) == 0) {
-            ++(*c);
-            num /= *p;
+    /* Allocate the array of prime factors which cannot exceed log_2(num) entries */
+    n = num;
+    size = ceil(log(num) / log(2));
+    *factors = (int *) malloc((unsigned) size * sizeof(int));
+
+    i = 0;
+    /* determine all occurences of factor 2 */
+    while((num % 2) == 0) {
+        num /= 2;
+        (*factors)[i++] = 2;
+    }
+    /* determine all occurences of uneven prime numbers up to sqrt(num) */
+    d = 3;
+    int sqrtnum = ceil(sqrt(num));
+    for(d = 3; (num > 1) && (d < sqrtnum); d += 2) {
+        while((num % d) == 0) {
+            num /= d;
+            (*factors)[i++] = d;
         }
     }
-
-    if (1 != num) {
-        return MPI_ERR_INTERN;
+    /* as we looped only up to sqrt(num) one factor > sqrt(num) may be left over */
+    if(num != 1) {
+        (*factors)[i++] = num;
     }
-
+    (*nfactors) = i;
     return MPI_SUCCESS;
 }
-
-/*
- *  getprimes
- *
- *  Function:   - get primes smaller than number
- *          - array of primes dynamically allocated
- *  Accepts:    - number
- *          - ptr # of primes (returned value)
- *          - ptr array of primes (returned values)
- *  Returns:    - 0 or ERROR
- */
-static int
-getprimes(int num, int *pnprime, int **pprimes) {
-
-   int i, j;
-   int n;
-   int size;
-   int *primes;
-
-   /* Allocate the array of primes */
-   size = (num / 2) + 1;
-   primes = (int *) malloc((unsigned) size * sizeof(int));
-   if (NULL == primes) {
-       return MPI_ERR_NO_MEM;
-   }
-   *pprimes = primes;
-
-   /* Find the prime numbers */
-   i = 0;
-   primes[i++] = 2;
-
-   for (n = 3; n <= num; n += 2) {
-      for (j = 1; j < i; ++j) {
-         if ((n % primes[j]) == 0) {
-             break;
-          }
-      }
-
-      if (j == i) {
-        if (i >= size) {
-           return MPI_ERR_DIMS;
-         }
-         primes[i++] = n;
-      }
-   }
-
-   *pnprime = i;
-   return MPI_SUCCESS;
-}
-- 
1.8.3.2

Reply via email to