Nice! Can you verify that it passes the ibm test? I didn't look closely, and to be honest, I'm not sure why the previous improvement broke the IBM test because it hypothetically did what you mentioned (stopped at sqrt(freenodes)).
I think patch 1 is a no-brainer. I'm not sure about #2 because I'm not sure how it's different than the previous one, nor did I have time to investigate why the previous one broke the IBM test. #3 seems like a good idea, too; I did't check the paper, but I assume it's some kind of proof about the upper limit on the number of primes in a given range. Two questions: 1. Should we cache generated prime numbers? (if so, it'll have to be done in a thread-safe way) 2. Should we just generate prime numbers and hard-code them into a table that is compiled into the code? We would only need primes up to the sqrt of 2billion (i.e., signed int), right? I don't know how many that is -- if it's small enough, perhaps this is the easiest solution. On Feb 10, 2014, at 1:30 PM, Christoph Niethammer <nietham...@hlrs.de> wrote: > Hello, > > I noticed some effort in improving the scalability of > MPI_Dims_create(int nnodes, int ndims, int dims[]) > Unfortunately there were some issues with the first attempt (r30539 and > r30540) which were reverted. > > So I decided to give it a short review based on r30606 > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606 > > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of > freeprocs have all positive integers as divisor. > So IMHO it would make more sense to check if nnodes > 0 in the > MPI_PARAM_CHECK section at the begin instead of the following (see patch > 0001): > > 99 if (freeprocs < 1) { > 100 return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS, > 101 FUNC_NAME); > 102 } > > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int > *nprimes, int **pprimes) > which makes mathematically more sens (as the largest prime factor of any > number n cannot exceed \sqrt{n}) - and should produce the right result. ;) > (see patch 0002) > Here the improvements: > > module load mpi/openmpi/trunk-gnu.4.7.3 > $ ./mpi-dims-old 1000000 > time used for MPI_Dims_create(1000000, 3, {}): 8.104007 > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing > $ ./mpi-dims-new 1000000 > time used for MPI_Dims_create(1000000, 3, {}): 0.060400 > > > 3.) Memory allocation for the list of prime numbers may be reduced up to a > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]: > \pi(x) < x/ln(x)(1+1.2762/ln(x)) for x > 1 > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as > reported by tcmalloc/pprof on a test program - but it may sum up with fatter > nodes. :P > > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap > (pprof) top > Total: -1.6 MB > 0.3 -18.8% -18.8% 0.3 -18.8% getprimes2 > 0.0 -0.0% -18.8% -1.6 100.0% __libc_start_main > 0.0 -0.0% -18.8% -1.6 100.0% main > -1.9 118.8% 100.0% -1.9 118.8% getprimes > > Find attached patch for it in 0003. > > > If there are no issues I would like to commit this to trunk for further > testing (+cmr for 1.7.5?) end of this week. > > Best regards > Christoph > > [1] > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html > > > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer<0001-Move-parameter-check-into-appropriate-code-section-a.patch><0002-Speeding-up-detection-of-prime-numbers-using-the-fac.patch><0003-Reduce-memory-usage-by-a-better-approximation-for-th.patch>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/