Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Andreas Schäfer
On 19:12 Mon 10 Feb , Jeff Squyres (jsquyres) wrote:
> 1. Should we cache generated prime numbers?  (if so, it'll have to be done in 
> a thread-safe way)

The code I've submitted in the other thread is much faster than the
original code (e.g. 100x faster for 1 nodes, the ratio increases
with the number of nodes). Given that MPI_Dims_create is a function
typically called once at application startup, I don't think caching
was of great benefit.

> 2. Should we just generate prime numbers and hard-code them into a table that 
> is compiled into the code?  We would only need primes up to the sqrt of 
> 2billion (i.e., signed int), right?  I don't know how many that is -- if it's 
> small enough, perhaps this is the easiest solution.

Could be done, but it's not much faster than the current code. Plus
it's just calling for bugs (MPICH does this and their (initial) code
easily segfaulted).

Best
-Andreas


-- 
==
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!


signature.asc
Description: Digital signature


Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Andreas Schäfer
Christoph-

Jeff is right, the largest prime p we need to precompute is p <
sqrt(2^31), so in total we'd get away with a dozen kB of RAM. The only
prime in the factorization we'd not catch this way can be retrieved
later on during the factorization.

But I agree with you that's it's not really worth the effort.

Cheers
-Andreas


On 02:39 Tue 11 Feb , Christoph Niethammer wrote:
> sqrt(2^31)/log(sqrt(2^31))*(1+1.2762/log(sqrt(2^31)))/1024 * 4byte = 
> 18,850133965051 kbyte should do it. ;)
> Amazing - I think our systems are still *too small* - lets go for MPI with 
> int64 types. ^^
> 
> - Ursprüngliche Mail -
> Von: "Jeff Squyres (jsquyres)" 
> An: "Open MPI Developers" 
> Gesendet: Dienstag, 11. Februar 2014 01:32:53
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> On Feb 10, 2014, at 7:22 PM, Christoph Niethammer  wrote:
> 
> > 2.) Interesting idea: Using the approximation from the cited paper we 
> > should only need around 400 MB to store all primes in the int32 range. 
> > Potential for applying compression techniques still present. ^^
> 
> Per Andreas' last mail, we only need primes up to sqrt(2B) + 1 more.  That 
> *has* to be less than 400MB... right?
> 
> sqrt(2B) = 46340.  So the upper limit on the size required to hold all the 
> primes from 2...46340 is 46340*sizeof(int) = 185,360 bytes (plus one more, 
> per Andreas, so 185,364).
> 
> This is all SWAGing, but I'm assuming the actual number must be *far* less 
> than that...
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
==
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!


signature.asc
Description: Digital signature


Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Andreas Schäfer
OK, so we were thinking the same thing :-) The optimization you
mention below is the same I've used in my updated patch.


On 02:29 Tue 11 Feb , Christoph Niethammer wrote:
> As mentioned in my former mail I did not touch the factorization
> code.
> But to figure out if a number n is *not* a prime number it is sufficient to 
> check up to \sqrt(n).
> Proof:
> let n = p*q with q > \sqrt{n}
> --> p < \sqrt(n)
> So we have already found factor p before reaching \sqrt(n) and by this n is 
> no prime any more and no need for further checks. ;)
> 
> 
> The mentioned factorization may indeed include one factor which is larger 
> than \sqrt(n). :)
> 
> Proof that at least one prime factor can be larger than \sqrt(n) example:
> 6 = 2*3
> \sqrt(6) = 2.4494897427832... < 3   Q.E.D.
> 
> 
> Proof that no more than one factor can be larger than \sqrt(n):
> let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
> and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
> --> 1 > \prod_{i=2}^K p_i
> which is a contradiction as all p_i \in N.  Q.E.D.
> 
> 
> So your idea is still applicable with not much effort and we only need prime 
> factors up to sqrt(n) in the factorizer code for an additional optimization. 
> :)
> 
> First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} 
> p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No 
> complication with counts IMHO. I leave this without patch as it is already 
> 2:30 in the morning. :P
> 
> Regards
> Christoph
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> 
> - Ursprüngliche Mail -
> Von: "Andreas Schäfer" 
> An: "Open MPI Developers" 
> Gesendet: Montag, 10. Februar 2014 23:24:24
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> Christoph-
> 
> your patch has the same problem as my original patch: indeed there may
> be a prime factor p of n with p > sqrt(n). What's important is that
> there may only be at most one. I've submitted an updated patch (see my
> previous mail) which catches this special case.
> 
> Best
> -Andreas
> 
> 
> On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> > Hello,
> > 
> > I noticed some effort in improving the scalability of
> > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > Unfortunately there were some issues with the first attempt (r30539 and 
> > r30540) which were reverted.
> > 
> > So I decided to give it a short review based on r30606
> > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > 
> > 
> > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> > freeprocs have all positive integers as divisor.
> > So IMHO it would make more sense to check if nnodes > 0 in the 
> > MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> > 0001):
> > 
> > 99  if (freeprocs < 1) {
> > 100return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> > 101  FUNC_NAME);
> > 102 }
> > 
> > 
> > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> > *nprimes, int **pprimes)
> > which makes mathematically more sens (as the largest prime factor of any 
> > number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> > (see patch 0002)
> > Here the improvements:
> > 
> > module load mpi/openmpi/trunk-gnu.4.7.3
> > $ ./mpi-dims-old 100
> > time used for MPI_Dims_create(100, 3, {}): 8.104007
> > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> > $ ./mpi-dims-new 100
> > time used for MPI_Dims_create(100, 3, {}): 0.060400
> > 
> > 
> > 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> > \pi(x)  < x/ln(x)(1+1.2762/ln(x))  for x > 1
> > Unfortunately this saves us only 1.6 MB per process for 1mio nodes as 
> > reported by tcmalloc/pprof on a test program - but it may sum up with 
> > fatter nodes. :P
> > 
> > $ pprof --base=$PWD/primes-old.0001.heap a.out primes-new.0002.heap
> > (pprof) top
> > Total: -1.6 MB
> >  0.3 -18.8% -18.8%  0.3 -18.8% getprimes2
> >  0.0  -0.0% -18.8% -1.6 100.0% __libc_start_main
> >  0.0  -0.0% -18.8% -1.6 100.0% main
> > -1.9 118.8% 100.0% -1.9 118.8% getprimes
> > 
> > Find attached patch for it in 0003.
> > 
> > 
> > If there are no issues I would like to commit this to trunk for further 
> > testing (+cmr for 1.7.5?) end of this week.
> > 
> > Best regards
> > Christoph
> > 
> > [1] 
> > http://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-01037-6/home.html
> > 
> > 
> > 
> > --
> > 
> > Christoph Niethammer
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstrasse 19
> > 70569 Stuttgart
> > 
> > Tel: ++49(0)711-685-87203

[OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-11 Thread Paul Kapinos

Dear Open MPI developer,

I.
we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a change 
to previous versions:

- when calling "mpiexec", it returns "1" and exits silently.

The behaviour is reproducible; well not that easy reproducible.

We have multiple InfiniBand islands in our cluster. All nodes are passwordless 
reachable from each other in somehow way; some via IPoIB, for some routing you 
also have to use ethernet cards and IB/TCP gateways.


One island (b) is configured to use the IB card as the main TCP interface. In 
this island, the variable OMPI_MCA_oob_tcp_if_include is set to "ib0" (*)


Another island (h) is configured in convenient way: IB cards also are here and 
may be used for IPoIB in the island, but the "main interface" used for DNS and 
Hostname binds is eth0.


When calling 'mpiexec' from (b) to start a process on (h), and OpenMPI version 
is 1.7.4, and OMPI_MCA_oob_tcp_if_include is set to "ib0", mpiexec just exits 
with return value "1" and no error/warning.


When OMPI_MCA_oob_tcp_if_include is unset it works pretty fine.

All previously versions of Open MPI (1.6.x, 1.7.3) ) did not have this 
behaviour; so this is aligned to v1.7.4 only. See log below.


You ask why to hell starting MPI processes on other IB island? Because our 
front-end nodes are in the island (b) but we sometimes need to start something 
also on island (h), which has been worced perfectly until 1.7.4.



(*) This is another Spaghetti Western long story. In short, we set 
OMPI_MCA_oob_tcp_if_include to 'ib0' in the subcluster where the IB card is 
configured to be the main network interface, in order to stop Open MPI trying to 
connect via (possibly unconfigured) ethernet cards - which lead to endless 
waiting, sometimes.

Cf. http://www.open-mpi.org/community/lists/users/2011/11/17824.php

--
pk224850@cluster:~[523]$ module switch $_LAST_MPI openmpi/1.7.3 

Unloading openmpi 1.7.3 
[ OK ]
Loading openmpi 1.7.3 for intel compiler 
[ OK ]
pk224850@cluster:~[524]$ $MPI_BINDIR/mpiexec  -H linuxscc004 -np 1 hostname ; 
echo $?

linuxscc004.rz.RWTH-Aachen.DE
0
pk224850@cluster:~[525]$ module switch $_LAST_MPI openmpi/1.7.4 

Unloading openmpi 1.7.3 
[ OK ]
Loading openmpi 1.7.4 for intel compiler 
[ OK ]
pk224850@cluster:~[526]$ $MPI_BINDIR/mpiexec  -H linuxscc004 -np 1 hostname ; 
echo $?

1
pk224850@cluster:~[527]$
--








II.
During some experiments with envvars and v1.7.4, got the below messages.

--
Sorry!  You were supposed to get help about:
no-included-found
But I couldn't open the help file:
/opt/MPI/openmpi-1.7.4/linux/intel/share/openmpi/help-oob-tcp.txt: No such 
file or directory.  Sorry!

--
[linuxc2.rz.RWTH-Aachen.DE:13942] [[63331,0],0] ORTE_ERROR_LOG: Not available in 
file ess_hnp_module.c at line 314

--

Reproducing:
$MPI_BINDIR/mpiexec  -mca oob_tcp_if_include ib0   -H linuxscc004 -np 1 hostname

*frome one node with no 'ib0' card*, also without infiniband. Yessir this is a 
bad idea, and the 1.7.3 has said more understanding "you do wrong thing":

--
None of the networks specified to be included for out-of-band communications
could be found:

  Value given: ib0

Please revise the specification and try again.
--


No idea, why the file share/openmpi/help-oob-tcp.txt has not been installed in 
1.7.4, as we compile this version in pretty the same way as previous versions..





Best,
Paul Kapinos

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Christoph Niethammer
Hello,

After rethinking Jeff's comments about caching prime numbers I came to the 
conclusion that we can omit the prime numbers at all and go directly for the 
factorization. :D
We then only need at most   log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128 
byte   of memory for the factors.

Computational costs reduce as well as the factorization itself is done by a 
loop with at most \sqrt(num) / 2 iterations - which is the same as in the 
original prime number detection loop.
I think this is the cleanest way which reduces also source code size. ;)

Find attache patch against the trunk.

Best regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



- Ursprüngliche Mail -
Von: "Andreas Schäfer" 
An: "Open MPI Developers" 
Gesendet: Dienstag, 11. Februar 2014 06:24:56
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

OK, so we were thinking the same thing :-) The optimization you
mention below is the same I've used in my updated patch.


On 02:29 Tue 11 Feb , Christoph Niethammer wrote:
> As mentioned in my former mail I did not touch the factorization
> code.
> But to figure out if a number n is *not* a prime number it is sufficient to 
> check up to \sqrt(n).
> Proof:
> let n = p*q with q > \sqrt{n}
> --> p < \sqrt(n)
> So we have already found factor p before reaching \sqrt(n) and by this n is 
> no prime any more and no need for further checks. ;)
> 
> 
> The mentioned factorization may indeed include one factor which is larger 
> than \sqrt(n). :)
> 
> Proof that at least one prime factor can be larger than \sqrt(n) example:
> 6 = 2*3
> \sqrt(6) = 2.4494897427832... < 3   Q.E.D.
> 
> 
> Proof that no more than one factor can be larger than \sqrt(n):
> let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
> and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
> --> 1 > \prod_{i=2}^K p_i
> which is a contradiction as all p_i \in N.  Q.E.D.
> 
> 
> So your idea is still applicable with not much effort and we only need prime 
> factors up to sqrt(n) in the factorizer code for an additional optimization. 
> :)
> 
> First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} 
> p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. No 
> complication with counts IMHO. I leave this without patch as it is already 
> 2:30 in the morning. :P
> 
> Regards
> Christoph
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> 
> - Ursprüngliche Mail -
> Von: "Andreas Schäfer" 
> An: "Open MPI Developers" 
> Gesendet: Montag, 10. Februar 2014 23:24:24
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> Christoph-
> 
> your patch has the same problem as my original patch: indeed there may
> be a prime factor p of n with p > sqrt(n). What's important is that
> there may only be at most one. I've submitted an updated patch (see my
> previous mail) which catches this special case.
> 
> Best
> -Andreas
> 
> 
> On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> > Hello,
> > 
> > I noticed some effort in improving the scalability of
> > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > Unfortunately there were some issues with the first attempt (r30539 and 
> > r30540) which were reverted.
> > 
> > So I decided to give it a short review based on r30606
> > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > 
> > 
> > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> > freeprocs have all positive integers as divisor.
> > So IMHO it would make more sense to check if nnodes > 0 in the 
> > MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> > 0001):
> > 
> > 99  if (freeprocs < 1) {
> > 100return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_DIMS,
> > 101  FUNC_NAME);
> > 102 }
> > 
> > 
> > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> > *nprimes, int **pprimes)
> > which makes mathematically more sens (as the largest prime factor of any 
> > number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> > (see patch 0002)
> > Here the improvements:
> > 
> > module load mpi/openmpi/trunk-gnu.4.7.3
> > $ ./mpi-dims-old 100
> > time used for MPI_Dims_create(100, 3, {}): 8.104007
> > module swap mpi/openmpi/trunk-gnu.4.7.3 mpi/openmpi/trunk-gnu.4.7.3-testing
> > $ ./mpi-dims-new 100
> > time used for MPI_Dims_create(100, 3, {}): 0.060400
> > 
> > 
> > 3.) Memory allocation for the list of prime numbers may be reduced up to a 
> > factor of ~6 for 1mio nodes using the result from Dusart 1999 [1]:
> > \pi(x)  < x/ln(x)(1+1.2762/ln(x)) 

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Andreas Schäfer
Hi,

ah, that's clever indeed!

Best
-Andreas


On 12:02 Tue 11 Feb , Christoph Niethammer wrote:
> Hello,
> 
> After rethinking Jeff's comments about caching prime numbers I came to the 
> conclusion that we can omit the prime numbers at all and go directly for the 
> factorization. :D
> We then only need at most   log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128 
> byte   of memory for the factors.
> 
> Computational costs reduce as well as the factorization itself is done by a 
> loop with at most \sqrt(num) / 2 iterations - which is the same as in the 
> original prime number detection loop.
> I think this is the cleanest way which reduces also source code size. ;)
> 
> Find attache patch against the trunk.
> 
> Best regards
> Christoph
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> 
> 
> 
> - Ursprüngliche Mail -
> Von: "Andreas Schäfer" 
> An: "Open MPI Developers" 
> Gesendet: Dienstag, 11. Februar 2014 06:24:56
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> OK, so we were thinking the same thing :-) The optimization you
> mention below is the same I've used in my updated patch.
> 
> 
> On 02:29 Tue 11 Feb , Christoph Niethammer wrote:
> > As mentioned in my former mail I did not touch the factorization
> > code.
> > But to figure out if a number n is *not* a prime number it is sufficient to 
> > check up to \sqrt(n).
> > Proof:
> > let n = p*q with q > \sqrt{n}
> > --> p < \sqrt(n)
> > So we have already found factor p before reaching \sqrt(n) and by this n is 
> > no prime any more and no need for further checks. ;)
> > 
> > 
> > The mentioned factorization may indeed include one factor which is larger 
> > than \sqrt(n). :)
> > 
> > Proof that at least one prime factor can be larger than \sqrt(n) example:
> > 6 = 2*3
> > \sqrt(6) = 2.4494897427832... < 3   Q.E.D.
> > 
> > 
> > Proof that no more than one factor can be larger than \sqrt(n):
> > let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
> > and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
> > --> 1 > \prod_{i=2}^K p_i
> > which is a contradiction as all p_i \in N.  Q.E.D.
> > 
> > 
> > So your idea is still applicable with not much effort and we only need 
> > prime factors up to sqrt(n) in the factorizer code for an additional 
> > optimization. :)
> > 
> > First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} 
> > p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. 
> > No complication with counts IMHO. I leave this without patch as it is 
> > already 2:30 in the morning. :P
> > 
> > Regards
> > Christoph
> > 
> > --
> > 
> > Christoph Niethammer
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstrasse 19
> > 70569 Stuttgart
> > 
> > Tel: ++49(0)711-685-87203
> > email: nietham...@hlrs.de
> > http://www.hlrs.de/people/niethammer
> > 
> > - Ursprüngliche Mail -
> > Von: "Andreas Schäfer" 
> > An: "Open MPI Developers" 
> > Gesendet: Montag, 10. Februar 2014 23:24:24
> > Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> > 
> > Christoph-
> > 
> > your patch has the same problem as my original patch: indeed there may
> > be a prime factor p of n with p > sqrt(n). What's important is that
> > there may only be at most one. I've submitted an updated patch (see my
> > previous mail) which catches this special case.
> > 
> > Best
> > -Andreas
> > 
> > 
> > On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> > > Hello,
> > > 
> > > I noticed some effort in improving the scalability of
> > > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > > Unfortunately there were some issues with the first attempt (r30539 and 
> > > r30540) which were reverted.
> > > 
> > > So I decided to give it a short review based on r30606
> > > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > > 
> > > 
> > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> > > freeprocs have all positive integers as divisor.
> > > So IMHO it would make more sense to check if nnodes > 0 in the 
> > > MPI_PARAM_CHECK section at the begin instead of the following (see patch 
> > > 0001):
> > > 
> > > 99if (freeprocs < 1) {
> > > 100  return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, 
> > > MPI_ERR_DIMS,
> > > 101FUNC_NAME);
> > > 102   }
> > > 
> > > 
> > > 2.) I rewrote the algorithm stopping at sqrt(n) in getprimes(int num, int 
> > > *nprimes, int **pprimes)
> > > which makes mathematically more sens (as the largest prime factor of any 
> > > number n cannot exceed \sqrt{n}) - and should produce the right result. ;)
> > > (see patch 0002)
> > > Here the improvements:
> > > 
> > > module load mpi/openmpi/trunk-gnu.4.7.3
> > > $ ./mpi-dims-old 100
> > > time us

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Christoph Niethammer
Hello,

Minor update as a little bug and an unused variable were left over in the patch.
I'll commit this + the parameter check change later.

Anybody volunteering for review of a cmr for 1.7.5. :)

Ah and some restults for MPI_Dims_create(100, 3, {})

original: 8.110628 sec
optimized-primes: 0.048702 sec
optimized-factorization: 0.13 sec

Regards
Christoph

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer




- Ursprüngliche Mail -
Von: "Andreas Schäfer" 
An: "Open MPI Developers" 
Gesendet: Dienstag, 11. Februar 2014 12:16:53
Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create

Hi,

ah, that's clever indeed!

Best
-Andreas


On 12:02 Tue 11 Feb , Christoph Niethammer wrote:
> Hello,
> 
> After rethinking Jeff's comments about caching prime numbers I came to the 
> conclusion that we can omit the prime numbers at all and go directly for the 
> factorization. :D
> We then only need at most   log_2(INT_MAX) * sizeof(int) = 32 * 4 byte = 128 
> byte   of memory for the factors.
> 
> Computational costs reduce as well as the factorization itself is done by a 
> loop with at most \sqrt(num) / 2 iterations - which is the same as in the 
> original prime number detection loop.
> I think this is the cleanest way which reduces also source code size. ;)
> 
> Find attache patch against the trunk.
> 
> Best regards
> Christoph
> 
> --
> 
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
> 
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> 
> 
> 
> - Ursprüngliche Mail -
> Von: "Andreas Schäfer" 
> An: "Open MPI Developers" 
> Gesendet: Dienstag, 11. Februar 2014 06:24:56
> Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> 
> OK, so we were thinking the same thing :-) The optimization you
> mention below is the same I've used in my updated patch.
> 
> 
> On 02:29 Tue 11 Feb , Christoph Niethammer wrote:
> > As mentioned in my former mail I did not touch the factorization
> > code.
> > But to figure out if a number n is *not* a prime number it is sufficient to 
> > check up to \sqrt(n).
> > Proof:
> > let n = p*q with q > \sqrt{n}
> > --> p < \sqrt(n)
> > So we have already found factor p before reaching \sqrt(n) and by this n is 
> > no prime any more and no need for further checks. ;)
> > 
> > 
> > The mentioned factorization may indeed include one factor which is larger 
> > than \sqrt(n). :)
> > 
> > Proof that at least one prime factor can be larger than \sqrt(n) example:
> > 6 = 2*3
> > \sqrt(6) = 2.4494897427832... < 3   Q.E.D.
> > 
> > 
> > Proof that no more than one factor can be larger than \sqrt(n):
> > let n = \prod_{i=0}^K p_i with p_i \in N  and K > 2
> > and assume w.l.o.g.  p_0 > \sqrt(n)  and  p_1 > \sqrt(n)
> > --> 1 > \prod_{i=2}^K p_i
> > which is a contradiction as all p_i \in N.  Q.E.D.
> > 
> > 
> > So your idea is still applicable with not much effort and we only need 
> > prime factors up to sqrt(n) in the factorizer code for an additional 
> > optimization. :)
> > 
> > First search all K' factors p_i < \sqrt(n). If then n \ne \prod_{i=0}^{K'} 
> > p_i we should be sure that p_{K'+1} = n / \prod_{i=0}^{K'} p_i is a prime. 
> > No complication with counts IMHO. I leave this without patch as it is 
> > already 2:30 in the morning. :P
> > 
> > Regards
> > Christoph
> > 
> > --
> > 
> > Christoph Niethammer
> > High Performance Computing Center Stuttgart (HLRS)
> > Nobelstrasse 19
> > 70569 Stuttgart
> > 
> > Tel: ++49(0)711-685-87203
> > email: nietham...@hlrs.de
> > http://www.hlrs.de/people/niethammer
> > 
> > - Ursprüngliche Mail -
> > Von: "Andreas Schäfer" 
> > An: "Open MPI Developers" 
> > Gesendet: Montag, 10. Februar 2014 23:24:24
> > Betreff: Re: [OMPI devel] Reviewing MPI_Dims_create
> > 
> > Christoph-
> > 
> > your patch has the same problem as my original patch: indeed there may
> > be a prime factor p of n with p > sqrt(n). What's important is that
> > there may only be at most one. I've submitted an updated patch (see my
> > previous mail) which catches this special case.
> > 
> > Best
> > -Andreas
> > 
> > 
> > On 19:30 Mon 10 Feb , Christoph Niethammer wrote:
> > > Hello,
> > > 
> > > I noticed some effort in improving the scalability of
> > > MPI_Dims_create(int nnodes, int ndims, int dims[])
> > > Unfortunately there were some issues with the first attempt (r30539 and 
> > > r30540) which were reverted.
> > > 
> > > So I decided to give it a short review based on r30606
> > > https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/c/dims_create.c?rev=30606
> > > 
> > > 
> > > 1.) freeprocs is initialized to be nnodes and the subsequent divisions of 
> > > freeprocs have all positive integers as divisor.
> > > So IMHO it would make more sense to check if nnodes > 0 in the 
> 

Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Andreas Schäfer
On 15:20 Tue 11 Feb , Christoph Niethammer wrote:
> Ah and some restults for MPI_Dims_create(100, 3, {})
> 
> original: 8.110628 sec
> optimized-primes: 0.048702 sec
> optimized-factorization: 0.13 sec

Awesome! I didn't expect that nested loop for checking whether a
factor would still have such an impact.

-Andreas


-- 
==
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!


signature.asc
Description: Digital signature


Re: [OMPI devel] RFC: Changing 32-bit build behavior/sizes for MPI_Count and MPI_Offset

2014-02-11 Thread Dave Goodell (dgoodell)
On Feb 10, 2014, at 6:14 PM, Jeff Squyres (jsquyres)  wrote:

> As a side effect, this means that -- for 32 bit builds -- we will not support 
> large filesystems well (e.g., filesystems with 64 bit offsets).  BlueGene is 
> an example of such a system (not that OMPI supports BlueGene, but...).

To clarify and head off unnecessary quibbling, I'll point out that by 
"BlueGene", Jeff means "Blue Gene/P" (/Q is 64-bit).  This issue applies to any 
machine with 32-bit addresses that might want to access files larger than 2 GiB.

> Specifically: for 32 bit builds, we'll only allow MPI_Offset to be 32 bits.  
> I don't think that this is a major issue, because 32 bit builds are not a 
> huge issue for the OMPI community, but I raise the point in the spirit of 
> full disclosure.  Fixing it to allow 32 bit MPI_Aint but 64 bit MPI_Offset 
> and MPI_Count would likely mean re-tooling the PML/BML/BTL/convertor 
> infrastructure to use something other than size_t, and I have zero desire to 
> do that!  (please, no OMPI vendor reveal that they're going to seriously 
> build giant 32 bit systems...)

-Dave



Re: [OMPI devel] Reviewing MPI_Dims_create

2014-02-11 Thread Jeff Squyres (jsquyres)
+1

On Feb 11, 2014, at 9:25 AM, Andreas Schäfer 
 wrote:

> On 15:20 Tue 11 Feb , Christoph Niethammer wrote:
>> Ah and some restults for MPI_Dims_create(100, 3, {})
>> 
>> original: 8.110628 sec
>> optimized-primes: 0.048702 sec
>> optimized-factorization: 0.13 sec
> 
> Awesome! I didn't expect that nested loop for checking whether a
> factor would still have such an impact.
> 
> -Andreas
> 
> 
> -- 
> ==
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==
> 
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] v1.7.4 REGRESSION: build failure w/ old OFED

2014-02-11 Thread Jeff Squyres (jsquyres)
Excellent; thanks Paul.  We're having a look.

On Feb 8, 2014, at 6:50 PM, Paul Hargrove  wrote:

> With Ralph's announcement that oshmem had been merged to v1.7 I started tests 
> on lots of systems.
> When I found the problem described below, I tried the 1.7.4 release, I found 
> the problem exists there too!!
> 
> One system I tried is a fairly ancient x86-64/linux system w/ QLogic HCAs, 
> and thus builds and tests mtl:psm.
> As a guest on this system I had NOT been testing it with all the 1.7.4rc's, 
> but had tested at least once w/o problems 
> (http://www.open-mpi.org/community/lists/devel/2014/01/13661.php).  
> 
> However, with both the 1.7.4 release and the current tarball (1.7.5a1r30634) 
> I seem to be getting an ibv error that is probably due to the age of the OFED 
> on this system:
> 
>   CCLD otfmerge-mpi
> /home/phhargrove/OMPI/openmpi-1.7-latest-linux-x86_64-psm/BLD/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>  undefined reference to `ibv_event_type_str'
> collect2: ld returned 1 exit status
> 
> The problem seems to be originating in the usenic btl:
> $ grep -rl ibv_event_type_str .
> ./ompi/mca/btl/usnic/btl_usnic_module.c
> 
> -Paul
> 
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] v1.7.4 REGRESSION: build failure w/ old OFED

2014-02-11 Thread Dave Goodell (dgoodell)
Should be fixed on trunk by r30674.  It's been CMRed to v1.7.5 here: 
https://svn.open-mpi.org/trac/ompi/ticket/4254

-Dave

On Feb 11, 2014, at 11:00 AM, Jeff Squyres (jsquyres)  
wrote:

> Excellent; thanks Paul.  We're having a look.
> 
> On Feb 8, 2014, at 6:50 PM, Paul Hargrove  wrote:
> 
>> With Ralph's announcement that oshmem had been merged to v1.7 I started 
>> tests on lots of systems.
>> When I found the problem described below, I tried the 1.7.4 release, I found 
>> the problem exists there too!!
>> 
>> One system I tried is a fairly ancient x86-64/linux system w/ QLogic HCAs, 
>> and thus builds and tests mtl:psm.
>> As a guest on this system I had NOT been testing it with all the 1.7.4rc's, 
>> but had tested at least once w/o problems 
>> (http://www.open-mpi.org/community/lists/devel/2014/01/13661.php).  
>> 
>> However, with both the 1.7.4 release and the current tarball (1.7.5a1r30634) 
>> I seem to be getting an ibv error that is probably due to the age of the 
>> OFED on this system:
>> 
>>  CCLD otfmerge-mpi
>> /home/phhargrove/OMPI/openmpi-1.7-latest-linux-x86_64-psm/BLD/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
>>  undefined reference to `ibv_event_type_str'
>> collect2: ld returned 1 exit status
>> 
>> The problem seems to be originating in the usenic btl:
>> $ grep -rl ibv_event_type_str .
>> ./ompi/mca/btl/usnic/btl_usnic_module.c
>> 
>> -Paul
>> 
>> 
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] v1.7.4, mpiexec "exit 1" and no other warning - behaviour changed to previous versions

2014-02-11 Thread Ralph Castain
I've added better error messages in the trunk, scheduled to move over to 1.7.5. 
I don't see anything in the code that would explain why we don't pickup and use 
ib0 if it is present and specified in if_include - we should be doing it.

For now, can you run this with "-mca oob_base_verbose 100" on your cmd line and 
send me the output? Might help debug the behavior.

Thanks
Ralph

On Feb 11, 2014, at 1:22 AM, Paul Kapinos  wrote:

> Dear Open MPI developer,
> 
> I.
> we see peculiar behaviour in the new 1.7.4 version of Open MPI which is a 
> change to previous versions:
> - when calling "mpiexec", it returns "1" and exits silently.
> 
> The behaviour is reproducible; well not that easy reproducible.
> 
> We have multiple InfiniBand islands in our cluster. All nodes are 
> passwordless reachable from each other in somehow way; some via IPoIB, for 
> some routing you also have to use ethernet cards and IB/TCP gateways.
> 
> One island (b) is configured to use the IB card as the main TCP interface. In 
> this island, the variable OMPI_MCA_oob_tcp_if_include is set to "ib0" (*)
> 
> Another island (h) is configured in convenient way: IB cards also are here 
> and may be used for IPoIB in the island, but the "main interface" used for 
> DNS and Hostname binds is eth0.
> 
> When calling 'mpiexec' from (b) to start a process on (h), and OpenMPI 
> version is 1.7.4, and OMPI_MCA_oob_tcp_if_include is set to "ib0", mpiexec 
> just exits with return value "1" and no error/warning.
> 
> When OMPI_MCA_oob_tcp_if_include is unset it works pretty fine.
> 
> All previously versions of Open MPI (1.6.x, 1.7.3) ) did not have this 
> behaviour; so this is aligned to v1.7.4 only. See log below.
> 
> You ask why to hell starting MPI processes on other IB island? Because our 
> front-end nodes are in the island (b) but we sometimes need to start 
> something also on island (h), which has been worced perfectly until 1.7.4.
> 
> 
> (*) This is another Spaghetti Western long story. In short, we set 
> OMPI_MCA_oob_tcp_if_include to 'ib0' in the subcluster where the IB card is 
> configured to be the main network interface, in order to stop Open MPI trying 
> to connect via (possibly unconfigured) ethernet cards - which lead to endless 
> waiting, sometimes.
> Cf. http://www.open-mpi.org/community/lists/users/2011/11/17824.php
> 
> --
> pk224850@cluster:~[523]$ module switch $_LAST_MPI openmpi/1.7.3 
> Unloading openmpi 1.7.3 [ OK ]
> Loading openmpi 1.7.3 for intel compiler [ OK ]
> pk224850@cluster:~[524]$ $MPI_BINDIR/mpiexec  -H linuxscc004 -np 1 hostname ; 
> echo $?
> linuxscc004.rz.RWTH-Aachen.DE
> 0
> pk224850@cluster:~[525]$ module switch $_LAST_MPI openmpi/1.7.4 
> Unloading openmpi 1.7.3 [ OK ]
> Loading openmpi 1.7.4 for intel compiler [ OK ]
> pk224850@cluster:~[526]$ $MPI_BINDIR/mpiexec  -H linuxscc004 -np 1 hostname ; 
> echo $?
> 1
> pk224850@cluster:~[527]$
> --
> 
> 
> 
> 
> 
> 
> 
> 
> II.
> During some experiments with envvars and v1.7.4, got the below messages.
> 
> --
> Sorry!  You were supposed to get help about:
>no-included-found
> But I couldn't open the help file:
>/opt/MPI/openmpi-1.7.4/linux/intel/share/openmpi/help-oob-tcp.txt: No such 
> file or directory.  Sorry!
> --
> [linuxc2.rz.RWTH-Aachen.DE:13942] [[63331,0],0] ORTE_ERROR_LOG: Not available 
> in file ess_hnp_module.c at line 314
> --
> 
> Reproducing:
> $MPI_BINDIR/mpiexec  -mca oob_tcp_if_include ib0   -H linuxscc004 -np 1 
> hostname
> 
> *frome one node with no 'ib0' card*, also without infiniband. Yessir this is 
> a bad idea, and the 1.7.3 has said more understanding "you do wrong thing":
> --
> None of the networks specified to be included for out-of-band communications
> could be found:
> 
>  Value given: ib0
> 
> Please revise the specification and try again.
> --
> 
> 
> No idea, why the file share/openmpi/help-oob-tcp.txt has not been installed 
> in 1.7.4, as we compile this version in pretty the same way as previous 
> versions..
> 
> 
> 
> 
> Best,
> Paul Kapinos
> 
> -- 
> Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
> RWTH Aachen University, IT Center
> Seffenter Weg 23,  D 52074  Aachen (Germany)
> Tel: +49 241/80-24915
> 



Re: [OMPI devel] new CRS component added (criu)

2014-02-11 Thread Jeff Squyres (jsquyres)
On Feb 8, 2014, at 4:49 PM, Adrian Reber  wrote:

>> I note you have a stray $3 at the end of your configure.m4, too (it might 
>> supposed to be $2?).
> 
> I think I do not really understand configure.m4 and was happy to just
> copy it from blcr. Especially what $2 and $3 mean and how they are
> supposed to be used. I will try to simplify my configure.m4. Is there an
> example which I can have a look at?

Sorry -- been a bit busy with releasing OMPI 1.7.4 and preparing for 1.7.5...

m4 is a macro language, so think of it as templates with some intelligence.  

$1, $2, and $3 are the "parameters" passed in to the macro.  So when you do 
something like:

AC_DEFUN([FOO], [
   echo 1 is $1
   echo 2 is $2])

and you invoke that macro via

   FOO([hello world], [goodbye world])

the generated script will contain:

   echo 1 is hello world
   echo 2 is goodbye world

In our case, $1 is the action to execute if the package is happy / wants to 
build, and $2 is the action to execute if the package is unhappy / does not 
want to build.

Meaning: we have a top-level engine that is iterating over all frameworks and 
components, and calling their *_CONFIG macros with appropriate $1 and $2 values 
that expand to actions-to-execute-if-happy / actions-to-execute-if-unhappy.

Make sense?

>> Finally, I note you're looking for libcriu.  Last time I checked with the 
>> CRIU guys -- which was quite a while ago -- that didn't exist (but I put in 
>> my $0.02 that OMPI would like to see such a userspace library).  I take it 
>> that libcriu now exists?
> 
> Yes criu has introduced libcriu with the 1.1 release. It is used to
> create RPCs to the criu process running as a service. I submitted a few
> patches to criu to actually install the headers and libraries and
> included it in the Fedora package:
> 
> https://admin.fedoraproject.org/updates/criu-1.1-4.fc20
> 
> This is what I am currently using to build against criu.

Gotcha.

I guess I should go look at that; thanks.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] new CRS component added (criu)

2014-02-11 Thread Adrian Reber
On Tue, Feb 11, 2014 at 08:09:35PM +, Jeff Squyres (jsquyres) wrote:
> On Feb 8, 2014, at 4:49 PM, Adrian Reber  wrote:
> 
> >> I note you have a stray $3 at the end of your configure.m4, too (it might 
> >> supposed to be $2?).
> > 
> > I think I do not really understand configure.m4 and was happy to just
> > copy it from blcr. Especially what $2 and $3 mean and how they are
> > supposed to be used. I will try to simplify my configure.m4. Is there an
> > example which I can have a look at?
> 
> Sorry -- been a bit busy with releasing OMPI 1.7.4 and preparing for 1.7.5...
> 
> m4 is a macro language, so think of it as templates with some intelligence.  
> 
> $1, $2, and $3 are the "parameters" passed in to the macro.  So when you do 
> something like:
> 
> AC_DEFUN([FOO], [
>echo 1 is $1
>echo 2 is $2])
> 
> and you invoke that macro via
> 
>FOO([hello world], [goodbye world])
> 
> the generated script will contain:
> 
>echo 1 is hello world
>echo 2 is goodbye world
> 
> In our case, $1 is the action to execute if the package is happy / wants to 
> build, and $2 is the action to execute if the package is unhappy / does not 
> want to build.
> 
> Meaning: we have a top-level engine that is iterating over all frameworks and 
> components, and calling their *_CONFIG macros with appropriate $1 and $2 
> values that expand to actions-to-execute-if-happy / 
> actions-to-execute-if-unhappy.
> 
> Make sense?

Thanks. I also tried to understand the macros better and with the
generated output and your description I think I understand it.

Trying to simplify configure.m4 like you suggested I would change this:

AS_IF([test "$check_crs_criu_good" != "yes"], [$2],
  [AS_IF([test ! -z "$with_criu" -a "$with_criu" != "yes"],
 [check_crs_criu_dir="$with_criu"
  check_crs_criu_dir_msg="$with_criu (from --with-criu)"])
   AS_IF([test ! -z "$with_criu_libdir" -a "$with_criu_libdir" != 
"yes"],
 [check_crs_criu_libdir="$with_criu_libdir"
  check_crs_criu_libdir_msg="$with_criu_libdir (from 
--with-criu-libdir)"])
  ])

to this:

   AS_IF([test "$check_crs_criu_good" = "yes" -a ! -z "$with_criu" -a 
"$with_criu" != "yes"],
 [check_crs_criu_dir="$with_criu"
  check_crs_criu_dir_msg="$with_criu (from --with-criu)"], 
 [$2
  check_crs_criu_good="no"])

   AS_IF([test "$check_crs_criu_good" = "yes" -a ! -z "$with_criu_libdir" -a 
"$with_criu_libdir" != "yes"],
 [check_crs_criu_dir_libdir="$with_criu_libdir"
  check_crs_criu_dir_libdir_msg="$with_criu_libdir (from --with-criu)"],
 [$2
  check_crs_criu_good="no"])


correct? With three checks in one line it seems bit unreadable
and the nested AS_IF seems easier for me to understand.
Did I understand it correctly what you meant or did you
mean something else?

Adrian


[OMPI devel] 1.7.5 status

2014-02-11 Thread Ralph Castain
Things are looking relatively good - I see two recurring failures:

1. idx_null - no idea what that test does, but it routinely fails

2. intercomm_create - this is the 3-way connect/accept/merge. Nathan - I 
believe you had a fix for that?

Ralph



Re: [OMPI devel] v1.7.4 REGRESSION: build failure w/ old OFED

2014-02-11 Thread Paul Hargrove
Dave,

Tonight's trunk tarball built successfully on the effected system.

Thanks,
-Paul


On Tue, Feb 11, 2014 at 11:19 AM, Dave Goodell (dgoodell) <
dgood...@cisco.com> wrote:

> Should be fixed on trunk by r30674.  It's been CMRed to v1.7.5 here:
> https://svn.open-mpi.org/trac/ompi/ticket/4254
>
> -Dave
>
> On Feb 11, 2014, at 11:00 AM, Jeff Squyres (jsquyres) 
> wrote:
>
> > Excellent; thanks Paul.  We're having a look.
> >
> > On Feb 8, 2014, at 6:50 PM, Paul Hargrove  wrote:
> >
> >> With Ralph's announcement that oshmem had been merged to v1.7 I started
> tests on lots of systems.
> >> When I found the problem described below, I tried the 1.7.4 release, I
> found the problem exists there too!!
> >>
> >> One system I tried is a fairly ancient x86-64/linux system w/ QLogic
> HCAs, and thus builds and tests mtl:psm.
> >> As a guest on this system I had NOT been testing it with all the
> 1.7.4rc's, but had tested at least once w/o problems (
> http://www.open-mpi.org/community/lists/devel/2014/01/13661.php).
> >>
> >> However, with both the 1.7.4 release and the current tarball
> (1.7.5a1r30634) I seem to be getting an ibv error that is probably due to
> the age of the OFED on this system:
> >>
> >>  CCLD otfmerge-mpi
> >>
> /home/phhargrove/OMPI/openmpi-1.7-latest-linux-x86_64-psm/BLD/ompi/contrib/vt/vt/../../../.libs/libmpi.so:
> undefined reference to `ibv_event_type_str'
> >> collect2: ld returned 1 exit status
> >>
> >> The problem seems to be originating in the usenic btl:
> >> $ grep -rl ibv_event_type_str .
> >> ./ompi/mca/btl/usnic/btl_usnic_module.c
> >>
> >> -Paul
> >>
> >>
> >> --
> >> Paul H. Hargrove  phhargr...@lbl.gov
> >> Future Technologies Group
> >> Computer and Data Sciences Department Tel: +1-510-495-2352
> >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900