[OMPI users] [PATCH] hooks: disable malloc override inside of Gentoo sandbox

2013-07-02 Thread Justin Bronder
As described in the comments in the source, Gentoo's own version of
fakeroot, sandbox, also runs into hangs when malloc is overridden.
Sandbox environments can easily be detected by looking for SANDBOX_PID
in the environment.  When detected, employ the same fix used for
fakeroot.

See https://bugs.gentoo.org/show_bug.cgi?id=462602
---
 opal/mca/memory/linux/hooks.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/opal/mca/memory/linux/hooks.c b/opal/mca/memory/linux/hooks.c
index 6a1646f..ce91e76 100644
--- a/opal/mca/memory/linux/hooks.c
+++ b/opal/mca/memory/linux/hooks.c
@@ -747,9 +747,16 @@ static void opal_memory_linux_malloc_init_hook(void)
"fakeroot" build environment that allocates memory during
stat() (see http://bugs.debian.org/531522).  It may not be
necessary any more since we're using access(), not stat().  But
-   we'll leave the check, anyway. */
+   we'll leave the check, anyway.
+
+   This is also an issue when using Gentoo's version of 'fakeroot',
+   sandbox v2.5.  Sandbox environments can also be detected fairly
+   easily by looking for SANDBOX_PID.
+*/
+
 if (getenv("FAKEROOTKEY") != NULL ||
-getenv("FAKED_MODE") != NULL) {
+getenv("FAKED_MODE") != NULL ||
+getenv("SANDBOX_PID") != NULL ) {
     return;
 }
 
-- 
1.8.1.5


-- 
Justin Bronder


signature.asc
Description: Digital signature


Re: [OMPI users] Wrappers should put include path *after* user args

2010-01-19 Thread Justin Bronder
On 04/12/09 16:20 -0500, Jeff Squyres wrote:
> Oy -- more specifically, we should not be putting -I/usr/include on the 
> command line *at all* (because it's special and already included by the 
> compiler search paths; similar for /usr/lib and /usr/lib64).  We should have 
> some special case code that looks for /usr/include and simply drops it.  Let 
> me check and see what's going on...
> 

I believe this was initially added here: 
https://svn.open-mpi.org/trac/ompi/ticket/870

> Can you send the contents of your 
> $prefix/share/openmpi/mpif90-wrapper-data.txt?  (it is *likely* in that 
> directory, but it could be somewhere else under prefix as well -- the 
> mpif90-wrapper-data.txt file is the important one)
> 
> 
> 
> On Dec 4, 2009, at 1:08 PM, Jed Brown wrote:
> 
> > Open MPI is installed by the distro with headers in /usr/include
> > 
> >   $ mpif90 -showme:compile -I/some/special/path
> >   -I/usr/include -pthread -I/usr/lib/openmpi -I/some/special/path
> > 
> > Here's why it's a problem:
> > 
> > HDF5 is also installed in /usr with modules at /usr/include/h5*.mod.  A
> > new HDF5 cannot be compiled using the wrappers because it will always
> > resolve the USE statements to /usr/include which is binary-incompatible
> > with the the new version (at a minimum, they "fixed" the size of an
> > argument to H5Lget_info_f between 1.8.3 and 1.8.4).
> > 
> > To build the library, the current choices are
> > 
> >   (a) get rid of the system copy before building
> >   (b) not use mpif90 wrapper
> > 
> > 
> > I just checked that MPICH2 wrappers consistently put command-line args
> > before the wrapper args.
> > 
> > Jed

Any news on this?  It doesn't look like it made it into the 1.4.1 release.
Also, it's not just /usr/include that is a problem, but the fact that the
wrappers are passing their paths before the user specified ones.  Here's an
example using mpich2 and openmpi with non-standard install paths.

Mpich2 (Some output stripped as mpicc -compile_info prints everything):
jbronder@mejis ~ $ which mpicc
/usr/lib64/mpi/mpi-mpich2/usr/bin/mpicc
jbronder@mejis ~ $ mpicc -compile_info -I/bleh
x86_64-pc-linux-gnu-gcc -I/bleh -I/usr/lib64/mpi/mpi-mpich2/usr/include 

OpenMPI:
jbronder@mejis ~ $ which mpicc
/usr/lib64/mpi/mpi-openmpi/usr/bin/mpicc
jbronder@mejis ~ $ mpicc -showme:compile -I/bleh
-I/usr/lib64/mpi/mpi-openmpi/usr/include/openmpi -pthread -I/bleh


Thanks,

-- 
Justin Bronder


pgpUpu5h4BdhJ.pgp
Description: PGP signature


[OMPI users] Open-MPI 1.2 and GM

2007-03-27 Thread Justin Bronder

Having a user who requires some of the features of gfortran in 4.1.2, I
recently began building a new image.  The issue is that "-mca btl gm" fails
while "-mca mtl gm" works.  I have not yet done any benchmarking, as I was
wondering if the move to mtl is part of the upgrade.  Below are the packages
I rebuilt.

Kernel 2.6.16.27 -> 2.6.20.1
Gcc 4.1.1 -> 4.1.2
GM Drivers 2.0.26 -> 2.0.26 (with patches for newer kernels)
OpenMPI 1.1.4 -> 1.2


The following works as expected:
/usr/local/ompi-gnu/bin/mpirun -np 4 -mca mtl gm --host node84,node83 ./xhpl

The following fails:
/usr/local/ompi-gnu/bin/mpirun -np 4 -mca btl gm --host node84,node83 ./xhpl

I've attached gziped files as suggested on the "Getting Help" section of the
website and the output from the failed mpirun.  Both nodes are known good
Myrinet nodes, using FMA to map.


Thanks in advance,

-- 
Justin Bronder

Advanced Computing Research Lab
University of Maine, Orono
20 Godfrey Dr
Orono, ME 04473
www.clusters.umaine.edu


config.log.gz
Description: Binary data


ompi_info.gz
Description: Binary data
--
Process 0.1.2 is unable to reach 0.1.2 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--
Process 0.1.1 is unable to reach 0.1.1 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--
Process 0.1.0 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--
Process 0.1.3 is unable to reach 0.1.3 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of 
usable components.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
prob

Re: [OMPI users] problem abut openmpi running

2006-10-19 Thread Justin Bronder

On a number of my Linux machines, /usr/local/lib is not searched by
ldconfig, and hence, is
not going to be found by gcc.  You can fix this by adding /usr/local/lib to
/etc/ld.so.conf and
running ldconfig ( add the -v flag if you want to see the output ).

-Justin.

On 10/19/06, Durga Choudhury  wrote:


George

I knew that was the answer to Calin's question, but I still would like to
understand the issue:

by default, the openMPI installer installs the libraries in
/usr/local/lib, which is a standard location for the C compiler to look for
libraries. So *why* do I need to explicitly specify this with
LD_LIBRARY_PATH? For example, when I am compiling with pthread calls and
pass -lpthread to gcc, I need not specify the location of libpthread.sowith 
LD_LIBRARY_PATH. I had the same problem as Calin so I am curious. This
is assuming he has not redirected the installation path to some non-standard
location.

Thanks
Durga


On 10/19/06, George Bosilca  wrote:
>
> Calin,
>
> Look like you're missing a proper value for the LD_LIBRARY_PATH.
> Please read the Open MPI FAW at http://www.open-mpi.org/faq/?
> category=running.
>
>   Thanks,
> george.
>
> On Oct 19, 2006, at 6:41 AM, calin pal wrote:
>
> >
> >   hi,
> >  i m calin from indiai m working on openmpii
> > have installed openmpi 1.1.1-tar.gz in four machines in our college
> > labin one system the openmpi is properly working.i have written
> > "hello world" program in all machines .but in one machine its
> > working properly.in other machine gives
> > ((
> > (hello:error while loading shared libraries:libmpi.so..o;cannot
> > open shared object file:no such file or directory.)
> >
> >
> > what is the problem plz tel me..and how to solve it..please
> > tell me
> >
> > calin pal
> > india
> > fergusson college
> > msc.tech(maths and computer sc.)
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



--
Devil wanted omnipresence;
He therefore created communists.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problem with Openmpi 1.1

2006-07-08 Thread Justin Bronder
1.)  Compiling without XL will take a little while, but I have the setup
for the
other questions ready now.  I figured I'd answer them right away.

2.)  TCP works fine, and is quite quick compared to mpich-1.2.7p1 by the
way.
I just reverified this.
WR11C2R45000   160 1 2  10.10  8.253e+00
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.0412956 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0272613 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0053214 .. PASSED


3.)  Exactly same setup, using mpichgm-1.2.6..14b
WR11C2R45000   160 1 2  10.43  7.994e+00

||Ax-b||_oo / ( eps * ||A||_1  * N) =0.0353693 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0233491 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0045577 .. PASSED

It also worked with mpichgm-1.2.6..15  (I believe this is the version, I
don't have
a node up with it at the moment).

Obviously mpich-1.2.7p1 works as well over ethernet.


Anyways, I'll begin the build with the standard gcc compilers that are
included
with OS X.  This is powerpc-apple-darwin8-gcc-4.0.1.

Thanks,

Justin.

Jeff Squyres (jsquyres) wrote:
> Justin --
>  
> Can we eliminate some variables so that we can figure out where the
> error is originating?
>  
> - Can you try compiling without the XL compilers?
> - Can you try running with just TCP (and not Myrinet)?
> - With the same support library installation (such as BLAS, etc.,
> assumedly also compiled with XL), can you try another MPI (e.g., LAM,
> MPICH-gm, whatever)?
>
> Let us know what you find.  Thanks!
>  
>
> 
> *From:* users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] *On Behalf Of *Justin Bronder
> *Sent:* Thursday, July 06, 2006 3:16 PM
> *To:* Open MPI Users
> *Subject:* Re: [OMPI users] Problem with Openmpi 1.1
>
> With 1.0.3a1r10670 the same problem is occuring.  Again the same
> configure arguments
> as before.  For clarity, the Myrinet drive we are using is 2.0.21
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
> GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
> r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX
> Fri Jun 16 14:39:45 EDT 2006."
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
> /usr/local/ompi-xl-1.0.3/bin/mpirun -np 2 xhpl
> This succeeds.
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787
> .. PASSED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195
> .. PASSED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300
> .. PASSED
>
> node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
> /usr/local/ompi-xl-1.0.3/bin/mpirun -mca btl gm -np 2 xhpl
> This fails.
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> 717370209518881444284334080.000 .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 226686309135.4274597
> .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722
> .. FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> 2037398812542965504.00
> ||A||_oo . . . . . . . . . . . . . . . . . . . =2561.554752
> ||A||_1  . . . . . . . . . . . . . . . . . . . =2558.129237
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 300175355203841216.00
> ||x||_1  . . . . . . . . . . . . . . . . . . . =
> 31645943341479366656.00
>
> Does anyone have a working system with OS X and Myrinet (GM)?  If
> so, I'd love to hear
>     the configure arguments and various versions you are using.  Bonus
> points if you are
> using the IBM XL compilers.
>
> Thanks,
> Justin.
>
>
> On 7/6/06, *Justin Bronder* <jsbron...@gmail.com
> <mailto:jsbron...@gmail.com>> wrote:
>
> Yes, that output was actually cut and pasted from an OS X
> run.  I'm about to test
> against 1.0.3a1r10670.
>
> Justin.
>
> On 7/6/06, *Galen M. Shipman* < gship...@lanl.gov
> <mailto:gship...@lanl.gov>> wrote:
>
> Justin, 
>
> Is the OS X run showing the same residual failure?
>
> - Galen 
>
> On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:
>
> Disregard the failure on Linux, a rebuild from scratch of
>         HPL and OpenMPI
>

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

With 1.0.3a1r10670 the same problem is occuring.  Again the same configure
arguments
as before.  For clarity, the Myrinet drive we are using is 2.0.21

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$ gm_board_info
GM build ID is "2.0.21_MacOSX_rc20050429075134PDT
r...@node96.meldrew.clusters.umaine.edu:/usr/src/gm-2.0.21_MacOSX Fri Jun 16
14:39:45 EDT 2006."

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-np 2 xhpl
This succeeds.
||Ax-b||_oo / ( eps * ||A||_1  * N) =0.1196787 .. PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =0.0283195 .. PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =0.0063300 .. PASSED

node90:~/src/hpl/bin/ompi-xl-1.0.3 jbronder$
/usr/local/ompi-xl-1.0.3/bin/mpirun
-mca btl gm -np 2 xhpl
This fails.
||Ax-b||_oo / ( eps * ||A||_1  * N) =
717370209518881444284334080.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 226686309135.4274597 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 2386641249.6518722 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . = 2037398812542965504.00
||A||_oo . . . . . . . . . . . . . . . . . . . =2561.554752
||A||_1  . . . . . . . . . . . . . . . . . . . =2558.129237
||x||_oo . . . . . . . . . . . . . . . . . . . = 300175355203841216.00
||x||_1  . . . . . . . . . . . . . . . . . . . = 31645943341479366656.00

Does anyone have a working system with OS X and Myrinet (GM)?  If so, I'd
love to hear
the configure arguments and various versions you are using.  Bonus points if
you are
using the IBM XL compilers.

Thanks,
Justin.


On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote:


Yes, that output was actually cut and pasted from an OS X run.  I'm about
to test
against 1.0.3a1r10670.

Justin.

On 7/6/06, Galen M. Shipman <gship...@lanl.gov> wrote:

> Justin,
> Is the OS X run showing the same residual failure?
>
> - Galen
>
> On Jul 6, 2006, at 10:49 AM, Justin Bronder wrote:
>
> Disregard the failure on Linux, a rebuild from scratch of HPL and
> OpenMPI
> seems to have resolved the issue.  At least I'm not getting the errors
> during
> the residual checks.
>
> However, this is persisting under OS X.
>
> Thanks,
> Justin.
>
> On 7/6/06, Justin Bronder < jsbron...@gmail.com> wrote:
>
> > For OS X:
> > /usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl
> >
> > For Linux:
> > ARCH=ompi-gnu-1.1.1a
> > /usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path
> > /usr/local/$ARCH/bin ./xhpl
> >
> > Thanks for the speedy response,
> > Justin.
> >
> > On 7/6/06, Galen M. Shipman < gship...@lanl.gov> wrote:
> >
> > > Hey Justin,
> > Please provide us your mca parameters (if any), these could be in a
> > config file, environment variables or on the command line.
> >
> > Thanks,
> >
> > Galen
> >
> >  On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
> >
> > As far as the nightly builds go, I'm still seeing what I believe to be
> >
> > this problem in both r10670 and r10652.  This is happening with
> > both Linux and OS X.  Below are the systems and ompi_info for the
> > newest revision 10670.
> >
> > As an example of the error, when running HPL with Myrinet I get the
> > following error.  Using tcp everything is fine and I see the results
> > I'd
> > expect.
> >
> > 
> > ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> > 42820214496954887558164928727596662784.000 .. FAILED
> > ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> > ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558.. 
FAILED
> > ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> > 272683853978565028754868928512.00
> > ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> > ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> > ||x||_oo . . . . . . . . . . . . . . . . . . . =
> > 37037692483529688659798261760.00
> > ||x||_1  . . . . . . . . . . . . . . . . . . . =
> > 4102704048669982798475494948864.00
> > ===
> >
> > Finished  1 tests with the following results:
> >   0 tests completed and passed residual checks,
> >   1 tests completed and failed residual checks,
> >   0 tests skipped because of illegal input values.
> >
> > 
> >
> > Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

Disregard the failure on Linux, a rebuild from scratch of HPL and OpenMPI
seems to have resolved the issue.  At least I'm not getting the errors
during
the residual checks.

However, this is persisting under OS X.

Thanks,
Justin.

On 7/6/06, Justin Bronder <jsbron...@gmail.com> wrote:


For OS X:
/usr/local/ompi-xl/bin/mpirun -mca btl gm -np 4 ./xhpl

For Linux:
ARCH=ompi-gnu-1.1.1a
/usr/local/$ARCH/bin/mpiexec -mca btl gm -np 2 -path /usr/local/$ARCH/bin
./xhpl

Thanks for the speedy response,
Justin.

On 7/6/06, Galen M. Shipman <gship...@lanl.gov> wrote:

> Hey Justin,
> Please provide us your mca parameters (if any), these could be in a
> config file, environment variables or on the command line.
>
> Thanks,
>
> Galen
>
> On Jul 6, 2006, at 9:22 AM, Justin Bronder wrote:
>
> As far as the nightly builds go, I'm still seeing what I believe to be
> this problem in both r10670 and r10652.  This is happening with
> both Linux and OS X.  Below are the systems and ompi_info for the
> newest revision 10670.
>
> As an example of the error, when running HPL with Myrinet I get the
> following error.  Using tcp everything is fine and I see the results I'd
>
> expect.
>
> 
> ||Ax-b||_oo / ( eps * ||A||_1  * N) =
> 42820214496954887558164928727596662784.000 .. FAILED
> ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182.. 
FAILED
> ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
> FAILED
> ||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
> 272683853978565028754868928512.00
> ||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
> ||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
> ||x||_oo . . . . . . . . . . . . . . . . . . . =
> 37037692483529688659798261760.00
> ||x||_1  . . . . . . . . . . . . . . . . . . . =
> 4102704048669982798475494948864.00
> ===
>
> Finished  1 tests with the following results:
>   0 tests completed and passed residual checks,
>   1 tests completed and failed residual checks,
>   0 tests skipped because of illegal input values.
>
> 
>
> Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64
> PPC970FX, altivec supported GNU/Linux
> jbronder@node41 ~ $ /usr/local/ompi- gnu-1.1.1a/bin/ompi_info
> Open MPI: 1.1.1a1r10670
>Open MPI SVN revision: r10670
> Open RTE: 1.1.1a1r10670
>Open RTE SVN revision: r10670
> OPAL: 1.1.1a1r10670
>OPAL SVN revision: r10670
>   Prefix: /usr/local/ompi-gnu-1.1.1a
>  Configured architecture: powerpc64-unknown-linux-gnu
>Configured by: root
>Configured on: Thu Jul  6 10:15:37 EDT 2006
>   Configure host: node41
> Built by: root
> Built on: Thu Jul  6 10:28:14 EDT 2006
>   Built host: node41
>   C bindings: yes
> C++ bindings: yes
>   Fortran77 bindings: yes (all)
>   Fortran90 bindings: yes
>  Fortran90 bindings size: small
>   C compiler: gcc
>  C compiler absolute: /usr/bin/gcc
> C++ compiler: g++
>C++ compiler absolute: /usr/bin/g++
>   Fortran77 compiler: gfortran
>   Fortran77 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>   Fortran90 compiler: gfortran
>   Fortran90 compiler abs:
> /usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
>  C profiling: yes
>C++ profiling: yes
>  Fortran77 profiling: yes
>  Fortran90 profiling: yes
>   C++ exceptions: no
>   Thread support: posix (mpi: no, progress: no)
>   Internal debug support: no
>  MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
>  libltdl support: yes
>   MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA maffinity: first_use (MCA v1.0, API v1.0, Component
> v1.1.1)
>MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
>MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
> MCA coll: basic (MCA v1.0, API v1.0, Componentv1.1.1)
>
>MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
> MCA coll: self (MCA v1.0, API v1

Re: [OMPI users] Problem with Openmpi 1.1

2006-07-06 Thread Justin Bronder

As far as the nightly builds go, I'm still seeing what I believe to be
this problem in both r10670 and r10652.  This is happening with
both Linux and OS X.  Below are the systems and ompi_info for the
newest revision 10670.

As an example of the error, when running HPL with Myrinet I get the
following error.  Using tcp everything is fine and I see the results I'd
expect.

||Ax-b||_oo / ( eps * ||A||_1  * N) =
42820214496954887558164928727596662784.000 .. FAILED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) = 156556068835.2711182 ..
FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 1156439380.5172558 ..
FAILED
||Ax-b||_oo  . . . . . . . . . . . . . . . . . =
272683853978565028754868928512.00
||A||_oo . . . . . . . . . . . . . . . . . . . =3822.884181
||A||_1  . . . . . . . . . . . . . . . . . . . =3823.922627
||x||_oo . . . . . . . . . . . . . . . . . . . =
37037692483529688659798261760.00
||x||_1  . . . . . . . . . . . . . . . . . . . =
4102704048669982798475494948864.00
===

Finished  1 tests with the following results:
 0 tests completed and passed residual checks,
 1 tests completed and failed residual checks,
 0 tests skipped because of illegal input values.


Linux node41 2.6.16.19 #1 SMP Wed Jun 21 17:22:01 EDT 2006 ppc64 PPC970FX,
altivec supported GNU/Linux
jbronder@node41 ~ $ /usr/local/ompi-gnu-1.1.1a/bin/ompi_info
   Open MPI: 1.1.1a1r10670
  Open MPI SVN revision: r10670
   Open RTE: 1.1.1a1r10670
  Open RTE SVN revision: r10670
   OPAL: 1.1.1a1r10670
  OPAL SVN revision: r10670
 Prefix: /usr/local/ompi-gnu-1.1.1a
Configured architecture: powerpc64-unknown-linux-gnu
  Configured by: root
  Configured on: Thu Jul  6 10:15:37 EDT 2006
 Configure host: node41
   Built by: root
   Built on: Thu Jul  6 10:28:14 EDT 2006
 Built host: node41
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: gfortran
 Fortran77 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
 Fortran90 compiler: gfortran
 Fortran90 compiler abs:
/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.0/gfortran
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1)
  MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1)
  MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
  MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
   MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
 MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
  MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
 MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
 

[OMPI users] OpenMpi 1.1 and Torque 2.1.1

2006-06-29 Thread Justin Bronder

I'm having trouble getting OpenMPI to execute jobs when submitting through
Torque.
Everything works fine if I am to use the included mpirun scripts, but this
is obviously
not a good solution for the general users on the cluster.

I'm running under OS X 10.4, Darwin 8.6.0.  I configured OpenMpi with:
export CC=/opt/ibmcmp/vac/6.0/bin/xlc
export CXX=/opt/ibmcmp/vacpp/6.0/bin/xlc++
export FC=/opt/ibmcmp/xlf/8.1/bin/xlf90_r
export F77=/opt/ibmcmp/xlf/8.1/bin/xlf_r
export LDFLAGS=-lSystemStubs
export LIBTOOL=glibtool

PREFIX=/usr/local/ompi-xl

./configure \
   --prefix=$PREFIX \
   --with-tm=/usr/local/pbs/ \
   --with-gm=/opt/gm \
   --enable-static \
   --disable-cxx

I also had to employ the fix listed in:
http://www.open-mpi.org/community/lists/users/2006/04/1007.php


I've attached the output of ompi_info while in an interactive job.  Looking
through the list,
I can at least save a bit of trouble by listing what does work.  Anything
outside of Torque
seems fine.  From within an interactive job, pbsdsh works fine, hence the
earlier problems
with poll are fixed.

Here is the error that is reported when I attemt to run hostname on one
processor:
node96:/usr/src/openmpi-1.1 jbronder$ /usr/local/ompi-xl/bin/mpirun -np 1
-mca pls_tm_debug 1 /bin/hostname
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: final top-level argv:
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: orted --no-daemonize
--bootproxy 1 --name  --num_procs 2 --vpid_start 0 --nodename  --universe
jbron...@node96.meldrew.clusters.umaine.edu:default-universe --nsreplica "
0.0.0;tcp://10.0.1.96:49395" --gprreplica "0.0.0;tcp://10.0.1.96:49395"
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: Set
prefix:/usr/local/ompi-xl
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: launching on node
localhost
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: resetting PATH:
/usr/local/ompi-xl/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/pbs/bin:/usr/local/mpiexec/bin:/opt/ibmcmp/xlf/8.1/bin:/opt/ibmcmp/vac/6.0/bin:/opt/ibmcmp/vacpp/6.0/bin:/opt/gm/bin:/opt/fms/bin
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: found
/usr/local/ompi-xl/bin/orted
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: not oversubscribed --
setting mpi_yield_when_idle to 0
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: executing: orted
--no-daemonize --bootproxy 1 --name 0.0.1 --num_procs 2 --vpid_start 0
--nodename localhost --universe
jbron...@node96.meldrew.clusters.umaine.edu:default-universe
--nsreplica "0.0.0;tcp://10.0.1.96:49395" --gprreplica "0.0.0
;tcp://10.0.1.96:49395"
[node96.meldrew.clusters.umaine.edu:00850] pls:tm: start_procs returned
error -13
[node96.meldrew.clusters.umaine.edu:00850] [0,0,0] ORTE_ERROR_LOG: Not found
in file rmgr_urm.c at line 184
[node96.meldrew.clusters.umaine.edu:00850] [0,0,0] ORTE_ERROR_LOG: Not found
in file rmgr_urm.c at line 432
[node96.meldrew.clusters.umaine.edu:00850] mpirun: spawn failed with
errno=-13
node96:/usr/src/openmpi-1.1 jbronder$


My thanks for any help in advance,

Justin Bronder.


ompi_info.log.gz
Description: GNU Zip compressed data


Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder

On 5/31/06, Brian W. Barrett  wrote:


A quick workaround is to edit opal/include/opal_config.h and change the
#defines for OMPI_CXX_GCC_INLINE_ASSEMBLY and OMPI_CC_GCC_INLINE_ASSEMBLY
from 1 to 0.  That should allow you to build Open MPI with those XL
compilers.  Hopefully IBM will fix this in a future version ;).



Well I actually edited include/ompi_config.h and set both
OMPI_C_GCC_INLINE_ASSEMBLY
and OMPI_CXX_GCC_INLINE_ASSEMBLY to 0.  This worked until libtool tried to
create
a shared library:

/bin/sh ../libtool --tag=CC --mode=link gxlc_64  -O -DNDEBUG
-qnokeyword=asm   -export-dynamic   -o libopal.la -rpath
/usr/local/ompi-xl/lib   libltdl/libltdlc.la asm/libasm.la class/libclass.la
event/libevent.la mca/base/libmca_base.la memoryhooks/libopalmemory.la
runtime/libruntime.la threads/libthreads.la util/libopalutil.la
mca/maffinity/base/libmca_maffinity_base.la
mca/memory/base/libmca_memory_base.la
mca/memory/malloc_hooks/libmca_memory_malloc_hooks.la
mca/paffinity/base/libmca_paffinity_base.la
mca/timer/base/libmca_timer_base.la mca/timer/linux/libmca_timer_linux.la
-lm  -lutil -lnsl
mkdir .libs
gxlc_64 -shared  --whole-archive libltdl/.libs/libltdlc.a asm/.libs/libasm.a
class/.libs/libclass.a event/.libs/libevent.a mca/base/.libs/libmca_base.a
memoryhooks/.libs/libopalmemory.a runtime/.libs/libruntime.a
threads/.libs/libthreads.a util/.libs/libopalutil.a
mca/maffinity/base/.libs/libmca_maffinity_base.a
mca/memory/base/.libs/libmca_memory_base.a
mca/memory/malloc_hooks/.libs/libmca_memory_malloc_hooks.a
mca/paffinity/base/.libs/libmca_paffinity_base.a
mca/timer/base/.libs/libmca_timer_base.a
mca/timer/linux/.libs/libmca_timer_linux.a --no-whole-archive  -ldl -lm
-lutil -lnsl -lc  -qnokeyword=asm -soname libopal.so.0 -o
.libs/libopal.so.0.0.0
gxlc: 1501-257 Option --whole-archive is not recognized.  Option will be
ignored.
gxlc: 1501-257 Option --no-whole-archive is not recognized.  Option will be
ignored.
gxlc: 1501-257 Option -qnokeyword=asm is not recognized.  Option will be
ignored.
gxlc: 1501-257 Option -soname is not recognized.  Option will be ignored.
xlc: 1501-218 file libopal.so.0 contains an incorrect file suffix
xlc: 1501-228 input file libopal.so.0 not found
xlc -q64 -qthreaded -D_REENTRANT -lpthread -qmkshrobj
libltdl/.libs/libltdlc.a asm/.libs/libasm.a class/.libs/libclass.a
event/.libs/libevent.a mca/base/.libs/libmca_base.a
memoryhooks/.libs/libopalmemory.a runtime/.libs/libruntime.a
threads/.libs/libthreads.a util/.libs/libopalutil.a
mca/maffinity/base/.libs/libmca_maffinity_base.

I was able to fix this by editing libtool and replacing $CC with $LD in the
following:

# Commands used to build and install a shared archive.
archive_cmds="\$LD -shared \$libobjs \$deplibs \$compiler_flags
\${wl}-soname \$wl\$soname -o \$lib"
archive_expsym_cmds="\$echo \\\"{ global:\\\" >
\$output_objdir/\$libname.ver~
 cat \$export_symbols | sed -e \\\"s/(.*)/1;/\\\" >>
\$output_objdir/\$libname.ver~
 \$echo \\\"local: *; };\\\" >> \$output_objdir/\$libname.ver~
   \$LD -shared \$libobjs \$deplibs \$compiler_flags \${wl}-soname
\$wl\$soname \${wl}-version-script \${wl}\$output_objdir/\$libname.ver -o
\$lib"

We then fail later on at:

make[3]: Entering directory `/usr/src/openmpi-1.0.3a1r10133
/orte/tools/orted'
/bin/sh ../../../libtool --tag=CC --mode=link gxlc_64  -O -DNDEBUG
-export-dynamic   -o orted   orted.o ../../../orte/liborte.la
../../../opal/libopal.la  -lm  -lutil -lnsl
gxlc_64 -O -DNDEBUG -o .libs/orted orted.o --export-dynamic
../../../orte/.libs/liborte.so
/usr/src/openmpi-1.0.3a1r10133/opal/.libs/libopal.so
../../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl --rpath
/usr/local/ompi-xl/lib
gxlc: 1501-257 Option --export-dynamic is not recognized.  Option will be
ignored.
gxlc: 1501-257 Option --rpath is not recognized.  Option will be ignored.
xlc: 1501-274 An incompatible level of gcc has been specified.
xlc: 1501-228 input file /usr/local/ompi-xl/lib not found
xlc -q64 -qthreaded -D_REENTRANT -lpthread -O -DNDEBUG -o .libs/orted
orted.o ../../../orte/.libs/liborte.so
/usr/src/openmpi-1.0.3a1r10133/opal/.libs/libopal.so
../../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl /usr/local/ompi-xl/lib

Simply replacing ld for gxlc_64 here obviously won't work.
node42 orted # ld  -O -DNDEBUG -o .libs/orted orted.o --export-dynamic
../../../orte/.libs/liborte.so
/usr/src/openmpi-1.0.3a1r10133/opal/.libs/libopal.so
../../../opal/.libs/libopal.so -ldl -lm -lutil -lnsl --rpath
/usr/local/ompi-xl/lib -lpthread
ld: warning: cannot find entry symbol _start; defaulting to 10013ed8

Of course, I've been told that directly linking with ld isn't such a great
idea in the first
place.  Ideas?

Thanks,

Justin.


Re: [OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-31 Thread Justin Bronder

On 5/30/06, Brian Barrett <brbar...@open-mpi.org> wrote:


On May 28, 2006, at 8:48 AM, Justin Bronder wrote:

> Brian Barrett wrote:
>> On May 27, 2006, at 10:01 AM, Justin Bronder wrote:
>>
>>
>>> I've attached the required logs.  Essentially the problem seems to
>>> be that the XL Compilers fail to recognize "__asm__ __volatile__" in
>>> opal/include/sys/powerpc/atomic.h when building 64-bit.
>>>
>>> I've tried using various xlc wrappers such as gxlc and xlc_r to
>>> no avail.  The current log uses xlc_r_64 which is just a one line
>>> shell script forcing the -q64 option.
>>>
>>> The same works flawlessly with gcc-4.1.0.  I'm using the nightly
>>> build in order to link with Torque's new shared libraries.
>>>
>>> Any help would be greatly appreciated.  For reference here are
>>> a few other things that may provide more information.
>>>
>>
>> Can you send the config.log file generated by configure?  What else
>> is in the xlc_r_64 shell script, other than the -q64 option?

> I've attached the config.log, and here's what all of the *_64 scripts
> look like.

Can you try compiling without the -qkeyword=__volatile__?  It looks
like XLC now has some support for GCC-style inline assembly, but it
doesn't seem to be working in this case.  If that doesn't work, try
setting CFLAGS and CXXFLAGS to include -qnokeyword=asm, which should
disable GCC inline assembly entirely.  I don't have access to a linux
cluster with the XL compilers, so I can't verify this.  But it should
work.

Brian



No good sadly.  The same error continues to appear.  I had actually
initially
attempted to  compile without -qkeyword=__volatile__, but had hoped to
force xlc to recognize it.  This is obviously more of an XL issue,
especially
as I've since found that everything works flawlessly in 32-bit mode.  If
anyone
has more suggestions, I'd love the help as I'm lost at this point.

Thanks for the help thus far,

Justin.


[OMPI users] [PMX:VIRUS] Re: OpenMPI 1.0.3a1r10002 Fails to build with IBM XL Compilers.

2006-05-28 Thread Justin Bronder
Brian Barrett wrote:
> On May 27, 2006, at 10:01 AM, Justin Bronder wrote:
>
>   
>> I've attached the required logs.  Essentially the problem seems to
>> be that the XL Compilers fail to recognize "__asm__ __volatile__" in
>> opal/include/sys/powerpc/atomic.h when building 64-bit.
>>
>> I've tried using various xlc wrappers such as gxlc and xlc_r to
>> no avail.  The current log uses xlc_r_64 which is just a one line
>> shell script forcing the -q64 option.
>>
>> The same works flawlessly with gcc-4.1.0.  I'm using the nightly
>> build in order to link with Torque's new shared libraries.
>>
>> Any help would be greatly appreciated.  For reference here are
>> a few other things that may provide more information.
>> 
>
> Can you send the config.log file generated by configure?  What else  
> is in the xlc_r_64 shell script, other than the -q64 option?
>
>
>   
I've attached the config.log, and here's what all of the *_64 scripts
look like.


node42 openmpi-1.0.3a1r10002 # cat /opt/ibmcmp/vac/8.0/bin/xlc_r_64
#!/bin/sh
xlc_r -q64 "$@"


Thanks,

-- 
Justin Bronder
University of Maine, Orono

Advanced Computing Research Lab
20 Godfrey Dr
Orono, ME 04473
www.clusters.umaine.edu

Mathematics Department
425 Neville Hall
Orono, ME 04469


WARNING: The virus scanner was unable to scan the next
attachment.  This attachment could possibly contain viruses
or other malicious programs.  The attachment could not be
scanned for the following reasons:

The file was corrupt

You are advised NOT to open this attachment unless you are
completely sure of its contents.  If in doubt, please
contact your system administrator.  

The identifier for this message is 'k4SEmG3W018957'.

The Management
PureMessage Admin <sys...@cs.indiana.edu>


config.log.tar.gz
Description: application/gzip