Re: [OMPI users] MPI Java Bindings on Mac OSX

2013-01-03 Thread Doug Reeder
Ralph,

The source is available at

http://modules.sourceforge.net/

Doug

On Jan 3, 2013, at 10:49 AM, Ralph Castain wrote:

> Hi Doug
> 
> What modules software do you use on the Mac? Would be nice to know :-)
> 
> 
> On Jan 3, 2013, at 8:34 AM, Doug Reeder <d...@centurylink.net> wrote:
> 
>> Chuck,
>> 
>> In step 4 you might want to consider the following
>> 
>> --prefix=/usr/local/openmpi-1.7rc5
>> 
>> and use the modules software to select which version of openmpi to use. I 
>> have to have multiple versions of openmpi available on my macs and this 
>> approach has worked well for me.
>> 
>> Doug Reeder
>> On Jan 3, 2013, at 9:22 AM, Chuck Mosher wrote:
>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MPI Java Bindings on Mac OSX

2013-01-03 Thread Doug Reeder
Chuck,

In step 4 you might want to consider the following

--prefix=/usr/local/openmpi-1.7rc5

and use the modules software to select which version of openmpi to use. I have 
to have multiple versions of openmpi available on my macs and this approach has 
worked well for me.

Doug Reeder
On Jan 3, 2013, at 9:22 AM, Chuck Mosher wrote:

> Hi,
> 
> I've been trying to get a working version of the MPI java bindings on Mac OSX 
> (10.6.8 with Java 1.6.0_37).
> 
> I ran into a number of issues along the way that I thought I would record 
> here for others who might be foolish enough to try the same ;-)
> 
> The issues I had to spend time with were:
> 
> 1. Installing a C compiler that can run from the command line
> 2. Finding and installing an appropriate Java JDK for my OS version
> 3. Building and installing OpenMPI for the first time on a Mac
> 4. Conflicts with the existing OpenMPI version 1.2.8 that was installed 
> already on my Mac
> 5. Figuring out syntax for using the mpirun command line to run java
> 6. Odd behavior when trying to use "localhost" or the output from `hostname` 
> on the command line or in a hostfile
> 
> Resolution for each of these in order:
> 
> 1. Installing a C compiler for the command line
> Found a good resource here:
> http://www.macobserver.com/tmo/article/install_the_command_line_c_compilers_in_os_x_lion
> The solution is to install XCode, then enable command line compilers from the 
> XCode console.
> 
> 2. Finding and installing an appropriate Java JDK for my OS version
> Used this resource to eventually figure out what to do:
> http://www.wikihow.com/Install-the-JDK-(Java-Development-Kit)-on-Mac-OS-X
> It didn't exactly match my setup, but had enough clues.
> The solution is to first find your java version (java -version, 1.6.0_37 in 
> my case) and then match that version number to the Apple Java update version 
> (11 in my case). 
> The key document is:
> http://developer.apple.com/library/mac/#technotes/tn2002/tn2110.html
> Which is a table relating java version numbers to the appropriate "Java for 
> Mac OS X xx.x Update xx".
> Once you know the update number, you can download the JDK installer from
> https://developer.apple.com/downloads/index.action
> where you of course have to have an Apple developer ID to access.
> Enter "java" in the search bar on the left and find the matching java update, 
> and you're good to go.
> 
> 3. Building and installing OpenMPI for the first time on a Mac
> After the usual false starts with a new installation on a new OS, I managed 
> to get a working build of openmpi-1.7rc5 with Java bindings.
> I could only find the java bindings in the 1.7 pre-release.
> I used the defaults as much as possible. 
> 
> After downloading from:
> http://www.open-mpi.org/software/ompi/v1.7/
> and unarchiving to Downloads, open a Terminal window.
> 
> cd Downloads/openmpi-1.7rc5
> ./configure --enable-java --prefix=/usr/local
> make all
> sudo make install
> 
> Verify that you can run the commands and examples:
> 
> chuck-> /usr/local/bin/mpirun -version
> mpirun (Open MPI) 1.7rc5
> 
> chuck-> cd examples
> chuck-> make
> chuck-> /usr/local/bin/mpirun -np 2 hello_c
> Hello, world, I am 0 of 2, (Open MPI v1.7rc5, package: Open MPI 
> chuck@chucks-iMac.local Distribution, ident: 1.7rc5, Oct 30, 2012, 111)
> Hello, world, I am 1 of 2, (Open MPI v1.7rc5, package: Open MPI 
> chuck@chucks-iMac.local Distribution, ident: 1.7rc5, Oct 30, 2012, 111)
> 
> 4. Conflicts with the existing OpenMPI version 1.2.8 that was installed 
> already on my Mac
> OpenMPI Version 1.2.8 was already installed for my OS in /usr/bin
> So, if you accidentally type:
> 
> chuck-> mpirun -np 2 hello_c
> --
> A requested component was not found, or was unable to be opened
> ...
> 
> you picked up the wrong "mpirun" and you will get a bunch of error output 
> complaining about sockets or mis-matched shared library versions.
> 
> I dealt with this moving the existing OpenMPI related commands to a 
> subdirectory, and then created symbolic links from /usr/local/bin to /usr/bin 
> for the commands I needed.
> 
> 5. Figuring out syntax for using the mpirun command line to run java
> First be sure you can run Java
> 
> chuck-> /usr/bin/java -version
> java version "1.6.0_37"
> Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-10M3909)
> Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)
> 
> Then be sure you can run your java class from the command line as well. To 
> figure this out I created a couple of simple java files in a temp direct

Re: [OMPI users] regarding the problem occurred while running an mpi programs

2012-04-25 Thread Doug Reeder
That is well documented as a BAD idea.

On Apr 25, 2012, at 8:23 AM, seshendra seshu wrote:

> Hi 
> Yes i run in root. 
> 
> On Wed, Apr 25, 2012 at 4:20 PM, tyler.bal...@huskers.unl.edu 
>  wrote:
> Seshendra, 
> 
> Do you always run in root? If not your root bash file may not have the 
> correct path settings, but that is a shot in the dark..
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of 
> seshendra seshu [seshu...@gmail.com]
> Sent: Wednesday, April 25, 2012 9:16 AM
> To: Open MPI Users
> Subject: [OMPI users] regarding the problem occurred while running an mpi 
> programs
> 
> 
> Hi,
> I have got the following error while running mpi programs
> 
> Here for running an mpi program i used hostfile which specifies all the nodes 
> in my cluster and out is my output file generated after "mpicc -o out 
> basic.c". then i have got the following error.
> 
> [root@ip-10-80-106-70 openmpi-1.4.5]# mpirun --hostfile hostfile out
> out: error while loading shared libraries: libmpi_cxx.so.0: cannot open 
> shared object file: No such file or directory
> --
> mpirun was unable to launch the specified application as it could not find an 
> executable:
> 
> Executable: out
> Node: ip-10-85-134-176.example.com
> 
> while attempting to start process rank 1.
> 
> 
> so kindly provide me  solution iam lagging of time.
> 
> -- 
>  WITH REGARDS
> M.L.N.Seshendra
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
>  WITH REGARDS
> M.L.N.Seshendra
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] heterogenous cluster

2011-02-02 Thread Doug Reeder
Jody,

With the gnu compilers the -m32  flag works. With other compilire's the same or 
other flag should work. 

Doug Reeder
On Feb 1, 2011, at 11:46 PM, jody wrote:

> Thanks for your reply.
> 
> If i try your suggestion, every process fails with the following message:
> 
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [aim-triops:15460] Abort before MPI_INIT completed successfully; not
> able to guarantee that all other processes were killed!
> 
> I think this is caused by the fact that on the 64Bit machine Open MPI
> is also built as a 64 bit application.
> How can i force OpenMPI to be built as a 32Bit application on a 64Bit machine?
> 
> Thank You
> Jody
> 
> On Tue, Feb 1, 2011 at 9:00 PM, David Mathog <mat...@caltech.edu> wrote:
>> 
>>> I have sofar used a homogenous 32-bit cluster.
>>> Now i have added a new machine which is 64 bit
>>> 
>>> This means i have to reconfigure open MPI with
>> `--enable-heterogeneous`, right?
>> 
>> Not necessarily.  If you don't need the 64bit capabilities you could run
>> 32 bit binaries along with a 32 bit version of OpenMPI.  At least that
>> approach has worked so far for me.
>> 
>> Regards,
>> 
>> David Mathog
>> mat...@caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-01 Thread Doug Reeder
Jeff,

We have similar circumstances and have been able to install and use versions of 
openmpi newer than supplied with the OS. It is necessary  to have some means of 
path management to ensure that applications build against the desired version 
of openmpi and run with the version of openmpi they were built with. We use the 
module system for this path management. We create modules for each version of 
openmpi and each version of the applications. We than include the appropriate 
openmpi module in the module for the application. Then when a user loads a 
module for their application they automatically get the correct version of 
openmpi.

Doug Reeder
On Feb 1, 2011, at 2:02 PM, Jeffrey A Cummings wrote:

> I use OpenMPI on a variety of platforms:  stand-alone servers running Solaris 
> on sparc boxes and Linux (mostly CentOS) on AMD/Intel boxes, also Linux 
> (again CentOS) on large clusters of AMD/Intel boxes.  These platforms all 
> have some version of the 1.3 OpenMPI stream.  I recently requested an upgrade 
> on all systems to 1.4.3 (for production work) and 1.5.1 (for 
> experimentation).  I'm getting a lot of push back from the SysAdmin folks 
> claiming that OpenMPI is closely intertwined with the specific version of the 
> operating system and/or other system software (i.e., Rocks on the clusters).  
> I need to know if they are telling me the truth or if they're just making 
> excuses to avoid the work.  To state my question another way:  Apparently 
> each release of Linux and/or Rocks comes with some version of OpenMPI bundled 
> in.  Is it dangerous in some way to upgrade to a newer version of OpenMPI?  
> Thanks in advance for any insight anyone can provide. 
> 
> - Jeff___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Mac Ifort and gfortran together

2010-12-15 Thread Doug Reeder
Hello,

You may me bumping into conflicts between the apple supplied ompi and your mpi. 
I use modules to force my mpi to the front of the PATH and DYLD_LIBRARY_PATH 
variables.

Doug Reeder
On Dec 15, 2010, at 5:22 PM, Jeff Squyres wrote:

> Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, 
> and the MPI Forum meeting last week...
> 
> 
> On Nov 29, 2010, at 2:12 PM, David Robertson wrote:
> 
>> I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use 
>> both Intel Ifort 11.1 and gfortran 4.3 on the same machine and switch 
>> between them to test and debug code.
>> 
>> I had runtime problems when I compiled openmpi in my usual way of no shared 
>> libraries so I switched to shared and it runs now.
> 
> What problems did you have?  OMPI should work fine when compiled statically.
> 
>> However, in order for it to work with ifort I ended up needing to add the 
>> location of my intel compiled Open MPI libraries 
>> (/opt/intelsoft/openmpi/lib) to my DYLD_LIBRARY_PATH environment variable to 
>> to get codes to compile and/or run with ifort.
> 
> Is this what Intel recommends for anything compiled with ifort on OS X, or is 
> this unique to OMPI-compiled MPI applications?
> 
>> The problem is that adding /opt/intelsoft/openmpi/lib to DYLD_LIBRARY_PATH 
>> broke my Open MPI for gfortran. Now when I try to compile with mpif90 for 
>> gfortran it thinks it's actually trying to compile with ifort still. As soon 
>> as I take the above path out of DYLD_LIBRARY_PATH everything works fine.
>> 
>> Also, when I run ompi_info everything looks right except prefix. It says 
>> /opt/intelsoft/openmpi rather than /opt/gfortransoft/openmpi like it should. 
>> It should be noted that having /opt/intelsoft/openmpi in LD_LIBRARY_PATH 
>> does not produce the same effect.
> 
> I'm not quite clear on your setup, but it *sounds* like you're somehow mixing 
> up 2 different installations of OMPI -- one in /opt/intelsoft and the other 
> in /opt/gfortransoft.
> 
> Can you verify that you're using the "right" mpif77 (and friends) when you 
> intend to, and so on?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Doug Reeder
In my experience hyperthreading can't really deliver two cores worth of 
processing simultaneously for processes expecting sole use of a core. Since you 
really have 512 cores I'm not surprised that you see a performance hit when 
requesting > 512 compute units. We should really get input from a 
hyperthreading expert, preferably form intel.

Doug Reeder
On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:

> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So 
> we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to 
> scatter an array from the master node to the compute nodes using mpiCC and 
> mpirun using C++. 
> 
> Here is my test:
> 
> The array size is 18KB * Number of compute nodes and is scattered to the 
> compute nodes 5000 times repeatly. 
> 
> The average running time(seconds):
> 
> 100 nodes: 170,
> 400 nodes: 690,
> 500 nodes: 855,
> 600 nodes: 2550,
> 700 nodes: 2720,
> 800 nodes: 2900,
> 
> There is a big jump of running time from 500 nodes to 600 nodes. Don't know 
> what's the problem. 
> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for 
> all the tests in 1.4.2 but the jump still exists. 
> Tried using either Bcast function or simply Send/Recv which give very close 
> results. 
> Tried both in running it directly or using SGE and got the same results.
> 
> The code and ompi_info are attached to this email. The direct running command 
> is :
> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile 
> ../machines -np 600 scatttest
> 
> The ifconfig of head node for eth0 is:
> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44  
>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000 
>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5 GiB)
>   Interrupt:106 Memory:d600-d6012800 
> 
> A typical ifconfig of a compute node is:
> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC  
>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000 
>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9 GiB)
>   Interrupt:82 Memory:d600-d6012800 
> 
> 
> Does anyone help me out of this? It bothers me a lot.
> 
> Thank you very much.
> 
> Linbao
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Building OpenMPI 10.4 with PGI fortran 10.8 and gcc

2010-09-14 Thread Doug Reeder
Axel,

Should the argument be -ipthread?

Doug Reeder
On Sep 14, 2010, at 12:17 PM, Axel Schweiger wrote:

> Trying to build a hybrid OpenMPI with PGI fortran and gcc to support WRF model
> The problem appears to be due to a -pthread switch passed to pgfortran.
> 
> 
> 
> libtool: link: pgfortran -shared  -fpic -Mnomain  .libs/mpi.o 
> .libs/mpi_sizeof.o .libs/mpi_comm_spawn_multiple_f90.o 
> .libs/mpi_testall_f90.o .libs/mpi_testsome_f90.o .libs/mpi_waitall_f90.o 
> .libs/mpi_waitsome_f90.o .libs/mpi_wtick_f90.o .libs/mpi_wtime_f90.o   
> -Wl,-rpath -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/ompi/.libs -Wl,-rpath 
> -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs -Wl,-rpath 
> -Wl,/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs -Wl,-rpath 
> -Wl,/opt/openmpi-pgi-gcc-1.42/lib 
> -L/home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs 
> -L/home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs 
> ../../../ompi/.libs/libmpi.so 
> /home/axel/AxboxInstall/openmpi-1.4.2/orte/.libs/libopen-rte.so 
> /home/axel/AxboxInstall/openmpi-1.4.2/opal/.libs/libopen-pal.so -ldl -lnsl 
> -lutil -lm-pthread -Wl,-soname -Wl,libmpi_f90.so.0 -o 
> .libs/libmpi_f90.so.0.0.0
> pgfortran-Error-Unknown switch: -pthread
> make[4]: *** [libmpi_f90.la] Error 1
> 
> 
> There has been discussion on this issue and the below solution suggested. 
> This doesn't appear to work for the 10.8
> release.
> 
> http://www.open-mpi.org/community/lists/users/2009/04/8911.php
> 
> There was a previous thread:
> http://www.open-mpi.org/community/lists/users/2009/03/8687.php
> 
> suggesting other solutions.
> 
> Wondering if there is a better solution right now? Building 1.4.2
> 
> Thanks
> Axel
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Configuring with torque: error and patch

2010-05-30 Thread Doug Reeder

John,

I haven't done a build with torque lately, but I think you need to  
have a -ltorque argument in the load step.


Doug Reeder
On May 30, 2010, at 9:13 AM, John Cary wrote:


Upon configuring and building openmpi on a system with
torque, I repeatedly got build errors of the sort,

/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG - 
finline-functions -pthread   -o ompi_info components.o ompi_info.o  
output.o param.o version.o ../../../ompi/libmpi.la -lnsl -lutil  -lm
libtool: link: g++ -O3 -DNDEBUG -finline-functions -pthread -o .libs/ 
ompi_info components.o ompi_info.o output.o param.o  
version.o  ../../../ompi/.libs/libmpi.so -L/usr/local/torque-2.4.0b1/ 
lib /scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/ 
orte/.libs/libopen-rte.so /scr_multipole/cary/facetspkgs/builds/ 
openmpi-1.4.2/nodl/opal/.libs/libopen-pal.so -ldl -lnsl -lutil -lm - 
pthread -Wl,-rpath -Wl,/usr/local/contrib/openmpi-1.4.2-nodl/lib
/scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ 
libopen-rte.so: undefined reference to `tm_spawn'
/scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ 
libopen-rte.so: undefined reference to `tm_poll'
/scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ 
libopen-rte.so: undefined reference to `tm_finalize'
/scr_multipole/cary/facetspkgs/builds/openmpi-1.4.2/nodl/orte/.libs/ 
libopen-rte.so: undefined reference to `tm_init'

collect2: ld returned 1 exit status

which I fixed by adding one or the other of

$(ORTE_WRAPPER_EXTRA_LDFLAGS) $(ORTE_WRAPPER_EXTRA_LIBS)

$(OMPI_WRAPPER_EXTRA_LDFLAGS) $(OMPI_WRAPPER_EXTRA_LIBS)

to various LDADD variables.  I doubt that this is consistent
with how your build system is designed, but it works for me.
I am sending you the diff in case it helps you in any way.
BTW, I also fixed some blanks after backslashes in
contrib/Makefile.am.  This is also in the attached patch.

BestJohn Cary
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Building 1.4.x on mac snow leopard with intel compilers

2010-05-23 Thread Doug Reeder

Mike,

Are you sure that you are getting the openmpi that you built and not  
the one supplied w/ OS X. I use modules to make sure that I am getting  
the openmpi version I build instead of the OS X suppleid version.


Doug Reeder
On May 23, 2010, at 10:45 AM, Glass, Micheal W wrote:

I’m having problems building a working version of openmpi 1.4.1/2 on  
a new Apple Mac Pro (dual quad-core nehalem processors) running snow  
leopard (10.6.3) with the Intel 11.1 compilers. I’ve tried the Intel  
11.1.084 and 11.1.088 versions of the compilers.  Everything appears  
to build just fine and some mpi test programs run but whenever I run  
a program with an MPI_Reduce() or MPI_Allreduce() I get a segfault  
(even with np=1).  I’m building openmpi with:


configure —without-xgrid —prefix= CC=icc CXX=icpc  
F77=ifort FC=ifort


When I build openmpi 1.4.1/2 with the GNU 4.3 compilers (installed  
via macports) using:


configure —without-xgrid —prefix= CC=gcc-mp-4.3  
CXX=g++-mp-4.3 F77=gfortran-mp-4.3 FC=gfortran-mp-4.3


all my mpi tests (6000+) run fine.  Any help would be appreciated.

Thanks,
Mike
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-04 Thread Doug Reeder

Hello,

I have a mac with two quad core nehalem chips (8 cores). The sysctl  
command shows 16 cpus (apparently w/ hyperthreading). I have a finite  
element code that runs in parallel using openmpi. Running on the  
single machine using openmpi -np 8 runs in about 2/3 time that running  
with -np 16 does. The program is very well optimized for parallel  
processing so I strongly suspect that hyperthreading is not helping.  
The program fairly aggressively uses 100% of each cpu it is on so I  
don't think hyperthreading gets much of a chance to split the cpu  
activity. I would certainly welcome input/insight from an intel  
hardware engineer. I make sure that I don't ask for more processors  
than there are physical cores and that seems to work.


Doug Reeder
On May 4, 2010, at 7:06 PM, Gus Correa wrote:


Hi Ralph

Thank you so much for your help.

You are right, paffinity is turned off (default):

**
/opt/sw/openmpi/1.4.2/gnu-4.4.3-4/bin/ompi_info --param opal all |  
grep paffinity
   MCA opal: parameter "opal_paffinity_alone" (current  
value: "0", data source: default value, synonyms:  
mpi_paffinity_alone, mpi_paffinity_alone)

**

I will try your suggestion to turn off HT tomorrow,
and report back here.
Douglas Guptill kindly sent a recipe to turn HT off via BIOS settings.

Cheers,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-


Ralph Castain wrote:

On May 4, 2010, at 4:51 PM, Gus Correa wrote:

Hi Ralph

Ralph Castain wrote:
One possibility is that the sm btl might not like that you have  
hyperthreading enabled.

I remember that hyperthreading was discussed months ago,
in the previous incarnation of this problem/thread/discussion on  
"Nehalem vs. Open MPI".

(It sounds like one of those supreme court cases ... )

I don't really administer that machine,
or any machine with hyperthreading,
so I am not much familiar to the HT nitty-gritty.
How do I turn off hyperthreading?
Is it a BIOS or a Linux thing?
I may try that.
I believe it can be turned off via an admin-level cmd, but I'm not  
certain about it
Another thing to check: do you have any paffinity settings turned  
on

(e.g., mpi_paffinity_alone)?

I didn't turn on or off any paffinity setting explicitly,
either in the command line or in the mca config file.
All that I did on the tests was to turn off "sm",
or just use the default settings.
I wonder if paffinity is on by default, is it?
Should I turn it off?
It is off by default - I mention it because sometimes people have  
it set in the default MCA param file and don't realize it is on.  
Sounds okay here, though.

Our paffinity system doesn't handle hyperthreading at this time.

OK, so *if* paffinity is on by default (Is it?),
and hyperthreading is also on, as it is now,
I must turn off one of them, maybe both, right?
I may go combinatorial about this tomorrow.
Can't do it today.
Darn locked office door!
I would say don't worry about the paffinity right now - sounds like  
it is off. You can always check, though, by running "ompi_info -- 
param opal all" and checking for the setting of the  
opal_paffinity_alone variable

I'm just suspicious of the HT since you have a quad-core machine,

and the limit where things work seems to be 4...

It may be.
If you tell me how to turn off HT (I'll google around for it  
meanwhile),

I will do it tomorrow, if I get a chance to
hard reboot that pesky machine now locked behind a door.
Yeah, I'm beginning to believe it is the HT that is causing the  
problem...

Thanks again for your help.

Gus


On May 4, 2010, at 3:44 PM, Gus Correa wrote:

Hi Jeff

Sure, I will certainly try v1.4.2.
I am downloading it right now.
As of this morning, when I first downloaded,
the web site still had 1.4.1.
Maybe I should have refreshed the web page on my browser.

I will tell you how it goes.

Gus

Jeff Squyres wrote:

Gus -- Can you try v1.4.2 which was just released today?
On May 4, 2010, at 4:18 PM, Gus Correa wrote:

Hi Ralph

Thank you very much.
The "-mca btl ^sm" workaround seems to have solved the problem,
at least for the little hello_c.c test.
I just ran it fine up to 128 processes.

I confess I am puzzled by this workaround.
* Why should we turn off "sm" in a standalone machine,
where everything is supposed to operate via shared memory?
* Do I incur in a performance penalty by not using "sm"?
* What other mechanism is actually used by OpenMPI for process
communication in this case?

It seems to be using tcp, because when I try -np 256 I get  
this error:


[spinoza:02715] [[11518,0],0] ORTE_ERROR_LOG: The system limit  
on number

of network connections a process ca

Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-04-30 Thread Doug Reeder

Cristobal,

It may be a 10.6 vs 10.5 difference. In the configure --help output it  
looks like --with-xgrid=no should turn off the default behavior of  
building with support for xgrid.


Doug Reeder
On Apr 30, 2010, at 3:28 PM, Cristobal Navarro wrote:

this is strange, because some weeks ago i compiled openmpi 1.4.1 on  
a mac 10.5.6

and the parameter --without-xgrid worked good.

can you turn off xgrid on the macs you are working with?? that might  
help



Cristobal




On Fri, Apr 30, 2010 at 6:19 PM, Doug Reeder <d...@cox.net> wrote:
Alan,

I haven't tried to build 1.4.x on os x 10.6.x yet, but it sounds  
like the configure script has become too clever by half. Is there a  
configure argument to force no xgrid (e.g., --with-xgrid=no or -- 
enable-xgrid=no).


Doug Reeder

On Apr 30, 2010, at 3:12 PM, Alan wrote:


Hi guys, thanks,

Well, I can assure there I have the right things as explained here:

ompi 1.2.8 (apple)
/usr/bin/ompi_info | grep xgrid
 MCA ras: xgrid (MCA v1.0, API v1.3, Component  
v1.2.8)
 MCA pls: xgrid (MCA v1.0, API v1.3, Component  
v1.2.8)


ompi 1.3.3 (Fink)
/sw/bin/ompi_info | grep xgrid
"nothing"

ompi 1.4.1 (mine, for Amber11)
/Users/alan/Programmes/amber11/exe/ompi_info | grep xgrid
  MCA plm: xgrid (MCA v2.0, API v2.0, Component  
v1.4.1)


So, my problem is "simple", the formula I used to compile ompi  
without xgrid used to work, but it's simply not working anymore  
with ompi 1.4.1, even though I see in compilation:


--- MCA component plm:xgrid (m4 configuration macro)
checking for MCA component plm:xgrid compile mode... static
checking if C and Objective C are link compatible... yes
checking for XgridFoundation Framework... yes
configure: WARNING: XGrid components must be built as DSOs.   
Disabling

checking if MCA component plm:xgrid can compile... no

Any help helps.

Thanks,

Alan

On Fri, Apr 30, 2010 at 20:32, Cristobal Navarro  
<axisch...@gmail.com> wrote:

try launching mpirun -v a see what version is picking up.
maybe its the included 1.2.x


Cristobal





On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder <d...@cox.net> wrote:
Alan,

Are you sure that the ompi_info and mpirun that you are using are  
the 1.4.1 versions and not the apple supplied versions. I use  
modules to help ensure that I am using the openmpi that I built and  
not the apple supplied versions.


Doug Reeder
On Apr 30, 2010, at 12:14 PM, Alan wrote:


Hi there,

No matter I do I cannot disable xgrid while compiling opempi. I  
tried:


--without-xgrid --enable-shared --enable-static

And still see with ompi_info:

 MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)

And because of xgrid on ompi, I have:

openmpi-1.4.1/examples% mpirun -c 2 hello_c
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error:  
1 in file src/plm_xgrid_module.m at line 119
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error:  
1 in file src/plm_xgrid_module.m at line 15


Using mac SL 10.6.3

Compiling 1.3.3 and haven't any problem.

Thanks in advance,

Alan

--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-04-30 Thread Doug Reeder

Alan,

I haven't tried to build 1.4.x on os x 10.6.x yet, but it sounds like  
the configure script has become too clever by half. Is there a  
configure argument to force no xgrid (e.g., --with-xgrid=no or -- 
enable-xgrid=no).


Doug Reeder
On Apr 30, 2010, at 3:12 PM, Alan wrote:


Hi guys, thanks,

Well, I can assure there I have the right things as explained here:

ompi 1.2.8 (apple)
/usr/bin/ompi_info | grep xgrid
 MCA ras: xgrid (MCA v1.0, API v1.3, Component v1.2.8)
 MCA pls: xgrid (MCA v1.0, API v1.3, Component v1.2.8)

ompi 1.3.3 (Fink)
/sw/bin/ompi_info | grep xgrid
"nothing"

ompi 1.4.1 (mine, for Amber11)
/Users/alan/Programmes/amber11/exe/ompi_info | grep xgrid
  MCA plm: xgrid (MCA v2.0, API v2.0, Component  
v1.4.1)


So, my problem is "simple", the formula I used to compile ompi  
without xgrid used to work, but it's simply not working anymore with  
ompi 1.4.1, even though I see in compilation:


--- MCA component plm:xgrid (m4 configuration macro)
checking for MCA component plm:xgrid compile mode... static
checking if C and Objective C are link compatible... yes
checking for XgridFoundation Framework... yes
configure: WARNING: XGrid components must be built as DSOs.  Disabling
checking if MCA component plm:xgrid can compile... no

Any help helps.

Thanks,

Alan

On Fri, Apr 30, 2010 at 20:32, Cristobal Navarro  
<axisch...@gmail.com> wrote:

try launching mpirun -v a see what version is picking up.
maybe its the included 1.2.x


Cristobal





On Fri, Apr 30, 2010 at 3:22 PM, Doug Reeder <d...@cox.net> wrote:
Alan,

Are you sure that the ompi_info and mpirun that you are using are  
the 1.4.1 versions and not the apple supplied versions. I use  
modules to help ensure that I am using the openmpi that I built and  
not the apple supplied versions.


Doug Reeder
On Apr 30, 2010, at 12:14 PM, Alan wrote:


Hi there,

No matter I do I cannot disable xgrid while compiling opempi. I  
tried:


--without-xgrid --enable-shared --enable-static

And still see with ompi_info:

 MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)

And because of xgrid on ompi, I have:

openmpi-1.4.1/examples% mpirun -c 2 hello_c
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error:  
1 in file src/plm_xgrid_module.m at line 119
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error:  
1 in file src/plm_xgrid_module.m at line 15


Using mac SL 10.6.3

Compiling 1.3.3 and haven't any problem.

Thanks in advance,

Alan

--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] openmpi 1.4.1 and xgrid

2010-04-30 Thread Doug Reeder

Alan,

Are you sure that the ompi_info and mpirun that you are using are the  
1.4.1 versions and not the apple supplied versions. I use modules to  
help ensure that I am using the openmpi that I built and not the apple  
supplied versions.


Doug Reeder
On Apr 30, 2010, at 12:14 PM, Alan wrote:


Hi there,

No matter I do I cannot disable xgrid while compiling opempi. I tried:

--without-xgrid --enable-shared --enable-static

And still see with ompi_info:

 MCA plm: xgrid (MCA v2.0, API v2.0, Component v1.4.1)

And because of xgrid on ompi, I have:

openmpi-1.4.1/examples% mpirun -c 2 hello_c
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1  
in file src/plm_xgrid_module.m at line 119
[amadeus.local:26559] [[63998,0],0] ORTE_ERROR_LOG: Unknown error: 1  
in file src/plm_xgrid_module.m at line 15


Using mac SL 10.6.3

Compiling 1.3.3 and haven't any problem.

Thanks in advance,

Alan

--
Alan Wilter S. da Silva, D.Sc. - CCPN Research Associate
Department of Biochemistry, University of Cambridge.
80 Tennis Court Road, Cambridge CB2 1GA, UK.
>>http://www.bio.cam.ac.uk/~awd28<<
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] configure script fails

2010-01-13 Thread Doug Reeder

Christoph,

It looks like you need to add -L/usr/local/lib to the fortran and f90  
flags, either on the configure input or in the environment variables,  
so that the loader can find libgfortran.


Doug
On Jan 13, 2010, at 4:09 PM, von Tycowicz, Christoph wrote:


Hi,

when running the configure script it breaks with:
configure: error: Could not run a simple Fortran 77 program.   
Aborting.

(logs with details attached)

I don't know how to interpret this error since I already  
successfully compiled fortran code using these compilers(gcc/ 
gfortran 4.5).

If would be really grateful for any clues on this.

best regards
Christoph

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI on OS X - file is not of required architecture

2009-09-11 Thread Doug Reeder

Andreas,

Have you checked that ifort is creating 64 bit objects. If I remember  
correctly with 10.1 the default was to create 32 bit objects.


Doug Reeder
On Sep 11, 2009, at 3:25 PM, Andreas Haselbacher wrote:

On Fri, Sep 11, 2009 at 5:10 PM, Jeff Squyres <jsquy...@cisco.com>  
wrote:

On Sep 11, 2009, at 10:05 AM, Andreas Haselbacher wrote:

I've built openmpi version 1.3.3 on a MacPro with OS X 10.5.8 and  
the Intel 10.1.006 Fortran compiler and gcc 4.0.  As far as I can  
tell, the configure and make commands completed fine. There are some  
warnings, but it's not clear to me that they are critical - or the  
explanation for what's not working. After installing, I try to  
compile a simple F77 hello world code. The output is:


% mpif77 helloworld_mpi.f -o helloworld_mpi
ld: warning in /opt/openmpi/lib/libmpi_f77.a, file is not of  
required architecture


This means that it skipped that library because it didn't match what  
you were trying to compile against.


Can you send the output of mpif77 --showme?


ifort -I/opt/openmpi/include -L/opt/openmpi/lib -lmpi_f77 -lmpi - 
lopen-rte -lopen-pal -lutil



Undefined symbols:
 "_mpi_init_", referenced from:
 _MAIN__ in ifortIsUNoZ.o

None of these symbols were found because libmpi_f77.a was skipped.


Right.


Here's my configure command:

./configure --prefix=/opt/openmpi --enable-static --disable-shared  
CC=gcc CFLAGS=-m64 CXX=g++ CXXFLAGS=-m64 F77=ifort FC=ifort FFLAGS=- 
assume nounderscore FCFLAGS=-assume nounderscore


I do not have the intel compilers for Mac; do they default to  
producing 64 bit objects?  I ask because it looks like you forced  
the C and C++ compilers to produce 64 bit objects -- do you need to  
do the same with ifort?  (via the FCFLAGS and FFLAGS env variables)


If I remember correctly, I had to add those flags, otherwise  
configure claimed that the compilers were not compatible. I can  
rerun configure if you suspect that this is an issue.  I did not add  
these flags to the Fortran variables because configure did not  
complain further, but I can see that this might be an issue.



Also, did you quote the "-assume nounderscore" arguments to FFLAGS/ 
FCFLAGS?  I.e., something like this:


   "FFLAGS=-assume nounderscore"


Yes, I did.

Andreas

--
Jeff Squyres
jsquy...@cisco.com


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] ompi_info segmentation fault with Snow Leopard

2009-09-01 Thread Doug Reeder

Marcus,

What version of openmpi ships with 10.6. Are you making sure that you  
are getting the includes and libraries for 1.3.3 and not the native  
apple version of openmpi.


Doug Reeder
On Sep 1, 2009, at 4:31 PM, Marcus Herrmann wrote:


Hi,
I am trying to install openmpi 1.3.3 under OSX 10.6 (Snow Leopard)  
using the 11.1.058 intel compilers. Configure and build seem to work  
fine. However trying to run ompi_info after install causes directly  
a segmentation fault without any additional information being printed.

Did anyone have success in using 1.3.3 under Snow Leopard?

Thanks
Marcus
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Configuration problem or network problem?

2009-07-06 Thread Doug Reeder

Lin,

Try -np 16 and not running on the head node.

Doug Reeder
On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:


Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as  
a headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and  
MPI_Recv ), the data size is about 40KB, and a lot of computings  
which will consume a long time(about 1 sec)in a loop.The co- 
processor in PS3 can take care of the computation, the main  
processor take care of point-to-point communication,so the computing  
and communication can overlap.The communication funtions should  
return much faster than computing function.
My question is that after some circles, the time consumed by  
communication functions in a PS3 will increase heavily, and the  
whole cluster's sync state will corrupt.When I decrease the  
computing time, this situation just disappeare.I am very confused  
about this.
I think there is a mechanism in OpenMPI that cause this case, does  
everyone get this situation before?
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there  
something i should added?

Lin
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Compiling Open MPI with PGI compilers in 32-bit mode

2009-03-20 Thread Doug Reeder

Ethan,

It looks likesome of the object files that you are trying to link to  
the malloc.o and malloc-stats.o were compiled as 64 bit objects. Are  
you using the 32 bit compiler flag for the compile step as well as the  
link step.


Doug Reeder
On Mar 20, 2009, at 10:49 AM, Ethan Mallove wrote:


Hi,

Has anyone successfully compiled Open MPI with the PGI compilers in
32-bit mode (e.g., using -tp=k8-32 flag)?  I am getting the following
error with 32-bit:

  $ cd opal/mca/memory/ptmalloc2
  $ make
  /bin/sh ../../../../libtool --tag=CC --mode=link pgcc -O -DNDEBUG - 
tp=k8-32 -export-dynamic -o libopenmpi-malloc.la -rpath /opt/SUNWhpc/ 
HPC8.2/pgi/lib malloc.lo malloc-stats.lo -lnsl -lutil -lpthread
  libtool: link: pgcc -shared -fpic -DPIC .libs/malloc.o .libs/ 
malloc-stats.o -lnsl -lutil -lpthread -lc -Wl,-soname -Wl,libopenmpi- 
malloc.so.0 -o .libs/libopenmpi-malloc.so.0.0.0
 /usr/bin/ld: warning: i386 architecture of input file `.libs/ 
malloc.o' is incompatible with i386:x86-64 output
 /usr/bin/ld: warning: i386 architecture of input file `.libs/malloc- 
stats.o' is incompatible with i386:x86-64 output

 .libs/malloc.o(.text+0xcb3): In function `realloc_check':
 : undefined reference to `opal_memcpy_base_module'
 .libs/malloc.o(.text+0x14e3): In function `munmap_chunk':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x1560): In function `mremap_chunk':
 : undefined reference to `opal_mem_hooks_release_hook'
 .libs/malloc.o(.text+0x2be2): In function `_int_free':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x30ae): In function `_int_realloc':
 : undefined reference to `opal_mem_hooks_release_hook'
 .libs/malloc.o(.text+0x3c2a): In function  
`opal_mem_free_ptmalloc2_sbrk':

 : undefined reference to `opal_mem_hooks_release_hook'
 .libs/malloc.o(.text+0x3fab): In function `ptmalloc_init':
 : undefined reference to `opal_mem_hooks_set_support'
 .libs/malloc.o(.text+0x40ad): In function `new_heap':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x40d5): In function `new_heap':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x414f): In function `new_heap':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x4198): In function `new_heap':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x4282): In function `heap_trim':
 : undefined reference to `opal_mem_free_ptmalloc2_munmap'
 .libs/malloc.o(.text+0x44aa): In function `arena_get2':
 : undefined reference to `opal_atomic_wmb'
 make: *** [libopenmpi-malloc.la] Error 2

-Ethan
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] 1.3 and --preload-files and --preload-binary

2009-01-22 Thread Doug Reeder

Josh,

It sounds like . is not in your path. That would prevent mpirun from  
seeing the binary in the current directory.


Doug Reeder
On Jan 22, 2009, at 10:48 AM, Josh Hursey wrote:


As a followup.

I can confirm that --preload-files is not working as it should.

I was able to use --preload-binary with a full path to the binary  
without a problem though. The following commands worked fine (where / 
tmp is not mounted on all machines):

  shell$ mpirun -np 2 --preload-binary /tmp/hello
  shell$ mpirun -np 2 -s /tmp/hello

However if I referred directly to the binary in the current  
directory I saw the same failure:

shell$ cd /tmp
shell$ mpirun -np 2 -s hello
--
mpirun was unable to launch the specified application as it could  
not find an executable:


Executable: hello
Node: odin101

while attempting to start process rank 0.
--


I'll keep digging into this bug, and let you know when I have a fix.  
I filed a ticket (below) that you can use to track the progress on  
this bug.

 https://svn.open-mpi.org/trac/ompi/ticket/1770

Thanks again for the bug report, I'll try to resolve this soon.

Josh

On Jan 22, 2009, at 10:49 AM, Josh Hursey wrote:

The warning is to be expected if the file already exists on the  
remote side. Open MPI has a policy not to replace the file if it  
already exists.


The segv is concerning. :/

I will take a look and see if I can diagnose what is going on here.  
Probably in the next day or two.


Thanks for the bug report,
Josh

On Jan 22, 2009, at 10:11 AM, Geoffroy Pignot wrote:


Hello,

As you can notice , I am trying the work done on this new release.  
preload-files and preload-binary options are very interesting to  
me because I work on a cluster without any shared space between  
nodes.
I tried those basically , but no success . You will find below the  
error messages.
If I did things wrong,  would it be possible to get simple  
examples showing how these options work.


Thanks

Geoffroy

/tmp/openmpi-1.3/bin/mpirun --preload-files hello.c --hostfile / 
tmp/hostlist -np 2 hostname

--
WARNING: Could not preload specified file: File already exists.

Fileset: /tmp/hello.c
Host: compil03

Will continue attempting to launch the process.

--
[compil03:26657] filem:rsh: get(): Failed to preare the request  
structure (-1)

--
WARNING: Could not preload the requested files and directories.

Fileset:
Fileset: hello.c

Will continue attempting to launch the process.

--
[compil03:26657] [[13938,0],0] ORTE_ERROR_LOG: Error in file base/ 
odls_base_state.c at line 127
[compil03:26657] [[13938,0],0] ORTE_ERROR_LOG: Error in file base/ 
odls_base_default_fns.c at line 831

[compil03:26657] *** Process received signal ***
[compil03:26657] Signal: Segmentation fault (11)
[compil03:26657] Signal code: Address not mapped (1)
[compil03:26657] Failing at address: 0x395eb15000
[compil03:26657] [ 0] /lib64/tls/libpthread.so.0 [0x395f80c420]
[compil03:26657] [ 1] /lib64/tls/libc.so.6(memcpy+0x3f)  
[0x395ed718df]
[compil03:26657] [ 2] /tmp/openmpi-1.3/lib64/libopen-pal.so.0  
[0x2a956b0a10]
[compil03:26657] [ 3] /tmp/openmpi-1.3/lib64/libopen-rte.so. 
0(orte_odls_base_default_launch_local+0x55c) [0x2a955809cc]
[compil03:26657] [ 4] /tmp/openmpi-1.3/lib64/openmpi/ 
mca_odls_default.so [0x2a963655f2]
[compil03:26657] [ 5] /tmp/openmpi-1.3/lib64/libopen-rte.so. 
0(orte_daemon_cmd_processor+0x57d) [0x2a9557812d]
[compil03:26657] [ 6] /tmp/openmpi-1.3/lib64/libopen-pal.so.0  
[0x2a956b9828]
[compil03:26657] [ 7] /tmp/openmpi-1.3/lib64/libopen-pal.so. 
0(opal_progress+0xb0) [0x2a956ae820]
[compil03:26657] [ 8] /tmp/openmpi-1.3/lib64/libopen-rte.so. 
0(orte_plm_base_launch_apps+0x1ed) [0x2a95584e7d]
[compil03:26657] [ 9] /tmp/openmpi-1.3/lib64/openmpi/ 
mca_plm_rsh.so [0x2a95c3ed98]

[compil03:26657] [10] /tmp/openmpi-1.3/bin/mpirun [0x403330]
[compil03:26657] [11] /tmp/openmpi-1.3/bin/mpirun [0x402ad3]
[compil03:26657] [12] /lib64/tls/libc.so.6(__libc_start_main+0xdb)  
[0x395ed1c4bb]

[compil03:26657] [13] /tmp/openmpi-1.3/bin/mpirun [0x402a2a]
[compil03:26657] *** End of error message ***
Segmentation fault

And it's not better with --preload-binary . a.out_32

compil03% /tmp/openmpi-1.3/bin/mpirun -s --hostfile /tmp/hostlist - 
wdir /tmp -np 2 a.out_32

--
mpirun was unable to launch the specified application as it could  
not find an executable:


Executable: a.out_32
Node: compil02

while attempting to start process rank 1.


___
users mailing

Re: [OMPI users] FW: Re: [MTT users] Is the stock MPI that comes with OSX leopard broken with xgrid?

2008-12-17 Thread Doug Reeder
I believe that the openmpi that comes with leopard doesn't support  
xgrid. If you type ompi_info|grep xgrid you get nothing. I'm not sure  
what apple was thinking.


Doug Reeder
On Dec 17, 2008, at 6:30 AM, Ethan Mallove wrote:


Hi John,

I'm forwarding your question to the Open MPI users list.

Regards,
Ethan

On Wed, Dec/17/2008 08:35:00AM, John Fink wrote:

   Hello OpenMPI folks,

   I've got a large pool of Macs running Leopard that are all on  
an xgrid.
   However, I can't seem to use the mpirun that comes with Leopard  
with the
   xgrid.  I've got my grid and password environment variables set  
up okay on
   my controller, all the xgrid command line commands work  
(displaying grid
   IDs, things like that) but mpirun only wants to run things on  
the local

   host.

   I'm extremely new to OpenMPI and only slightly less new to Macs  
so there's
   probably something very obvious that I'm missing, but I'm  
trying what's

   detailed on this page:
   http://www.macresearch.org/runing_mpi_job_through_xgrid (the / 
bin/hostname

   example).  Here's my output:

   as-0003-l:~ locadmin$ mpirun -n 8 /bin/hostname
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca
   as-0003-l.lib.mcmaster.ca

   Issuing the same command with -nolocal yields the following:

   as-0003-l:~ locadmin$ mpirun --nolocal -n 8 /bin/hostname

- 
-
   There are no available nodes allocated to this job. This could  
be because

   no nodes were found or all the available nodes were already used.

   Note that since the -nolocal option was given no processes can be
   launched on the local node.

- 
-
   [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG:  
Temporarily out

   of resource in file
   /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/ 
rmaps_base_support_fns.c

   at line 168
   [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG:  
Temporarily out

   of resource in file
   /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/ 
round_robin/rmaps_rr.c

   at line 402
   [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG:  
Temporarily out

   of resource in file
   /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/base/ 
rmaps_base_map_job.c

   at line 210
   [as-0003-l.lib.mcmaster.ca:82776] [0,0,0] ORTE_ERROR_LOG:  
Temporarily out

   of resource in file
   /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/urm/ 
rmgr_urm.c at

   line 372
   [as-0003-l.lib.mcmaster.ca:82776] mpirun: spawn failed with  
errno=-3


   Thanks very much for any help you can provide!

   jf

   --
   http://libgrunt.blogspot.com -- library culture and technology.

References

   Visible links
   . http://www.macresearch.org/runing_mpi_job_through_xgrid
   . http://as-0003-l.lib.mcmaster.ca/
   . http://as-0003-l.lib.mcmaster.ca/
   . http://as-0003-l.lib.mcmaster.ca/
   . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776
   . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776
   . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776
   . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776
   . file:///tmp/http:/as-0003-l.lib.mcmaster.ca:82776
   . http://libgrunt.blogspot.com/



___
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-21 Thread Doug Reeder

Brian,

I'm not sure I understand the problem. The ale3d program from LLNL  
operates exactly as you describe and it can be built with mpich, lam,  
or openmpi.


Doug Reeder
On Oct 21, 2008, at 3:08 PM, Adams, Brian M wrote:


-Original Message-
From: users-boun...@open-mpi.org
[mailto:users-boun...@open-mpi.org] On Behalf Of Reuti
Sent: Tuesday, October 21, 2008 11:36 AM
To: Open MPI Users
Subject: Re: [OMPI users] OpenMPI runtime-specific
environment variable?

Hi,

Am 21.10.2008 um 18:52 schrieb Ralph Castain:


On Oct 21, 2008, at 10:37 AM, Adams, Brian M wrote:


Doug is right that we could use an additional command line flag to
indicate MPI runs, but at this point, we're trying to hide

that from

the user, such that all they have to do is run the binary vs.
orterun/mpirun the binary and we detect whether it's a serial or
parallel run.


And when you have this information you decide for your user,
whether to use mpirun (and the correct version to use) or
just the plain binary?


I might have created some confusion here too.  The goal is to build  
an MPI-enabled binary 'foo' which a user may invoke as


(1) ./foo
-OR-
(2) mpirun -np 4 ./foo

The binary foo then determines at run-time whether it is to run in  
(1) serial, where MPI_Init will never be called; or (2) parallel,  
calling MPI_Init and so on.  This is a historical behavior which we  
need to preserve, at least for our present software release.



You are making something like "strings the_binary" and grep
for indications of the compilation type? For the standard
Open MPI with shared libraries a "ldd the_binary" might
reveal some information.


Hadn't thought to do that actually, since it addresses a slightly  
different problem than I propose above.  Thanks for the  
suggestion.  This is another possibility if instead of doing this  
detection directly in our binary, we decide to change to a wrapper  
script approach.


In any case, I appreciate all the discussion -- I believe I have a  
reasonable path forward using a combination of pre-processor  
defines that the OMPI wrappers and headers make with the runtime  
environment variables Ralph suggested (I'll just check for both the  
<1.3 and >= 1.3 environment cases).


Brian


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] OpenMPI runtime-specific environment variable?

2008-10-20 Thread Doug Reeder

Brian,

In your code branch for the parallel run you could set an environment  
or internal variable when you call mpi_init. Can you parse the  
command line (arg 0) and figure out if you are running parallel or  
serial.


Doug Reeder
On Oct 20, 2008, at 3:40 PM, Adams, Brian M wrote:

I work on an application (DAKOTA) that has opted for single  
binaries with source code to detect serial vs. MPI execution at run- 
time.  While I realize there are many other ways to handle this  
(wrapper scripts, command-line switches, different binaries for  
serial vs. MPI, etc.), I'm looking for a reliable way to detect (in  
source) whether a binary has been launched in serial or with orterun.


We typically do this via detecting environment variables, so the  
easiest path for me would be to know an environment variable  
present when an application is invoked with orterun that is not  
typically present outside that MPI runtime environment.  Some  
candidates that came up in my particular environment include the  
following, but I don't know if any is a safe bet:


OMPI_MCA_gpr_replica_uri
OMPI_MCA_mpi_paffinity_processor
OMPI_MCA_mpi_yield_when_idle
OMPI_MCA_ns_nds
OMPI_MCA_ns_nds_cellid
OMPI_MCA_ns_nds_jobid
OMPI_MCA_ns_nds_num_procs
OMPI_MCA_ns_nds_vpid
OMPI_MCA_ns_nds_vpid_start
OMPI_MCA_ns_replica_uri
OMPI_MCA_orte_app_num
OMPI_MCA_orte_base_nodename
OMPI_MCA_orte_precondition_transports
OMPI_MCA_pls
OMPI_MCA_ras
OMPI_MCA_rds
OMPI_MCA_rmaps
OMPI_MCA_rmgr
OMPI_MCA_universe

I'd also welcome suggestions for other in-source tests that might  
reliably detect run via orterun.  Thanks!


Brian
--
Brian M. Adams, PhD (bria...@sandia.gov)
Optimization and Uncertainty Estimation
Sandia National Laboratories, Albuquerque, NM
http://www.sandia.gov/~briadam


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Passing LD_LIBRARY_PATH to orted

2008-10-14 Thread Doug Reeder
In torque/pbs using the #PBS -V command pushes the environment  
variables out to the nodes. I don't know if that is what was  
happening with slurm.


Doug Reeder
On Oct 14, 2008, at 12:33 PM, Ralph Castain wrote:

I -think- there is...at least here, it does seem to behave that way  
on our systems. Not sure if there is something done locally to make  
it work.


Also, though, I have noted that LD_LIBRARY_PATH does seem to be  
getting forwarded on the 1.3 branch in some environments. OMPI  
isn't doing it directly to the best of my knowledge, but I think  
the base environment might be. Specifically, I noticed it on slurm  
earlier today. I'll check the others as far as I can.


Craig: what environment are you using? ssh?
Ralph



On Oct 14, 2008, at 1:18 PM, George Bosilca wrote:

I use modules too, but they only work locally. Or is there a  
feature in "module" to automatically load the list of currently  
loaded local modules remotely ?


 george.

On Oct 14, 2008, at 3:03 PM, Ralph Castain wrote:

You might consider using something like  "module" - we use that  
system for exactly this reason. Works quite well and solves the  
multiple compiler issue.


Ralph

On Oct 14, 2008, at 12:56 PM, Craig Tierney wrote:


George Bosilca wrote:
The option to expand the remote LD_LIBRARY_PATH, in such a way  
that Open MPI related applications have their dependencies  
satisfied, is in the trunk. The fact that the compiler requires  
some LD_LIBRARY_PATH is out of the scope of an MPI  
implementation, and I don't think we should take care of it.
Passing the local LD_LIBRARY_PATH to the remote nodes doesn't  
make much sense. There are plenty of environment, where the  
head node have a different configuration than the compute  
nodes. Again, in this case my original solution seems not that  
bad. If you copy (or make a link if you prefer) in the Open MPI  
lib directory to the compiler shared libraries, this will work.

george.


This does work.  It just increases maintenance for each new version
of OpenMPI.   How often does a head node have a different  
configuration
than the compute node?  It would see that this would even more  
support the
passing of LD_LIBRARY_PATH for OpenMPI tools to support a  
heterogeneous

configuration as you described.


Thanks,
Craig





On Oct 14, 2008, at 12:11 PM, Craig Tierney wrote:

George Bosilca wrote:

Craig,
This is a problem with the Intel libraries and not the Open  
MPI ones. You have to somehow make these libraries available  
on the compute nodes.
What I usually do (but it's not the best way to solve this  
problem) is to copy these libraries somewhere on my home area  
and to add the directory to my LD_LIBRARY_PATH.

george.


This is ok when you only ever use one compiler, but it isn't  
very flexible.
I want to keep it as simple as possible for my users, while  
having a maintainable

system.

The libraries are on the compute nodes, the problem deals with  
supporting
multiple versions of compilers.  I can't just list all of the  
lib paths
in ld.so.conf, because then the user will never get the  
correct one.  I can't
specify a static LD_LIBRARY_PATH for the same reason.  I would  
prefer not

to build my system libraries static.

To the OpenMPI developers, what is your opinion on changing  
orterun/mpirun
to pass LD_LIBRARY_PATH to the remote hosts when starting  
OpenMPI processes?

By hand, all that would be done is:

env LD_LIBRARY_PATH=$LD_LIBRARY_PATH $OPMIPATH/orted 

This would ensure that orted is launched correctly.

Or is it better to just build the OpenMPI tools statically?   
We also
use other compilers (PGI, Lahey) so I need a solution that  
works for

all of them.

Thanks,
Craig




On Oct 10, 2008, at 6:17 PM, Craig Tierney wrote:
I am having problems launching openmpi jobs on my system.  I  
support multiple versions
of MPI and compilers using GNU Modules.  For the default  
compiler, everything is fine.

For non-default, I am having problems.

I built Openmpi-1.2.6 (and 1.2.7) with the following  
configure options:


# module load intel/10.1
# ./configure CC=icc CXX=icpc F77=ifort FC=ifort F90=ifort -- 
prefix=/opt/openmpi/1.2.7-intel-10.1 --without-
gridengine --enable-io-romio --with-io-romio-flags=--with- 
file-sys=nfs+ufs --with-openib=/opt/hjet/ofed/1.3.1


When I launch a job, I run the module command for the right  
compiler/MPI version to set the paths
correctly.  Mpirun passes LD_LIBRARY_PATH to the executable  
I am launching, but not orted.


When orted is launched on the remote system, the  
LD_LIBRARY_PATH

doesn't come with, and the Intel 10.1 libraries can't be found.

/opt/openmpi/1.2.7-intel-10.1/bin/orted: error while loading  
shared libraries: libintlc.so.5: cannot open shared object  
file: No such file or directory


How do others solve this problem?

Thanks,
Craig


--
Craig Tierney (craig.tier...@noaa.gov)
___
users mailing list
us...@open-mpi.org
htt

Re: [OMPI users] OMPI link error with petsc 2.3.3

2008-10-07 Thread Doug Reeder

Yann,

It looks like somehow the libmpi and libmpi_f90 have different values  
for the variable mpi_fortran_status_ignore. It sounds like a  
configure problem. You might check the mpi include files to see if  
you can see where the different values are coming from.


Doug Reeder
On Oct 7, 2008, at 7:55 AM, Yann JOBIC wrote:


Hello,

I'm using openmpi 1.3r19400 (ClusterTools 8.0), with sun studio 12,  
and solaris 10u5


I've got this error when linking a PETSc code :
ld: warning: symbol `mpi_fortran_status_ignore_' has differing sizes:
   (file /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi.so value=0x8;  
file /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi_f90.so value=0x14);

   /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi.so definition taken


Isn't it very strange ?

Have you got any idea on the way to solve it ?

Many thanks,

Yann
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] segfault issue - possible bug in openmpi

2008-10-04 Thread Doug Reeder

Shafagh,

I missed the dependence on the number of processors. Apparently there  
is some thread support.


Doug
On Oct 4, 2008, at 5:29 PM, Shafagh Jafer wrote:


Doug Reeder,
Daniel is saying that the problem only occurs in openmpi when  
running more than 16 processes. So could that still be cause  
becasue openmpi does not support threads??!!


--- On Fri, 10/3/08, Doug Reeder <d...@rain.org> wrote:
From: Doug Reeder <d...@rain.org>
Subject: Re: [OMPI users] segfault issue - possible bug in openmpi
To: "Open MPI Users" <us...@open-mpi.org>
Date: Friday, October 3, 2008, 2:40 PM

Daniel,

Are you using threads. I don't think the opempi-1.2.x work with  
threads.


Doug Reeder
On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote:


Oh, by the way, here is the segfault:

[m4b-1-8:11481] *** Process received signal ***
[m4b-1-8:11481] Signal: Segmentation fault (11)
[m4b-1-8:11481] Signal code: Address not mapped (1)
[m4b-1-8:11481] Failing at address: 0x2b91c69eed
[m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
[m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0  
[0x2abea7c0]
[m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0  
[0x2abea675]
[m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 
(mca_pml_ob1_send+0x2da) [0x2abeaf55]
[m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 
(MPI_Send+0x28e) [0x2ab52c5a]
[m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8]
[m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c]
[m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5]
[m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)  
[0x33e841d8a4]
[m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham [0x4040b9]

[m4b-1-8:11483] *** End of error message ***



On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhan...@byu.net>  
wrote:
I have been testing some code against openmpi lately that always  
causes it to crash during certain mpi function calls.  The code  
does not seem to be the problem, as it runs just fine against  
mpich.  I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7  
and they all exhibit the same problem.  Also, the problem only  
occurs in openmpi when running more than 16 processes.  I have  
posted this stack trace to the list before, but I am submitting it  
now as a potential bug report.  I need some help debugging it and  
finding out exactly what is going on in openmpi when the segfault  
occurs.  Are there any suggestions on how best to do this?  Is  
there an easy way to attach gdb to one of the processes or  
something??  I have already compiled openmpi with debugging,  
memory profiling, etc.  How can I best take advantage of these  
features?


Thanks,
Daniel Hansen
Systems Administrator
BYU Fulton Supercomputing Lab

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] segfault issue - possible bug in openmpi

2008-10-03 Thread Doug Reeder

Daniel,

Are you using threads. I don't think the opempi-1.2.x work with threads.

Doug Reeder
On Oct 3, 2008, at 2:30 PM, Daniel Hansen wrote:


Oh, by the way, here is the segfault:

[m4b-1-8:11481] *** Process received signal ***
[m4b-1-8:11481] Signal: Segmentation fault (11)
[m4b-1-8:11481] Signal code: Address not mapped (1)
[m4b-1-8:11481] Failing at address: 0x2b91c69eed
[m4b-1-8:11483] [ 0] /lib64/libpthread.so.0 [0x33e8c0de70]
[m4b-1-8:11483] [ 1] /fslhome/dhansen7/openmpi/lib/libmpi.so.0  
[0x2abea7c0]
[m4b-1-8:11483] [ 2] /fslhome/dhansen7/openmpi/lib/libmpi.so.0  
[0x2abea675]
[m4b-1-8:11483] [ 3] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 
(mca_pml_ob1_send+0x2da) [0x2abeaf55]
[m4b-1-8:11483] [ 4] /fslhome/dhansen7/openmpi/lib/libmpi.so.0 
(MPI_Send+0x28e) [0x2ab52c5a]
[m4b-1-8:11483] [ 5] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(twham_init+0x708) [0x42a8a8]
[m4b-1-8:11483] [ 6] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(repexch+0x73c) [0x425d5c]
[m4b-1-8:11483] [ 7] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham(main+0x855) [0x4133a5]
[m4b-1-8:11483] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4)  
[0x33e841d8a4]
[m4b-1-8:11483] [ 9] /fslhome/dhansen7/compute/for_DanielHansen/ 
replica_mpi_marylou2/Openmpi_md_twham [0x4040b9]

[m4b-1-8:11483] *** End of error message ***



On Fri, Oct 3, 2008 at 3:20 PM, Daniel Hansen <dhan...@byu.net> wrote:
I have been testing some code against openmpi lately that always  
causes it to crash during certain mpi function calls.  The code  
does not seem to be the problem, as it runs just fine against  
mpich.  I have tested it against openmpi 1.2.5, 1.2.6, and 1.2.7  
and they all exhibit the same problem.  Also, the problem only  
occurs in openmpi when running more than 16 processes.  I have  
posted this stack trace to the list before, but I am submitting it  
now as a potential bug report.  I need some help debugging it and  
finding out exactly what is going on in openmpi when the segfault  
occurs.  Are there any suggestions on how best to do this?  Is  
there an easy way to attach gdb to one of the processes or  
something??  I have already compiled openmpi with debugging, memory  
profiling, etc.  How can I best take advantage of these features?


Thanks,
Daniel Hansen
Systems Administrator
BYU Fulton Supercomputing Lab

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] 1.2.2 to 1.2.7 differences.

2008-10-02 Thread Doug Reeder

Shafagh,

You should be able to google modules. it should take you to http:// 
modules.sourceforge.net. That is where the software is.


Doug
On Oct 1, 2008, at 9:32 PM, Shafagh Jafer wrote:

could you please be specific on what I should google? please give  
me the keywords. I couldn't hit the target:<


--- On Wed, 10/1/08, Doug Reeder <d...@rain.org> wrote:
From: Doug Reeder <d...@rain.org>
Subject: Re: [OMPI users] 1.2.2 to 1.2.7 differences.
To: "Open MPI Users" <us...@open-mpi.org>
Date: Wednesday, October 1, 2008, 8:58 PM

Shafagh,

You should be able to run whatever version of open-mpi you want.  
You just need to make sure that in the build and run steps that you  
don't mix the two. I have had good results using modules (you can  
google it, download it, build and install it) to keep them  
separate. You probably want to upgrade to gcc 3.x.x or 4.x.x and  
use module for that also).


Doug Reeder
On Oct 1, 2008, at 8:11 PM, Shafagh Jafer wrote:

On our cluster we have RedHat Linux 7.3 Professional, and on the  
cluster specification it says the following:

-The cluster should be able to run the follwoing software tools:
gcc 2.96.x(or 2.95.x or 2.91.66)
Bison 1.28
flex 2.5.4
mpich 1.2.5
So i am just wondering if my cluster is capable to run openmpi  
1.2.7?? I haven't contacted the cluster technicians yet but i just  
wanted to know your answer first. Many thanks in advance.



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] 1.2.2 to 1.2.7 differences.

2008-10-02 Thread Doug Reeder

Shafagh,

You should be able to run whatever version of open-mpi you want. You  
just need to make sure that in the build and run steps that you don't  
mix the two. I have had good results using modules (you can google  
it, download it, build and install it) to keep them separate. You  
probably want to upgrade to gcc 3.x.x or 4.x.x and use module for  
that also).


Doug Reeder
On Oct 1, 2008, at 8:11 PM, Shafagh Jafer wrote:

On our cluster we have RedHat Linux 7.3 Professional, and on the  
cluster specification it says the following:

-The cluster should be able to run the follwoing software tools:
gcc 2.96.x(or 2.95.x or 2.91.66)
Bison 1.28
flex 2.5.4
mpich 1.2.5
So i am just wondering if my cluster is capable to run openmpi  
1.2.7?? I haven't contacted the cluster technicians yet but i just  
wanted to know your answer first. Many thanks in advance.



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Doug Reeder
It sounds like you may not have setup paswordless ssh between all  
your nodes.


Doug Reeder
On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:


At 10:45 PM 9/29/2008 +0200, you wrote:

Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:


At 07:37 PM 9/29/2008 +0200, Reuti wrote:


"-l nodes=6:ppn=2" is all I have to specify the node requests:


this might help: http://www.open-mpi.org/faq/?category=tm


Essentially the examples given on this web is no difference from
what I did.
Only thing new is, I suppose "qsub -I " is for interactive mode.
When I did this:

 qsub -I -l nodes=7 mpiblastn.sh

It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to
start".



UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program
where "mpi_program" is a file with one line:
/path/to/mpirun -np 12 /path/to/my_program


Can you please try this jobscript instead:

#!/bin/sh
set | grep PBS
/path/to/mpirun /path/to/my_program

All should be handled by Open MPI automatically. With the "set"  
bash

command you will get a list with all defined variables for further
analysis; and where you can check for the variables set by Torque.

-- Reuti


"set | grep PBS" part had nothing in output.


Strange - you checked the .o end .e files of the job? - Reuti


There is nothing in -o nor -e output.  I had to kill the job.
I checked torque log, it shows (/var/spool/torque/server_logs):

09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing  
into default, state 1 hop 1
09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued  
at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name =  
mpiblastn.sh, queue = default
09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent  
command new
09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job  
Modified at request of schedu...@xxx.xxx.xxx
09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted  
at request of z...@xxx.xxx.xxx
09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing  
from default, state EXITING
09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent  
command term
09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad  
attempt to connect from 172.16.100.1:1021 (address not trusted -  
check entry in server_priv/nodes)


where the server_priv/nodes has:
node001 np=4
node002 np=4
node003 np=4
node004 np=4
node005 np=4
node006 np=4
node007 np=4

which was set up by the vender.

What is "address not trusted"?

Zhiliang




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] compile openmpi with a gcc that is not default gcc??

2008-09-27 Thread Doug Reeder

Shafagh,

You could put the full paths to the 3.4.4 compiler in the configure  
arguments. See configure -help.


Doug Reeder
On Sep 27, 2008, at 3:21 PM, Shafagh Jafer wrote:


I have a simple question:
My default gcc is 2.95.3, so I installed a newer version in my own  
home directory, it's gcc-3.4.4. Now I want to install openmpi and  
compile it with this new version. I dont know how to force it not  
to pick the default one. I want it to use the 3.4.4 version. Please  
let me know what to do exactly. Thanks.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Configure and Build ok, but mpi module not recognized?

2008-09-22 Thread Doug Reeder

Jeff,

I think that unless make all depends on make clean and make clean  
depends on Makefile, you have to manually run make clean and/or  
manually delete the module files.


Doug Reeder
On Sep 22, 2008, at 3:16 PM, Jeff Squyres wrote:


On Sep 22, 2008, at 6:08 PM, Brian Harker wrote:


Here's the config.log file...now that I look through it more
carefully, I see some errors that I didn't see when watching
./configure scroll by...still don't know what to do though.  :(


Not to worry; there are many tests in configure that are designed  
to fail.  So it's not a problem to see lots of failures in config.log.


I see that it did use ifort for both the F77 and F90 compilers;  
that's what I wanted to check with configure output and config.log.


Per Doug's comment, if OMPI is not re-compiling the Fortran module  
when you reconfigure with a new fortran compiler, that is likely a  
bug.  Can you "make clean all install" and see if it works?  If  
not, send all the output here (see http://www.open-mpi.org/ 
community/help/ for instructions; please compress).


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Configure and Build ok, but mpi module not recognized?

2008-09-22 Thread Doug Reeder

Brian,

Try doing a make clean before doing the build with your new make file  
(from the new configure process). It looks like you are getting the  
leftover module files from the old makefile/compilers.


Doug reeder
On Sep 22, 2008, at 2:52 PM, Brian Harker wrote:


Ok, here's something funny/weird/stupid:

Looking at the actual mpi.mod module file in the $OPENMPI_HOME/lib
directory, the very first line is:
GFORTRAN module created from mpi.f90 on Fri Sep 19 14:01:27 2008

WTF!?  I specified that I wanted to use the ifort/icc/icpc compiler
suite when I installed (see my first post)!  Why would it create the
module with gfortran?  This would seem to be the source of my
troubles...



On Mon, Sep 22, 2008 at 11:27 AM, Gus Correa  
<g...@ldeo.columbia.edu> wrote:

Hi Brian and list

I read your original posting and Jeff's answers.

Here on CentOS from Rocks Cluster I have a "native" OpenMPI, with  
a mpi.mod,

compiled with gfortran.
Note that I don't even have gfortran installed!
This is besides the MPI versions (MPICH2 and OpenMPI)
I installed from scratch using combinations of ifort and pgi with  
gcc.
It may be that mpif90 is not picking the right mpi.mod, as Jeff  
suggested.

Something like this may be part of your problem.
A "locate mpi.mod" should show what your system has.

Have you tried to force the directory where mpi.mod is searched for?
Something like this:

/full/path/to/openmpi/bin/mpif90  -module
 /full/path/to/openmpi_mpi.mod_directory/   hello_f90.f90

The ifort man pages has the "-module" syntax details.

I hope this helps.

Gus Correa

--
-
Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
-


Brian Harker wrote:


Hi Gus-

Thanks for the input.  I have been using full path names to both the
wrapper compilers and mpiexec from the first day I had two MPI
implementations on my machine, depending on if I want to use  
MPICH or

openMPI, but still the problem remains.  ARGG!

On Mon, Sep 22, 2008 at 9:40 AM, Gus Correa  
<g...@ldeo.columbia.edu> wrote:




Hello Brian and list

My confusing experiences with multiple MPI implementations
were fixed the day I decided to use full path names to the MPI  
compiler

wrappers (mpicc, mpif77, etc) at compile time,
and to the MPI job launcher (mpirun, mpiexec, and so on) at run  
time,
and to do this in a consistent fashion (using the tools from the  
same

install to compile and to run the programs).

Most Linux distributions come with built in MPI implementations  
(often

times
more than one),
and so do commercial compilers and other tools.
You end up with a mess of different MPI versions on your  
"native" PATH,

as well as variety of bin, lib, and include directories containing
different
MPI stuff.
The easy way around is to use full path names, particularly if you
install
yet another MPI implementation
from scratch.
Another way is to fix your PATH on your initialization files  
(.cshrc,

etc)
to point to your preferred implementation (put the appropriate bin
directory
ahead of everything else).
Yet another is to install the "environment modules" package on your
system
and use it consistently.

My two cents.

Gus Correa

--
--- 
--

Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
--- 
--



Brian Harker wrote:




I built and installed both MPICH2 and openMPI from source, so no
distribution packages or anything.  MPICH2 has the modules  
located in

/usr/local/include, which I assume would be found (since its in my
path), were it not for specifying -I$OPENMPI_HOME/lib at  
compile time,

right?  I can't imagine that if you tell it where to look for the
correct modules, it would search through your path first before  
going
to where you tell it to go.  Or am I too optimistic?  Thanks  
again for

the input!

On Mon, Sep 22, 2008 at 8:58 AM, Jeff Squyres <jsquy...@cisco.com>
wrote:




On Sep 22, 2008, at 10:10 AM, Brian Harker wrote:





Thanks for the reply...crap, $HOME/openmpi/lib does contains  
all the

various lilbmpi* files as well as mpi.mod,




That should be correct.





but still get the same
error at compile-time.  Yes, I made sure to specifically  
build openMPI
with ifort 10.1.012, and did run the --showme command right  
after
installation to make sure the wrapper compiler was using  
ifort as

well.




Ok, good.





Before posting to this mailing list, I did uninstall and re- 
install
openMPI several times to make sure I had a clean install. 

Re: [OMPI users] How to get started?

2008-08-15 Thread Doug Reeder

Yes, I run it on my dual core apple notebook.

Doug Reeder
On Aug 15, 2008, at 9:58 AM, Anugraha Sankaranarayanan wrote:



>>Are you talking about single notebook or multiple? Doesn't make  
sense to just have it single machine - unless you're building codes  
that gonna go into a cluster.




I have a HP Compaq Notebook with dual core processor.Can i use MPI  
in this?For learning purpose?



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] mpirun on 8-way node with rsh

2008-08-03 Thread Doug Reeder

Pete,

I don't know why the behavior on an 8 processor machine differs with  
the machine file format/syntax. You don't need to specify a machine  
file on a single multiprocessor machine.


On you torque scheduled cluster you shouldn't need a machine file for  
openmpi. Openmpi should just use the number of processors you  
requested from torque. It will communicate with torque to find out  
which ones to use.


Doug Reeder
On Aug 3, 2008, at 10:45 AM, Pete Schmitt wrote:

I use the following:  mpirun -machinefile machine.file -np 8 ./mpi- 
program

and the machine file has the following:

t01
t01
t01
t01
t01
t01
t01
t01

I get the following error:

rm_12992: (0.632812) net_send: could not write to fd=4, errno = 32
rm_13053: (0.421875) net_send: could not write to fd=4, errno = 32
rm_l_3_13050: (0.636719) net_send: could not write to fd=5, errno = 32
rm_13114: (0.210938) net_send: could not write to fd=4, errno = 32
rm_12870: (1.066406) net_send: could not write to fd=4, errno = 32
rm_12931: (0.855469) net_send: could not write to fd=4, errno = 32
rm_l_4_13111: (0.425781) net_send: could not write to fd=5, errno = 32
rm_l_1_12929: (1.070312) net_send: could not write to fd=5, errno = 32
rm_l_2_12989: (0.859375) net_send: could not write to fd=5, errno = 32
rm_l_5_13172: (0.214844) net_send: could not write to fd=5, errno = 32
p0_12866: (5.285156) net_send: could not write to fd=4, errno = 32

If I use np=6 or less, it works fine.   It also works with 8 if the
machine.file just contains t01:8
Since we want to submit this to a torque/moab cluster, it's not  
possible

to get the latter format.

The OS is a 64b RH5.2


--
Pete Schmitt
Technical Director:
 Discovery Cluster / Computational Genetics Lab
URL: http://discovery.dartmouth.edu
179M Berry Baker Library, HB 6224
Dartmouth College
Hanover, NH 03755

Dart: 603-646-8109
DHMC: 603-653-3598
Fax:  603-646-1042
Cell: 603-252-2452


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] getting fortran90 to compile

2008-07-13 Thread Doug Reeder

Zachary,

I believe you need to ad F90=/usr/bin/gfortran-4.2 (or something  
similar) to the configure arguments, FC= just gets f77 support.


Doug Reeder
On Jul 13, 2008, at 8:58 AM, zach wrote:


I installed openmpi like
./configure --prefix= FC=/usr/bin/gfortran-4.2
make all install

When i type
mpif90 file1.f90 file2.f90 file3.f90

I get
Unfortunately, this installation of Open MPI was not compiled with
Fortran 90 support.  As such, the mpif90 compiler is non-functional.

What am i doing wrong?

Zachary
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] gfortran bindings apparently not built on mac os leopard

2008-06-16 Thread Doug Reeder

Greg,

In your run_output file you don't appear to be using the openmpi  
versions that you built. From your make-install.out file it looks  
like your versions are in /usr/local/openmpi/1.2.6-gcc4.0/bin. You  
need to use that absolute path or prepend that path to your PATH  
environment variable.


Doug Reeder
On Jun 16, 2008, at 9:25 AM, Weirs, V Gregory wrote:




I am having trouble building mpif77/mpif90 with gfortran on Mac OS  
10.5. Or maybe just running. The configure, make all, and make  
install seemed to go just fine, finding my gfortran and apparently  
using it, but the scripts mpif77 and mpif90 give the error that my  
openmpi was not built with fortran bindings. Mpicc and mpicxx don’t  
give this error.  Ompi_info says the f77 and f90 bindings were built.


I know that OS X 10.5 comes with openmpi mpicc and mpicxx  
installed, but not fortran bindings, and I was careful to put the  
openmpi I built first in the path.


Some run output (mpif77 —version, ompi_info), config.log,  
configure.log, make.out, make-install.out are in the attached tarball.


Any clues?

Thanks,
Greg


--
V. Gregory Weirs
Sandia National Laboratoriesvgwe...@sandia.gov
P.O.Box 5800, MS 0378phone: 505 845 2032
Albuquerque, NM 87185-0378  fax: 505 284 0154

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Different CC for orte and opmi?

2008-06-10 Thread Doug Reeder

Ashley,

I had a similar situation linking to the intel libraries and used the  
following in the link step


-L/opt/intel/compiler_10.1/x86_64/lib -Wl,-non_shared -limf -lsvml - 
lintlc -Wl,-call_shared


This created binaries statically linked to the intel compiler  
libraries so I didn't have to push the intel libraries out to the  
nodes or worry about the LD_LIBRARY_PATH.


Doug Reeder
On Jun 10, 2008, at 4:28 AM, Ashley Pittman wrote:



Sorry, I'll try and fill in the background.  I'm attempting to package
openmpi for a number of customers we have, whenever possible on our
clusters we use modules to provide users with a choice of MPI
environment.

I'm using the 1.2.6 stable release and have built the code twice, once
to /opt/openmpi-1.2.6/gnu and once to /opt/openmpi-1.2.6/intel, I have
create two modules environments called openmpi-gnu and openmpi- 
intel and

am also using a existing one called intel-compiler.  The build was
successful in both cases.

If I load the openmpi-gnu module I can compile and run code using
mpicc/mpirun as expected, if I load openmpi-intel and intel-compiler I
find I can compile code but I get an error about missing libimf.so  
when

I try to run it (reproduced below).

The application *will* run if I add the line "module load
intel-compiler" to my bashrc as this allows orted to link.  What I  
think

I want to do is to compile the actual library with icc but to compile
orted with gcc so that I don't need to load the intel environment by
default.  I'm assuming that the link problems only exist with orted  
and
not with the actual application as the LD_LIBRARY_PATH is set  
correctly

in the shell which is launching the program.

Ashley Pittman.

sccomp@demo4-sles-10-1-fe:~/benchmarks/IMB_3.0/src> mpirun -H  
comp00,comp01 ./IMB-MPI1
/opt/openmpi-1.2.6/intel/bin/orted: error while loading shared  
libraries: libimf.so: cannot open shared object file: No such file  
or directory
/opt/openmpi-1.2.6/intel/bin/orted: error while loading shared  
libraries: libimf.so: cannot open shared object file: No such file  
or directory
[demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp01 failed to  
start as expected.
[demo4-sles-10-1-fe:29303] ERROR: There may be more information  
available from

[demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above).
[demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly  
with status 127.
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file  
base/pls_base_orted_cmds.c at line 275
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file  
pls_rsh_module.c at line 1166
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file  
errmgr_hnp.c at line 90
[demo4-sles-10-1-fe:29303] ERROR: A daemon on node comp00 failed to  
start as expected.
[demo4-sles-10-1-fe:29303] ERROR: There may be more information  
available from

[demo4-sles-10-1-fe:29303] ERROR: the remote shell (see above).
[demo4-sles-10-1-fe:29303] ERROR: The daemon exited unexpectedly  
with status 127.
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file  
base/pls_base_orted_cmds.c at line 188
[demo4-sles-10-1-fe:29303] [0,0,0] ORTE_ERROR_LOG: Timeout in file  
pls_rsh_module.c at line 1198
-- 

mpirun was unable to cleanly terminate the daemons for this job.  
Returned value Timeout instead of ORTE_SUCCESS.
-- 



$ ldd /opt/openmpi-1.2.6/intel/bin/orted
linux-vdso.so.1 =>  (0x7fff877fe000)
libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- 
rte.so.0 (0x7fe97f3ac000)
libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- 
pal.so.0 (0x7fe97f239000)

libdl.so.2 => /lib64/libdl.so.2 (0x7fe97f135000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x7fe97f01f000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fe97ef1c000)
libm.so.6 => /lib64/libm.so.6 (0x7fe97edc7000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fe97ecba000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fe97eba3000)
libc.so.6 => /lib64/libc.so.6 (0x7fe97e972000)
libimf.so => /opt/intel/compiler_10.1/x86_64/lib/libimf.so  
(0x7fe97e61)
libsvml.so => /opt/intel/compiler_10.1/x86_64/lib/ 
libsvml.so (0x7fe97e489000)
libintlc.so.5 => /opt/intel/compiler_10.1/x86_64/lib/ 
libintlc.so.5 (0x7fe97e35)

/lib64/ld-linux-x86-64.so.2 (0x7fe97f525000)
$ ssh comp00 ldd /opt/openmpi-1.2.6/intel/bin/orted
libopen-rte.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- 
rte.so.0 (0x2b1f0c0c5000)
libopen-pal.so.0 => /opt/openmpi-1.2.6/intel/lib/libopen- 
pal.so.0 (0x2b1f0c23e000)

libdl.so.2 => /lib64/libdl.so.2 (0x2b1f0c3bc000)
libnsl.so.1 => /lib64/libnsl.so.1 

Re: [OMPI users] Different CC for orte and opmi?

2008-06-09 Thread Doug Reeder

Ashley,

I am confused. In your first post you said orted fails, with link  
errors, when you try to launch a job. From this I inferred that the  
build and install steps for creating openmpi were successful. Was the  
build/install step successful. If so what dynamic libraries does ldd  
say that orted is using.


Doug Reeder
On Jun 9, 2008, at 12:54 PM, Ashley Pittman wrote:



Putting to side any religious views I might have about static linking
how would that help in this case?   It appears to be orted itself that
fails to link, I'm assuming that the application would actually run,
either because the LD_LIBRARY_PATH is set correctly on the front  
end or

the --prefix option to mpirun.

Or do you mean static linking of the tools?  I could go for that if
there is a configure option for it.

Ashley Pittman.

On Mon, 2008-06-09 at 08:27 -0700, Doug Reeder wrote:

Ashley,

It could work but I think you would be better off to try and
statically link the intel libraries.

Doug Reeder
On Jun 9, 2008, at 4:34 AM, Ashley Pittman wrote:



Is there a way to use a different compiler for the orte component  
and

the shared library component when using openmpi?  We are finding
that if
we use icc to compile openmpi then orted fails with link errors  
when I
try and launch a job as the intel environment isn't loaded by  
default.


We use the module command heavily and have modules for openmpi- 
gnu and

openmpi-intel as well as a intel_compiler module.  To use openmpi-
intel
we have to load intel_compiler by default on the compute nodes which
isn't ideal, is it possible to compile the orte component with  
gcc and

the library component with icc?

Yours,

Ashley Pittman,

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Different CC for orte and opmi?

2008-06-09 Thread Doug Reeder

Ashley,

It could work but I think you would be better off to try and  
statically link the intel libraries.


Doug Reeder
On Jun 9, 2008, at 4:34 AM, Ashley Pittman wrote:



Is there a way to use a different compiler for the orte component and
the shared library component when using openmpi?  We are finding  
that if

we use icc to compile openmpi then orted fails with link errors when I
try and launch a job as the intel environment isn't loaded by default.

We use the module command heavily and have modules for openmpi-gnu and
openmpi-intel as well as a intel_compiler module.  To use openmpi- 
intel

we have to load intel_compiler by default on the compute nodes which
isn't ideal, is it possible to compile the orte component with gcc and
the library component with icc?

Yours,

Ashley Pittman,

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Open MPI instructional videos

2008-06-06 Thread Doug Reeder

Jeff,

I believe that with quicktime-pro you can export the videos in  
several formats.


Doug Reeder
On Jun 3, 2008, at 1:48 PM, Jeff Squyres wrote:


On May 30, 2008, at 9:55 AM, Andreas Schäfer wrote:


I've never really dig into Open MPI's guts, not because I wasn't
interested, but mainly because the time required to get my bearings
seemed just too much. Until now. I've watched a couple of the videos
while coding and it was pretty awesome. Easy to understand,  
structured

and well spoken.


Good!  I'm glad you've found them useful.


- Do you like the format?
- Is the (slides+narration) format useful?


Yes, I like it a lot. I guess a pure podcast would be insufficient  
for

complex issues where you simply need diagrams.


That was definitely my thought here -- pictures can be worth a million
words, etc.


Maybe a small
suggestion: maybe it's just me, but I'd actually prefer (even) leaner
slides. Currently you're basically duplicating on screen what you're
saying, which is good when you're a nervous, moumbling college  
student

and might lose your audience somewhere. But when you're an experenced
speaker (which you obviously are), the audience does rarely need this
redundancy and might rather get confused when trying to digest both
streams of information (visual and auditory) simultaneously. But this
is of course a question of personal preference.


Thanks for the compliment snuggled in there.  :-)

Yes, this might be a style thing -- I have found that at least some
people like to have slides that are more-or-less what the speaker
actually said for two reasons:

- so that the visuals and audio agree with each other -- it's not two
different through processes while you're trying to absorb the
information.  Sure, some people read ahead on the slide and get bored
because the speaker eventually catches up, but at least in my
experience, these people are a minority.

- more importantly, however, the audience likes to take the slides
away and when they actually look at them 6 weeks after the lecture,
they might actually remember the content better because they received
the same information via two forms of sensory input (audio + visual).


- Would terminal screen-scrape sessions be useful?


I'd prefer how-to pages for this, as you can copy the commands
directly into your own shell.


Good point.


- ...other [low-budget] suggestions?


Maybe an a tad higher audio bitrate. And some people don't like the
.mov format, but that isn't really important.



Ok, I can bump up the audio rate and see what happens to the filesize
(that was my prime concern, actually).  Plus it *is* just the builtin
microphone on my Mac, so it may not be the greatest sound quality to
begin with.  :-)

As for .mov, yes, this is definitely a compromise.  I tried uploading
the videos to YouTube and Google Video and a few others, but a) most
have a time or file size restriction (e.g., 10 mins max) -- I was not
willing to spend the extra work to split up the videos into multiple
segments, and b) they down-res'ed the videos so much as to make the
slides look crappy and/or unreadable.  So I had to go with the video
encoder that I could get for darn little money (Cisco's a big company,
but my budget is still tiny :-) ).  That turned out to be a fun little
program called iShowU for OS X that does screen scraping + audio
capture.  It outputs Quicktime movies, so that was really my only
choice.

Is it a real hardship for people to install the QT player?  Are there
easy-to-install convertors?  I'm not opposed to hosting it in multiple
formats if it's easy and free to convert them.

--
Jeff Squyres
Cisco Systems


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] openmpi 32-bit g++ compilation issue

2008-05-19 Thread Doug Reeder

Arif,

It looks like your system is 64 bit by default and it therefore  
doesn't pick up the 32 bit libraries automatically at the link step  
(note the -L/.../x86_64-suse-linux/lib entries prior to the  
correspond entries pointing to the 32 bit library versions). I don't  
use suse linux so I don't know if this is something you can control  
in the configure step for open-mpi.


Doug Reeder
On May 19, 2008, at 2:48 PM, Arif Ali wrote:


Hi,

OS: SLES10 SP1
OFED: 1.3
openmpi: 1.2 1.2.5 1.2.6
compilers: gcc g++ gfortran

I am creating a 32-bit build of openmpi on an Infiniband cluster,  
and the compilation gets stuck, If I use the /usr/lib64/gcc/x86_64- 
suse-linux/4.1.2/32/libstdc++.so library manually it compiles that  
piece of code. I was wandering if anyone else has had this problem.  
Or is there any other way of getting this to work. I feel that  
there may be something very silly here that I have missed out. but  
I can't seem to gather it.


I have also tried this on a fresh install of OFED 1.3 with openmpi  
1.2.6



libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - 
I../../../orte/include -I../../../ompi/include - 
DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - 
I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT file.lo  
-MD -MP -MF .deps/file.Tpo -c file.cc  -fPIC -DPIC -o .libs/file.o

depbase=`echo win.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`;\
/bin/sh ../../../libtool --tag=CXX   --mode=compile g++ - 
DHAVE_CONFIG_H -I. -I../../../opal/include -I../../../orte/include - 
I../../../ompi/include  -DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 - 
DOMPI_SKIP_MPICXX=1 -I../../..-O3 -DNDEBUG -m32 -finline- 
functions -pthread -MT win.lo -MD -MP -MF $depbase.Tpo -c -o win.lo  
win.cc &&\

mv -f $depbase.Tpo $depbase.Plo
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../../../opal/include - 
I../../../orte/include -I../../../ompi/include - 
DOMPI_BUILDING_CXX_BINDINGS_LIBRARY=1 -DOMPI_SKIP_MPICXX=1 - 
I../../.. -O3 -DNDEBUG -m32 -finline-functions -pthread -MT win.lo - 
MD -MP -MF .deps/win.Tpo -c win.cc  -fPIC -DPIC -o .libs/win.o
/bin/sh ../../../libtool --tag=CXX   --mode=link g++  -O3 -DNDEBUG - 
m32 -finline-functions -pthread  -export-dynamic -m32  -o  
libmpi_cxx.la -rpath /opt/openmpi/1.2.6/gnu_4.1.2/32/lib mpicxx.lo  
intercepts.lo comm.lo datatype.lo file.lo win.lo  -lnsl -lutil  -lm
libtool: link: g++ -shared -nostdlib /usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/../../../../lib/crti.o /usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/32/crtbeginS.o  .libs/mpicxx.o .libs/intercepts.o .libs/ 
comm.o .libs/datatype.o .libs/file.o .libs/win.o   -Wl,-rpath -Wl,/ 
usr/lib64/gcc/x86_64-suse-linux/4.1.2 -Wl,-rpath -Wl,/usr/lib64/gcc/ 
x86_64-suse-linux/4.1.2 -lnsl -lutil -L/usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/32 -L/usr/lib64/gcc/x86_64-suse-linux/4.1.2/../../../../ 
x86_64-suse-linux/lib/../lib -L/usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib64/ 
gcc/x86_64-suse-linux/4.1.2 -L/usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../x86_64-suse-linux/lib -L/usr/lib64/gcc/x86_64- 
suse-linux/4.1.2/../../.. /usr/lib64/gcc/x86_64-suse-linux/4.1.2/ 
libstdc++.so -lm -lpthread -lc -lgcc_s /usr/lib64/gcc/x86_64-suse- 
linux/4.1.2/32/crtendS.o /usr/lib64/gcc/x86_64-suse-linux/ 
4.1.2/../../../../lib/crtn.o  -m32 -pthread -m32   -pthread -Wl,- 
soname -Wl,libmpi_cxx.so.0 -o .libs/libmpi_cxx.so.0.0.0
/usr/lib64/gcc/x86_64-suse-linux/4.1.2/libstdc++.so: could not read  
symbols: File in wrong format

collect2: ld returned 1 exit status
--
Arif Ali
Software Engineer
OCF plc

Mobile: +44 (0)7970 148 122
DDI:+44 (0)114 257 2240
Office: +44 (0)114 257 2200
Fax:+44 (0)114 257 0022
Email:  a...@ocf.co.uk
Web:http://www.ocf.co.uk

Support Phone:   +44 (0)845 702 3829
Support E-mail:  supp...@ocf.co.uk

Skype:  arif_ali80
MSN:a...@ocf.co.uk

This email is confidential in that it is intended for the exclusive
attention of the addressee(s) indicated. If you are not the intended
recipient, this email should not be read or disclosed to any other
person. Please notify the sender immediately and delete this email  
from

your computer system. Any opinions expressed are not necessarily those
of the company from which this email was sent and, whilst to the  
best of

our knowledge no viruses or defects exist, no responsibility can be
accepted for any loss or damage arising from its receipt or subsequent
use of this email.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard

2008-05-07 Thread Doug Reeder

Linwei,

Did you build the liblapack.a file, it is of the wrong architecture.

Doug Reeder
On May 7, 2008, at 2:58 PM, Linwei Wang wrote:


Hi, Doug

I've checked the makefiles and make sure that  flag -m64 is used for
all the compiling
but the error still exists..


Linwei

On May 7, 2008, at 5:33 PM, Doug Reeder wrote:


Linwei,

It looks like you are getting a mix of 32 and 64 bit code (hence the
'file is not of required architecture' error). Are you using the
command line flag -m64 for some parts of the build and not for
others. You need to use either -m32 or -m64 for all the builds.

Doug Reeder
On May 7, 2008, at 2:25 PM, Linwei Wang wrote:


Dear sir,

  Thanks very much for your detailed guideline~
  I'm now trying to follow it out~
  I've installed gcc 4.3 & openmpi~
  When compiling CLAPACK, I'm trying to use the optimized BLAS
library by ATLAS, so I set the BLASLIB in the make.inc as:
  BLASLIB = ../../libcblaswr.a -lcblas -latlas
  then build the libraries  (also before that, I built the f2clib
following the guideline in netlib
  It went well, but when I tried to built the blas testing code, it
generates errors for "undefined symbols"
  looks like those should be in the f2clib, but I already built
it
 "gccsblat2.o  \
 ../../F2CLIBS/libf2c.a -lm  -o ../xblat2s
Undefined symbols:
  "_f2c_ssbmv", referenced from:
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schk2_ in sblat2.o
  "_f2c_sgbmv", referenced from:
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schke_ in sblat2.o
  _schk1_ in sblat2.o
..."

On the other side, when compiling ATLAS, I did the configure as you
said and "make build" went well.
But when I tried "make check" for testing, it again give errors for
"undefined symbols"...

"d: warning in /Users/maomaowlw/ATLAS/build/lib/liblapack.a, file is
not of required architecture
Undefined symbols:
  "_ATL_slauum", referenced from:
  _test_inv in sinvtst.o
  "_ATL_strtri", referenced from:
  _test_inv in sinvtst.o
  "_ATL_spotrf", referenced from:
  _test_inv in sinvtst.o
  "_ATL_sgetrf", referenced from:
  _test_inv in sinvtst.o
  "_ATL_sgetri", referenced from:
  _test_inv in sinvtst.o
"

I'm not sure where is the problem? Can you provide any help?

Thanks again!

Linwei


On May 6, 2008, at 11:11 AM, Gregory John Orris wrote:


Points to clarify if I may, having gone through this relatively
recently:
g77 and gfortran are NOT one and the same.
gfortran from sourceforge works well, but it is based on gnu gcc  
4.3

and not on the gnu gcc 4.0.1 that comes with Leopard.
Your best bet is to download the ENTIRE gcc package from  
sourceforge

and install it into /usr/local. This includes gcc, g++, and
gfortran.

Then you will need to do a number of things to actually get a
reliable
set of packages all compiled from the same version of gcc 4.3.
Why? Because 4.3 seems to be notoriously faster. AND, I had a  
lot of
problems integrating the 4.0.1 libs with the 4.3 libs without  
errors

1. download CLAPACK-3.1.1 from netlib And compile
2. Download ATLAS-1.8 from dourceforge (netlib is a little behind
here) and configure it with the --with-netlib-lapack=your just
compiled lapack from CLAPACK
3. Download OpenMPI 1.2.6 and install it also so that openMPI will
have the fortran not installed with Leopard.
4. NOW you can compile BLACS and ScaLAPACK

In all of this you will need to do a couple of additional things
like
set the env's
setenv LDFLAGS "-L/usr/local/lib/x86_64"
setenv DYLD_LIBRARY_PATH "your openmpi path"
setenv LD_LIBRARY_PATH "your openmpi path"

Do all this right and make sure you compile with the -m64 -
mtune=core2
flags and you will be golden.

So what will you have---
A new cblas, atlas, lapack, openmpi, fortran, c, c++, blacs, and
scalapack.
All on the same version of gnu c.

Alternatively you can buy and use the intel compiler. It is
significantly faster than gfortran, but it has a host of other
problems associated with it.
But if you follow the outline above,  you will be left with the  
best

that's available. I have lots more info on this, but time is short.

FINALLY, and this is important, DO NOT FORGET ABOUT THE small STACK
size on Mac's when using gfortran. It's so small that it's useless
for
large parallel jobs.


On May 6, 2008, at 10:09 AM, Jeff Squyres wrote:


FWIW, I'm not a fortran expert, but if you built your Fortran
libraries with g77 and then tried to link against them with
gfortran,
you might run into problems.

My advice would be to use a single fortran compiler for building
everything: Open

Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard

2008-05-07 Thread Doug Reeder

Linwei,

It looks like you are getting a mix of 32 and 64 bit code (hence the  
'file is not of required architecture' error). Are you using the  
command line flag -m64 for some parts of the build and not for  
others. You need to use either -m32 or -m64 for all the builds.


Doug Reeder
On May 7, 2008, at 2:25 PM, Linwei Wang wrote:


Dear sir,

   Thanks very much for your detailed guideline~
   I'm now trying to follow it out~
   I've installed gcc 4.3 & openmpi~
   When compiling CLAPACK, I'm trying to use the optimized BLAS
library by ATLAS, so I set the BLASLIB in the make.inc as:
   BLASLIB = ../../libcblaswr.a -lcblas -latlas
   then build the libraries  (also before that, I built the f2clib
following the guideline in netlib
   It went well, but when I tried to built the blas testing code, it
generates errors for "undefined symbols"
   looks like those should be in the f2clib, but I already built  
it

  "gccsblat2.o  \
  ../../F2CLIBS/libf2c.a -lm  -o ../xblat2s
Undefined symbols:
   "_f2c_ssbmv", referenced from:
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schk2_ in sblat2.o
   "_f2c_sgbmv", referenced from:
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schke_ in sblat2.o
   _schk1_ in sblat2.o
..."

On the other side, when compiling ATLAS, I did the configure as you
said and "make build" went well.
But when I tried "make check" for testing, it again give errors for
"undefined symbols"...

"d: warning in /Users/maomaowlw/ATLAS/build/lib/liblapack.a, file is
not of required architecture
Undefined symbols:
   "_ATL_slauum", referenced from:
   _test_inv in sinvtst.o
   "_ATL_strtri", referenced from:
   _test_inv in sinvtst.o
   "_ATL_spotrf", referenced from:
   _test_inv in sinvtst.o
   "_ATL_sgetrf", referenced from:
   _test_inv in sinvtst.o
   "_ATL_sgetri", referenced from:
   _test_inv in sinvtst.o
"

I'm not sure where is the problem? Can you provide any help?

Thanks again!

Linwei


On May 6, 2008, at 11:11 AM, Gregory John Orris wrote:


Points to clarify if I may, having gone through this relatively
recently:
g77 and gfortran are NOT one and the same.
gfortran from sourceforge works well, but it is based on gnu gcc 4.3
and not on the gnu gcc 4.0.1 that comes with Leopard.
Your best bet is to download the ENTIRE gcc package from sourceforge
and install it into /usr/local. This includes gcc, g++, and gfortran.

Then you will need to do a number of things to actually get a  
reliable

set of packages all compiled from the same version of gcc 4.3.
Why? Because 4.3 seems to be notoriously faster. AND, I had a lot of
problems integrating the 4.0.1 libs with the 4.3 libs without errors
1. download CLAPACK-3.1.1 from netlib And compile
2. Download ATLAS-1.8 from dourceforge (netlib is a little behind
here) and configure it with the --with-netlib-lapack=your just
compiled lapack from CLAPACK
3. Download OpenMPI 1.2.6 and install it also so that openMPI will
have the fortran not installed with Leopard.
4. NOW you can compile BLACS and ScaLAPACK

In all of this you will need to do a couple of additional things like
set the env's
setenv LDFLAGS "-L/usr/local/lib/x86_64"
setenv DYLD_LIBRARY_PATH "your openmpi path"
setenv LD_LIBRARY_PATH "your openmpi path"

Do all this right and make sure you compile with the -m64 - 
mtune=core2

flags and you will be golden.

So what will you have---
A new cblas, atlas, lapack, openmpi, fortran, c, c++, blacs, and
scalapack.
All on the same version of gnu c.

Alternatively you can buy and use the intel compiler. It is
significantly faster than gfortran, but it has a host of other
problems associated with it.
But if you follow the outline above,  you will be left with the best
that's available. I have lots more info on this, but time is short.

FINALLY, and this is important, DO NOT FORGET ABOUT THE small STACK
size on Mac's when using gfortran. It's so small that it's useless  
for

large parallel jobs.


On May 6, 2008, at 10:09 AM, Jeff Squyres wrote:


FWIW, I'm not a fortran expert, but if you built your Fortran
libraries with g77 and then tried to link against them with  
gfortran,

you might run into problems.

My advice would be to use a single fortran compiler for building
everything: Open MPI, your libraries, your apps.  I prefer gfortran
because it's more modern, but I have not done any performance
evaluations of gfortran vs. g77 -- I have heard [unverified]
anecdotes
that gfortran is "slower" than g77 -- google around and see what the
recent buzz is.

FW

Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard

2008-05-05 Thread Doug Reeder

Linwei,

Have you tried using -funderscoring with gfortran. I don't think the  
trouble you are having is caused by having g77 and gfortran both  
installed.


Do you know where the unreferenced symbols (_s_wsle, _e_wsle, etc )  
are supposed to be coming from. If they are in your fortran programs  
then using -funderscoring should help.


Doug Reeder
On May 5, 2008, at 11:21 AM, Linwei Wang wrote:


Dear Reeder,

   I've tried add gfortran flag  "-fno-underscoring", but the same
errors persist...
Is that possible because that I have both g77 and gfortran in my
computer?

   Best,
   Linwei

On May 5, 2008, at 1:17 PM, Doug Reeder wrote:


Linwei,

Is there a problem with trailing underscores. Are you linking c/c++
files with fortran. Do the _s_wsle family members need to have a
trailing underscore

where are the unrefernced symbols supposed to be coming from. If they
have a trailing underscore in their names you probably need to add a
command line flag to you fortran command to append the underscore.

Doug Reeder
On May 5, 2008, at 10:12 AM, Linwei Wang wrote:


Dear Dr. Simon,

Do I need to remove g77 from my computer then? Since after  
installing

gfortran (for Leopard), there is some link problem with gfortran..
When I try to build some routines in the BLACS, it gives error like:

Undefined symbols:
  "_s_wsle", referenced from:
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  "_e_wsle", referenced from:
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  "_do_lio", referenced from:
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  _MAIN__ in tc_fCsameF77.o
  "_s_stop", referenced from:
  _MAIN__ in tc_fCsameF77.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

for some which is successfully built, it can not be run either,
giving
errors like:

iris-wl03:14541] *** Process received signal ***
[iris-wl03:14541] Signal: Bus error (10)
[iris-wl03:14541] Signal code:  (2)
[iris-wl03:14541] Failing at address: 0xe3
[iris-wl03:14541] [ 0] 2   libSystem.B.dylib
0x955f45eb _sigtramp + 43
[iris-wl03:14541] [ 1] 3   ???
0x 0x0 + 4294967295
[iris-wl03:14541] [ 2] 4   xcmpi_sane
0x1cc3 main + 51
[iris-wl03:14541] [ 3] 5   xcmpi_sane
0x1c56 start + 54
[iris-wl03:14541] *** End of error message ***
mpirun noticed that job rank 0 with PID 14541 on node iris-
wl03.rit.edu exited on signal 10 (Bus error).



The second problem happens when I use g77 too, but there were no
linking problems with g77...

Thanks for any help!

Best,
Linwei


On May 2, 2008, at 7:04 AM, Christian Simon wrote:


Dear Linwei,

On 1 mai 08, at 20:32, Linwei Wang wrote:


other type at (1) [info -f g77 M GLOBALS]


What compiler are you using ?
--
Dr. Christian SIMON
Laboratoire LI2C-UMR7612
Universite Pierre et Marie Curie
Case 51
4 Place Jussieu
75252 Paris Cedex 05
France/Europe

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard

2008-05-01 Thread Doug Reeder

Linwei,

mpif.h is the include file for fortran programs to use openmpi. The  
apple version does not support fortran. If you want to use openmpi  
from fortran you will need to install a version of openmpi that  
supports fortran, this will install mpif.h. I suggest you install the  
new version in a different directory than the apple version ( use -- 
prefix in the openmpi configure command). You will also need to  
remove the apple version or rename the openmpi include and library  
files so that the linker can find your new, fortran supporting version.


Doug Reeder
On May 1, 2008, at 8:42 AM, Linwei Wang wrote:


Dear all,

I'm new to openmpi. I'm now trying to use BLACS and ScaLAPACK on
Leopard.  Since it has built-in Open MPI, I didn't install any other
versions. I followed the BLACS install guidances in FAQ section, and
it generated errors as:

"No rule to make target `/usr/include/mpif.h', needed by
`mpif.h'.  Stop."

The problem is I could not find "mpif.h" in my computer. Does this
mean I should install other Open MPI version rather than using  
Leopard's

built-in version?

   Thanks for the help!

   Best,
  Linwei

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] trouble building on a macbook

2008-04-28 Thread Doug Reeder

Robert,

Did you mean to install openmpi-1.2.6 in /usr. That is where the  
apple supplied openmpi-1.2.3 in is installed. That doesn't appear to  
be the problem causing your make install error. Were there any  
warnings or errors when you ran make.


Doug Reeder
On Apr 27, 2008, at 1:11 PM, Robert Taylor wrote:


I have had trouble building on an macbook running OS X 10.5.2

Specifically it fails after the configure when I run make all --  
files attached.


Is this the right place to get help?  I do note that the  
ompi_config.h is in ompi/include not

share/include.

rlt___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] install intel mac with Laopard

2008-04-25 Thread Doug Reeder

Jeff,

I think that error message is a good compromise and addresses the  
most common problems that cause it to be written.


Doug Reeder
On Apr 25, 2008, at 4:08 AM, Jeff Squyres wrote:


Sorry, I should have been more specific: how about this?

**
It appears that your Fortran 77 compiler is unable to link against
object files created by your C compiler.  This typically indicates
one of a few possibilities:

   - A conflict between CFLAGS and FFLAGS
   - A problem with your compiler installation(s)
   - Different default build options between compilers (e.g., C
 building for 32 bit and Fortran building for 64 bit)
   - Incompatible compilers

Such problems can usually be solved by picking compatible compilers
and/or CFLAGS and FFLAGS.  More information (including exactly what
command was given to the compilers and what error resulted when the
commands were executed) is available in the config.log file in this
directory.
**

On Apr 25, 2008, at 7:00 AM, Jeff Squyres wrote:


How about a compromise -- I'll extend the message to also include the
possibility of architecture mismatches.



--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] install intel mac with Laopard

2008-04-24 Thread Doug Reeder

Jeff,

I don't know if it there is a way to capture the "not of required  
architecture" response and add it to the error message. I agree that  
the current error message captures the problem in broad terms and  
points to the config.log file. It is just not very specific. If the  
architecture problem can't be added to the error message then I   
think we are stuck with what we have. If that is the case is it  
worthwhile to add this to the FAQ for building openmpi.


Doug
On Apr 24, 2008, at 9:34 AM, Jeff Squyres wrote:


On Apr 24, 2008, at 12:24 PM, George Bosilca wrote:


There are so many special errors that are compiler and operating
system dependent that there is no way to handle each of them
specifically. And even if it was possible, I will not use autoconf
if the resulting configure file was 100MB ...


More specifically, the error messages in config.log are mostly written
by the compiler/linker (i.e., redirect stdout/stderr from the command
line to config.log). We don't usually modify that -- the Autoconf Way
is that Autoconf is 100% responsible for config.log.


Additionally, I think the error message is more than clear. It
clearly state that the problem is coming from a mismatch between the
CFLAGS and FFLAGS. There is even a hint that one has to look in
config.log to find the real cause...


As George specifies, the stdout from configure is what we can most
directly affect, and that's why we chose to output this message:


* It appears that your Fortran 77 compiler is unable to link against
* object files created by your C compiler.  This generally indicates
* either a conflict between the options specified in CFLAGS and FFLAGS
* or a problem with the local compiler installation.  More
* information (including exactly what command was given to the
* compilers and what error resulted when the commands were  
executed) is

* available in the config.log file in this directory.


OMPI doesn't know *why* the test link failed; we just know that it
failed.  I agree with George that trying to put in compiler-specific
stdout/stderr analysis is a black hole that would be extraordinarily
difficult.

Do you have any suggestions for re-wording this message?  That's
probably the best that we can do.




 george.

On Apr 24, 2008, at 11:57 AM, Doug Reeder wrote:


Jeff,

For the specific problem of the gcc compiler creating i386 objects
and ifort creating x86_64 objects, in the config.log file it says

configure:26935: ifort -o conftest conftest.f conftest_c.o >&% ld:
warning in conftest_c.o, file is not of required architecture

If configure could pick up on this and write an error message
something like "Your C and fortran compilers are creating objects  
for

different architectures. You probably need to change your CFLAG or
FFLAG arguments to ensure that they are consistent" it would point
the user more directly to the real problem. Right now the  
information

is in the config.log file but it doesn't jump out at you.

Doug Reeder
On Apr 24, 2008, at 8:40 AM, Jeff Squyres wrote:


On Apr 24, 2008, at 11:07 AM, Doug Reeder wrote:


Make sure that your compilers are all creaqting code for the same
architecture (i386 or x86-64). ifort usually installs such that  
the

64 bit version of the compiler is the dfault while the apple gcc
compiler creates i386 output by default. Check the architecture of
the .o files with file *.o and if the gcc output needs to be  
x86_64

add the -m64 flag to the c and c++ flags. That has worked for me.
You shouldn't need the intel c/c++ compilers. I find the configure
error message to be a little bit cryptic and not very insightful.


Do you have a suggestion for a new configure error message?  I
thought
it was very clear, but then again, I'm one of the implementors...

checking if C and Fortran 77 are link compatible... no
* 
***

**
* It appears that your Fortran 77 compiler is unable to link
against
* object files created by your C compiler.  This generally
indicates
* either a conflict between the options specified in CFLAGS and
FFLAGS
* or a problem with the local compiler installation.  More
* information (including exactly what command was given to the
* compilers and what error resulted when the commands were
executed) is
* available in the config.log file in this directory.
* 
***

**
configure: error: C and Fortran 77 compilers are not link
compatible.
Can not continue.




--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org

Re: [OMPI users] install intel mac with Laopard

2008-04-24 Thread Doug Reeder

Jeff,

For the specific problem of the gcc compiler creating i386 objects  
and ifort creating x86_64 objects, in the config.log file it says


configure:26935: ifort -o conftest conftest.f conftest_c.o >&% ld:  
warning in conftest_c.o, file is not of required architecture


If configure could pick up on this and write an error message  
something like "Your C and fortran compilers are creating objects for  
different architectures. You probably need to change your CFLAG or  
FFLAG arguments to ensure that they are consistent" it would point  
the user more directly to the real problem. Right now the information  
is in the config.log file but it doesn't jump out at you.


Doug Reeder
On Apr 24, 2008, at 8:40 AM, Jeff Squyres wrote:


On Apr 24, 2008, at 11:07 AM, Doug Reeder wrote:


Make sure that your compilers are all creaqting code for the same
architecture (i386 or x86-64). ifort usually installs such that the
64 bit version of the compiler is the dfault while the apple gcc
compiler creates i386 output by default. Check the architecture of
the .o files with file *.o and if the gcc output needs to be x86_64
add the -m64 flag to the c and c++ flags. That has worked for me.
You shouldn't need the intel c/c++ compilers. I find the configure
error message to be a little bit cryptic and not very insightful.


Do you have a suggestion for a new configure error message?  I thought
it was very clear, but then again, I'm one of the implementors...

checking if C and Fortran 77 are link compatible... no
 
**

* It appears that your Fortran 77 compiler is unable to link against
* object files created by your C compiler.  This generally indicates
* either a conflict between the options specified in CFLAGS and
FFLAGS
* or a problem with the local compiler installation.  More
* information (including exactly what command was given to the
* compilers and what error resulted when the commands were
executed) is
* available in the config.log file in this directory.
 
**
configure: error: C and Fortran 77 compilers are not link  
compatible.

Can not continue.




--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Problems with program-execution with OpenMPI: Orted: command not found

2008-04-22 Thread Doug Reeder

Stephan,

A couple things to try

Put -np 2 after -hostfile /home/stephan/mpd.hosts

put the command you want to run after -np 2

Good luck,

Doug Reeder

On Apr 21, 2008, at 11:56 PM, gildo@gmx.de wrote:


Dear all,

I wanted to compare MPICH and OpenMPI. MPICH works fine. So I  
installed OpenMPI the same way (configure, make, make install). The  
commands are found in the OpenMPI installation directory.


When I tried to run programs I was a little bit confused, that  
there seems not to be a default hosts-file like in MPICH. I  
included it in the command with "--hostfile".


When I now want to run my first test with

  mpirun -np 2 --hostfile /home/stephan/mpd.hosts

I get the error-message:

  orted: command not found

The "orted"-executable resides as well as the "mpirun"- and  
"mpiexec"-executables in the directory /home/stephan/openmpi- 
install. "orted" is also found by "which orted".


What might be the problem? How does "orted" work? I'm not conscious  
about anything equivalent in MPICH...


Thanks in advance for your help!

Kind Regards

Stephan
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] remote host not accessible

2008-04-01 Thread Doug Reeder

Danesh,

The filesystem physically on the master, specifically including the  
directory where you are running the open-mpi program, should be NFS  
mounted by the slave machines. The absolute path name should be the  
same on all machines. I don't know if that will fix your problem but  
we had to do thaqt on our linux clusters and os x clusters.


Doug
On Apr 1, 2008, at 2:22 PM, Danesh Daroui wrote:


You mean I should mount NFS filesystems of slave machine on master so
their disks can be accessed from a mount point on master? In that  
cases,

what moint point on master
shoud it be? Should I configure open-MPI about this mount point? Can't
it work without mounting? I think it should work since the  
processes are

locally run via SSH on remote machines.

D.


Doug Reeder skrev:

Danesh,

Do they all have access to the sam file system/physical hard drive.
You will probably need to NFS mount the filesystem on master on the
other two systems.

Doug Reeder
On Apr 1, 2008, at 1:46 PM, Danesh Daroui wrote:



Hi all,

I have installed Open-MPI on three machine which runs OpenSUSE  
and it

has been installed successfully. I can submit jobs locally on each
machine using "mpirun" and it works fine. I have defined a
host file on one of them (master) where I have defined IP address of
each machine and number of slots. First when I tried to submit  
jobs to

master it asked for password for SSH connection which showed
that master can communicate with slaves. Then I setup all  
machines to

communicate with each other using SSH without password. Now when I
submit a job on master, the job just blocks and nothing
happens. The program runs locally on each machine but it will not  
run

when I submit it on master to be run on slaves. What can it be?

D.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] remote host not accessible

2008-04-01 Thread Doug Reeder

Danesh,

Do they all have access to the sam file system/physical hard drive.  
You will probably need to NFS mount the filesystem on master on the  
other two systems.


Doug Reeder
On Apr 1, 2008, at 1:46 PM, Danesh Daroui wrote:


Hi all,

I have installed Open-MPI on three machine which runs OpenSUSE and it
has been installed successfully. I can submit jobs locally on each
machine using "mpirun" and it works fine. I have defined a
host file on one of them (master) where I have defined IP address of
each machine and number of slots. First when I tried to submit jobs to
master it asked for password for SSH connection which showed
that master can communicate with slaves. Then I setup all machines to
communicate with each other using SSH without password. Now when I
submit a job on master, the job just blocks and nothing
happens. The program runs locally on each machine but it will not run
when I submit it on master to be run on slaves. What can it be?

D.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Doug Reeder

Greg,

I would disagree with your statement that the available fortran  
options can't pass a cost-benefit analysis. I have found that for  
scientific programming (e.g., Livermore Fortran Kernels and actual  
PDE solvers) that code produced by the intel compiler runs 25 to 55%  
faster than code from gfortran or g95. Looking at the cost of adding  
processors with g95/gfortran to get the same throughput as with ifort  
you recover the $549 compiler cost real quickly.


Doug Reeder
On Mar 6, 2008, at 9:20 AM, Gregory John Orris wrote:


Sorry for the long delay in response.

Let's get back to the beginning:
My original compiler configuration was gcc from the standard  
Leopard Developer Tools supplied off the installation DVD. This  
version was 4.0.1. However, it has been significantly modified by  
Apple to work with Leopard. If you haven't used Apple's Developer  
Environment, you're missing out on something. It's pretty sweet.  
But the price you pay for it is no fortran support (not usually a  
problem for me but it is relevant here) and usually a somewhat time- 
lagged compiler. I'm not as plugged into Apple as perhaps I should  
be, but I can only imagine that their philosophy is to really over  
test their compiler. Gratis, Apple throws into it's "frameworks" a  
shared library called vecLib, that includes machine optimized BLAS  
and CLAPACK routines. Also, with Leopard, Apple has integrated open- 
mpi (yea!). But they have once again not included fortran support  
(boo!).


Now, to get fortran on a Mac you have several options (most of  
which cannot really survive the cost-benefit analysis of a  
competent manager), but a perfectly fine freeware option is to get  
it off of hpc.sourceforge.net. This version is based on gcc 4.3.0.  
There are a few legitimate reasons to stick with Apple's older gcc.  
As it's not really a good idea to try an mix libraries from one  
compiler version with another. Especially here, because (without  
knowing precisely what Apple has done) there is a tremendous  
difference in execution speed of code written with gcc 4.0 and 4.1  
as opposed to 4.2 and later. (This has been well documented on many  
systems.) Also, out of a bit of laziness, I really didn't want to  
go to the trouble of re-writing (or finding) all of the compiler  
scripts in the Developer Environment to use the new gcc.


So, I compiled open-mpi-1.2.5 with gcc, g++ 4.0.1, and gfortran  
4.3. Then, I compiled BLACS and ScaLAPACK using the configuration  
from the open-mpi FAQ page. Everything compiles perfectly ok,  
independent of whether you choose 32 or 64 bit addressing. First  
problem was that I was still calling mpicc from the Apple supplied  
openmpi and mpif77 from the newly installed distribution. Once  
again, I've not a clue what Apple has done, but while the two would  
compile items together, they DO NOT COMMUNICATE properly in 64-bit  
mode. MPI_COMM_WORLD even in the test routines of openMPI would  
fail! This is the point at which I originated the message asking if  
anyone had gotten a 64-bit version to actually work. The errors  
were in libSystem and were not what I'd expect from a simple  
openmpi error. I believe this problem is caused by a difference in  
how pointers were/are treated within gcc from version to version.  
Thus mixing versions essentially caused failure within the Apple  
supplied openmpi distribution and the new one I installed.


How to get over this hurdle? Install the complete gcc 4.3.0 from  
the hpc.sourceforge.net site and recompile EVERYTHING!


You might think you were done here, but there is one (or actually  
four) additional problem(s). Now NONE of the complex routines  
worked. All of the test routines returned failure. And I tracked it  
down the the fact that pzdotc, pzdotu, pcdotc, and pcdotu inside of  
the PBLAS routines were failing. Potentially this was a much more  
difficult problem, since rewriting these codes is really not what  
I'm paid to do. Tracing down these errors further I found that the  
actual problem is with the zdotc, zdotu, cdotc, and cdotu BLAS  
routines inside of Apple's vecLib. So, the problem seemed as though  
a faulty manufacturer supplied and optimized library was not  
functioning properly. Well, as it turns out there is a peculiar  
difference (again) between versions of the gcc suite in how it  
regards, returned values from complex fortran functions (I'm only  
assuming this since the workaround was successful). This problem  
has been know for some time now (perhaps 4 years or more). See,   
http://developer.apple.com/hardware/ve/errata.html#fortran_conventions


How to get over this hurdle? Install ATLAS, CLAPACK, and CBLAS off  
the netlib.org web site, and compile them with the gcc 4.3.0 suite.


So, where am I now? BLACS and ScaLAPACK, and PBLAS work in 64-bit  
mode with CLAPACK-3.1.1, ATLAS 3.8.1, Open-MPI-1.2.5, and GCC 4.3.0  
and link with ATLAS and CLAPACK and NOT vecLib!


Long way

Re: [OMPI users] -prefix option to mpirun.

2008-03-04 Thread Doug Reeder

Ashley,

Could you define an alias for mpirun that includes -prefix and the  
necessary argument.


Doug Reeder
On Mar 4, 2008, at 6:28 AM, Ashley Pittman wrote:



Hello,

I work for medium sized UK based ISV and am packaging open-mpi so that
is can be made available as an option to our users, so far I've been
very impressed by how smoothly things have gone but I've got one  
problem

which doesn't seem to be covered by the FAQ.

We install openmpi to /opt/openmpi-1.2.5 and are using the modules
command to select which mpi to use, the modules command correctly sets
PATH to pick up mpicc and mpirun on the head node however the issue
comes with running a job, users need to specify -prefix on the mpirun
command line.  Is there a way to specify this in the environment so I
could make it happen automatically as part of the modules environment?

I've searched the archives for this, the closest I can find is this
exchange in 2006, if I specify a full path to mpirun then it does the
right thing but is there a way to extend this functionality to the  
case

where mpirun is run from path?
http://www.open-mpi.org/community/lists/users/2006/01/0480.php

Yours,  Ashley Pittman.

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] shared libraly problem with openmpi-1.2.3 and opensuse10.2

2008-02-20 Thread Doug Reeder

Yoshi,

Is the appropriate verison of libgfortran.so.1 (32 bit or 64 bit in  
your LD_LIBRARY_PATH. what is the out put from

ldd ./a.out

The version of libgfortran.so.1 it lists needs to be in your  
LD_LIBRARY_PATH


what does file ./a.out say if it is an AMD x86-64 then you should  
put /usr/lib64 in your LD_LIBRARY_PATH otherwise put /usr/lib in your  
LD_LIBRARY_PATH.


Doug Reeder

On Feb 19, 2008, at 10:00 PM, yoshi.plala wrote:


Dear sirs

I am a beginer with openmpi-1.2.3 (and opensuse10.2).
But, I have some experience with mpich-1.2 and FreeBSD5.4.

I am struggling with them to build scalapack, parallel-octave and  
matlab on.


I succeeded in installing intel fortran/c 10.0.026 and  
openMPI-1.2.3, now.


like belows

#mkdir build
#cd build
#../configure --prefix=/opt/openmpi/1.2.3 --enable-mpi-threads  CC=icc
CXX=icpc F77=ifort FC=ifort
#make all
#make install

test@linux-4e1d:~> set |grep LD_
DYLD_LIBRARY_PATH=/opt/intel/cce/10.0.026/lib:/opt/intel/fce/ 
10.0.026/lib
LD_LIBRARY_PATH=/opt/openmpi/1.2.3/lib:/opt/intel/cce/10.0.026/lib:/ 
opt/intel/fc

e/10.0.026/lib
LD_RUN_PATH=/opt/openmpi/1.2.3/lib:/opt/intel/cce/10.0.026/lib:/opt/ 
intel/fce/10

.0.026/lib:/usr/lib64:/usr/lib64/gcc/x86_64-suse-linux/4.1.2
test@linux-4e1d:~>


hello_c worked without any trouble.
test@linux-4e1d:~/openmpi-1.2.3/examples> mpirun -np 8 hello_c - 
hostfile

/opt/o
penmpi/1.2.3/etc/openmpi-default-hostfile
Hello, world, I am 7 of 8
Hello, world, I am 6 of 8
Hello, world, I am 4 of 8
Hello, world, I am 3 of 8
Hello, world, I am 5 of 8
Hello, world, I am 0 of 8
Hello, world, I am 2 of 8
Hello, world, I am 1 of 8
test@linux-4e1d:~/openmpi-1.2.3/examples>

But, my bench mark program doesn't work. Are there any mistake in my
configuration?.

test@linux-4e1d:~/himenoBMT/mpi> ls
README.txt  a.out  himenoBMTxpr.f  param.h  paramset.sh
test@linux-4e1d:~/himenoBMT/mpi> mpirun -np 8 ./a.out -hostfile
/opt/openmpi/1.
2.3/etc/openmpi-default-hostfile
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory
./a.out: error while loading shared libraries: libgfortran.so.1:  
cannot open

sha
red object file: No such file or directory

[1]+  Stopped mpirun -np 8 ./a.out -hostfile
/opt/openmpi/1.2.3/
etc/openmpi-default-hostfile
test@linux-4e1d:~/himenoBMT/mpi>

linux-4e1d:/home/test/himenoBMT/mpi # find / -name libgfortran.so.1  
-print

/usr/lib64/libgfortran.so.1
/usr/lib/libgfortran.so.1
/usr/local/matlab75/sys/os/glnxa64/libgfortran.so.1
linux-4e1d:/home/test/himenoBMT/mpi #


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Doug Reeder

Brock,

Do you mean flash memory, like a USB memory stick. What kid of file  
system is on the memory. Is there some filesystem limit you are  
bumping into.


Doug Reeder
On Jan 25, 2008, at 8:38 AM, Brock Palen wrote:


Is anyone using flash with openMPI?  we are here, but when ever it
tries to write its second checkpoint file it segfaults once it gets
to 2.2GB always in the same location.

Debugging is a pain as it takes 3 days to get to that point.  Just
wondering if anyone else has seen this same behavior.


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Tracing the library using gdb and xterm

2008-01-03 Thread Doug Reeder

Krishna,

Would it work to launch the gdb/ddd process separately on the remote  
machine and then attaching to the mpi running jobfrom within gdb/ddd.  
Something like


ssh -X [hostname|ip address] [ddd|gdb]

Doug Reeder
On Jan 3, 2008, at 8:32 AM, Jeff Squyres wrote:


Per my previous mail, Open MPI (by default) closes its ssh sessions
after the remote processes are launched, so X forwarding through ssh
will not work.

If it is possible (and I think it is, based on your subsequent
replies), you might be best served with unencrypted X forwarding.


On Jan 3, 2008, at 11:02 AM, Doug Reeder wrote:


Krishna,

Review the ssh and sshd man pages. When using ssh -X it takes care
of defining the DISPLAY and sending the X11 images to your screen.
Defining DISPLY directly generally won't work (that is how you do it
with rlogin but not with ssh).

Doug Reeder
On Jan 3, 2008, at 1:54 AM, Krishna Chaitanya wrote:


Hi Rolf,
Thanks for that. There is still one minor problem,
though. The xwindow is getting spawned on the remote machine and
not on my local machine. It now looks like,
mpirun --prefix /usr/local -hostfile machines -x DISPLAY -x PATH  -
np 2 xterm -e gdb peruse_ex1
Please let me know what i can do to have it displayed
on my machine. I have the DISPLAY variable set to 0.0 on both the
machines and I am ssh-ing into the other machine by using the -X
switch.

Thanks,
Krishna Chaitanya


On 1/2/08, Rolf Vandevaart <rolf.vandeva...@sun.com> wrote: Krishna
Chaitanya wrote:

Hi,
   I have been tracing the interactions between the

PERUSE

and MPI library,on one machine. I have been using gdb along with

xterm

to have two windows open at the same time as I step through the

code. I

wish to get a better glimpse of the working of the point to point

calls,

by launching the job on two machines and by tracing the flow in a
similar manner. This is where I stand as of now :

mpirun --prefix /usr/local -hostfile machines  -np 2 xterm -e gdb

peruse_ex1

xterm Xt error: Can't open display:
xterm:  DISPLAY is not set

   I tried using the display option for xterm and

setting

the value as 0.0, that was not of much help.
   If someone can guide me as to where the DISPLAY

parameter

has to be set to allow the remote machine to open the xterm

window, it

will be of great help.

Thanks,
Krishna



I also do the the following:

-x DISPLAY -x PATH

In this way, both your DISPLAY and PATH settings make it to the
remote node.

Rolf
--

=
rolf.vandeva...@sun.com
781-442-3043
=



--
In the middle of difficulty, lies opportunity
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Tracing the library using gdb and xterm

2008-01-03 Thread Doug Reeder

Krishna,

Review the ssh and sshd man pages. When using ssh -X it takes care of  
defining the DISPLAY and sending the X11 images to your screen.  
Defining DISPLY directly generally won't work (that is how you do it  
with rlogin but not with ssh).


Doug Reeder
On Jan 3, 2008, at 1:54 AM, Krishna Chaitanya wrote:


Hi Rolf,
Thanks for that. There is still one minor problem,  
though. The xwindow is getting spawned on the remote machine and  
not on my local machine. It now looks like,
mpirun --prefix /usr/local -hostfile machines -x DISPLAY -x PATH  - 
np 2 xterm -e gdb peruse_ex1
Please let me know what i can do to have it displayed  
on my machine. I have the DISPLAY variable set to 0.0 on both the  
machines and I am ssh-ing into the other machine by using the -X  
switch.


Thanks,
Krishna Chaitanya


On 1/2/08, Rolf Vandevaart <rolf.vandeva...@sun.com> wrote:
Krishna Chaitanya wrote:
> Hi,
>I have been tracing the interactions between the  
PERUSE
> and MPI library,on one machine. I have been using gdb along with  
xterm
> to have two windows open at the same time as I step through the  
code. I
> wish to get a better glimpse of the working of the point to point  
calls,

> by launching the job on two machines and by tracing the flow in a
> similar manner. This is where I stand as of now :
>
> mpirun --prefix /usr/local -hostfile machines  -np 2 xterm -e gdb  
peruse_ex1

> xterm Xt error: Can't open display:
> xterm:  DISPLAY is not set
>
>I tried using the display option for xterm and  
setting

> the value as 0.0, that was not of much help.
>If someone can guide me as to where the DISPLAY  
parameter
> has to be set to allow the remote machine to open the xterm  
window, it

> will be of great help.
>
> Thanks,
> Krishna
>

I also do the the following:

-x DISPLAY -x PATH

In this way, both your DISPLAY and PATH settings make it to the  
remote node.


Rolf
--

=
rolf.vandeva...@sun.com
781-442-3043
=



--
In the middle of difficulty, lies opportunity
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Tracing the library using gdb and xterm

2008-01-01 Thread Doug Reeder

Krishna,

If you are using ssh to connect to the second machine you need to be  
sure that ssh X11 forwarding is enabled and you may need to have mpi  
use ssh -X or ssh -Y to connect to the second machine. That is how  
the DISPLAY gets set using ssh.


Doug Reeder

On Jan 1, 2008, at 8:11 AM, Krishna Chaitanya wrote:


Hi,
   I have been tracing the interactions between the  
PERUSE and MPI library,on one machine. I have been using gdb along  
with xterm to have two windows open at the same time as I step  
through the code. I wish to get a better glimpse of the working of  
the point to point calls, by launching the job on two machines and  
by tracing the flow in a similar manner. This is where I stand as  
of now :


mpirun --prefix /usr/local -hostfile machines  -np 2 xterm -e gdb  
peruse_ex1

xterm Xt error: Can't open display:
xterm:  DISPLAY is not set

   I tried using the display option for xterm and  
setting the value as 0.0, that was not of much help.
   If someone can guide me as to where the DISPLAY  
parameter has to be set to allow the remote machine to open the  
xterm window, it will be of great help.


Thanks,
Krishna



--
In the middle of difficulty, lies opportunity
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] compiler warnings in openmpi-1.2.5rc2

2007-12-27 Thread Doug Reeder

Hello,

The attachment contains a short explanation of a compiler warning  
using the gcc-4.3.0 compilers from hpc-sourceforge on os x 10.5.1.  
The warning doesn't occur when using the apple gcc-4.0.1 compilers.  
This was on a mac /x86 machine.


Doug Reeder


openmpi.wrn
Description: Binary data